|
1 | | -# embedding-atlas |
| 1 | +# Embedding Atlas |
2 | 2 |
|
3 | | -A Python package that provides a command line tool to visualize a dataset with embeddings. |
| 3 | +A Python package that provides a command line tool to visualize a dataset with embeddings. It also includes a Jupyter widget and a Streamlit widget. |
4 | 4 |
|
5 | | -## Development Setup |
| 5 | +- Documentation: https://apple.github.io/embedding-atlas |
| 6 | +- GitHub: https://github.com/apple/embedding-atlas |
6 | 7 |
|
7 | | -Install the [uv](https://github.com/astral-sh/uv) package manager. |
| 8 | +## Installation |
8 | 9 |
|
9 | | -To launch the command line tool, run it with `uv run`. |
| 10 | +```bash |
| 11 | +pip install embedding-atlas |
| 12 | +``` |
| 13 | + |
| 14 | +and then launch the command line tool: |
| 15 | + |
| 16 | +```bash |
| 17 | +embedding-atlas [OPTIONS] INPUTS... |
| 18 | +``` |
| 19 | + |
| 20 | +## Loading Data |
| 21 | + |
| 22 | +You can load your data in two ways: locally or from Hugging Face. |
| 23 | + |
| 24 | +### Loading Local Data |
| 25 | + |
| 26 | +To get started with your own data, run: |
| 27 | + |
| 28 | +```bash |
| 29 | +embedding-atlas path_to_dataset.parquet |
| 30 | +``` |
| 31 | + |
| 32 | +### Loading Hugging Face Data |
| 33 | + |
| 34 | +You can instead load datasets from Hugging Face: |
| 35 | + |
| 36 | +```bash |
| 37 | +embedding-atlas huggingface_org/dataset_name |
| 38 | +``` |
| 39 | + |
| 40 | +## Visualizing Embedding Projections |
| 41 | + |
| 42 | +To visual embedding projections, pre-compute the X and Y coordinates, and specify the column names with `--x` and `--y`, such as: |
10 | 43 |
|
11 | 44 | ```bash |
12 | | -uv run embedding-atlas --help |
| 45 | +embedding-atlas path_to_dataset.parquet --x projection_x --y projection_y |
13 | 46 | ``` |
14 | 47 |
|
15 | | -Run `./start.sh` to launch a test server with a online dataset. |
| 48 | +You may use the [SentenceTransformers](https://sbert.net/) package to compute high-dimensional embeddings from text data, and then use the [UMAP](https://umap-learn.readthedocs.io/en/latest/index.html) package to compute 2D projections. |
| 49 | + |
| 50 | +You may also specify a column for pre-computed nearest neighbors: |
| 51 | + |
| 52 | +```bash |
| 53 | +embedding-atlas path_to_dataset.parquet --x projection_x --y projection_y --neighbors neighbors |
| 54 | +``` |
16 | 55 |
|
17 | | -To build the wheel, run `./build.sh` to build the wheel. |
| 56 | +The `neighbors` column should have values in the following format: `{"ids": [id1, id2, ...], "distances": [d1, d2, ...]}`. |
| 57 | +If this column is specified, you'll be able to see nearest neighbors for a selected point in the tool. |
0 commit comments