Embeddings turn text into numerical vectors that capture meaning. They are the foundation for RAG pipelines, semantic search, recommendations, and clustering. Eden AI exposes an OpenAI-compatible embeddings endpoint that works the same way across all supported providers — pick a model, send text, get vectors.Documentation Index
Fetch the complete documentation index at: https://www.edenai.co/docs/llms.txt
Use this file to discover all available pages before exploring further.
What are embeddings?
An embedding is a list of floating-point numbers — a vector — that represents a piece of text in a high-dimensional space. The model is trained so that texts with similar meaning land close together in that space, and unrelated texts land far apart. Concretely, “cat” and “kitten” produce nearly identical vectors. “Cat” and “airplane” produce vectors that point in very different directions. The distance between two vectors (usually measured with cosine similarity) is a numerical proxy for how related the two texts are. You generate embeddings once for your corpus, store the vectors, and then compare new query vectors against the stored ones at lookup time. That’s the entire shape of semantic search and RAG.Use cases
- Retrieval-Augmented Generation (RAG) — embed your docs, retrieve the most relevant chunks for a user question, feed them into an LLM.
- Semantic search — match queries against documents by meaning, not keywords.
- Recommendations — suggest similar products, articles, or songs based on description vectors.
- Clustering and topic discovery — group thousands of texts by meaning without labels.
- Deduplication — find near-duplicates that don’t share exact wording.
- Anomaly detection — flag inputs that look unlike anything in your corpus.
- Classification — train a small classifier on top of frozen embeddings instead of fine-tuning a full model.
Endpoints
provider/model — the same format used everywhere else in V3.
List available models
id, owned_by, context_length, pricing, capabilities, and regions. Use any id as the model field below.
Create embeddings
The example picks a model from the catalog at runtime so the snippet never goes stale.Worked example: semantic search
This is the smallest end-to-end example that demonstrates the full retrieval pattern: embed a query and a small corpus in one batched call, score with cosine similarity, return the top matches.Python
cost field. Swap in your own corpus, persist corpus_vecs to a vector database, and you have a working RAG retriever.
Request body
| Field | Type | Required | Description |
|---|---|---|---|
model | string | yes | provider/model from /v3/embeddings/models. |
input | string | string[] | int[] | int[][] | yes | Text to embed, or a batch. Pre-tokenized integer inputs are also accepted. |
encoding_format | "float" | "base64" | no | Defaults to "float". "base64" reduces wire payload size. |
dimensions | integer | no | Truncate the output vector. Only supported by models that advertise it (3-series). |
user | string | no | End-user identifier for abuse tracking. |
metadata | object | no | Eden extension. Free-form metadata stored with the request. |
Response
provider and cost are Eden extensions on top of the OpenAI shape. When encoding_format is "base64", each embedding is a base64-encoded string instead of a list.
Batching
Pass a list toinput to embed multiple strings in one call. Items in data keep their index matching the input order — that’s the property the worked example relies on. Batching is significantly cheaper and faster than one call per string.
Python
Choosing a model
There is no single best embedding model — picking one is a tradeoff between quality, dimension count, context length, language coverage, and price.- Small / fast / cheap (e.g.
*-smallvariants) — good default for most semantic-search workloads. Lower latency, lower cost per token, vectors are smaller so storage and dot products are faster. - Large / higher quality (e.g.
*-largevariants) — meaningfully better recall on hard retrieval tasks (long technical docs, multilingual corpora). Costs more and produces larger vectors. - Context length — long-context embedding models let you embed entire documents without chunking. Most 8k-context models still need chunking for paragraphs above ~2k tokens.
- Dimensions — some 3-series models support a
dimensionsparameter to truncate outputs. Smaller vectors save storage and speed up similarity search at a small recall cost. - Multilingual — verify language support in the model’s
capabilitiesbefore using it for non-English corpora.
Errors
Common HTTP status codes returned by/v3/embeddings:
| Status | Meaning |
|---|---|
400 | Invalid request — usually a missing model, malformed input, or unsupported parameter. |
401 | Missing or invalid Authorization: Bearer token. |
402 | Insufficient credits. Top up from the dashboard. |
404 | The model id does not exist or is not enabled on your account. |
429 | Rate limit exceeded — back off and retry, or switch to a model with more headroom. |
5xx | Upstream provider error. Configure a fallback to route to a backup model. |
Best practices
- Smart routing is not supported on embeddings. Always pass a concrete
provider/model, not@edenai/.... See Smart routing for which endpoints support it. - Compare vectors with cosine similarity. It is the standard distance for embedding spaces. Normalize once at write time so retrieval is a single dot product.
- Re-index when you change models. Vectors from different models are not compatible — store
(text, embedding, model_id)together and re-embed if you switch. - Cache embeddings. They are deterministic for the same
(model, input)pair, so caching by hash avoids re-billing for unchanged content. - Chunk before embedding long documents. Most models cap at 8k tokens; for retrieval, paragraph-sized chunks (~200–500 tokens) generally outperform whole-document embeddings.
Related
- List LLM models — discover all chat-completion models.
- Smart routing — automatic provider selection (chat only).
- Fallback — route to a backup model on errors.
- Plans & pricing — credits, budgets, and per-call costs.
- OpenAI SDK (Python) — call
/v3/embeddingsthrough the OpenAI client.