Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.edenai.co/docs/llms.txt

Use this file to discover all available pages before exploring further.

Embeddings turn text into numerical vectors that capture meaning. They are the foundation for RAG pipelines, semantic search, recommendations, and clustering. Eden AI exposes an OpenAI-compatible embeddings endpoint that works the same way across all supported providers — pick a model, send text, get vectors.

What are embeddings?

An embedding is a list of floating-point numbers — a vector — that represents a piece of text in a high-dimensional space. The model is trained so that texts with similar meaning land close together in that space, and unrelated texts land far apart. Concretely, “cat” and “kitten” produce nearly identical vectors. “Cat” and “airplane” produce vectors that point in very different directions. The distance between two vectors (usually measured with cosine similarity) is a numerical proxy for how related the two texts are. You generate embeddings once for your corpus, store the vectors, and then compare new query vectors against the stored ones at lookup time. That’s the entire shape of semantic search and RAG.

Use cases

  • Retrieval-Augmented Generation (RAG) — embed your docs, retrieve the most relevant chunks for a user question, feed them into an LLM.
  • Semantic search — match queries against documents by meaning, not keywords.
  • Recommendations — suggest similar products, articles, or songs based on description vectors.
  • Clustering and topic discovery — group thousands of texts by meaning without labels.
  • Deduplication — find near-duplicates that don’t share exact wording.
  • Anomaly detection — flag inputs that look unlike anything in your corpus.
  • Classification — train a small classifier on top of frozen embeddings instead of fine-tuning a full model.

Endpoints

GET  /v3/embeddings/models     List available embedding models
POST /v3/embeddings            Create embeddings
Models are identified as provider/model — the same format used everywhere else in V3.

List available models

import requests

response = requests.get(
    "https://api.edenai.run/v3/embeddings/models",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
)

for model in response.json()["data"]:
    print(model["id"], "-", model.get("context_length"))
Each item exposes id, owned_by, context_length, pricing, capabilities, and regions. Use any id as the model field below.

Create embeddings

The example picks a model from the catalog at runtime so the snippet never goes stale.
import requests

headers = {"Authorization": "Bearer YOUR_API_KEY"}

model_id = requests.get(
    "https://api.edenai.run/v3/embeddings/models",
    headers=headers,
).json()["data"][0]["id"]

response = requests.post(
    "https://api.edenai.run/v3/embeddings",
    headers={**headers, "Content-Type": "application/json"},
    json={
        "model": model_id,
        "input": "The quick brown fox jumps over the lazy dog",
    },
).json()

vector = response["data"][0]["embedding"]
print(f"{model_id}: {len(vector)} dimensions, cost=${response['cost']}")
This is the smallest end-to-end example that demonstrates the full retrieval pattern: embed a query and a small corpus in one batched call, score with cosine similarity, return the top matches.
Python
import requests
import numpy as np

API_KEY = "YOUR_API_KEY"
headers = {"Authorization": f"Bearer {API_KEY}"}

# 1. Pick a model from the catalog.
model_id = requests.get(
    "https://api.edenai.run/v3/embeddings/models",
    headers=headers,
).json()["data"][0]["id"]

# 2. Define a query and a corpus of documents to search over.
query = "How do I track my API costs?"
corpus = [
    "Eden AI returns a `cost` field on every response so you can track spend per call.",
    "You can cap spending by creating custom API keys with a per-token budget.",
    "Smart routing with `@edenai` applies to chat/completions, not to embeddings.",
    "The /v3/models endpoint lists every available chat-completions model.",
    "To upload files for vision-capable LLMs, use the /v3/upload endpoint.",
]

# 3. Embed query + corpus in a single batched call. Eden returns vectors in input order.
payload = {"model": model_id, "input": [query, *corpus]}
response = requests.post(
    "https://api.edenai.run/v3/embeddings",
    headers={**headers, "Content-Type": "application/json"},
    json=payload,
).json()

vectors = np.array([item["embedding"] for item in response["data"]])
query_vec, corpus_vecs = vectors[0], vectors[1:]

# 4. Cosine similarity = dot product of L2-normalized vectors.
def normalize(v):
    return v / np.linalg.norm(v, axis=-1, keepdims=True)

scores = normalize(corpus_vecs) @ normalize(query_vec)
ranking = np.argsort(scores)[::-1]

print(f"Embedded {len(corpus) + 1} texts for ${response['cost']:.6f}\n")
for rank, idx in enumerate(ranking[:3], start=1):
    print(f"{rank}. ({scores[idx]:.3f}) {corpus[idx]}")
The first hit should be the document about the cost field. Swap in your own corpus, persist corpus_vecs to a vector database, and you have a working RAG retriever.

Request body

FieldTypeRequiredDescription
modelstringyesprovider/model from /v3/embeddings/models.
inputstring | string[] | int[] | int[][]yesText to embed, or a batch. Pre-tokenized integer inputs are also accepted.
encoding_format"float" | "base64"noDefaults to "float". "base64" reduces wire payload size.
dimensionsintegernoTruncate the output vector. Only supported by models that advertise it (3-series).
userstringnoEnd-user identifier for abuse tracking.
metadataobjectnoEden extension. Free-form metadata stored with the request.
Unknown top-level fields are forwarded to the underlying provider, so provider-specific options can be passed through unchanged.

Response

{
  "object": "list",
  "model": "<provider>/<model>",
  "provider": "<provider>",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0123, -0.0456, "..."]
    }
  ],
  "usage": { "prompt_tokens": 9, "total_tokens": 9 },
  "cost": 0.0000012
}
provider and cost are Eden extensions on top of the OpenAI shape. When encoding_format is "base64", each embedding is a base64-encoded string instead of a list.

Batching

Pass a list to input to embed multiple strings in one call. Items in data keep their index matching the input order — that’s the property the worked example relies on. Batching is significantly cheaper and faster than one call per string.
Python
payload = {
    "model": model_id,
    "input": ["first sentence", "second sentence", "third sentence"],
}

Choosing a model

There is no single best embedding model — picking one is a tradeoff between quality, dimension count, context length, language coverage, and price.
  • Small / fast / cheap (e.g. *-small variants) — good default for most semantic-search workloads. Lower latency, lower cost per token, vectors are smaller so storage and dot products are faster.
  • Large / higher quality (e.g. *-large variants) — meaningfully better recall on hard retrieval tasks (long technical docs, multilingual corpora). Costs more and produces larger vectors.
  • Context length — long-context embedding models let you embed entire documents without chunking. Most 8k-context models still need chunking for paragraphs above ~2k tokens.
  • Dimensions — some 3-series models support a dimensions parameter to truncate outputs. Smaller vectors save storage and speed up similarity search at a small recall cost.
  • Multilingual — verify language support in the model’s capabilities before using it for non-English corpora.
A common pattern: prototype with a small model, then A/B test a larger model on your evaluation set before committing to the storage cost.

Errors

Common HTTP status codes returned by /v3/embeddings:
StatusMeaning
400Invalid request — usually a missing model, malformed input, or unsupported parameter.
401Missing or invalid Authorization: Bearer token.
402Insufficient credits. Top up from the dashboard.
404The model id does not exist or is not enabled on your account.
429Rate limit exceeded — back off and retry, or switch to a model with more headroom.
5xxUpstream provider error. Configure a fallback to route to a backup model.

Best practices

  • Smart routing is not supported on embeddings. Always pass a concrete provider/model, not @edenai/.... See Smart routing for which endpoints support it.
  • Compare vectors with cosine similarity. It is the standard distance for embedding spaces. Normalize once at write time so retrieval is a single dot product.
  • Re-index when you change models. Vectors from different models are not compatible — store (text, embedding, model_id) together and re-embed if you switch.
  • Cache embeddings. They are deterministic for the same (model, input) pair, so caching by hash avoids re-billing for unchanged content.
  • Chunk before embedding long documents. Most models cap at 8k tokens; for retrieval, paragraph-sized chunks (~200–500 tokens) generally outperform whole-document embeddings.