Summarize this article with:
- LLMs can be billed either per request or per token.
- At Eden AI, we've built a system that takes into account the nuances of token billing across different providers and languages.
- Provider Flexibility: We support multiple LLM providers, each with their own tokenization methods.
- Understanding how to count tokens when using the Eden AI API for text generation is essential for managing costs and ensuring your prompts are effective.
- When you make a request to the text generation endpoint, you’ll receive usage information in the response, including the number of tokens used for both the prompt and the completion.
Introduction
When working with the Eden AI API, the currency of your interactions isn't just data or results—it's tokens. But what are tokens exactly, and how do they relate to the characters in your text? This guide provides a practical understanding of LLM token billing and explains how various providers handle this crucial aspect of AI model usage.
The Challenge of LLM Token Billing
Language Model (LLM) token billing is at the heart of AI-powered applications. In this context, a “token” is not a random unit of measurement but a carefully calibrated representation of text. Tokens can be as small as a single character or as large as entire words, depending on the language and context. This variability is what makes understanding LLM billing both challenging and fascinating.
Specifically in the case of Eden AI, the tokenization is handled directly by the models behind each provider.
From Characters to Tokens: A Journey through Tokenization
Consider the sentence "Hello, World!". In English, this sentence has 13 characters. But how many tokens is it? This depends on the tokenization method used by the AI model. Some models might see "Hello" as one token and ", World!" as three tokens, making it four tokens in total. Other models might break it down differently.
For example, the famous GPT models use the Byte Pair Encoding (BPE) tokenization. In BPE, frequently occurring pairs of bytes (or characters in textual data) are iteratively replaced with a single, unused byte. This method is efficient for handling both common and rare words, and it often leads to tokens that correspond to common subwords in a language.
LLM Token Billing: Per-Request vs. Per-Token
LLMs can be billed either per request or per token. A per-request billing model charges a flat fee for each API call, regardless of the amount of data processed. A per-token model, on the other hand, charges based on the number of tokens processed.
Per-token billing can be more cost-effective for tasks that require processing small amounts of text, while per-request billing might be more economical for tasks that require processing large amounts of text in a single call.
Unraveling the Token Count in Different Languages
Language is a beautiful, complex system, and it adds an extra layer of complexity to token billing. Different languages have different tokenization patterns, which can lead to different token counts for the same content.
Consider the word “communication”. In English, this is a single word with 13 characters. But how many tokens is it? It could be broken down into "commun" and "ication", making it two tokens. In French, the same word is “communication”—identical to the English version and therefore likely the same number of tokens. But in German, it’s “Kommunikation”, which might be tokenized differently.
In languages with a non-Latin alphabet, such as Arabic or Chinese, a single character can represent a whole word or concept. This means a single token could carry a lot more information than in English. Similarly, in highly agglutinative languages like Finnish or Turkish, a single word can carry the meaning of an entire English sentence, potentially leading to a more efficient use of tokens.
How Eden AI Handles Token Billing
At Eden AI, we've built a system that takes into account the nuances of token billing across different providers and languages. Our API tracks the tokens used for your text generation tasks and bills you accordingly.
In our system, we’ve made the following design decisions to ensure fair and transparent billing:
- Token-Based Billing: We bill based on the number of tokens processed by the LLM, providing a fair reflection of the computational resources used.
- Transparency: We provide detailed billing information so that you always know exactly what you’re being billed for.
- Provider Flexibility: We support multiple LLM providers, each with their own tokenization methods. Our system handles the complexities of different tokenization methods, so you don’t have to.
LLM Token-Based Billing Comparison between Providers
Let’s take a look at some of the major LLM providers and compare their token-based billing. Note that these figures are approximate and can change, so always check with the provider for the most up-to-date pricing information.
OpenAI GPT-3.5
OpenAI’s GPT-3.5 offers a range of models with different capabilities and pricing. The billing is per token, with the cost varying based on the model and its capabilities. For instance, the “gpt-3.5-turbo” model, one of the most capable and efficient ones, charges a fixed amount per 1K tokens, but they distinguish between “input” and “output” tokens, with output tokens costing more.
OpenAI GPT-4
OpenAI’s GPT-4 follows a similar per-token pricing model, but it’s typically more expensive due to its enhanced capabilities. Like GPT-3.5, it differentiates between “input” and “output” tokens, with output tokens costing more.
Anthropic Claude 3
Anthropic’s Claude 3 also uses a per-token billing model, though details can vary based on the specific model and its capabilities. It’s always advisable to check Anthropic’s official documentation for the most up-to-date pricing information.
How to Count Tokens when using the Eden AI API for Text Generation?
Understanding how to count tokens when using the Eden AI API for text generation is essential for managing costs and ensuring your prompts are effective.
When you make a request to the text generation endpoint, you’ll receive usage information in the response, including the number of tokens used for both the prompt and the completion. This information is detailed in the text generation API documentation.
Here’s a step-by-step guide for Python users:
Install the Necessary Libraries
For using LLMs and counting tokens, you’ll need to install the openai, tiktoken, and requests libraries, among others. You can do this using pip.
Import the Libraries
Make a Request and Count Tokens
In this example, we make a request to the text generation endpoint and print the number of input and output tokens. This information is crucial for billing purposes and helps you understand how your usage translates into tokens.

.jpg)
.png)
.png)
