Science
8 min reading

Language codes standardization

Summarize this article with:

summary
  • Adding examples to the prompt does help , but only up to a certain point.
  • 10 examples seems to be good enough : We noticed that beyond 10 examples, there is generally no significant improvement in performance.
  • Fine-tuning with very few examples gives worse performance than few-shot : Fine-tuning can hurt performance when done with very few examples (10 in our case).
  • Eden AI provides a unified API platform that gives you access to the best AI providers for this use case, with standardized responses and centralized billing.
  • Language codes standardization enables automation, accuracy improvements, and cost reduction across AI-powered applications.

by Eden AI

In this post, we investigate to what extent AI models are capable of learning from examples provided within the prompt itself. We attempt to figure out the best way to leverage what is called “few-shot learning” in practical scenarios, and when to prefer fine-tuning over it.

What is few-shot learning?

Few-shot learning is about providing the AI with some example pairs of inputs and associated outputs as part of the prompt. This helps the AI to understand what kind of output format we are looking for, and to better understand the task.

In this post, we look at how few-shot learning can be used for classification. The examples in the prompt look like this:

Eden AI - Few-shot learning prompt

What is the ideal number of examples?

We investigate the ideal number of examples for our task (multi-class text classification). We use a movie plot dataset, and try to classify the movie genre given the plot.

In the figure below, we plot classification accuracy as a function of the number of examples for 3 different models (GPT3.5 Turbo, Cohere Command, and Anthropic Claude2). 0 examples corresponds to zero-shot learning (no examples in the prompt).

Eden AI - Few-shot learning figure

Looking at these curves, we can note that:

  • Adding examples to the prompt does help, but only up to a certain point. Adding more and more examples doesn’t continuously improve performance, and could even decrease it at some point.
  • 10 examples seems to be good enough: We noticed that beyond 10 examples, there is generally no significant improvement in performance. This is consistent with what has been observed in the literature for few-shot learning tasks.

How should examples be chosen?

Choosing the examples wisely can make a big difference. Here we investigate how the choice of examples impacts performance.

Random vs Diverse few-shot

Providing diverse examples can help the model better understand the range of possible inputs. We compare random and diverse examples below.

Eden AI - Random vs Diverse few-shot

  • Diversity helps on average but may not help for every model

Fixed vs Dynamic (or Adaptive) few-shot

Dynamic few-shot means that the examples are chosen based on the input. For example, we can use a retrieval model to find the most similar examples to the input. We compare fixed and dynamic examples below.

Eden AI - Fixed vs Dynamic few-shot

  • Dynamic is better than fixed examples for all models

When should we prefer fine-tuning over few-shot learning?

Fine-tuning is generally much more powerful than few-shot learning when done correctly. However, it takes more time and effort to set up. There are also some cases where fine-tuning gives worse results than few-shot learning.

In our experiment, we fine-tune the Davinci model (GPT3) with 3 different numbers of training examples: 2 examples per class (10 total), 50 examples per class (250 total), and 500 examples per class (2500 total). We compare the fine-tuned models with the best few-shot learner.

Eden AI - Fine-tuning vs few-shot

  • Fine-tuning with very few examples gives worse performance than few-shot: Fine-tuning can hurt performance when done with very few examples (10 in our case). More data is needed.
  • Fine-tuning with 500 examples per class gives much better performance than few-shot: Fine-tuning is preferred when you have a larger number of annotated data.

Takeaways for few-shot learning

  • 10 examples in the prompt is generally enough
  • Dynamic few-shot outperforms fixed few-shot
  • Diverse examples help on average but may not for all models
  • Fine-tuning is preferred when you have more than ~50 training examples per class

FAQ — Language codes standardization

Language codes standardization is an AI-powered capability that helps developers and businesses automate workflows, process data at scale, and improve decision accuracy.
The process involves sending data — text, image, audio, or document — to an AI model via API, which returns structured results in JSON format.
Common applications include document processing, content moderation, data extraction, language translation, and building intelligent automation pipelines.
Eden AI aggregates the best providers under a single API, letting you compare and switch between models without managing separate accounts or API keys.
Yes. Most AI APIs offer SLAs, rate limits, and enterprise plans. Eden AI adds fallback routing and centralized monitoring to further improve reliability.

Similar articles

Science
All
What is an AI Engineer?
12/3/2025
Science
All
How to Automate AI Model Selection in Production: A Practical Guide
11/21/2025
·
Written byTaha Zemmouri
let’s start

Start building with Eden AI

A single interface to integrate the best AI technologies into your products.