Lilac is Now on Eden AI: Access Kimi K2.6, MiniMax M2.7, GLM 5.1 & Gemma 4

Summarize this article with:

‍Lilac is now available as a provider on Eden AI, bringing cost-efficient LLM inference to the platform through an OpenAI-compatible API. Backed by YC, Lilac routes requests to idle enterprise GPU capacity, helping developers access warm inference infrastructure without reserved-capacity overhead.

Developers can now use Kimi K2.6, MiniMax M2.7, GLM 5.1, and Gemma 4 through Eden AI. This gives teams more model choice, lower inference costs, and access to new models without any new integration.

What is Lilac?

Lilac is a YC S25-backed inference startup building a more cost-efficient way to serve large language models. The Lilac inference API routes requests to underused enterprise GPU clusters, where capacity often runs below full utilization, giving developers OpenAI-compatible access to LLM inference without cold starts or reserved-capacity markup.

Unlike Groq, Together, or Fireworks, Lilac does not rely only on dedicated inference infrastructure. Its approach is based on idle GPU inference, turning existing enterprise compute into a distributed serving layer. This helps lower costs while keeping latency predictable and pricing simple, with pay-per-token access starting from $5.

‍

Our Interview with Lilac’s CEO

To better explain what Lilac brings to developers, Lucas Ewing, CEO of Lilac, shared insights into the company’s mission, model offering, and infrastructure approach. In the interview below, he explains how Lilac supports cost-efficient AI inference, what makes its routing layer different, and how the Eden AI integration simplifies access for developers.

Can you introduce Lilac and its mission?

Lilac builds cost-efficient inference infrastructure for large language models. Its mission is to make high-performance AI inference easier and more affordable by using GPU capacity that is already deployed but underutilized.

Many enterprise GPU clusters sit idle at different times of day. Lilac connects that available capacity to developers who need reliable model inference, helping reduce compute waste while lowering the cost of serving AI workloads.

Can you go into more detail about your offering, your models, and what makes them unique?

Lilac offers hosted inference for open-weight language models, including models such as Kimi, GLM, and Gemma. These models support common developer workloads like text generation, reasoning, coding, tool use, structured outputs, AI agents, and long-context applications.

What makes Lilac different is the routing layer behind the API. Instead of serving every request from a single fixed deployment, Lilac routes traffic across a distributed fleet of enterprise GPUs. This lets us route users to nearby healthy GPUs for lower latency, while load balancing across a wider pool of warm capacity for better average throughput. Because the GPUs are already running in enterprise environments, Lilac can offer usage-based inference without requiring customers to reserve or manage dedicated infrastructure.

Who are your target users or customers?

Lilac is built for developers and teams running production LLM workloads who care about cost, latency, and simplicity. Typical users include AI application developers, agent builders, developer-tool companies, research teams, support automation teams, and businesses adding LLM features to existing products.

Lilac is especially useful for teams that want access to capable open-weight models without building and operating their own GPU serving stack.

What led you to integrate with Eden AI?

Eden AI gives developers a single place to access multiple AI providers and choose the right provider for each workload. Integrating with Eden AI makes it easier for developers to try Lilac without changing their broader provider strategy or building a separate integration path. For Lilac, the partnership is a good fit because many Eden AI users already compare providers across price, latency, and model coverage. Lilac adds another option for teams looking for cost-sensitive LLM inference backed by distributed GPU capacity.

What’s next for your company? What are your future plans or vision?

Our focus is expanding model coverage, improving routing performance, and growing the distributed GPU network behind Lilac. We plan to support more high-utility open-weight models, improve latency-aware routing, and give developers better visibility into model performance, pricing, and availability.

Longer term, our vision is to make underutilized GPU capacity a standard part of AI infrastructure. We believe a large amount of compute is already deployed but not fully used, and better routing can make that capacity useful for developers while creating new revenue opportunities for GPU owners.

‍

Which models are now available via Lilac on Eden AI?

Kimi K2.6 API

Kimi K2.6, developed by Moonshot AI, is a 1T-parameter Mixture-of-Experts model with 32B activated parameters and a 262K-token context window. Available via Lilac on Eden AI, it supports text and image inputs, reasoning enabled by default, tool use, and structured output. Pricing starts at $0.70 per 1M input tokens and $3.50 per 1M output tokens.

Model	Provider	Input ($/1M tokens)	Output ($/1M tokens)	Context	Vision
Kimi K2.6	Eden AI via Lilac	$0.70	$3.50	262K	Image
Kimi K2.6	DeepInfra (FP4)	$0.60	—	262K	No
Kimi K2.6	Fireworks	$0.70	—	262K	No

MiniMax M2.7 API

MiniMax M2.7 API is a text-only model built for long-context reasoning and cost-efficient AI workflows. It supports a 205K-token context window, FP8 inference, reasoning, tool use, and structured output. Among the four Lilac models available on Eden AI, it has the lowest input price, at $0.30 per 1M input tokens and $1.20 per 1M output tokens.

Model	Provider	Input ($/1M tokens)	Output ($/1M tokens)	Context	Vision
MiniMax M2.7	Eden AI via Lilac	$0.30	$1.20	205K	No
MiniMax M2.7	OpenRouter	$0.28	$1.20	205K	No
MiniMax M2.7	Fireworks	$0.30	$1.20	205K	No

GLM 5.1 API

GLM 5.1, from Z.ai and Zhipu, is a 754B-parameter Mixture-of-Experts model with a 203K-token context window. The GLM 5.1 API is well suited for coding tasks, software agents, and agentic workflows that require reliable instruction following, tool use, and structured output. Via Lilac on Eden AI, pricing is $0.90 per 1M input tokens and $3.00 per 1M output tokens.

Gemma 4 31B API

Gemma 4 31B API gives developers access to Google’s 31B open-weight model through Lilac on Eden AI. It supports a 262K-token context window and multimodal inputs across text, image, and video, which is still uncommon through standard API access. It is also the cheapest multimodal option in this set, priced at $0.11 per 1M input tokens and $0.35 per 1M output tokens.

Model	Provider	Input ($/1M tokens)	Output ($/1M tokens)	Context	Vision
Gemma 4 31B	Eden AI via Lilac	$0.11	$0.35	262K	Image + Video
Gemma 4 31B	Together AI	—	—	262K	No

Next, let’s look at how to choose the right Lilac model depending on your workload, context length, modality, and budget.

Which Model by Lilac Should You Choose?

Best reasoning at mid price - Kimi K2.6

Choose Kimi K2.6 for complex reasoning, agentic pipelines, and multimodal tasks that combine text and image inputs. Reasoning is enabled by default, and the 262K context window can handle large codebases, long documents, or multi-step workflows without aggressive chunking.

Highest-volume text at lowest cost - MiniMax M2.7

Choose MiniMax M2.7 for production text pipelines where cost matters most. At $0.30 per 1M input tokens, it has the lowest input price of the four models, while still supporting reasoning, tool use, and structured output.

Code generation / agentic tasks - GLM 5.1

Choose GLM 5.1 for code generation, multi-step agents, and engineering workflows that depend on reliable tool use. Its 754B Mixture-of-Experts architecture and strong coding benchmarks make it the best fit for code-heavy or agentic workloads.

Video or image understanding cheaply - Gemma 4 31B

Choose Gemma 4 31B for multimodal pipelines that need image and video understanding through an API. It is the only model in this set with video frame support, and at $0.11 per 1M input tokens, it is also the lowest-cost option overall.

Use Case	Recommended Model	Context	Vision	Input Price
Complex reasoning and multimodal agents	Kimi K2.6	262K	Text + image	$0.70/M
High-volume text pipelines	MiniMax M2.7	205K	No	$0.30/M
Code generation and agentic engineering	GLM 5.1	203K	No	$0.90/M
Image and video understanding	Gemma 4 31B	262K	Text + image + video	$0.11/M

How to access these models on Eden AI?

Accessing Lilac models on Eden AI requires no new integration: they work through Eden AI’s existing unified API.

Create a free Eden AI account and get your API key
Select any Lilac model in the Eden AI playground, such as Kimi K2.6, MiniMax M2.7, GLM 5.1, or Gemma 4, or pass the model ID directly in your API call.‍
Call the API using the OpenAI-compatible SDK. If you already use Eden AI, no code changes are needed beyond selecting the Lilac model.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["EDEN_AI_API_KEY"],
    base_url="https://api.edenai.run/v3"
)

response = client.chat.completions.create(
    model="lilac/moonshotai/kimi-k2.6",
    messages=[
        {
            "role": "user",
            "content": "Hello, what can you do?"
        }
    ]
)

print(response.choices[0].message.content)

‍

Conclusion

Lilac models are available now on Eden AI: Kimi K2.6, MiniMax M2.7, GLM 5.1, and Gemma 4 31B. This gives developers more model choice, lower inference costs, and access through the same Eden AI integration they already use.

FAQs - Lilac is on Eden AI

What is Lilac AI?

Lilac AI is a YC-backed inference startup that routes LLM requests to idle enterprise GPU clusters, turning underused compute into always-available model infrastructure. It provides OpenAI-compatible, pay-per-token access with no commitments or reserved-capacity requirements.

Which models are available via Lilac on Eden AI?

Eden AI now gives developers access to Kimi K2.6 for reasoning and multimodal workflows, MiniMax M2.7 for low-cost text pipelines, GLM 5.1 for coding and agentic tasks, and Gemma 4 31B for image and video understanding, with more models coming soon.

Is Lilac reliable enough for production use?

Lilac's idle GPU model does not mean cold or unavailable infrastructure. It routes requests only to hardware that is already powered on, with always-warm capacity, no cold starts, and a real-time performance dashboard tracking TPS and TTFT metrics.

How does Lilac pricing compare to Fireworks or Together AI for Kimi K2.6?

Lilac prices Kimi K2.6 at $0.70 per 1M input tokens, compared with Fireworks at about $0.70 per 1M input tokens and DeepInfra at about $0.60 per 1M input tokens for FP4. This is competitive, especially when combined with Eden AI's routing, fallback, and provider-switching features.

Can I switch between Lilac and other providers on Eden AI without changing my code?

Yes. Eden AI's unified API lets developers switch between Lilac and other providers by changing the model name in one line, with no re-integration and no new API keys.

Last updated onJune 10, 2026

Taha Zemmouri

Taha Zemmouri is the CEO and co-founder of Eden AI. With previous experience in AI consulting, he brings a strong business perspective to artificial intelligence and focuses on turning AI capabilities into practical value for companies. With a background in data science and a real entrepreneurial mindset, he combines technical understanding, business vision, and hands-on execution to make AI more accessible and easier to integrate.