IIT Madras’ AI4Bharat Launches ‘Indic LLM-Arena’ Benchmark to Evaluate AI Models for Indian Languages and Context

Developed in collaboration with Google Cloud, the Indic LLM-Arena is a crowdsourced platform that allows users to compare AI models through anonymous voting.

Table of Contents

Focus on Indian Languages and Code-Mixed Scenarios
Addressing the Lack of Regional Benchmarks
How the Indic LLM-Arena Works

Chennai, November 10, 2025: The AI4Bharat research lab at the Indian Institute of Technology (IIT) Madras has unveiled a new open-source benchmark designed to test the performance of large language models (LLMs) in Indian languages and contexts. The initiative, known as Indic LLM-Arena, also evaluates the safety and cultural relevance of these models in real-world Indian use cases.

Developed with assistance from Google Cloud, the Indic LLM-Arena introduces a crowdsourced evaluation framework where thousands of anonymous users cast votes to determine which AI model performs better. According to AI4Bharat’s official blog post, the results are displayed on a “human-in-the-loop leaderboard,” ensuring that model rankings are grounded in actual user preferences.

Focus on Indian Languages and Code-Mixed Scenarios

Currently, the Indic LLM-Arena supports text-based inputs in multiple Indian languages and code-mix formats such as Hinglish and Tanglish. The research team noted that it plans to expand the platform to include AI agents and multimodal models that can process both visual and audio data in the future.

“Evaluation goes beyond translating India’s 22 scheduled languages. It’s about understanding how Indians naturally communicate, often mixing multiple languages within a sentence,” the AI4Bharat team explained. They added that all anonymized data, code, and testing pipelines will be released under an open-source license to promote transparency and community collaboration.

Announcing the Indic LLM-Arena 🇮🇳 At AI4Bharat (IIT Madras), our mission has always been clear – to build open, inclusive, and world-class AI for Indian languages. The Indic LLM-Arena is a step toward realizing that vision through community-driven benchmarking.

🚀 Announcing the Indic LLM-Arena 🇮🇳
At AI4Bharat (IIT Madras), our mission has always been clear – build open, inclusive, and world-class AI for Indian languages.

To further this goal, today, we’re introducing the Indic LLM-Arena, a crowd-sourced, human-in-the-loop leaderboard…

— AI4Bharat (@ai4bharat) November 10, 2025

Addressing the Lack of Regional Benchmarks

The creation of Indic LLM-Arena comes amid growing concerns from Indian AI developers and researchers about the absence of regional benchmarks to accurately assess AI Model performance. Earlier this month, OpenAI introduced the IndQA benchmark, which evaluates an AI model’s language proficiency and cultural understanding in Indian contexts.

According to OpenAI, the IndQA benchmark includes 2,278 questions across 12 Indian languages and 10 cultural domains, developed in collaboration with 261 experts from across the country. AI4Bharat hopes that its new benchmark will further complement such initiatives by offering an India-specific comparative evaluation framework.

“Indic LLM-Arena allows researchers, startups, and businesses to compare how different models perform on Indic-specific languages and real-world applications,” AI4Bharat said. “Organizations across industries can use these insights to make data-driven decisions about model adoption, reduce deployment risks, and accelerate responsible AI development for Indian users.”

How the Indic LLM-Arena Works

AI4Bharat explained that Indic LLM-Arena uses a fair, blind, side-by-side comparison system inspired by international benchmarking platforms like lmarena. The process involves the following steps:

Users enter a prompt in any Indian language or a mix of languages.
The platform displays responses from two anonymous AI models—referred to as Model A and Model B—to prevent brand bias.
Participants vote for the response they find superior or mark it as a tie.
After thousands of user votes, AI4Bharat applies the Bradley–Terry statistical model to rank the models based on their real-world performance with Indian language prompts.

The lab stated that it will soon publish an updated public leaderboard after accounting for statistical uncertainties. In addition, detailed leaderboards categorized by language, task, and domain will be released to help researchers perform more targeted evaluations.

By creating a transparent, community-driven testing ecosystem, IIT Madras’ AI4Bharat aims to make India a global leader in inclusive AI innovation that truly understands and reflects the country’s linguistic diversity.

For breaking news and live news updates, like us on Facebook or follow us on Twitter and Instagram. Read more on Latest Technology on thefoxdaily.com.

COMMENTS 0

About the Author

Ashish kumar

Ashish Kumar is the creative mind behind The Fox Daily, where technology, innovation, and storytelling meet. A passionate developer and web strategist, Ashish began exploring the web when blogs were hand-coded, and CSS hacks were a rite of passage. Over the years, he has evolved into a full-stack thinker—crafting themes, optimizing WordPress experiences, and building platforms that blend utility with design. With a strong footing in both front-end flair and back-end logic, Ashish enjoys diving into complex problems—from custom plugin development to AI-enhanced content experiences. He is currently focused on building a modern digital media ecosystem through The Fox Daily, a platform dedicated to tech trends, digital culture, and web innovation. Ashish refuses to stick to the mainstream—often found experimenting with emerging technologies, building in-house tools, and spotlighting underrepresented tech niches. Whether it's creating a smarter search experience or integrating push notifications from scratch, Ashish builds not just for today, but for the evolving web of tomorrow.