Online Evaluation
Contribute Benchmark
Large Model Evaluation System
Shanghai AI Laboratory Open-source, efficient, and comprehensive large model evaluation system and open platform
OpenCompass
Shanghai AI Laboratory Open-source, efficient, and comprehensive large model evaluation system and open platform
imgimg
CompassKit
Evaluation Toolkit
Provides a rich set of evaluation benchmarks and model templates in the form of an open source toolkit, supports flexible expansion to meet various practical business needs of model evaluation
View on Github →
imgimg
CompassHub
Benchmark Community
Publish and share evaluation benchmarks and leaderboards within the community. At the same time, select high-quality benchmarks to be added to the official evaluation leaderboards
Go to the Community →
imgimg
CompassRank
Evaluation Leaderboards
Provides comprehensive, objective, and neutral scores and rankings for top-tier large language models and multimodal models. Leave your model API or repository address here and compete with top AI talents
View Leaderboards →
Support
Large Language Model Leaderboard & Multi-modal Leaderboard
One-stop Evaluation
box
NLP
Large Language Model Evaluation
OpenCompass performs an in-depth and holistic assessment of large language models, leveraging an extensive array of rigorously curated benchmarks across eight fundamental dimensions: language comprehension, knowledge precision, logical deduction, creative ideation, mathematical problem-solving, programming proficiency, extended text analysis, and intelligent agent engagement.
_MULTIMODAL
How do we evaluate?
OpenCompass platform extensively supports over 40+ HuggingFace and API models, incorporating a diverse range of 100+ benchmarks with approximately 400,000 questions to evaluate models across eight dimensions. Its efficient distributed evaluation system allows for quick and thorough assessment of billion-scale models. The platform accommodates various evaluation methods, including zero-shot, few-shot, and chain-of-thought evaluations, and features a highly extendable modular design for easy addition of new models, benchmarks, or customized task strategies. Additionally, OpenCompass includes robust experiment management and reporting tools for detailed tracking and real-time result presentation.
how-to-evaluate
6 Key Features for Professional Evaluation
Open Source and Reproducible
A complete open source and reproducible large language model evaluation system
Rich Model Support
One-stop evaluation support for various types of models (HF models, API models, custom open source models)
Distributed and Efficient Evaluation
Provides distributed and efficient evaluation of dozens of benchmarks, supporting rapid evaluation of models with hundreds of billions of parameters
Comprehensive Capability Dimensions
Comprehensive division of capability dimensions and corresponding benchmark support
Flexible Expansion
Supports flexible and convenient addition of evaluation benchmarks and models
Diverse Evaluation Methods
Supports zero-shot evaluation, few-shot evaluation, and thought chain evaluation
Partners
If you are interested in cooperation, please contact us via email:
opencompass@pjlab.org.cn
Aliyun
East Money
Duxiaoman
Fudan University
Nanjing University
Nanyang Technological University
OPPO
Tencent
TigerResearch
The Chinese University of Hong Kong
Zhejiang University
Zhipu·AI
Partner Program
We have launched the OpenCompass Partner Program, and welcome more excellent models to join our leaderboard. At the same time, we sincerely invite all of you to contribute more representative and reliable benchamarks
Evaluation Leaderboards
Only open source or external API service models are supported (internal iterative models currently do not support)
Benchmark Community
Make your benchmark shine: co-create and share evaluation benchmarks with the community, establish unique leaderboards, and together shape the new benchmark for large model evaluation
box
box