What is OpenCompass ? OpenCompass is a platform focused on understanding of the AGI, include Large Language Model and Multi-modality Model.
We aim to:
- develop high-quality libraries to reduce the difficulties in evaluation
- provide convincing leaderboards for improving the understanding of the large models
- create powerful toolchains targeting a variety of abilities and tasks
- build solid benchmarks to support the large model research
- research on inference of Large Model(analysis, reasoning, prompt engineering.)
OpenCompass
- OpenCompass is an LLM evaluation platform, supporting a wide range of models (LLaMA, LLaMa2, ChatGLM2, ChatGPT, Claude, etc) over 80+ datasets.
- https://github.com/open-compass/opencompass
VLMEvalKit
- VLMEvalKit is a toolkit for evaluating large vision-language models (LVLMs), currently supporting ~20 LVLMs and five multi-modal benchmarks.
- https://github.com/open-compass/vlmevalkit
Project | Topic | Paper |
Automated Software Development | ||
Critic Reasoning | ||
Hallucination Annotation |
ANAH: Analytical Annotation of Hallucinations in Large Language Models |
|
Mathematical Reasoning | ||
Tool Utilization |
T-Eval: Evaluating the Tool Utilization Capability Step by Step |
|
Multi Modality | ||
Subjective Evaluation |
BotChat: Evaluating LLMs’ Capabilities of Having Multi-Turn Dialogues |
|
Domain Evaluation |
LawBench: Benchmarking Legal Knowledge of Large Language Models |