BenchmarksBEIR
Public leaderboard
| # | Model | Score |
|---|---|---|
| 1 | BGE-M3 | 54.1 |
Your artifacts on this benchmark
Ranked among public results
| Artifact | Score | Rank | Δ |
|---|---|---|---|
| claims-copilot-v3 | 79.8 | #4 | -1.2 |
| wealth-advisor-rag-v5 | 82.1 | #1 | +1.1 |
| trade-finance-helper-v3 | 76.4 | #6 | -0.9 |
| claude-3-5-sonnet (vendor baseline) | 81.0 | #2 | 0.0 |
| gpt-4o (vendor baseline) | 77.3 | #5 | 0.0 |
Latest result card — wealth-advisor-rag-v5 on BEIR
Overall score
82.1
Public rank
#1
Cases evaluated
10,231
Cost
$184.20
Category breakdown
Open-book QA — 10-K
86.0%
Open-book QA — 10-Q
83.0%
Earnings transcripts
79.0%
Cross-document reasoning
74.0%
Numerical reasoning
81.0%