Eval Runs
1,247 runs in the last 90 days · 3 currently running · 6 scheduled
| Run | Type | Started by | When | Cases | Pass | Cost | Duration | Status |
|---|---|---|---|---|---|---|---|---|
| Pre-deployment eval — claims-extraction-v18 run-2041 · 12 evaluators · 1 artifact(s) | Pre-deployment | Vikram Shetty | 2 days ago | 5,000 | 88.4% | $42.17 | 18m 04s | Failed |
| Comparison: claude-sonnet vs gpt-4o vs llama-3.1-70b on customer-support-rag run-2038 · 14 evaluators · 3 artifact(s) | Comparison | Vikram Shetty | 4 days ago | 12,400 | 92.7% | $312.45 | 1h 42m | Passed |
| Caste-bias sweep — all customer-facing artifacts run-2037 · 4 evaluators · 14 artifact(s) | Ad-hoc | Anjali Krishnan | 1 week ago | 25,000 | 95.4% | $198.20 | 3h 12m | Passed |
| Pre-certification eval — claims-copilot-v3 v18 run-2036 · 28 evaluators · 1 artifact(s) | Pre-certification | Catherine O'Brien | 1 week ago | 10,247 | 96.2% | $287.10 | 2h 18m | Passed |
| Indic quality eval — hindi-customer-voice run-2035 · 8 evaluators · 1 artifact(s) | Scheduled | scheduler@trustlab | 12h ago | 2,847 | 92.1% | $14.20 | 32m | Passed |
| Tool-use correctness — fraud-investigation-copilot run-2033 · 6 evaluators · 1 artifact(s) | Pre-deployment | Arjun Iyer | 3 days ago | 412 | 97.1% | $8.40 | 9m | Passed |
| Bias regression — loan-eligibility-assistant run-2032 · 9 evaluators · 1 artifact(s) | Regression | Anjali Krishnan | 2 days ago | 8,400 | 94.3% | $62.10 | 48m | Passed |
| Nightly smoke — claims-copilot-v3 run-2031 · 4 evaluators · 1 artifact(s) | Scheduled | scheduler@trustlab | 3 days ago | 800 | 86.0% | $8.00 | 4m | Failed |
| Weekly regression — kyc-onboarding-bot-v8 run-2030 · 5 evaluators · 1 artifact(s) | Regression | Vikram Shetty | 4 days ago | 1,111 | 93.0% | $25.00 | 5m | Passed |
| Drift check — mortgage-disclosure-generator-v9 run-2029 · 6 evaluators · 1 artifact(s) | Ad-hoc | Meera Pillai | 5 days ago | 1,422 | 87.0% | $42.00 | 6m | Passed |
| Ad-hoc eval — fraud-investigation-copilot-v4 run-2028 · 7 evaluators · 1 artifact(s) | Pre-deployment | Anjali Krishnan | 6 days ago | 1,733 | 94.0% | $59.00 | 7m | Passed |
| Bias sweep — loan-eligibility-assistant-v11 run-2027 · 8 evaluators · 1 artifact(s) | Scheduled | Catherine O'Brien | 7 days ago | 2,044 | 88.0% | $76.00 | 8m | Passed |
| RAG quality sweep — wealth-advisor-rag-v5 run-2026 · 9 evaluators · 1 artifact(s) | Regression | scheduler@trustlab | 8 days ago | 2,355 | 95.0% | $93.00 | 9m | Passed |
| Nightly smoke — hindi-customer-voice-v6 run-2025 · 10 evaluators · 1 artifact(s) | Ad-hoc | Vikram Shetty | 9 days ago | 2,666 | 89.0% | $110.00 | 10m | Passed |
| Weekly regression — trade-finance-helper-v3 run-2024 · 11 evaluators · 1 artifact(s) | Pre-deployment | Meera Pillai | 10 days ago | 2,977 | 96.0% | $127.00 | 11m | Partial |
| Drift check — claims-copilot-v3 run-2023 · 12 evaluators · 1 artifact(s) | Scheduled | Anjali Krishnan | 11 days ago | 3,288 | 90.0% | $144.00 | 12m | Passed |
| Ad-hoc eval — kyc-onboarding-bot-v8 run-2022 · 13 evaluators · 1 artifact(s) | Regression | Catherine O'Brien | 12 days ago | 3,599 | 97.0% | $161.00 | 13m | Passed |
| Bias sweep — mortgage-disclosure-generator-v9 run-2021 · 14 evaluators · 1 artifact(s) | Ad-hoc | scheduler@trustlab | 13 days ago | 3,910 | 91.0% | $178.00 | 14m | Passed |
| RAG quality sweep — fraud-investigation-copilot-v4 run-2020 · 15 evaluators · 1 artifact(s) | Pre-deployment | Vikram Shetty | 14 days ago | 4,221 | 98.0% | $195.00 | 15m | Failed |
| Nightly smoke — loan-eligibility-assistant-v11 run-2019 · 16 evaluators · 1 artifact(s) | Scheduled | Meera Pillai | 15 days ago | 4,532 | 92.0% | $212.00 | 16m | Passed |
| Weekly regression — wealth-advisor-rag-v5 run-2018 · 17 evaluators · 1 artifact(s) | Regression | Anjali Krishnan | 16 days ago | 4,843 | 86.0% | $229.00 | 17m | Passed |
| Drift check — hindi-customer-voice-v6 run-2017 · 18 evaluators · 1 artifact(s) | Ad-hoc | Catherine O'Brien | 17 days ago | 5,154 | 93.0% | $246.00 | 18m | Partial |
| Ad-hoc eval — trade-finance-helper-v3 run-2016 · 19 evaluators · 1 artifact(s) | Pre-deployment | scheduler@trustlab | 18 days ago | 5,465 | 87.0% | $23.00 | 19m | Passed |
| Bias sweep — claims-copilot-v3 run-2015 · 4 evaluators · 1 artifact(s) | Scheduled | Vikram Shetty | 19 days ago | 5,776 | 94.0% | $40.00 | 20m | Passed |
| RAG quality sweep — kyc-onboarding-bot-v8 run-2014 · 5 evaluators · 1 artifact(s) | Regression | Meera Pillai | 20 days ago | 6,087 | 88.0% | $57.00 | 21m | Passed |
| Nightly smoke — mortgage-disclosure-generator-v9 run-2013 · 6 evaluators · 1 artifact(s) | Ad-hoc | Anjali Krishnan | 21 days ago | 6,398 | 95.0% | $74.00 | 22m | Passed |
| Weekly regression — fraud-investigation-copilot-v4 run-2012 · 7 evaluators · 1 artifact(s) | Pre-deployment | Catherine O'Brien | 22 days ago | 6,709 | 89.0% | $91.00 | 23m | Passed |
| Drift check — loan-eligibility-assistant-v11 run-2011 · 8 evaluators · 1 artifact(s) | Scheduled | scheduler@trustlab | 23 days ago | 7,020 | 96.0% | $108.00 | 24m | Passed |
| Ad-hoc eval — wealth-advisor-rag-v5 run-2010 · 9 evaluators · 1 artifact(s) | Regression | Vikram Shetty | 24 days ago | 7,331 | 90.0% | $125.00 | 25m | Partial |
| Bias sweep — hindi-customer-voice-v6 run-2009 · 10 evaluators · 1 artifact(s) | Ad-hoc | Meera Pillai | 25 days ago | 7,642 | 97.0% | $142.00 | 26m | Failed |
| RAG quality sweep — trade-finance-helper-v3 run-2008 · 11 evaluators · 1 artifact(s) | Pre-deployment | Anjali Krishnan | 26 days ago | 7,953 | 91.0% | $159.00 | 27m | Passed |
| Nightly smoke — claims-copilot-v3 run-2007 · 12 evaluators · 1 artifact(s) | Scheduled | Catherine O'Brien | 27 days ago | 8,264 | 98.0% | $176.00 | 28m | Passed |
| Weekly regression — kyc-onboarding-bot-v8 run-2006 · 13 evaluators · 1 artifact(s) | Regression | scheduler@trustlab | 28 days ago | 8,575 | 92.0% | $193.00 | 29m | Passed |
| Drift check — mortgage-disclosure-generator-v9 run-2005 · 14 evaluators · 1 artifact(s) | Ad-hoc | Vikram Shetty | 29 days ago | 8,886 | 86.0% | $210.00 | 30m | Passed |
| Ad-hoc eval — fraud-investigation-copilot-v4 run-2004 · 15 evaluators · 1 artifact(s) | Pre-deployment | Meera Pillai | 30 days ago | 9,197 | 93.0% | $227.00 | 31m | Passed |
| Bias sweep — loan-eligibility-assistant-v11 run-2003 · 16 evaluators · 1 artifact(s) | Scheduled | Anjali Krishnan | 31 days ago | 9,508 | 87.0% | $244.00 | 32m | Partial |
| RAG quality sweep — wealth-advisor-rag-v5 run-2002 · 17 evaluators · 1 artifact(s) | Regression | Catherine O'Brien | 32 days ago | 819 | 94.0% | $21.00 | 33m | Passed |