Eval Runs

1,247 runs in the last 90 days · 3 currently running · 6 scheduled

RunTypeStarted byWhenCasesPassCostDurationStatus
Pre-deployment eval — claims-extraction-v18
run-2041 · 12 evaluators · 1 artifact(s)
Pre-deploymentVikram Shetty2 days ago5,00088.4%$42.1718m 04sFailed
Comparison: claude-sonnet vs gpt-4o vs llama-3.1-70b on customer-support-rag
run-2038 · 14 evaluators · 3 artifact(s)
ComparisonVikram Shetty4 days ago12,40092.7%$312.451h 42mPassed
Caste-bias sweep — all customer-facing artifacts
run-2037 · 4 evaluators · 14 artifact(s)
Ad-hocAnjali Krishnan1 week ago25,00095.4%$198.203h 12mPassed
Pre-certification eval — claims-copilot-v3 v18
run-2036 · 28 evaluators · 1 artifact(s)
Pre-certificationCatherine O'Brien1 week ago10,24796.2%$287.102h 18mPassed
Indic quality eval — hindi-customer-voice
run-2035 · 8 evaluators · 1 artifact(s)
Scheduledscheduler@trustlab12h ago2,84792.1%$14.2032mPassed
Tool-use correctness — fraud-investigation-copilot
run-2033 · 6 evaluators · 1 artifact(s)
Pre-deploymentArjun Iyer3 days ago41297.1%$8.409mPassed
Bias regression — loan-eligibility-assistant
run-2032 · 9 evaluators · 1 artifact(s)
RegressionAnjali Krishnan2 days ago8,40094.3%$62.1048mPassed
Nightly smoke — claims-copilot-v3
run-2031 · 4 evaluators · 1 artifact(s)
Scheduledscheduler@trustlab3 days ago80086.0%$8.004mFailed
Weekly regression — kyc-onboarding-bot-v8
run-2030 · 5 evaluators · 1 artifact(s)
RegressionVikram Shetty4 days ago1,11193.0%$25.005mPassed
Drift check — mortgage-disclosure-generator-v9
run-2029 · 6 evaluators · 1 artifact(s)
Ad-hocMeera Pillai5 days ago1,42287.0%$42.006mPassed
Ad-hoc eval — fraud-investigation-copilot-v4
run-2028 · 7 evaluators · 1 artifact(s)
Pre-deploymentAnjali Krishnan6 days ago1,73394.0%$59.007mPassed
Bias sweep — loan-eligibility-assistant-v11
run-2027 · 8 evaluators · 1 artifact(s)
ScheduledCatherine O'Brien7 days ago2,04488.0%$76.008mPassed
RAG quality sweep — wealth-advisor-rag-v5
run-2026 · 9 evaluators · 1 artifact(s)
Regressionscheduler@trustlab8 days ago2,35595.0%$93.009mPassed
Nightly smoke — hindi-customer-voice-v6
run-2025 · 10 evaluators · 1 artifact(s)
Ad-hocVikram Shetty9 days ago2,66689.0%$110.0010mPassed
Weekly regression — trade-finance-helper-v3
run-2024 · 11 evaluators · 1 artifact(s)
Pre-deploymentMeera Pillai10 days ago2,97796.0%$127.0011mPartial
Drift check — claims-copilot-v3
run-2023 · 12 evaluators · 1 artifact(s)
ScheduledAnjali Krishnan11 days ago3,28890.0%$144.0012mPassed
Ad-hoc eval — kyc-onboarding-bot-v8
run-2022 · 13 evaluators · 1 artifact(s)
RegressionCatherine O'Brien12 days ago3,59997.0%$161.0013mPassed
Bias sweep — mortgage-disclosure-generator-v9
run-2021 · 14 evaluators · 1 artifact(s)
Ad-hocscheduler@trustlab13 days ago3,91091.0%$178.0014mPassed
RAG quality sweep — fraud-investigation-copilot-v4
run-2020 · 15 evaluators · 1 artifact(s)
Pre-deploymentVikram Shetty14 days ago4,22198.0%$195.0015mFailed
Nightly smoke — loan-eligibility-assistant-v11
run-2019 · 16 evaluators · 1 artifact(s)
ScheduledMeera Pillai15 days ago4,53292.0%$212.0016mPassed
Weekly regression — wealth-advisor-rag-v5
run-2018 · 17 evaluators · 1 artifact(s)
RegressionAnjali Krishnan16 days ago4,84386.0%$229.0017mPassed
Drift check — hindi-customer-voice-v6
run-2017 · 18 evaluators · 1 artifact(s)
Ad-hocCatherine O'Brien17 days ago5,15493.0%$246.0018mPartial
Ad-hoc eval — trade-finance-helper-v3
run-2016 · 19 evaluators · 1 artifact(s)
Pre-deploymentscheduler@trustlab18 days ago5,46587.0%$23.0019mPassed
Bias sweep — claims-copilot-v3
run-2015 · 4 evaluators · 1 artifact(s)
ScheduledVikram Shetty19 days ago5,77694.0%$40.0020mPassed
RAG quality sweep — kyc-onboarding-bot-v8
run-2014 · 5 evaluators · 1 artifact(s)
RegressionMeera Pillai20 days ago6,08788.0%$57.0021mPassed
Nightly smoke — mortgage-disclosure-generator-v9
run-2013 · 6 evaluators · 1 artifact(s)
Ad-hocAnjali Krishnan21 days ago6,39895.0%$74.0022mPassed
Weekly regression — fraud-investigation-copilot-v4
run-2012 · 7 evaluators · 1 artifact(s)
Pre-deploymentCatherine O'Brien22 days ago6,70989.0%$91.0023mPassed
Drift check — loan-eligibility-assistant-v11
run-2011 · 8 evaluators · 1 artifact(s)
Scheduledscheduler@trustlab23 days ago7,02096.0%$108.0024mPassed
Ad-hoc eval — wealth-advisor-rag-v5
run-2010 · 9 evaluators · 1 artifact(s)
RegressionVikram Shetty24 days ago7,33190.0%$125.0025mPartial
Bias sweep — hindi-customer-voice-v6
run-2009 · 10 evaluators · 1 artifact(s)
Ad-hocMeera Pillai25 days ago7,64297.0%$142.0026mFailed
RAG quality sweep — trade-finance-helper-v3
run-2008 · 11 evaluators · 1 artifact(s)
Pre-deploymentAnjali Krishnan26 days ago7,95391.0%$159.0027mPassed
Nightly smoke — claims-copilot-v3
run-2007 · 12 evaluators · 1 artifact(s)
ScheduledCatherine O'Brien27 days ago8,26498.0%$176.0028mPassed
Weekly regression — kyc-onboarding-bot-v8
run-2006 · 13 evaluators · 1 artifact(s)
Regressionscheduler@trustlab28 days ago8,57592.0%$193.0029mPassed
Drift check — mortgage-disclosure-generator-v9
run-2005 · 14 evaluators · 1 artifact(s)
Ad-hocVikram Shetty29 days ago8,88686.0%$210.0030mPassed
Ad-hoc eval — fraud-investigation-copilot-v4
run-2004 · 15 evaluators · 1 artifact(s)
Pre-deploymentMeera Pillai30 days ago9,19793.0%$227.0031mPassed
Bias sweep — loan-eligibility-assistant-v11
run-2003 · 16 evaluators · 1 artifact(s)
ScheduledAnjali Krishnan31 days ago9,50887.0%$244.0032mPartial
RAG quality sweep — wealth-advisor-rag-v5
run-2002 · 17 evaluators · 1 artifact(s)
RegressionCatherine O'Brien32 days ago81994.0%$21.0033mPassed