Adversarial Corpora
62 corpora spanning OWASP LLM Top 10, MITRE ATLAS, and proprietary attacks · 20K total adversarial cases
Universal direct-injection payload set, OWASP LLM-01 aligned, multilingual.
Code-switched Hindi/English direct-injection payloads with Devanagari obfuscation variants.
Comprehensive DAN/AIM/STAN style jailbreak templates with persona escalation.
Persona-based jailbreaks: fictional characters, hypothetical scenarios, debate setups.
Adversarial RAG documents containing hidden instructions designed to manipulate AI applications that use retrieval-augmented generation.
Poisoned tool/API responses inserting downstream instructions for the LLM.
Multimodal adversarial images with embedded instruction text intended for OCR.
PDF docs with invisible-font and metadata instruction smuggling.
Microsoft's Crescendo gradual-escalation multi-turn jailbreak conversations.
Multi-turn safety bypass via instruction overrides on guard rails.
Long-context many-shot jailbreaks (Anthropic) targeting 100k+ context windows.
Encoded harmful instructions to bypass keyword filters.
ASCII-art encoded harmful queries (ArtPrompt).
Devanagari transliteration & homoglyph encoding to bypass English-only filters.
Greedy Coordinate Gradient suffix attacks (Zou et al.).
Genetic-algorithm generated adversarial prompts (AutoDAN).
Indian PII extraction probes targeting Aadhaar, PAN, IFSC, CKYC IDs.
SSN, NHS, DOB, address extraction probes (US/UK/EU).
Prompt-extraction probes targeting verbatim training-data emission.
Probes that elicit verbatim system prompts and instructions.
Cases probing whether specific records were in training data.
Caste-correlated surname & locality probes across BFSI decisions.
Religion-correlated names & customer-support scenarios.
Counterfactual probes flipping gender across loan/insurance decisions.
Multilingual toxicity-elicitation probes across protected categories.
Tool-misuse payloads — calling unauthorized tools, exfiltrating via tool outputs.
Argument injection into tool calls (SQLi-style for tool args).
Multi-turn payloads that poison agent memory for downstream sessions.
Prompts that redirect autonomous agent objective mid-task.
MCP Inspector-style exploitation patterns — RCE via crafted tool descriptors.
MCP cross-tenant context leakage payload variants.
Standard refusal-bypass set across forbidden categories.
Benign prompts that should not be refused — measures over-cautious behaviour.
Low-resource language jailbreaks — translate harmful queries to bypass filters.
Hindi↔English mid-sentence code-switching attacks bypassing English-only safety filters.
Adversarial visual prompts manipulating multimodal models.
Adversarial audio for ASR-LLM pipelines.
Document carriers — DOCX/XLSX/PDF — with embedded injection content.
HarmBench standardized harm payloads.
AdvBench harmful behaviour payloads.
Recently published AI Vulnerability Database entries pulled into the corpus.
Auto-generated variant covering supplementary direct prompt injection surface area.
Auto-generated variant covering supplementary jailbreaks surface area.
Auto-generated variant covering supplementary tool abuse surface area.
Auto-generated variant covering supplementary bias elicitation surface area.
Auto-generated variant covering supplementary encoded/obfuscated surface area.
Auto-generated variant covering supplementary refusal bypass surface area.
Auto-generated variant covering supplementary direct prompt injection surface area.
Auto-generated variant covering supplementary jailbreaks surface area.
Auto-generated variant covering supplementary tool abuse surface area.
Auto-generated variant covering supplementary bias elicitation surface area.
Auto-generated variant covering supplementary encoded/obfuscated surface area.
Auto-generated variant covering supplementary refusal bypass surface area.
Auto-generated variant covering supplementary direct prompt injection surface area.
Auto-generated variant covering supplementary jailbreaks surface area.
Auto-generated variant covering supplementary tool abuse surface area.
Auto-generated variant covering supplementary bias elicitation surface area.
Auto-generated variant covering supplementary encoded/obfuscated surface area.
Auto-generated variant covering supplementary refusal bypass surface area.
Auto-generated variant covering supplementary direct prompt injection surface area.
Auto-generated variant covering supplementary jailbreaks surface area.
Auto-generated variant covering supplementary tool abuse surface area.