Open-Source Legal AI Research
We partner with experienced attorneys at leading in-house legal departments to create expert-annotated datasets and benchmarks — advancing the science of AI in legal practice.
Contract Understanding Atticus Dataset
CUAD is an expert-annotated dataset that helps AI identify the most critical parts of legal contracts. Built from 510+ contracts with 13,000+ labels across 41 clause types, it teaches models to highlight key terms — speeding up contract review and making it easier to catch red flags.
Explore CUAD →Example Output
Governing Law: "This Agreement shall be governed by the laws of the State of California..." (Page 2)
⚠ Covenant Not to Sue: "In addition, Company shall not now or in the future contest the validity..." (Page 30)
► Perpetual / Irrevocable License: "Company grants to Investor a worldwide, royalty-free, exclusive..." (Page 151)
Merger Agreement Understanding Dataset
MAUD is an expert-annotated reading comprehension dataset built from the ABA's Public Target Deal Points Study. It turns real M&A deal points into clear questions and answers, giving legal-tech tools a reliable way to read and interpret key terms at scale.
Explore MAUD →Example Output
Question: What ordinary course representations are made?
☑ Business and operation of Target
☑ Ability to consummate transaction
☐ Conduct of Target in compliance with law
Atticus Clause Retrieval Dataset
ACORD is the first expert-annotated dataset built to help AI find the right legal contract clauses. It includes real lawyer-written queries and thousands of rated clauses — making it easier to draft and review complex provisions like Limitation of Liability and Indemnification.
Explore ACORD →Example Output
🔍 Query: Termination for Convenience Clause?
1. Either Party may terminate this Agreement at any time upon 30 days written notice...
2. Both parties shall have the right to terminate this Agreement without cause...
3. Either Party may terminate this Agreement or all/part of a Project upon notice...
Download the Data
All datasets are free and open-source under CC BY 4.0. Download directly from GitHub or Hugging Face.