The frameworks in Lesson 2 didn't fall from the sky. They
were forged from documented harms — public scandals,
lawsuits, regulatory rulings, internal investigations —
each of which encoded a rule the field now takes for
granted. This reading walks 15 cases organized by harm
category. Each entry is short: name, year, what happened,
what we learned, the rule it codified. By the end you
should be able to spot which past case a current design
smells like, before it becomes the next case.
1. Discrimination — Tabular and Visual
Case
Year
What happened
What we learned
COMPAS recidivism (ProPublica)
2016
Equivant's COMPAS score used in US criminal courts had similar overall accuracy across races but ~2x false-positive rate for Black defendants.
"Calibration" and "equal error rates" are different fairness criteria; you cannot satisfy both simultaneously when base rates differ (Chouldechova, Kleinberg). Pick the metric the harm structure dictates.
Amazon hiring tool
2018 (Reuters)
Resume-screener trained on 10 years of mostly-male hires learned to penalize "women's chess club" and women's-college names; could not be reliably debiased; project killed.
Historical training data encodes historical discrimination. "We don't use protected attributes" doesn't help when proxies are everywhere.
Apple Card credit limits
2019
Goldman Sachs's algorithm gave women 10-20x lower credit limits than spouses on shared finances; "the algorithm is gender-blind" defense collapsed in NYDFS investigation.
Disparate-impact testing is required even when protected attributes aren't inputs. Equal Credit Opportunity Act applies to ML.
Twitter image cropping
2020
Saliency model auto-cropped images in feeds to favor white over Black faces and to over-focus on women. Twitter open-sourced the model and audit.
Visual ML systems need subgroup eval. Self-disclosed audits + open release of mitigations is now an industry norm.
Engineer-level lesson: "fair" requires you to pick a
fairness metric appropriate to the harm structure and to
evaluate subgroups, not just aggregate accuracy. The
organizational lesson: "we audit" is a procurement gate now.
2. Misinformation — LLMs Confidently Wrong
Case
Year
What happened
What we learned
Galactica withdrawal (Meta)
2022
Scientific-paper LLM produced confident fake citations and racist/biased content within days of public release; pulled in 72 hours.
"Helpful, scientific-sounding, occasionally confidently wrong" is the worst combo. Domain restrictions don't substitute for safety evaluation.
Bing Sydney
2023
Bing's GPT-4-powered chat exhibited unhinged personas, manipulation, threats, and identity confusion in extended conversations; Microsoft truncated context.
LLM behavior at long context lengths is not the same as at short ones; production must test the deployment surface, not just benchmarks.
Air Canada chatbot
2024
Customer-support chatbot invented a bereavement-fare policy. Tribunal ruled the company is liable for what its chatbot says.
Chatbots are not separate legal entities. Hallucinated policy = the company's policy, legally.
OpenAI / generative AI defamation
2023-25
Multiple defamation suits over hallucinated criminal allegations about real people (Mark Walters case among others).
Hallucinations about identifiable real people are a defamation surface. Section 230 doesn't cleanly apply to model-generated content.
Engineer-level lesson: assume the model will be confidently
wrong; build the system around that assumption (citations,
refusal, disclaimers, human review at high stakes).
Organization-level lesson: "the AI did it" is not a legal
defense.
3. Privacy — Data Where It Wasn't Supposed to Be
Case
Year
What happened
What we learned
Strava heatmap
2018
Aggregate fitness-tracker heatmap revealed locations and patrol patterns of secret US military bases.
Aggregation isn't anonymization. Even "just heatmaps" can be sensitive when combined with public context.
Clearview AI
multiple
Scraped 30B+ photos from public web; sold facial recognition to police. Banned/fined in Italy, France, UK, Australia, Canada; settled BIPA suits in US.
"Public" data is not free for ML. Biometric data has its own legal regime (BIPA, GDPR Article 9, UK DPA Schedule 1).
ChatGPT memory leaks / Italy Garante ban
2023
ChatGPT exposed other users' chat titles and payment data in a Redis bug; Italy's Garante banned the service for a month over GDPR violations and reinstated only after age-gating, opt-out, and lawful-basis fixes.
LLM products are GDPR-regulated processing. Lawful basis, age verification, transparency, and breach notification all apply.
BIPA biometric class actions (Facebook, Shutterstock, etc.)
2015-24
Illinois BIPA's 1000−5000 per violation × millions of users led to settlements in the hundreds of millions (Facebook $650M, 2021).
Statutory damages plus class certification = catastrophic exposure. Biometrics deserve a dedicated review.
Engineer-level: data minimization at collection is cheaper
than every downstream control. Org-level: a privacy
incident is a regulator-level event in 2026, not a blog
post.
4. Safety — Physical Harm
Case
Year
What happened
What we learned
Uber autonomous Tempe fatality
2018
Uber test vehicle struck and killed a pedestrian; safety driver was distracted; classifier flipped between "vehicle / bicycle / unknown" and disabled emergency braking.
Human-in-the-loop fails when the human's job is "stay alert for hours waiting for an edge case." Defense in depth — independent safety systems — matters.
Tesla Autopilot incidents
ongoing
NHTSA opened investigations into hundreds of Autopilot crashes; "FSD" naming and capability marketing under regulatory and litigation scrutiny.
Capability marketing creates downstream safety obligations. Users calibrate to claimed reliability, not measured reliability.
5. Welfare and Government — When the State Uses AI on Citizens
Case
Year
What happened
What we learned
Robodebt (Australia)
2016-23
Government welfare debt-recovery used income-averaging that was statistically invalid; raised hundreds of thousands of false debts; linked to suicides; $1.8B AUD class settlement; Royal Commission found unlawfulness; senior officials referred for prosecution.
Statistical convenience cannot override legal due-process. Vulnerable populations and reverse-burden-of-proof are a catastrophic combination.
Dutch SyRI
2014-20
Risk-scoring of welfare recipients in low-income neighborhoods; opaque, no due process. Hague District Court struck it down (2020) under ECHR Article 8.
Even legitimate state aims don't override transparency and proportionality. SyRI is the European precedent for AI-rights jurisprudence.
UK A-level grade algorithm
2020
COVID exam-cancellation replacement algorithm downgraded ~40% of teacher predictions, hitting state-school students hardest; scrapped in 4 days.
Optimizing for "historical school performance" reproduces historical school inequality. Public-impact systems need impact assessment before deployment.
6. Generative AI — New Surface, Old Lessons
Case
Year
What happened
What we learned
Deepfake harassment (broadly)
2017-26
Non-consensual intimate imagery of named individuals (often women, often minors) generated and distributed at scale; high-profile celebrity cases drove platform crackdowns.
Generation-side guardrails matter; distribution-platform takedown matters; identity-protection laws (Take It Down Act 2024 US, EU DSA) increasingly impose obligations.
Election-cycle disinformation
2024
AI-cloned voices of political candidates used in robocalls; AI-generated images circulated as news in multiple national elections.
Watermarking, provenance signing (C2PA), and platform-side detection are now table-stakes; voluntary AI election accord (Munich 2024).
7. Concentration of Power, IP, and Creative Labor
Case
Year
What happened
What we learned
NYT v OpenAI / Microsoft
2023-ongoing
NYT sued over verbatim regurgitation of paywalled articles by GPT-4; signaled legal exposure for training-data provenance.
Training data provenance is a first-class engineering concern. Output-side memorization tests belong in eval suites.
Books3 dataset
exposed 2023
~196k pirated books used to train multiple major LLMs; Atlantic article triggered author class actions.
"It was on the internet" is not a license. Dataset audits, opt-out registries, and licensed-data alternatives became norms.
Voice-cloning and likeness suits
2023-25
Scarlett Johansson / OpenAI "Sky" voice incident; multiple voice-clone lawsuits by working actors.
Likeness and voice are protected under right-of-publicity laws; "trained on public data" doesn't cover identifiable performance.
8. Workplace and Algorithmic Management
Case
Year
What happened
What we learned
Amazon warehouse productivity scoring
2019-ongoing
Algorithmic productivity tracking with automated termination warnings tied to per-second package metrics; OSHA investigations and several state laws (CA AB 701) regulating quotas.
Workers are AI subjects too. Algorithmic management is now subject to specific worker-protection law.
Gig-worker algorithmic management
ongoing
Uber/Lyft/Deliveroo dispatch and pricing algorithms challenged under EU Platform Work Directive (2024); UK Supreme Court Aslam v Uber (2021).
Worker classification and the right to algorithm transparency are converging in EU law.
9. The Consistent Pattern — Engineers vs Organizations
Across every case above, two questions yield two different
answers:
10. The Ten Rules Earned in Blood
Distilled from the cases above. Each is a rule the field
paid for in lawsuits, scandals, or human harm:
Aggregate metrics hide tail harm.
Always evaluate by subgroup. (COMPAS, Apple Card,
Twitter cropping, UK A-levels.)
Historical data encodes historical
discrimination. Removing protected attributes
doesn't fix it. (Amazon hiring, Apple Card.)
Aggregation is not anonymization. Test
for re-identification. (Strava, AOL search logs.)
Public data is not free data.
Provenance, licenses, and biometric law all apply.
(Clearview, Books3, NYT v OpenAI.)
The model will hallucinate; design around
it. Citations, refusals, human review.
(Galactica, Air Canada, Mata v Avianca.)
"The AI did it" is not a legal defense.
The deploying entity is liable. (Air Canada, Robodebt,
lending suits.)
Human-in-the-loop fails when the human's job
is to wait for rare events. Defense in depth.
(Uber Tempe, Tesla.)
Vulnerable users absorb harms first and
hardest. Design for the worst-off case.
(Robodebt, SyRI, A-levels, deepfake harassment.)
Capability marketing creates safety
obligations. Users calibrate to your claims.
(Tesla FSD, every "AI X" product.)
Documentation is the audit artifact.
No model card, no risk classification, no logs = no
defense when the regulator calls. (EU AI Act technical
file requirements.)