Ethical Frameworks for AI Development

35 min readvideoFoundations of AI Ethics

2 of 20AI Ethics & Safety

Ethical Frameworks for AI Development

Lesson 1 made the case that ethics is now a practical engineering concern. This lesson is the toolkit: the moral frameworks practitioners actually use, the four canonical principle sets you'll see referenced in every policy document, the inevitable conflicts between them, and a worked example of three frameworks applied to the same shipping decision. By the end you'll be able to read an EU AI Act recital, a NIST AI RMF function, or a model card and recognize which ethical tradition is doing the work.

1. Consequentialism — What Are the Net Outcomes?

Consequentialist ethics judges actions by their outcomes. In the utilitarian flavor: the right action is whichever maximizes expected welfare. Engineers love it because it feels measurable: define a metric, compute the impact, decide.

What gets harder when AI is the actor:

Asymmetric harms — a recommender that raises engagement 1% on average can also push a small number of users into eating-disorder content. The averaged welfare gain hides the tail.
Long-tail risks — a generative model is net positive on 99.9% of prompts and produces a defamation lawsuit on the rest.
Welfare across whom? — utilitarian math doesn't distinguish "100 people +1 unit" from "1 person +100 units"; equity questions get flattened.
Counterfactual baselines — "compared to what?" If your hiring model is biased but humans are more biased, is deployment net-positive? Defensible argument; uncomfortable conclusion.

Consequentialism dominates A/B testing, recommender-system reasoning, and most product analytics. It's strong for measurable trade-offs and weak for rights-violating actions that "happen to be efficient."

2. Deontology — Rules and Duties

Deontological ethics judges actions by whether they comply with duties or rules, regardless of outcome. Translated to AI: "the system shall not deceive," "the system shall not discriminate on protected attributes," "the system shall not be the sole decider in legally significant cases."

This is the dominant tradition in regulation. The EU AI Act's prohibited-practices list (manipulation, social-scoring, real-time public biometric ID) is deontological: certain uses are forbidden regardless of how much aggregate welfare they would produce. GDPR's right to human review of automated decisions is deontological. "Don't generate CSAM" is deontological — no welfare calculation required.

Where deontology is hardest: when rules conflict (don't reveal private data vs don't deceive a user about why a decision was made), and when rules give counterintuitive results in edge cases. Most production AI policies are deontological at the floor (prohibited behaviors) and consequentialist above it (optimize within the rules).

3. Virtue Ethics — What Kind of Team Are We?

Virtue ethics asks "what would a good person/team/company do?" rather than "what does the rule say?" or "what are the outcomes?" In an engineering org this lives mostly in culture: defaults, code-review norms, what gets a thumbs-up in design docs, what gets pushed back on.

Why it matters operationally: the rules and the consequences are both incomplete. New cases arise — multimodal jailbreaks, agentic misuse, synthetic-data bias laundering — that are not yet covered. The team's defaults determine what ships in those cases. A culture where "let's add a refusal" is a normal review comment is a virtue-ethics win.

Concretely virtue ethics shows up as: hiring people who push back, rewarding ethics catches in retros, having an internal red-team be cool rather than annoying, and not punishing the engineer who delays a launch for a fairness finding.

4. Justice / Rawlsian Fairness — Design for the Worst-Off

Rawls' "veil of ignorance" thought experiment: design the system as if you didn't know which user you'd be. The practical translation: prioritize the worst-off group's experience.

This is in tension with equal-treatment fairness ("treat everyone identically"). Equal treatment of unequal starting positions can perpetuate inequality — the classic example is a credit model that treats applicants identically but trains on historical data that already encoded discrimination, so identical treatment locks in the gap.

Rawlsian framings show up explicitly in: equity-focused fairness metrics (worst-group accuracy, calibration on worst-served subgroups), inclusive design for users with disabilities, the EU AI Act's emphasis on "vulnerable groups", and most public-sector AI standards.

5. Procedural Ethics — Fair Process, Right to Explain, Right to Appeal

Procedural ethics doesn't insist on a particular outcome; it insists that the process be fair. Three components show up everywhere in AI law:

Notice — the user knows an automated system was used (NYC LL144, EU AI Act Article 50, sectoral disclosure rules).
Right to explanation — the user can get a meaningful description of why (GDPR Article 22 + Recital 71; ICO guidance; sectoral lending and employment law).
Right to appeal / human review — the user can contest the decision and get a human to look at it (GDPR Article 22; EU AI Act human oversight; ECOA adverse-action notices).

Procedural ethics is the backbone of the EU AI Act's human-oversight requirements and of essentially every responsible-AI policy. It's the framework that survives even when consequentialist and deontological analyses disagree: regardless of who's right about the outcome, affected users get notice, explanation, and appeal.

6. Care Ethics — Relational, Attentive, Responsible

Care ethics emphasizes relationships and the moral weight of attention to specific others, especially the vulnerable. Less common in policy text, very useful when designing for:

Mental-health and crisis chatbots (where the relational frame is the product).
Children's products (where consent is constrained and asymmetry of power is large).
Healthcare AI (where dignity and individual context matter).
Elder care, disability assistance, refugee services.

Concretely, care ethics asks: have we listened to the people most affected? Have we co-designed with them? When things go wrong for one user, do we treat them as a case or as a statistic? Care ethics is uncomfortable with "we'll fix it in v2" when v1 caused harm to a specific person.

7. The Four Canonical Principle Sets

Set	Year	What it gives you
Asilomar AI Principles	2017	23 principles spanning research, ethics & values, and longer-term issues. The earliest broadly-cited industry consensus document; foreshadowed alignment and existential-risk discussions.
Belmont Report (medical, adapted for AI)	1979 / adapted ongoing	Respect for persons, beneficence, justice. Originally for human-subjects research; the IRB tradition that increasingly applies to AI research with human data.
OECD AI Principles	2019, updated 2024	Inclusive growth, human-centered values, transparency, robustness/safety, accountability. Adopted by 47+ countries; the closest thing to a global baseline.
NIST AI RMF trustworthy characteristics	2023	Valid & reliable, safe, secure & resilient, accountable & transparent, explainable & interpretable, privacy-enhanced, fair (with harmful bias managed). The closest thing to an engineering checklist.

All four sets converge on roughly the same dozen ideas: fairness, transparency, accountability, robustness, privacy, human oversight, beneficence. The differences are emphasis and operationalization. The NIST RMF is the most useful for engineers because it pairs each characteristic with measurable practices.

8. The Principles-to-Practice Gap

"Be fair, accountable, transparent" doesn't ship code. The hard work is converting each principle into a concrete engineering or product action. Side-by-side:

Principle	Concrete eng / PM action
Fairness	Define the protected groups; pick fairness metric (TPR parity, calibration, etc.); add subgroup eval to CI; set release threshold.
Transparency	Ship a model card; version it; expose it to customers; log model version in every prediction.
Accountability	Named owner per model; runbook; incident-response procedure; quarterly model review.
Robustness	Out-of-distribution test set; adversarial test set; perturbation tests in CI; canary deploys.
Privacy	Data minimization in collection; DPIA; differential-privacy training where applicable; deletion API.
Human oversight	Confidence threshold below which humans review; UI never auto-confirms high-stakes decisions; appeals queue.
Explainability	SHAP / counterfactual explanation surface; "why this decision?" link in UI; reason codes in adverse-action notices.

This table is the artifact every Trustworthy-AI program eventually produces. The 2026 industry hasn't standardized it; your job is to write your team's version.

9. The Five Conflicts You Will Hit

Conflict	Where it shows up	Decision frame
Privacy vs accuracy	More data = better model; more data = more privacy risk.	Data minimization + differential privacy + use-purpose limits. Accept some accuracy loss for high-sensitivity domains.
Fairness vs accuracy	Subgroup-equal models often have lower aggregate accuracy.	Define which fairness metric is required (regulatory or product); set a worst-group floor; accept aggregate loss to meet it.
Transparency vs IP	Full disclosure of training data / weights vs business secrets.	Layered disclosure: model card always; weights / data only as required by jurisdiction; auditor access under NDA.
Autonomy vs safety	Refusing to answer = paternalism; answering = potential harm.	Tier responses by risk; refuse for narrow high-harm categories; default to informed-user assumption elsewhere.
Profit vs welfare	Engagement-optimizing recommender vs user wellbeing.	Cap downside (max session length, content-quality floors); diversify metrics beyond engagement; report quarterly.

No framework eliminates these conflicts. The job is to surface them, decide explicitly, and document the decision so the next person doesn't re-litigate it.

10. The Framework Chooser

Decision type	Dominant framework	Why
"Should we ship this feature at all?"	Deontology + Justice	Rules and worst-off groups gate launch.
"How do we tune the trade-off?"	Consequentialism	Once you're inside the rules, optimize.
"What does the affected user get?"	Procedural	Notice, explanation, appeal.
"What's our default when policy is silent?"	Virtue	Culture decides the unwritten cases.
"Vulnerable population product"	Care + Justice	Relational attention; design for worst-off.

11. Worked Example — A Generative Customer-Support Bot for Vulnerable Users

Concrete decision: ship a generative AI chatbot for a utilities company whose customers include many low-income and elderly users. The bot answers billing questions, payment plans, and shut-off notices.

Note

Through Three Frameworks

Consequentialist lens: aggregate impact analysis. Bot resolves 60% of tickets, average wait drops 12 minutes, customer satisfaction rises 8 points. Net positive — ship.
Deontological lens: rules check. Air Canada precedent says we're liable for what the bot says about policy. Shut-off notices have legal effect — that's GDPR Article 22 territory and likely "high-risk" under the EU AI Act if we deploy in the EU. Rule: bot must not autonomously commit the company to anything; must not give shut-off-related advice without a human handoff.
Care / Justice lens: who is the worst-off user? Elderly user, low literacy, English as second language, payment overdue, fearful of shut-off. Care ethics says: design for that user. The bot's defaults — tone, plain language, always-visible "talk to a person" button, never ending a session in distress without escalation — come from this analysis, not the consequentialist one.

The shipping decision is "yes, with constraints": bot deployed for low-stakes informational questions; any payment, shut-off, or hardship topic auto-escalates; one-tap human handoff at all times; logged for post-launch monitoring of subgroup outcomes. That decision required three frameworks; one would have missed something.

12. The Mental Model

Result

What You Learned Six frameworks: consequentialism (outcomes — A/B tests, recommenders), deontology (rules — EU AI Act prohibited uses, GDPR Article 22), virtue (defaults — culture and code-review norms), Rawlsian justice (worst-off — equity metrics, vulnerable groups), procedural (notice + explanation + appeal — the spine of EU AI law), and care (relational — vulnerable-user contexts). Four canonical principle sets — Asilomar, Belmont, OECD, NIST AI RMF — agree on the same dozen ideas; NIST is the most engineering-actionable. The principles-to-practice gap is closed by a side-by-side table of principle → concrete action, which your team must write. Five conflicts — privacy vs accuracy, fairness vs accuracy, transparency vs IP, autonomy vs safety, profit vs welfare — are permanent; surface them, decide, document. Hard decisions use multiple frameworks; the worked example showed three applied to one bot. Lesson 3 covers the historical incidents these frameworks were forged in.

← Previous lessonWhy AI Ethics Matters

Up next · History of AI Harms and Lessons Learned