AIMaks

Ethical Frameworks for AI Development

35 min readvideoFoundations of AI Ethics
2 of 20AI Ethics & Safety

Ethical Frameworks for AI Development

Lesson 1 made the case that ethics is now a practical engineering concern. This lesson is the toolkit: the moral frameworks practitioners actually use, the four canonical principle sets you'll see referenced in every policy document, the inevitable conflicts between them, and a worked example of three frameworks applied to the same shipping decision. By the end you'll be able to read an EU AI Act recital, a NIST AI RMF function, or a model card and recognize which ethical tradition is doing the work.

1. Consequentialism — What Are the Net Outcomes?

Consequentialist ethics judges actions by their outcomes. In the utilitarian flavor: the right action is whichever maximizes expected welfare. Engineers love it because it feels measurable: define a metric, compute the impact, decide.

What gets harder when AI is the actor:

  • Asymmetric harms — a recommender that raises engagement 1% on average can also push a small number of users into eating-disorder content. The averaged welfare gain hides the tail.
  • Long-tail risks — a generative model is net positive on 99.9% of prompts and produces a defamation lawsuit on the rest.
  • Welfare across whom? — utilitarian math doesn't distinguish "100 people +1 unit" from "1 person +100 units"; equity questions get flattened.
  • Counterfactual baselines — "compared to what?" If your hiring model is biased but humans are more biased, is deployment net-positive? Defensible argument; uncomfortable conclusion.

Consequentialism dominates A/B testing, recommender-system reasoning, and most product analytics. It's strong for measurable trade-offs and weak for rights-violating actions that "happen to be efficient."

2. Deontology — Rules and Duties

Deontological ethics judges actions by whether they comply with duties or rules, regardless of outcome. Translated to AI: "the system shall not deceive," "the system shall not discriminate on protected attributes," "the system shall not be the sole decider in legally significant cases."

This is the dominant tradition in regulation. The EU AI Act's prohibited-practices list (manipulation, social-scoring, real-time public biometric ID) is deontological: certain uses are forbidden regardless of how much aggregate welfare they would produce. GDPR's right to human review of automated decisions is deontological. "Don't generate CSAM" is deontological — no welfare calculation required.

Where deontology is hardest: when rules conflict (don't reveal private data vs don't deceive a user about why a decision was made), and when rules give counterintuitive results in edge cases. Most production AI policies are deontological at the floor (prohibited behaviors) and consequentialist above it (optimize within the rules).

3. Virtue Ethics — What Kind of Team Are We?

Virtue ethics asks "what would a good person/team/company do?" rather than "what does the rule say?" or "what are the outcomes?" In an engineering org this lives mostly in culture: defaults, code-review norms, what gets a thumbs-up in design docs, what gets pushed back on.

Why it matters operationally: the rules and the consequences are both incomplete. New cases arise — multimodal jailbreaks, agentic misuse, synthetic-data bias laundering — that are not yet covered. The team's defaults determine what ships in those cases. A culture where "let's add a refusal" is a normal review comment is a virtue-ethics win.

Concretely virtue ethics shows up as: hiring people who push back, rewarding ethics catches in retros, having an internal red-team be cool rather than annoying, and not punishing the engineer who delays a launch for a fairness finding.

4. Justice / Rawlsian Fairness — Design for the Worst-Off

Rawls' "veil of ignorance" thought experiment: design the system as if you didn't know which user you'd be. The practical translation: prioritize the worst-off group's experience.

This is in tension with equal-treatment fairness ("treat everyone identically"). Equal treatment of unequal starting positions can perpetuate inequality — the classic example is a credit model that treats applicants identically but trains on historical data that already encoded discrimination, so identical treatment locks in the gap.

Rawlsian framings show up explicitly in: equity-focused fairness metrics (worst-group accuracy, calibration on worst-served subgroups), inclusive design for users with disabilities, the EU AI Act's emphasis on "vulnerable groups", and most public-sector AI standards.

5. Procedural Ethics — Fair Process, Right to Explain, Right to Appeal

Procedural ethics doesn't insist on a particular outcome; it insists that the process be fair. Three components show up everywhere in AI law:

  • Notice — the user knows an automated system was used (NYC LL144, EU AI Act Article 50, sectoral disclosure rules).
  • Right to explanation — the user can get a meaningful description of why (GDPR Article 22 + Recital 71; ICO guidance; sectoral lending and employment law).
  • Right to appeal / human review — the user can contest the decision and get a human to look at it (GDPR Article 22; EU AI Act human oversight; ECOA adverse-action notices).

Procedural ethics is the backbone of the EU AI Act's human-oversight requirements and of essentially every responsible-AI policy. It's the framework that survives even when consequentialist and deontological analyses disagree: regardless of who's right about the outcome, affected users get notice, explanation, and appeal.

6. Care Ethics — Relational, Attentive, Responsible

Care ethics emphasizes relationships and the moral weight of attention to specific others, especially the vulnerable. Less common in policy text, very useful when designing for:

  • Mental-health and crisis chatbots (where the relational frame is the product).
  • Children's products (where consent is constrained and asymmetry of power is large).
  • Healthcare AI (where dignity and individual context matter).
  • Elder care, disability assistance, refugee services.

Concretely, care ethics asks: have we listened to the people most affected? Have we co-designed with them? When things go wrong for one user, do we treat them as a case or as a statistic? Care ethics is uncomfortable with "we'll fix it in v2" when v1 caused harm to a specific person.

7. The Four Canonical Principle Sets

SetYearWhat it gives you
Asilomar AI Principles201723 principles spanning research, ethics & values, and longer-term issues. The earliest broadly-cited industry consensus document; foreshadowed alignment and existential-risk discussions.
Belmont Report (medical, adapted for AI)1979 / adapted ongoingRespect for persons, beneficence, justice. Originally for human-subjects research; the IRB tradition that increasingly applies to AI research with human data.
OECD AI Principles2019, updated 2024Inclusive growth, human-centered values, transparency, robustness/safety, accountability. Adopted by 47+ countries; the closest thing to a global baseline.
NIST AI RMF trustworthy characteristics2023Valid & reliable, safe, secure & resilient, accountable & transparent, explainable & interpretable, privacy-enhanced, fair (with harmful bias managed). The closest thing to an engineering checklist.

All four sets converge on roughly the same dozen ideas: fairness, transparency, accountability, robustness, privacy, human oversight, beneficence. The differences are emphasis and operationalization. The NIST RMF is the most useful for engineers because it pairs each characteristic with measurable practices.

8. The Principles-to-Practice Gap

"Be fair, accountable, transparent" doesn't ship code. The hard work is converting each principle into a concrete engineering or product action. Side-by-side:

PrincipleConcrete eng / PM action
FairnessDefine the protected groups; pick fairness metric (TPR parity, calibration, etc.); add subgroup eval to CI; set release threshold.
TransparencyShip a model card; version it; expose it to customers; log model version in every prediction.
AccountabilityNamed owner per model; runbook; incident-response procedure; quarterly model review.
RobustnessOut-of-distribution test set; adversarial test set; perturbation tests in CI; canary deploys.
PrivacyData minimization in collection; DPIA; differential-privacy training where applicable; deletion API.
Human oversightConfidence threshold below which humans review; UI never auto-confirms high-stakes decisions; appeals queue.
ExplainabilitySHAP / counterfactual explanation surface; "why this decision?" link in UI; reason codes in adverse-action notices.

This table is the artifact every Trustworthy-AI program eventually produces. The 2026 industry hasn't standardized it; your job is to write your team's version.

9. The Five Conflicts You Will Hit

ConflictWhere it shows upDecision frame
Privacy vs accuracyMore data = better model; more data = more privacy risk.Data minimization + differential privacy + use-purpose limits. Accept some accuracy loss for high-sensitivity domains.
Fairness vs accuracySubgroup-equal models often have lower aggregate accuracy.Define which fairness metric is required (regulatory or product); set a worst-group floor; accept aggregate loss to meet it.
Transparency vs IPFull disclosure of training data / weights vs business secrets.Layered disclosure: model card always; weights / data only as required by jurisdiction; auditor access under NDA.
Autonomy vs safetyRefusing to answer = paternalism; answering = potential harm.Tier responses by risk; refuse for narrow high-harm categories; default to informed-user assumption elsewhere.
Profit vs welfareEngagement-optimizing recommender vs user wellbeing.Cap downside (max session length, content-quality floors); diversify metrics beyond engagement; report quarterly.

No framework eliminates these conflicts. The job is to surface them, decide explicitly, and document the decision so the next person doesn't re-litigate it.

10. The Framework Chooser

Decision typeDominant frameworkWhy
"Should we ship this feature at all?"Deontology + JusticeRules and worst-off groups gate launch.
"How do we tune the trade-off?"ConsequentialismOnce you're inside the rules, optimize.
"What does the affected user get?"ProceduralNotice, explanation, appeal.
"What's our default when policy is silent?"VirtueCulture decides the unwritten cases.
"Vulnerable population product"Care + JusticeRelational attention; design for worst-off.

11. Worked Example — A Generative Customer-Support Bot for Vulnerable Users

Concrete decision: ship a generative AI chatbot for a utilities company whose customers include many low-income and elderly users. The bot answers billing questions, payment plans, and shut-off notices.

The shipping decision is "yes, with constraints": bot deployed for low-stakes informational questions; any payment, shut-off, or hardship topic auto-escalates; one-tap human handoff at all times; logged for post-launch monitoring of subgroup outcomes. That decision required three frameworks; one would have missed something.

12. The Mental Model

Up next · History of AI Harms and Lessons Learned