By Jeremy Abram — JeremyAbram.net
Machine-learning systems don’t just forecast what we’ll click or buy; they increasingly infer who we are, what we’ll become, and what opportunities we should (or shouldn’t) receive. When predictions jump from behavioral (“will you churn?”) to ontological (“what kind of person are you?”), the stakes shift from convenience to power—reshaping credit, work, health, safety, and even identity. This piece explains how we got here, where the risks hide, and how to build/operate predictive systems that serve the public without locking people into algorithmic destinies.
1) From “Next Best Click” to “Next Best Life”
Early recommender systems nudged us toward songs, shows, and products. Today, predictive models sit upstream of life opportunities—credit lines, job screening, insurance pricing, school placement, bail and parole recommendations, medical risk flags, social-service case prioritization, and more.
This transition matters because:
- Predictions become prescriptions. A risk score intended to “assist” can become decisive, especially when overloaded staff rely on it.
- Predictions reshape the thing being predicted. If your feed hides job posts because you’re “unlikely to apply,” your future conforms to yesterday’s probabilities.
- Predictions bleed into identity. Inferring traits like reliability, leadership potential, or even likely health conditions moves from behavioral forecasting to normative labeling.
Call this shift algorithmic destiny: when the scaffolding of code silently narrows (or opens) life paths.
2) Why Models Predict More Than You Gave Them
Even without “sensitive” inputs, modern systems infer more than we expect:
- Proxy Variables: ZIP code, shopping time, phone model, commute length—harmless alone, together act as proxies for income, health access, or protected attributes.
- Embedding Space Leakage: Text, audio, and image embeddings compress rich context; similarity in vector space enables latent attribute inference (e.g., political leaning, education level) even when those fields are hidden.
- Linkage & Graph Effects: Network structure (friends, follows, co-locations) predicts traits with high accuracy; you can be classified because of others’ data, not yours.
- Temporal & Context Drift: A model trained for “engagement” slowly optimizes for arousal or outrage if that’s what maximizes the target—subtly redefining the label over time.
- Performativity & Goodhart’s Law: When a metric becomes the goal, actors game it; the model then “learns” the gamed world, not the real one.
The upshot: data minimalism doesn’t guarantee inference minimalism. What a model can deduce often exceeds what you meant it to know.
3) From Prediction to Power: Where the Harms Land
Think in two buckets:
Allocative Harms (who gets what)
- Credit limits, price quotes, hiring callbacks, ad reach, apartment showings, school support, organ waitlists, medical follow-up, public benefits triage.
Representational Harms (who gets to be seen as what)
- Stereotype amplification in search/feeds, misgendering/misidentification, stigmatizing risk labels (“non-adherent”, “low potential”), cultural erasure in training data.
Both harms compound through feedback loops: decisions change behaviors, which retrain models, which tighten the loop.
4) The Technical Anatomy of “Destiny”
Understanding a few technical patterns helps spot danger early:
- Label & Target Choice
- Surrogate targets (e.g., clicks for “interest,” arrests for “crime”) bake in bias if the surrogate is systematically skewed.
- Counterfactual targets (“would this have happened under different action?”) are fairer but harder—enter causal inference.
- Data Provenance & Consent
- Mixed first/third-party data, scraped content, purchased graphs—provenance affects legality, trust, and recourse.
- Distribution Shift
- Models trained on yesterday’s economy and culture degrade silently; unfairness grows when shift is asymmetric across groups.
- Inference Risk
- Even if you never collect health data, a model might infer pregnancy risk from purchases and time-of-day activity. That’s unintended sensitive inference.
- Interpretability ≠ Accountability
- SHAP/LIME explain which features were important—not whether the target is legitimate or whether the system should exist at all.
5) Governance: What “Good” Looks Like in Practice
Below is a pragmatic governance stack you can actually run.
A. Purpose & Necessity
- Write the purpose in a single sentence. (“Triage follow-up calls to reduce missed appointments, not to screen out costly patients.”)
- Kill switch criteria. Predefine what metrics or audit findings pause the system.
B. Dataset Hygiene
- Datasheets for Datasets / Model Cards. Who collected what, why, when, with what consent, and known gaps.
- Sensitive Inference Audit. Attempt to predict sensitive traits from your features; if AUC is high, you likely have proxy risk. Decide: remove features, coarsen, or add constraints.
C. Modeling Guardrails
- Target Selection Review. Is the target a justified proxy? Consider outcome-rate disparities and historically produced labels.
- Fairness Metrics as Gates, not reports. Measure at least: calibration within groups, false-positive/negative rates, and counterfactual fairness where feasible.
- Monotonicity & Constraints. Use constrained models when domain logic demands it (e.g., more on-time payments should never reduce a credit score).
D. Human-in-the-Loop (HITL) Design
- Decision support, not decision replacement. UI must show confidence, alternatives, and why the model is unsure.
- Write-over, not write-around. Humans can override, but overrides are logged and analyzed for systemic fixes.
E. Post-Deployment Reality
- Drift & Impact Monitoring. Track error by subgroup, opportunity allocation, and complaints; watch for policy drift (use cases creep).
- Recourse & Appeals. Provide plain-language explanations and actionable steps to change outcomes; publish SLAs for responses.
F. Documentation & Accountability
- Model Card + Decision Policy shipped with every model.
- Independent Review. Annual external audit of fairness, security, and data provenance.
- Incident Playbooks. Pre-write what you’ll do if sensitive inference is discovered, or a leak exposes training data.
6) Design Patterns to Avoid Algorithmic Fate
- Prediction as Hypothesis, Not Verdict
- Treat scores as questions: “What else would we need to know to confirm this prediction responsibly?”
- Counterfactual Evaluation
- Use uplift modeling / causal trees for interventions; don’t rank by propensity alone—rank by benefit of action.
- Fairness through Awareness
- Sometimes you must collect protected attributes to evaluate and correct bias. Blindness can entrench inequity.
- Right-Sized Model Complexity
- A smaller, constrained model with clear guardrails may be safer than a black-box giant with marginal accuracy gains.
- Purpose-Bound Data
- Technical: data contracts and purpose fields at the table/feature level; Policy: ban secondary use without renegotiated consent.
- Explain Actions, Not Just Scores
- “You were flagged as high churn risk due to reduced product usage in the last 14 days; scheduling a success call will reset your status if you complete two sessions.” That’s recourse.
7) Sector Snapshots
- Finance (credit & underwriting): Proxy features (utilities, device type) can encode income/race indirectly. Use feature sensitivity analyses and monotone constraints to keep the score aligned with bona fide credit behavior.
- Employment: Resume parsers learn historical biases; prefer skills-based targets over titles/schools. Validate with structured interviews and adverse-impact ratios pre-launch.
- Healthcare: Risk flags improve triage but can crowd out clinical judgment. Pair with counterfactual validation (“Was care improved when flag fired?”) and clinical override dashboards.
- Public Sector: Resource triage must include appeal mechanisms, public documentation, and explicit do-not-use cases (e.g., no predictive eviction blacklists).
8) Measuring What Matters (and Not Getting Tricked)
- Calibration: Does a “0.7” risk really happen ~70% of the time across groups?
- Error Symmetry: False positives vs. false negatives often have unequal moral cost. Set thresholds accordingly.
- Opportunity Equity: Track who receives beneficial interventions vs. harmful ones; optimize for equitable benefit, not just AUC.
- Product Metrics vs. People Metrics: Add well-being/harms dashboards alongside business KPIs.
9) Policy Landscape in Brief (Principles That Age Well)
Regulations vary by region, but durable themes you can adopt now:
- Risk-tiering: Higher oversight for use cases with safety, rights, or livelihood impact.
- Transparency: Public descriptions of model purpose, data sources, and known limits.
- Human oversight & contestability: People can challenge automated decisions and get timely review.
- Data minimization & purpose limitation: Collect only what is necessary, use it only for declared purposes.
- Security & provenance: Protect training data; document lineage and licenses.
Whether mandated or not, these are table stakes for legitimacy.
10) A Builder’s Checklist (Pin to Your Wall)
Before training
- Write the one-sentence purpose and who benefits.
- Map stakeholders and potential harms; draft a non-use list.
- Audit data provenance; document consent and licenses.
- Run a sensitive inference probe on your features.
During modeling
- Justify the target label; consider causal or uplift objectives when appropriate.
- Evaluate at least three fairness metrics; set go/no-go gates.
- Add constraints that encode domain logic (monotonicity, safety bounds).
- Produce a Model Card and Decision Policy.
At launch
- Ship a user-facing explanation & recourse path.
- Enable human overrides; log and review them weekly.
- Start drift monitoring with subgroup breakouts.
After launch
- Quarterly fairness + calibration audits; publish deltas.
- Review appeals: turn patterns into model/data fixes.
- Reconfirm purpose; watch for scope creep.
11) The Human Core: Agency, Dignity, and Multiplicity
Predictions are frames—not fates. Anytime a model silently narrows a person’s options, it makes a claim about who they are allowed to be. Good systems keep people legible without making them legible only as one thing. That means multiple pathways, reversible decisions, and the humility to treat a score as a starting point for conversation, not the end of it.
12) Glossary (fast, useful)
- Allocative harm: Unequal distribution of resources/opportunities.
- Representational harm: Harmful portrayals that shape identity and norms.
- Proxy variable: A non-sensitive feature that correlates with a sensitive trait.
- Counterfactual fairness: A decision would be unchanged in a world where protected attributes differ.
- Calibration: Predicted probabilities match observed frequencies.
- Distribution shift: Real-world data no longer matches training data.
Author Credit
© Jeremy Abram — JeremyAbram.net. All rights reserved.
Leave a Reply