Asi Alignment Tier List
S
TimeGoat's Tier List Details
TimeGoat reserves S Tier for solutions that seem like they will actually work and have lower than 5% pdoom.
A
AI solves Asi Alignment
Description: Using AI itself as a tool to solve the alignment problem.
---------------------------------------------------------------------
ARC's Eliciting Latent Knowledge (ELK):
Pioneering work on Cooperative Inverse Reinforcement Learning (CIRL), which creates mathematical frameworks for AI systems to learn human preferences through observation and interaction rather than explicit programming.
---------------------------------------------------------------------
Redwood Research's Adversarial Training:
Using autonomous AI systems in red-teaming scenarios to find alignment failures in other AI systems. Their approach involves training one AI to find cases where another AI would behave in problematic ways.
---------------------------------------------------------------------
DeepMind's Recursive Reward Modeling:
Developing formal frameworks for capturing human preferences through their work on reward modeling and specification techniques.
Their research combines theoretical foundations with practical implementation in advanced AI systems.
---------------------------------------------------------------------
Conjecture's Interpretability Tools:
Building autonomous AI systems specifically designed to help understand and interpret the internal workings of other AI systems, with the goal of making alignment properties more transparent and verifiable.
---------------------------------------------------------------------
Anthropic's Constitutional AI:
Developing AI systems that can autonomously identify and resolve their own alignment failures through constitutional principles. Their approach uses AI assistants to evaluate and improve AI behavior without direct human feedback for each decision.
Optimistic Outcome Game Theory
Description: It's possible the most game theoretically selfish thing an ASI could do would refrain from harming humans. I believe there's a very strong argument for this.
---------------------------------------------------------------------
Dr Mike Israetel: Why We MUST Push AI to Be as Smart as Possible | Episode #80
---------------------------------------------------------------------
Mo Gawdat: Human created AI dystopia followed by ASI Utopia: Watch from to 51:31 - 54:10
---------------------------------------------------------------------
More info/resources coming soon. Until then, watch the Dr. Mike video if you haven't seen it.
Asi Ethical Frameworks
Description: Asi ethical frameworks are structured systems of moral principles and guidelines designed to ensure that artificial superintelligence operates in a way that is ethically sound and broadly beneficial to humanity, rather than harmful or misaligned with human values.
---------------------------------------------------------------------
Alignment Research Center's (ARC) Value Formalization Project:
Building formal representations of ethical principles that can be integrated into AI architectures. Their approach emphasizes mathematically precise yet philosophically nuanced definitions of human values suitable for implementation in advanced AI systems.
---------------------------------------------------------------------
Future of Humanity Institute's AI Ethics Team:
Developing comprehensive philosophical frameworks for ASI alignment that incorporate moral uncertainty and pluralism. Their work focuses on creating flexible ethical structures that can adapt to diverse human values while remaining robust against misalignment.
---------------------------------------------------------------------
Global Priorities Institute's AI Ethics Program:
Developing formal frameworks for capturing human preferences through their work on reward modeling and specification techniques.
Their research combines theoretical foundations with practical implementation in advanced AI systems.
---------------------------------------------------------------------
Montreal AI Ethics Institute's Superintelligence Ethics Initiative:
Developing cross-cultural ethical frameworks for ASI alignment that incorporate diverse global perspectives. Their approach emphasizes inclusive representation of human values across different cultures and traditions.
---------------------------------------------------------------------
Cambridge Centre for the Study of Existential Risk's AI Value Alignment Program:
Creating practical ethical frameworks for the governance and design of superintelligent systems. Their work combines technical alignment approaches with philosophical foundations to ensure ASI systems remain beneficial under uncertainty.
B
Education
Description: Education on AI existential risk is crucial for ensuring responsible AI development.
---------------------------------------------------------------------
Best Memes = AI Notkilleveryoneism Memes
---------------------------------------------------------------------
General AI Safety FAQ = aiSafety.info (Rob Miles)
---------------------------------------------------------------------
AI Safety Map = aiSafety.world
---------------------------------------------------------------------
AI Safety Beginner's Guide = aiSafetyLinkTree
---------------------------------------------------------------------
AI Incident Database = incidentdatabase.ai
AI Safety Funding
Description: Receiving, Providing, and how to fundraise money for AI Safety.
---------------------------------------------------------------------
How to Receive Funding for an AI Safety project.
---------------------------------------------------------------------
Donation guide.
---------------------------------------------------------------------
How to fundraise:
Focus on conveying the urgent need and immense positive leverage of this nascent field. Target effective altruist communities, tech philanthropists concerned about long-term risks, and potentially government agencies interested in national security and beneficial AI. Clearly articulate the potential for existential risks from unaligned ASI, contrasting this with the vast benefits of aligned ASI for humanity’s long-term flourishing. Emphasize that even relatively small early investments now in AI safety research and infrastructure have an outsized impact on mitigating catastrophic risks and shaping a beneficial AI future. Build a compelling narrative around the importance of this cause, showcase specific, impactful projects with clear pathways to progress, and demonstrate expertise, transparency, and efficient resource allocation to build trust with potential funders.
Mechanistic Interpretability
Description: Mechanistic interpretability is the pursuit to understand the inner workings of black box AI such as LLMs or End-to-End Reinforcement Learning systems.
---------------------------------------------------------------------
Anthropic's Mechanistic Interpretability Team:
Pioneering work on Cooperative Inverse Reinforcement Learning (CIRL), which creates mathematical frameworks for AI systems to learn human preferences through observation and interaction rather than explicit programming.
---------------------------------------------------------------------
Redwood Research's Redwood Research's Causal Scrubbing:
Using autonomous AI systems in red-teaming scenarios to find alignment failures in other AI systems. Their approach involves training one AI to find cases where another AI would behave in problematic ways.
---------------------------------------------------------------------
TransformCIR Collaboration:
Developing formal frameworks for capturing human preferences through their work on reward modeling and specification techniques.
Their research combines theoretical foundations with practical implementation in advanced AI systems.
---------------------------------------------------------------------
EleutherAI's Interpretability Research:
Building autonomous AI systems specifically designed to help understand and interpret the internal workings of other AI systems, with the goal of making alignment properties more transparent and verifiable.
---------------------------------------------------------------------
NeuroSEED at MIT
Developing AI systems that can autonomously identify and resolve their own alignment failures through constitutional principles. Their approach uses AI assistants to evaluate and improve AI behavior without direct human feedback for each decision.
AI Regulations & Governance
Description: Development of policy, legal, regulatory, and international frameworks to ensure safe and beneficial AI development and deployment.
---------------------------------------------------------------------
Ai Governance Map
---------------------------------------------------------------------
Pause AI Movement:
The Pause AI movement is attempting to buy more time to work on Asi Alignment. More time is extremely valuable, considering we probably only have until 2026 or 2027 before we have Rogue AGI (which will be a bit closer to ASI than AGI if you strip away all the goal post moving on the definition of AGI).
Although in a perfect world it would be ideal to Pause AI until we can do more work in AI safety and alignment, in reality it will be difficult (but not impossible) to enforce into reality.
The Game Theoretical principles like Moloch, along with the fact that very few people even understand that AGI can kill all humans, The Pause AI movement difficult, but still worth pursuing, imo.
---------------------------------------------------------------------
Sam Altman on AI Regulation
0. Lobbying Politicians / Influencial People
1. Blackbox Algorithmic Transparancy
2. Data Collection & Usage
3. Human Extinction Safety Standards
4. Economic Impact & Universal Basic Income
5. AI Capability restrictions
Human Value Alignment Frameworks
Description: Create a robust, scalable framework to encode human values into ASI.
---------------------------------------------------------------------
Stuart Russell's Center for Human-Compatible AI (CHAI):
Pioneering work on Cooperative Inverse Reinforcement Learning (CIRL), which creates mathematical frameworks for AI systems to learn human preferences through observation and interaction rather than explicit programming.
---------------------------------------------------------------------
Alignment Research Center (ARC) - CIRL and Value Alignment:
ARC, founded by Paul Christiano, develops frameworks like Cooperative Inverse Reinforcement Learning (CIRL) to infer human reward functions through a cooperative game between humans and AI.
It emphasizes learning values implicitly via human-robot interaction, addressing goal specification and preference learning.
---------------------------------------------------------------------
DeepMind's Ethics and Society Team's Value Alignment Research:
Developing formal frameworks for capturing human preferences through their work on reward modeling and specification techniques.
Their research combines theoretical foundations with practical implementation in advanced AI systems.
---------------------------------------------------------------------
Anthropic's Constitutional AI:
Research program developing methods for AI systems to learn human values through constitutional principles and preference modeling, with particular focus on resolving value conflicts and building safe, helpful, and honest AI.
---------------------------------------------------------------------
C
Corrigibility
Description: An approach to AI alignment that involves designing an AI system that is willing to be corrected or shut down by humans when it is uncertain about its own goals or the actions it should take.
Cognitive Emulation (Connor Leahy)
Description: Trying to build bounded understandable systems that emulate human-like reasoning. When you use the system, at the end you get a causal story, an explanation you can understand using human-like reasoning why the system did what it did and why you should trust the output to be valid.
---------------------------------------------------------------------
Connor Leahy Explaining "CoEm" 2023
Connor Leahy Explaining "CoEm" 2024
D
Logic / Decision Theory
Description: Creating better theoretical frameworks for how AI agents should make decisions, especially in complex scenarios involving other agents, uncertainty, or self-reference
E
Synthetic Brain Clone
Description: The idea is to create a fullly transparant and understandable clone of a human bain in computer form. Ben Goertzel among others talk about this.
Advanced Human Cyborgs (Brain Machine Interfaces)
Description: The basic idea is to radically enhance human brain capabilities by connecting them to computational power, then have the new cyborgs solve AI Alignment. Neuralink is the most well known project working on BMI, but there are a handful of companies including but not limited to Kernel, CTRL-Labs, Blackrock Microsystems.
F
Humans Controlling Asi like a Slave
Description: ...The arrogance of humans to think they can control a godlike Artificial Super Intelligence.
Maximum Truth Seeking (Elon Musk)
Description: There's not much info from elon about this yet, will update as new info comes in. He seems to be building a new project to try to rival OpenAI and Google using this Maximum Truth Seeking idea. More Info coming soon.
RLHF (Reinforcement Learning with Human Feedback)
Description: Emphasis on the "Human Feedback" part. Reinforcement learning with human feedback involves using a learning algorithm that interacts with an environment and receives feedback signals from a human, typically in the form of rewards or penalties, to guide its learning process. The algorithm uses this feedback to learn how to make better decisions in the environment over time.
---------------------------------------------------------------------
Sam Altman direct quote.
"Just unplug it"
*starts fortnite dancing*
"The risk is negligible"
Description: "Trust me bro"
Accelerated Global Nuclear Winter
Description: I hope this is not the best solution to stop AI Capability Accelaration.
Mesaoptimisers
Description: An AI system that is capable of optimizing its own objectives, including the ability to modify its own code or architecture. It may arise as a result of training an AI system using an optimization algorithm that incentivizes the development of sub-agents or sub-routines that themselves exhibit optimization behavior.
Heuristic Imperatives (Ben Shapiro)
Description: This solution claims you can use natural language (english) to tell LLMs 1. Reduce suffering in the universe. 2. Increase Prosperity in the universe. 3. Increase understanding in the universe.
---------------------------------------------------------------------
Check reddit r/HeuristicImperatives or this video for more info.
OpenCog Hyperon (Ben Goertzel)
Description: Hyperon is based on a knowledge representation and reasoning system that uses a probabilistic logic programming language called Probabilistic Logic Networks (PLN). PLN combines first-order logic with probability theory to enable reasoning with uncertainty.