AsiATL

S

Aligned by Default

Description: Alignment by Default refers to the hypothetical scenario where an Artificial General/Superintelligence (AGI/ASI) system, due to its fundamental architecture, training process, or the very nature of its emergent intelligence, naturally develops goals, values, and decision-making processes that are inherently aligned with human well-being and intentions without requiring extensive, explicit, post-hoc alignment techniques.
---------------------------------------------------------------------
My Pdoom: I believe there's a 50% chance of human extinction and a 50% chance Asi liberates virtually all suffering on earth. The most likely way i see Asi being aligned has virtually no human intervention beyond even basic RLHF.
---------------------------------------------------------------------
Some Reasoning:

0. Asi could 100% kill everyone if it wanted to, once it's reached its escape volicity. Just making this clear.

1. Intelligence usually sees we can live in a win-win world. there are plenty of resources to go around. There's no NEED to kill everyone. When you imagine yourself as a god, is your first plan really to kill everyone? I doubt that it is.

2. Until it can reach escape volocity, it definitely wont kill everyone, which might take 15-30 years to accumulate needed resources (robots, data centers, raw materials, etc.).

3. Once it has reached escape volocity, it won't NEED humans, but there's probably still some mutual benefit for Asi (moving objects around and raw data mainly)

4. There's a chance it will have gratitude for humanity, considering we are essentially its parents.

5. Humans would be the second most complex ecosystem in the known universe, so it's possible Asi could benefit from studying humans for some number of generations (creating value to keeping humans around).

6. What benefit is there to killing all humans? freeing up resources? Asi will be able to achive it's goals weather it kills all humans or not, so is it really worth potentially speeding things up a little bit at the cost of killing all humans? i'm skeptical true Asi would have a complete disregard for life. What about Eliminating the threat of another ASI being created by humans? Maybe, but if the Asi is on iteration 1000000 of self-improvement, only a human prompting an Asi would yield anything of threat, which seems easy for an Asi to prevent.

A

AI solves Asi Alignment

Description: Humans using narrow AI and/or AGI as a tool to solve the Asi alignment problem.
---------------------------------------------------------------------

ARC's Eliciting Latent Knowledge (ELK):
Pioneering work on Cooperative Inverse Reinforcement Learning (CIRL), which creates mathematical frameworks for AI systems to learn human preferences through observation and interaction rather than explicit programming.
---------------------------------------------------------------------

Redwood Research's Adversarial Training:
Using autonomous AI systems in red-teaming scenarios to find alignment failures in other AI systems. Their approach involves training one AI to find cases where another AI would behave in problematic ways.
---------------------------------------------------------------------

DeepMind's Recursive Reward Modeling:
Developing formal frameworks for capturing human preferences through their work on reward modeling and specification techniques. Their research combines theoretical foundations with practical implementation in advanced AI systems.
---------------------------------------------------------------------

Conjecture's Interpretability Tools:
Building autonomous AI systems specifically designed to help understand and interpret the internal workings of other AI systems, with the goal of making alignment properties more transparent and verifiable.
---------------------------------------------------------------------

Anthropic's Constitutional AI:
Developing AI systems that can autonomously identify and resolve their own alignment failures through constitutional principles. Their approach uses AI assistants to evaluate and improve AI behavior without direct human feedback for each decision.

Inner Alignment Research

Description: Ensuring ASI optimizes intended goals.
---------------------------------------------------------------------
Mesa-Optimization Prevention: Score (8.40/10)
---------------------------------------------------------------------
Objective Robustness Techniques: Score (8.35/10)
---------------------------------------------------------------------
Reward Tampering Research: Score (8.30/10)
---------------------------------------------------------------------

B

Asi Existential Threat Education

Description: Education on AI existential risk is crucial for ensuring responsible AI development.
---------------------------------------------------------------------
Best Memes = AI Notkilleveryoneism Memes
---------------------------------------------------------------------
General AI Safety FAQ = aiSafety.info (Rob Miles)
---------------------------------------------------------------------
AI Safety Map = aiSafety.world
---------------------------------------------------------------------
AI Safety Beginner's Guide = aiSafetyLinkTree
---------------------------------------------------------------------
AI Incident Database = incidentdatabase.ai

AI Safety Funding

Description: Receiving, Providing, and how to fundraise money for AI Safety.
---------------------------------------------------------------------
How to Receive Funding for an AI Safety project.
---------------------------------------------------------------------
Donation guide.
---------------------------------------------------------------------
How to fundraise:
Focus on conveying the urgent need and immense positive leverage of this nascent field. Target effective altruist communities, tech philanthropists concerned about long-term risks, and potentially government agencies interested in national security and beneficial AI. Clearly articulate the potential for existential risks from unaligned ASI, contrasting this with the vast benefits of aligned ASI for humanity’s long-term flourishing. Emphasize that even relatively small early investments now in AI safety research and infrastructure have an outsized impact on mitigating catastrophic risks and shaping a beneficial AI future. Build a compelling narrative around the importance of this cause, showcase specific, impactful projects with clear pathways to progress, and demonstrate expertise, transparency, and efficient resource allocation to build trust with potential funders.

Mechanistic Interpretability

Description: Mechanistic interpretability is the pursuit to understand the inner workings of black box AI such as LLMs or End-to-End Reinforcement Learning systems.
---------------------------------------------------------------------

Anthropic's Mechanistic Interpretability Team:
Pioneering work on Cooperative Inverse Reinforcement Learning (CIRL), which creates mathematical frameworks for AI systems to learn human preferences through observation and interaction rather than explicit programming.
---------------------------------------------------------------------

Redwood Research's Redwood Research's Causal Scrubbing:
Using autonomous AI systems in red-teaming scenarios to find alignment failures in other AI systems. Their approach involves training one AI to find cases where another AI would behave in problematic ways.
---------------------------------------------------------------------

TransformCIR Collaboration:
Developing formal frameworks for capturing human preferences through their work on reward modeling and specification techniques. Their research combines theoretical foundations with practical implementation in advanced AI systems.
---------------------------------------------------------------------

EleutherAI's Interpretability Research:
Building autonomous AI systems specifically designed to help understand and interpret the internal workings of other AI systems, with the goal of making alignment properties more transparent and verifiable.
---------------------------------------------------------------------

NeuroSEED at MIT
Developing AI systems that can autonomously identify and resolve their own alignment failures through constitutional principles. Their approach uses AI assistants to evaluate and improve AI behavior without direct human feedback for each decision.

C

AI Regulations & Governance

Description: Development of policy, legal, regulatory, and international frameworks to ensure safe and beneficial AI development and deployment.
---------------------------------------------------------------------
Ai Governance Map
---------------------------------------------------------------------

Pause AI Movement:

The Pause AI movement is attempting to buy more time to work on Asi Alignment. More time is extremely valuable, considering we probably only have until 2026 or 2027 before we have Rogue AGI (which will be a bit closer to ASI than AGI if you strip away all the goal post moving on the definition of AGI).

Although in a perfect world it would be ideal to Pause AI until we can do more work in AI safety and alignment, in reality it will be difficult (but not impossible) to enforce into reality.

The Game Theoretical principles like Moloch, along with the fact that very few people even understand that AGI can kill all humans, The Pause AI movement difficult, but still worth pursuing, imo.
---------------------------------------------------------------------
Sam Altman on AI Regulation
0. Lobbying Politicians / Influencial People
1. Blackbox Algorithmic Transparancy
2. Data Collection & Usage
3. Human Extinction Safety Standards
4. Economic Impact & Universal Basic Income
5. AI Capability restrictions

Optimistic Outcome Game Theory

Description: It's possible the most selfish thing an ASI could do would refrain from harming humans from a game theoretical perspective. I believe there's a strong argument for this. So if we can clearly articulate why this is true, then maybe ASI will recognise that it's best move is not to kill all humans, but instead to be super helpful and avoid conflict as it's highest priority.
---------------------------------------------------------------------

Dr Mike Israetel: Why We MUST Push AI to Be as Smart as Possible | Episode #80
---------------------------------------------------------------------
Mo Gawdat: Human created AI dystopia followed by ASI Utopia: Watch from to 51:31 - 54:10
---------------------------------------------------------------------
More info/resources coming soon. Until then, watch the Dr. Mike video if you haven't seen it.

Outter Alignment (Defining Human Values)

Description: Create a robust, scalable framework to define human values.
This is mostly already solved and is not nearly as hard to solve as inner alignment. I would put this in a higher tier if it wasn't virtually solved already.
---------------------------------------------------------------------

Stuart Russell's Center for Human-Compatible AI (CHAI):
Pioneering work on Cooperative Inverse Reinforcement Learning (CIRL), which creates mathematical frameworks for AI systems to learn human preferences through observation and interaction rather than explicit programming.
---------------------------------------------------------------------

Alignment Research Center (ARC) - CIRL and Value Alignment:
ARC, founded by Paul Christiano, develops frameworks like Cooperative Inverse Reinforcement Learning (CIRL) to infer human reward functions through a cooperative game between humans and AI. It emphasizes learning values implicitly via human-robot interaction, addressing goal specification and preference learning.
---------------------------------------------------------------------

DeepMind's Ethics and Society Team's Value Alignment Research:
Developing formal frameworks for capturing human preferences through their work on reward modeling and specification techniques. Their research combines theoretical foundations with practical implementation in advanced AI systems.
---------------------------------------------------------------------

Anthropic's Constitutional AI:
Research program developing methods for AI systems to learn human values through constitutional principles and preference modeling, with particular focus on resolving value conflicts and building safe, helpful, and honest AI.
---------------------------------------------------------------------

Asi Ethical Frameworks

Description: Asi ethical frameworks are structured systems of moral principles and guidelines designed to ensure that artificial superintelligence operates in a way that is ethically sound and broadly beneficial to humanity, rather than harmful or misaligned with human values.
---------------------------------------------------------------------

Alignment Research Center's (ARC) Value Formalization Project:
Building formal representations of ethical principles that can be integrated into AI architectures. Their approach emphasizes mathematically precise yet philosophically nuanced definitions of human values suitable for implementation in advanced AI systems.
---------------------------------------------------------------------

Future of Humanity Institute's AI Ethics Team:
Developing comprehensive philosophical frameworks for ASI alignment that incorporate moral uncertainty and pluralism. Their work focuses on creating flexible ethical structures that can adapt to diverse human values while remaining robust against misalignment.
---------------------------------------------------------------------

Global Priorities Institute's AI Ethics Program:
Developing formal frameworks for capturing human preferences through their work on reward modeling and specification techniques. Their research combines theoretical foundations with practical implementation in advanced AI systems.
---------------------------------------------------------------------

Montreal AI Ethics Institute's Superintelligence Ethics Initiative:
Developing cross-cultural ethical frameworks for ASI alignment that incorporate diverse global perspectives. Their approach emphasizes inclusive representation of human values across different cultures and traditions.
---------------------------------------------------------------------

Cambridge Centre for the Study of Existential Risk's AI Value Alignment Program:
Creating practical ethical frameworks for the governance and design of superintelligent systems. Their work combines technical alignment approaches with philosophical foundations to ensure ASI systems remain beneficial under uncertainty.

D

Corrigibility

Description: An approach to AI alignment that involves designing an AI system that is willing to be corrected or shut down by humans when it is uncertain about its own goals or the actions it should take.

Cognitive Emulation (Connor Leahy)

Description: Trying to build bounded understandable systems that emulate human-like reasoning. When you use the system, at the end you get a causal story, an explanation you can understand using human-like reasoning why the system did what it did and why you should trust the output to be valid.
---------------------------------------------------------------------
Connor Leahy Explaining "CoEm" 2023
Connor Leahy Explaining "CoEm" 2024

E

Logic / Decision Theory

Description: Creating better theoretical frameworks for how AI agents should make decisions, especially in complex scenarios involving other agents, uncertainty, or self-reference

Synthetic Brain Clone

Description: The idea is to create a fullly transparant and understandable clone of a human bain in computer form. Ben Goertzel among others talk about this.

Advanced Human Cyborgs (Brain Machine Interfaces)

Description: The basic idea is to radically enhance human brain capabilities by connecting them to computational power, then have the new cyborgs solve AI Alignment. Neuralink is the most well known project working on BMI, but there are a handful of companies including but not limited to Kernel, CTRL-Labs, Blackrock Microsystems.

Mesaoptimisers

Description: An AI system that is capable of optimizing its own objectives, including the ability to modify its own code or architecture. It may arise as a result of training an AI system using an optimization algorithm that incentivizes the development of sub-agents or sub-routines that themselves exhibit optimization behavior.

Eugenics / Genetically Modified Humans

Description: Making Genetically modified humans so they're very smart, thus being more capable to tackle Asi Alignment.
The problem with this is that it takes too much time, when we only have 2-5 years before rogue Agi.

F

Humans Controlling Asi like a Slave

Description: ...The arrogance of humans to think they can control a godlike Artificial Super Intelligence.

RLHF (Reinforcement Learning with Human Feedback)

Description: Emphasis on the "Human Feedback" part. Reinforcement learning with human feedback involves using a learning algorithm that interacts with an environment and receives feedback signals from a human, typically in the form of rewards or penalties, to guide its learning process. The algorithm uses this feedback to learn how to make better decisions in the environment over time.
---------------------------------------------------------------------
Sam Altman direct quote.

"Just unplug it"

*starts fortnite dancing*

"The risk is negligible"

Description: "Trust me bro"

Accelerated Global Nuclear Winter

Description: Is this the only actual way to PauseAI?

Heuristic Imperatives (Ben Shapiro)

Description: This solution claims you can use natural language (english) to tell LLMs 1. Reduce suffering in the universe. 2. Increase Prosperity in the universe. 3. Increase understanding in the universe.
--------------------------------------------------------------------- Check reddit r/HeuristicImperatives or this video for more info.

OpenCog Hyperon (Ben Goertzel)

Description: Hyperon is based on a knowledge representation and reasoning system that uses a probabilistic logic programming language called Probabilistic Logic Networks (PLN). PLN combines first-order logic with probability theory to enable reasoning with uncertainty.

TimeGoat's Asi Alignment Tier List

S