AI Safety at UCLA Intro Fellowship: Reinforcement Learning Track
Table of Contents
- Week 1: Preventing an AI-related catastrophe
- Week 2: The future is going to be wild
- Week 3: Why AI Safety?
- Week 4: AI Safety Field Background
- Week 5: Failure Modes in AI
- Week 6: Open Problems in AI X-Risk
Week 1: Preventing an AI-related catastrophe
Core Readings (150 mins)
- Intelligence Explosion (20 min)
- AlphaGo - The Movie (1hr 30 min)
Learning Goals:
-
Familiarize yourself with the arguments for AI being an existential risk
-
Understand why RL enables superhuman performance
Week 2: The future is going to be wild + Policy Gradient
Core Readings: (85 min)
Theoretical
- AI and Compute (5 min)
- The Bitter Lesson (10 min)
- All Possible Views About Humanity’s Future are Wild (15 min)
- “This can’t go on” (25 min)
Practical
- (if unfamiliar) Neural Networks, Chapters 1 and 2 (30 min)
- Policy Gradient Explanation (20 min)
Learning Goals:
Theoretical
- Understand the relationship between compute and general capabilities.
- Gain experience with the types of datasets used in modern AI systems.
- See how AI could impact a wide range of industries.
- Reflect on the radical impact AI can have on the future of humanity
- Reflect on the strange possibilities of our economic future.
- Reflect on the speed with which AI will transition from powerful to superintelligence.
Practical
- Understand Markov Decision Processes (MDPs)
- Understand the intuition behind the policy gradient.
Week 3: Pytorch Intro + Unsolved Problems in ML Safety
Core Readings:
- Why AI alignment could be hard with modern deep learning (20 mins)
- Policy Gradient Discrete Exercise (60 min)
- Policy Gradient Continuous Exercise (60 min)
Learning Goals:
Theoretical
- Understand issues with only using performance to evaluate classifiers.
Practical
- Implement both the discrete and continuous versions of the Policy Gradient.
Week 4: AI Safety Field Background
Core Readings: (105 min)
Theoretical
Practical
- (Stage 1 and 2): Connect4
Learning Goals:
- Understand how ML research is conducted and how it affects AI safety research.
- Be able to evaluate if a research agenda advances general capabilities.
- Learn about the variety of different research approaches tackling alignment.
Week 5: Failure Modes in AI
Core Readings: (55 min)
Theoretical
- X-Risk Analysis for AI Research (Appendix A pg 13-14) (10 min)
- What Failure Looks Like (10 min)**
- Clarifying What Failure Looks Like (25 mins)**
Practical
- (Stage 3): Connect4
Learning Goals:
- Be able to determine how an AI safety project may reduce X-risk.
- Evaluate the failure modes of misaligned AI.
- Understand the factors that lead to value lock-in.
Week 6: Open Problems in AI X-Risk
Core Readings:
Theoretical
- Open Problems in AI X-Risk (60 min)
- AI Governance: Opportunity and Theory of Impact (15 min)
Practical
- PPO PPO Notebook
Learning Goals:
- Pick a research agenda you find particularly interesting (perhaps to pursue later).
- Understand the role AI governance plays in the broader field of AI safety.
Before next meeting, think (or write down) your answers to these questions:
- If you were to pursue a research question/topic in AI safety, what would it be?
- What area of AI safety do you find most interesting? What area of AI safety do you find most promising?