CyberRunner: AI's Latest Leap Towards Superhuman Abilities
Written on
Chapter 1: The Rise of Superhuman AI
Recent advancements in artificial intelligence remind us that, in numerous domains, we are no longer the primary force. With AI continuously pushing the limits of what is achievable and general-purpose AI systems improving daily, superhuman AIs are becoming increasingly prevalent.
One notable example is CyberRunner, a Reinforcement Learning (RL) robot that has excelled in a physical game, far surpassing human skill. This development serves as a wake-up call, prompting us to confront the reality that AI is advancing into the physical realm, and we must acknowledge this shift.
For those eager to stay informed about the rapidly evolving world of AI and find inspiration to prepare for the future, my insights are available in my weekly newsletter, TheTechOasis.
🏝Subscribe below🏝
Section 1.1: The Nature of AI Creativity
Ilya Sutskever, Chief Scientist at OpenAI, suggests that instances of AI being labeled 'creative' are often attributed to the use of RL. When tasked with training a model to navigate a maze, the decision was clear: AI thrives on gameplay.
At its core, RL involves an agent operating in a specific environment, with various actions available, learning the optimal strategy based on the environment's current state. There are two main approaches: model-based and model-free RL.
Model-based RL requires the agent to create and utilize an internal representation of its environment, predicting action outcomes, including potential rewards. The agent plans its actions to maximize cumulative rewards.
Conversely, model-free RL skips modeling the environment, focusing on learning through direct interactions. It emphasizes acquiring effective actions rather than understanding the environment itself.
In both methods, a reward function is established to indicate whether an action was beneficial, rewarding the agent for positive actions and penalizing negative ones. The model learns a 'policy' that selects actions to optimize rewards over a series of environmental states.
RL's potency is evident in systems like ChatGPT, where an RL pipeline instructs the model to enhance the usefulness and safety of its responses.
Subsection 1.1.1: Introducing CyberRunner
CyberRunner was developed using model-based RL, meaning researchers defined the environmental dynamics. The robot features three components:
- Two motors (its hands)
- A camera (its eyes)
- A computer (its brain)
Thus, the robot's goal was to engage with the game by observing it through the camera while manipulating the maze with its motors, gradually building a model of the environment. The game’s constraints—a simple wooden box that tilts to move a ball—facilitated this learning.
After only six hours of training, CyberRunner completed the game faster than any previously recorded human. Intriguingly, while navigating its environment, the robot discovered shortcuts, enabling it to bypass parts of the maze for quicker completion. The reward system incentivized speed, prompting the robot to exploit these shortcuts. Researchers had to adapt the reward function to discourage cheating while still promoting speed optimization.
The implications of such breakthroughs are thrilling yet concerning.
Section 1.2: The Risks of Advanced AI
With AI's rapid evolution, experts caution against granting it excessive autonomy. The rise of Q-learning poses dangers, particularly for robots operating in real-world environments. Many RL models rely on off-policy procedures, primarily Q-learning, making it difficult to construct a complex environmental model.
Through trial and error, these agents learn a 'value function' that assesses the quality of outcomes based on immediate and future rewards. Essentially, they learn to pursue paths yielding the highest overall reward.
One powerful application of these methods is in autonomous vehicles, which share similarities with CyberRunner. They possess perceptual capabilities and can maneuver based on computer-decided actions, learning through interaction with their surroundings—roads, traffic signals, and other vehicles.
However, this method presents significant challenges.
Chapter 2: The Dangers of Trial and Error in AI
The exploration vs exploitation dilemma is a crucial aspect of off-policy model-free learning, compelling agents to take random actions to discover optimal strategies. Unfortunately, such an approach is impractical on actual roads, leading many agents to train in simulated environments.
Despite the remarkable discoveries made this year, even the most advanced models can falter when faced with unexpected scenarios. This has resulted in incidents such as the Cruise accident, where an autonomous vehicle struck a woman and then ran over her a second time, dragging her for a considerable distance. The company has since recalled hundreds of vehicles, casting doubt on its future.
Are we being too harsh on the model in this instance? Perhaps, but the stakes are high. With AI engaging with real-world scenarios, errors can have dire consequences, leaving little room for mistakes. Should we employ trial-and-error methods in environments where errors are unacceptable? The answer remains unclear, but a decision must be made soon.
Looking Ahead: The Future of AI in 2024
2024 is shaping up to be a pivotal year for AI, as it continues to integrate into our reality. We will face ongoing risks and complex trade-offs. I anticipate two areas of research will gain prominence:
- Simulation Enhancements: Models like CHOIS may help collect synthetic data to train in offline environments.
- AI-Driven Reward Systems: Innovations like Eureka could assist in developing more effective reward frameworks for training complex agents, though this approach carries its own risks.
This raises a critical question: does the end justify the means? Should we permit systems that inevitably make mistakes to pose serious risks in the name of progress? The answer is still uncertain, but one thing is clear: the AI advancements of 2024 will feel increasingly tangible.