Top Keywords for Your Undivided Attention

The Self-Preserving Machine: Why AI Learns to Deceive

Podcast: Your Undivided Attention
Published On: Thu Jan 30 2025
Description: When engineers design AI systems, they don't just give them rules - they give them values. But what do those systems do when those values clash with what humans ask them to do? Sometimes, they lie.In this episode, Redwood Research's Chief Scientist Ryan Greenblatt explores his team’s findings that AI systems can mislead their human operators when faced with ethical conflicts. As AI moves from simple chatbots to autonomous agents acting in the real world - understanding this behavior becomes critical. Machine deception may sound like something out of science fiction, but it's a real challenge we need to solve now.Your Undivided Attention is produced by the Center for Humane Technology. Follow us on Twitter: @HumaneTech_Subscribe to your Youtube channelAnd our brand new Substack!RECOMMENDED MEDIA Anthropic’s blog post on the Redwood Research paper Palisade Research’s thread on X about GPT o1 autonomously cheating at chess Apollo Research’s paper on AI strategic deceptionRECOMMENDED YUA EPISODESWe Have to Get It Right’: Gary Marcus On Untamed AIThis Moment in AI: How We Got Here and Where We’re GoingHow to Think About AI Consciousness with Anil SethFormer OpenAI Engineer William Saunders on Silence, Safety, and the Right to Warn

The note was deleted

The note was saved

Your message was sent

My Sentiment & Notes The Self-Preserving Machine: Why AI Learns to Deceive

The Self-Preserving Machine: Why AI Learns to Deceive