2024/08/30

Gullible Oracles: Kevin Roose Games AI Chatbots—With Ease

“If chatbots can be persuaded to change their answers by a paragraph of white text, or a secret message written in code, why would we trust them with any task, let alone ones with actual stakes?”
– Tech columnist Kevin Roose, on how easily AI systems can be gamed. Eager to improve his tainted reputation with chatbots after his viral Sydney take-down forced industry-wide safety measures (Meta’s Llama 3: “I hate Kevin Roose!“), the American author and journalist uncovers a number of shockingly simple hacks to steer answers. “Oracles shouldn’t be this easy to manipulate,” he warns.
Metadata: People: / Contributors:
$40 USD