One AI Thing
Posts
What is AI Safety?

What is AI Safety?

Alignment, reliability, and ... y'know ... keeping humans around

Noah Kravitz
May 09, 2023

Hello, and thanks for reading One AI Thing. Get smarter about artificial intelligence, one thing at a time.

👀 Today’s Thing: What is AI Safety?

🤖 People working on and around AI have concerns about the technology’s potential impact on humanity. Some of the concerns are short-term, like will AI put lawyers out of work, or are lenders using AI models trained on biased data when deciding who gets money and who doesn’t. Others are long-term concerns, like will AI eventually become capable and self-deterministic enough to turn all the humans into paper clips? These concerns are generally lumped into the categories of AI Ethics, AI Safety, and AI Alignment.

I tried really hard to find concise, agreed-upon definitions for all three of these labels, including explanations of the differences between them. I failed! I guess it’s an evolving field and the terminology is evolving, too. Here’s my current best take on what these three highly interrelated terms mean:

AI Ethics: Issues around AI models being trained, deployed, and used equitably, without bias, and without harm, as concerns all humans and our environment.

AI Safety: Issues, mainly technical in nature, concerning the AI models themselves (not) growing powerful beyond human control.

AI Alignment: Aligning the goals of AI models with those of humanity.

🎧 Want to go deeper? Listen to my interview with Brian Christian, author of The Alignment Problem.

📖 Backstory

☞ Executives from Anthropic, Google, Microsoft, and OpenAI met with Vice President Harris and other White House officials last week to discuss ways to balance the rapid advancement of AI with associated risks including the potential to spread misinformation, manipulate users, and upend the job market.

☞ Some AI Ethics experts, like Oxford researcher Elizabeth Renieris, criticized the White House for only inviting tech industry titans heavily invested in the commercial success of AI:

Unfortunately, and with all due respect POTUS, these are not the people who can tell us what is "most needed to protect society" when it comes to #AI
— Elizabeth M. Renieris (@hackylawyER)
9:43 PM • May 4, 2023

☞ Earlier this year, thousands of tech industry leaders, including Elon Musk and Apple co-founder Steve Wozniak, signed an open letter calling for a six-month pause on “giant” A.I. projects. “Powerful A.I. systems should be developed only once we are confident that their effects will be positive and their risks will be manageable,” the letter states. Ironically, Musk launched his own AI company roughly two weeks after the letter was published.

🔑 Keys to Understanding

🥇 Science-fiction author Isaac Asimov introduced an early idea of AI safety to readers in his 1942 short story, Runaround. His “Three Laws of Robotics” have since been widely cited in academic, cultural, and philosophical explorations of our technological future, including AI.

🥈AI safety research and AI ethics are two relatively new disciplines dedicated to nudging technology in a good direction that benefits everyone. But they do it from different angles. AI safety deals mainly in the technical aspects of how AI models operate. AI ethics is more concerned with calling out bias, guarding against unintended consequences, and generally trying to get organizations to act in line with their stated principles.

🥉 I asked ChatGPT for its take on the matter, too:

“Some key concerns in AI safety include:

Robustness: Ensuring that AI systems remain reliable and perform as expected, even under adversarial or uncertain conditions.

Interpretability: Developing methods for understanding and explaining AI system behavior, making it easier for humans to trust and manage these systems.

Alignment: Ensuring that AI systems have objectives and values that align with those of the humans they serve, avoiding potential misaligned incentives or unintended consequences.

Distributional Shift: Addressing the challenge of AI systems performing well on training data but failing on new, unseen data due to differences in the data distribution.

Long-term safety: Anticipating and mitigating risks associated with more advanced AI systems, such as artificial general intelligence (AGI), which could have far-reaching consequences for humanity.

Ethics and societal impact: Considering the broader ethical and societal implications of AI development, including fairness, privacy, and the potential for job displacement.”

🕵️ Need More?

Searching for a certain kind of AI thing? Reply to this email and let me know what you'd like to see more of.

Until the next thing,

- Noah

p.s. Want to sign up for the One AI Thing newsletter or share it with a friend? You can find me here.