News moves fast. 12 hours ago I was enjoying the demolition that the US put on Paraguay when I heard that Anthropic had shut down access to Fable and Mythos (their latest and most powerful models).
Since then, more news has surfaced about what went down, and I feel like it's a good exercise in understanding both the policy and psychodrama around AI today - with maybe even a moral or two like Aesop's .... FABLES (yes I'm going to keep making bad jokes and no you can't stop me)
Part 1: The event
Let's first lay out the facts of the matter. To the best of my knowledge, here's what transpired.
- Some researchers (apparently at Amazon) uncovered ways to jailbreak Fable to (possibly) perform cybersecurity-related attacks.
- Someone (apparently Andy Jassy) told the White House (or the Treasury Secretary) about this.
- The WH and Anthropic had a back and forth on what needed to be done about this: Anthropic claimed these were not serious jalbreaks, and the WH said that they were and that Anthropic needed to either take down the model or something...
- The WH then invoked export controls to demand that Anthropic block access to Fable/Mythos for foreign nationals (regardless of where they happen to be)
- Anthropic blocked access entirely, arguing that they had no way of distinguishing foreign nationals from American citizens.
Now most of the reporting will focus on solidifying the facts of the matter (I hope) and will probably also focus on the drama. Drama is fun (don't get me wrong), but it can make thinking about policy really hard.
So let me lay out some of the questions about the drama that might be useful to have answered, but then try to focus on the bigger policy questions that come out of this.
- What were these mysterious jailbreaks? This is actually an important question that will shape the policy response as well.
- How were these jailbreaks flagged and sent up the chain, and why was the communication of the form "hey I called my buddy at the WH and told him stuff"?
- What actually transpired in the discussion between the WH and Anthropic?
Part II: Clean-slate Policy
Let's pretend we are working in a vacuum for a second, and think about this with policy hats on, without worrying about the actual players (unrealistic I know, but a useful exercise).
The US Government is worried about powerful models allowing any user to generate (say) cybersecurity hacks that can compromise critical national infrastructure (for e.g the financial sector which is why the Treasury Secretary is paying close attention). These models are general purpose and have many uses, and the USG doesn't just want to shut them down entirely (we can debate that, but not right now).
What they'd like to do is have some way to monitor models for specific kinds of risks, before deployment, and also on a continuing basis. Maybe there's some kind of voluntary program where providers of powerful models give access to independent testers (for eg. some kind of Center for AI Security Standards and Innovation) who can identify risks, communicate these risks to the companies involved, and make sure that mitigations are put in place. It wouldn't be perfect, but it would be an ongoing process.
If this sounds familiar, it should. because a) it's how we do cybersecurity right now without any government involvement and b) it's a little bit of how the recent WH EO was constructed (there were other parts of the EO that are problematic, but again, not for now)
In other words, there's a way to do what the government wants if this is indeed what they want and companies are willing to cooperate (this is setting aside whether you and I want the government to do this. That's a different discussion)
Part III: (we know) Drama
Well. that's all well and good. But I don't unfortunately live in a rationalist universe where I can write 20,000 word screeds on moreright.com and be "aligned" with everyone else. What's the reality here?
The first thing I want to emphasize is that drama loves a good guy (yes "guy") and a bad guy, and it's really tempting to first decide who's the bad guy and then decide the other one must be good. It would be really tempting to say for e.g "the Trump administration has no clue on AI and therefore Anthropic is the good guy", or "Tech companies are evillll and the administration is therefore doing the right thing".
Unfortunately (reporters please please pretty please pay attention), it's not that simple.
There are no innocent actors here.
This particular administration has always approached AI regulation in a very "we will say we are hands off but actually we are not but it's really about who's in favor and who's not that decides how we will act" way. Trying to retrofit logical policy actions onto that is hard, and this case is no different. The administration seems to operate its AI policy on some mix of favoritism, pique, and vengeance, and so it's hard to reconcile this reaction with the complete silence when (say) Grok was churning out CSAM and deepfake nonconsensual porn on demand while also being used within the department of war defense. For more on the internal incoherence of the administration's approach to AI, see Justin Hendrix's great analysis.
Anthropic is the "hero" of the moment, because their seeming adversary is the "bad guy" for so many in tech policy. But on the eve of the UFC fight on the WH lawn, keep in mind that these are all actors, and there's an audience. Anthropic is about to go public and make an insane amount of money for some people. It's in their interest to say "oh yeah our models are SCARY (good) and the best out there" and also say at the same time "Yeah your jailbreak is not that scary and we are fine and can release our systems". I don't doubt that there are people at Anthropic who genuinely believe such things, but Anthropic is a corporation (not a "lab") and is in the business of market control and profit.
Specifically, it is entirely possible that Fable is both a great improvement on Opus, and can do some questionable things better, and is also susceptible to the same jailbreaks and vulnerabilities as other models. It's possible it's not some special unicorn that is so dangerous we all have to trust in Anthropic's good intentions, but just the next incarnation of a product with many of the same weaknesses. We just don't know because Anthropic won't say, and won't actually allow for independent testing separately from the folks they want to give access to.
Part IV: So what should we do?
This episode doesn't change many of the things we understand already about the contours of AI policy. And in fact it's dangerous to overindex on one episode - that tends to leads to a whack-a-mole approach to doing AI regulation that has been harmful in other settings.
1. We need to regulate the downstream risks and harms that come from the introduction of AI.
All this nonsense around "but but innovation" needs to stop. You can tell an argument is not very useful when it's been used over and over again for virtually every single sector of society over the past century, including all the currently regulated sectors that we don't want to loosen regulations on.
We need to do this 10 years ago. And we need to do this now. The AI industry is not some delicate hothouse flower that needs nurturing. It's a robust trillion dollar enterprise that's reshaping our world and will do so without our say so.
2. It's more effective to focus sector by sector.
Cybersecurity risks are concrete risks that we can evaluate in a focused way. And we can make use of the infrastructure and policy around cybersecurity to do so. Will this exact framework work for (say) threats to the electrical grid? probably not, and so we need a different "vertical" for understanding, evaluating, and mitigating risks in that sector. And so on.
3. You don't need to focus only on the tech: focus on the ecosystem of actors and safeguards currently in place
There's a lot of concern around the use of AI in medicine, and in the financial sector. But these are both heavily regulated sectors where there are already checks in place to make sure that the systems function as we want them to. Are they perfect? no. But it's easier to tweak an existing system of safeguards. Maybe AI is used to generate a new drug: but such a drug will need to go through regular clinical trials with real people (not synthetic!) in order to be put on the market. So focus on where AI might be compromising an existing system of governance, rather than assuming we need to regulate the model itself.
4. Testing testing testing (independently)
To really assess the risks associated with the introduction of AI in different sectors, we need ... testing. Independent testing - not whatever blog posts the labs companies put out. But focused testing on specific issues, rather than general "capability testing". And we need to build and support the infrastructure for that. This is already too long to go on a rant about the decimation of the scientific research apparatus in the US courtesy of the administration, but yes, the decimation of the scientific research apparatus in the US will have a direct effect on our ability to test for risks and harms, and has to be part of any policy directions we explore.
No comments:
Post a Comment