Saturday, June 13, 2026

The "fable" of Anthropic and the USG

News moves fast. 12 hours ago I was enjoying the demolition that the US put on Paraguay when I heard that Anthropic had shut down access to Fable and Mythos (their latest and most powerful models). 

Since then, more news has surfaced about what went down, and I feel like it's a good exercise in understanding both the policy and psychodrama around AI today  - with maybe even a moral or two like Aesop's .... FABLES (yes I'm going to keep making bad jokes and no you can't stop me)

Part 1: The event

Let's first lay out the facts of the matter. To the best of my knowledge, here's what transpired. 

  1. Some researchers (apparently at Amazon) uncovered ways to jailbreak Fable to (possibly) perform cybersecurity-related attacks. 
  2. Someone (apparently Andy Jassy) told the White House (or the Treasury Secretary) about this. 
  3. The WH and Anthropic had a back and forth on what needed to be done about this: Anthropic claimed these were not serious jalbreaks, and the WH said that they were and that Anthropic needed to either take down the model or something...
  4. The WH then invoked export controls to demand that Anthropic block access to Fable/Mythos for foreign nationals (regardless of where they happen to be)
  5. Anthropic blocked access entirely, arguing that they had no way of distinguishing foreign nationals from American citizens. 
Now most of the reporting will focus on solidifying the facts of the matter (I hope) and will probably also focus on the drama. Drama is fun (don't get me wrong), but it can make thinking about policy really hard. 

So let me lay out some of the questions about the drama that might be useful to have answered, but then try to focus on the bigger policy questions that come out of this. 
  1. What were these mysterious jailbreaks? This is actually an important question that will shape the policy response as well. 
  2. How were these jailbreaks flagged and sent up the chain, and why was the communication of the form "hey I called my buddy at the WH and told him stuff"? 
  3. What actually transpired in the discussion between the WH and Anthropic? 

Part II: Clean-slate Policy

Let's pretend we are working in a vacuum for a second, and think about this with policy hats on, without worrying about the actual players (unrealistic I know, but a useful exercise). 

The US Government is worried about powerful models allowing any user to generate (say) cybersecurity hacks that can compromise critical national infrastructure (for e.g the financial sector which is why the Treasury Secretary is paying close attention). These models are general purpose and have many uses, and the USG doesn't just want to shut them down entirely (we can debate that, but not right now). 

What they'd like to do is have some way to monitor models for specific kinds of risks, before deployment, and also on a continuing basis. Maybe there's some kind of voluntary program where providers of powerful models give access to independent testers (for eg. some kind of Center for AI Security Standards and Innovation) who can identify risks, communicate these risks to the companies involved, and make sure that mitigations are put in place. It wouldn't be perfect, but it would be an ongoing process. 

If this sounds familiar, it should. because a) it's how we do cybersecurity right now without any government involvement and b) it's a little bit of how the recent WH EO was constructed (there were other parts of the EO that are problematic, but again, not for now)

In other words, there's a way to do what the government wants if this is indeed what they want and companies are willing to cooperate (this is setting aside whether you and I want the government to do this. That's a different discussion)

Part III: (we know) Drama

Well. that's all well and good. But I don't unfortunately live in a rationalist universe where I can write 20,000 word screeds on moreright.com and be "aligned" with everyone else. What's the reality here? 

The first thing I want to emphasize is that drama loves a good guy (yes "guy") and a bad guy, and it's really tempting to first decide who's the bad guy and then decide the other one must be good. It would be really tempting to say for e.g "the Trump administration has no clue on AI and therefore Anthropic is the good guy", or "Tech companies are evillll and the administration is therefore doing the right thing". 

Unfortunately (reporters please please pretty please pay attention), it's not that simple. 

There are no innocent actors here. 

This particular administration has always approached AI regulation in a very "we will say we are hands off but actually we are not but it's really about who's in favor and who's not that decides how we will act" way. Trying to retrofit logical policy actions onto that is hard, and this case is no different. The administration seems to operate its AI policy on some mix of favoritism, pique, and vengeance, and so it's hard to reconcile this reaction with the complete silence when (say) Grok was churning out CSAM and deepfake nonconsensual porn on demand while also being used within the department of war defense. For more on the internal incoherence of the administration's approach to AI, see Justin Hendrix's great analysis

Anthropic is the "hero" of the moment, because their seeming adversary is the "bad guy" for so many in tech policy. But on the eve of the UFC fight on the WH lawn, keep in mind that these are all actors, and there's an audience. Anthropic is about to go public and make an insane amount of money for some people. It's in their interest to say "oh yeah our models are SCARY (good) and the best out there" and also say at the same time "Yeah your jailbreak is not that scary and we are fine and can release our systems". I don't doubt that there are people at Anthropic who genuinely believe such things, but Anthropic is a corporation (not a "lab") and is in the business of market control and profit. 

Specifically, it is entirely possible that Fable is both a great improvement on Opus, and can do some questionable things better, and is also susceptible to the same jailbreaks and vulnerabilities as other models. It's possible it's not some special unicorn that is so dangerous we all have to trust in Anthropic's good intentions, but just the next incarnation of a product with many of the same weaknesses. We just don't know because Anthropic won't say, and won't actually allow for independent testing separately from the folks they want to give access to. 

Part IV: So what should we do? 

This episode doesn't change many of the things we understand already about the contours of AI policy. And in fact it's dangerous to overindex on one episode - that tends to leads to a whack-a-mole approach to doing AI regulation that has been harmful in other settings

1. We need to regulate the downstream risks and harms that come from the introduction of AI. 

All this nonsense around "but but innovation" needs to stop. You can tell an argument is not very useful when it's been used over and over again for virtually every single sector of society over the past century, including all the currently regulated sectors that we don't want to loosen regulations on. 

We need to do this 10 years ago. And we need to do this now. The AI industry is not some delicate hothouse flower that needs nurturing. It's a robust trillion dollar enterprise that's reshaping our world and will do so without our say so. 

2. It's more effective to focus sector by sector. 

Cybersecurity risks are concrete risks that we can evaluate in a focused way. And we can make use of the infrastructure and policy around cybersecurity to do so. Will this exact framework work for (say) threats to the electrical grid? probably not, and so we need a different "vertical" for understanding, evaluating, and mitigating risks in that sector. And so on. 

3. You don't need to focus only on the tech: focus on the ecosystem of actors and safeguards currently in place

There's a lot of concern around the use of AI in medicine, and in the financial sector. But these are both heavily regulated sectors where there are already checks in place to make sure that the systems function as we want them to. Are they perfect? no. But it's easier to tweak an existing system of safeguards. Maybe AI is used to generate a new drug: but such a drug will need to go through regular clinical trials with real people (not synthetic!) in order to be put on the market. So focus on where AI might be compromising an existing system of governance, rather than assuming we need to regulate the model itself. 

4. Testing testing testing (independently)
To really assess the risks associated with the introduction of AI in different sectors, we need ... testing. Independent testing - not whatever blog posts the labs companies put out. But focused testing on specific issues, rather than general "capability testing". And we need to build and support the infrastructure for that. This is already too long to go on a rant about the decimation of the scientific research apparatus in the US courtesy of the administration, but yes, the decimation of the scientific research apparatus in the US will have a direct effect on our ability to test for risks and harms, and has to be part of any policy directions we explore. 

Wednesday, May 20, 2026

The unit distances problem

 OpenAI just announced that ChatGPT has disproved a conjecture about one of Erdos's most famous problems: the unit distance problem.

This problem is personal to me: I spent a good chunk of time during my Ph.D mulling over it, and it's what hooked me into computational geometry. Like most of Erdos's problems, it's really easy to state

Let P be a set of n points in the plane (amen). What is the maximum number of unit distances that can be achieved? 

(the amen is a running joke in my old field of computational geometry) 

Note that this is really about duplicate distances (because if you get a bunch of pairs of points at distance d, you can scale the point set so that d = 1). It's also trivial to see that the maximum is at least n-1 (just put points evenly spaced along the number line), and that the number can't be more than n-choose-2 (because that's the number of pairs). 

So what's the real number? Erdos showed using a fairly complicated construction that you can get a set of points that has $n^{1 + 1/\log\log n}$ unit distances. On the other side of it, a famous result from 1983 by Spencer, Szemederi and Trotter showed that you can't get more than $O(n^{4/3})$ distances. 

Erdos himself used a really elegant but weaker argument to show that you can't get more than $O(n^{3/2})$ distances. And the argument was cool and deceptive (in retrospect). To get a distance pair of 1, a circle of size 1 around a point must touch another point (and vice versa). Draw an edge between those two points when this happens. So here's a fun fact: you can't create a situation where two points are connected by edges to each of three other points. Or to be more formal, you can't create a situation where there's a $K_{2,3}$ bipartite graph hidden inside the set of edges. By a well known result in graph theory this means this graph can't have too many edges in it (because if it did you'd eventually find one of these special graphs): specifically no more than $O(n^{3/2})$ edges. 

So there's a limit. But what's the real limit? A long standing conjecture was that you could NOT get anything nontrivially more than n pairs. Specifically that you couldn't get $\Omega(n^{1 + \epsilon})$ pairs for any $\epsilon$. This is frustrating because the gap between this, and $n^{4/3}$ is huge. 

Turns out this conjecture was wrong. And ChatGPT proved it, by building a complicated generalization of Erdos's construction that is indeed of size $\Omega(n^{1 + \epsilon})$ for some $\epsilon > 0$. 

This was a tantalizing, infuriating, and beautiful problem that has resisted progress for a very long time, and touches on some very deep concepts in mathematics. It's really impressive that an AI system has provided a proof for it. For more on the significance of the result and some interpretation of the proof technique, check out the companion article

Thursday, May 07, 2026

The AAAI 2026 AI review experiment

 AAAI did an experiment this year where they supplemented human reviews with AI-generated reviews and solicited feedback from authors and the review hierarchy about the process. They've now written up the experiment

The paper isn't too long, and I'd encourage you to read the whole thing (or, I don't know, put it into notebookLM and make a podcast out of it!). Some interesting points stood out to me as I read the report. 

The complexity of the process

The process of architecting the AI review was not the cartoonish "hey ChatGPT review this paper for me". It was carefully structured to focus on specific elements of the review (content, readability, evaluation, setup, etc). The system had what is now standard: a second LLM that acted as a critic and was not told where the review came from, and a third LLM that has to integrate the analysis of the critic and the original review into a final review. I've heard plenty of cases where this architecture does better than just getting the review or even just having a judge. 

To be clear, the critic was only doing a 'meta review'. It didn't have access to the original paper, so its goal was mostly structural/formal: does the review have all the elements and does it avoid things like accidental author reveal, or obnoxious comments etc. 

One thing that wasn't clear from the article was how exactly the LLM was checking code, experiments, theorems and proofs, "using the code interpreter as needed". I'd want to see more details about that seemingly agentic handoff. 

The perception of the results

There's a pretty dramatic signal in the survey results (and the number of responses was decent). AI-generated reviews were viewed as better than human generated reviews along six of nine categories. Where humans did better was on not nitpicking, identifying technical errors, and providing useful suggestions, but where AI reviews did better included being thorough and providing useful suggestions for improvement (which reminds of https://www.refine.ink/)

It was interesting to see that almost across the board, authors were more enthusiastic about the AI reviews than the reviewer hierarchy. If I'm being my burnt-out AC persona, I'd say this is because authors are likely grateful to get any kind of thorough review of their paper, and man do human reviews of papers suck. 

The human-AI interaction

The survey had free form responses that were interesting from a qualitative perspective. I think this is where the report fell down a bit, because I suspect there's a rich trove of analysis to do on the assessments that people wrote in. A couple of highlighted examples though brought home the important point that perhaps the best use of these AI reviews is before submission itself, kind of like what STOC 2026 did in their experiment. Because the AI reviews are great at identifying lots of small things that a friendly pre-submission review might miss, but they don't have the same kind of judgement and taste that a person has. 

A minor notes:

  • The process cost less than $1/paper, for 30,000 submissions. That's not a bad amount to spend. But you have to wonder why reviewers can't get compensation for their work but OpenAI gets paid. 

Disqus for The Geomblog