Saturday, June 13, 2026

The "fable" of Anthropic and the USG

News moves fast. 12 hours ago I was enjoying the demolition that the US put on Paraguay when I heard that Anthropic had shut down access to Fable and Mythos (their latest and most powerful models). 

Since then, more news has surfaced about what went down, and I feel like it's a good exercise in understanding both the policy and psychodrama around AI today  - with maybe even a moral or two like Aesop's .... FABLES (yes I'm going to keep making bad jokes and no you can't stop me)

Part 1: The event

Let's first lay out the facts of the matter. To the best of my knowledge, here's what transpired. 

  1. Some researchers (apparently at Amazon) uncovered ways to jailbreak Fable to (possibly) perform cybersecurity-related attacks. 
  2. Someone (apparently Andy Jassy) told the White House (or the Treasury Secretary) about this. 
  3. The WH and Anthropic had a back and forth on what needed to be done about this: Anthropic claimed these were not serious jalbreaks, and the WH said that they were and that Anthropic needed to either take down the model or something...
  4. The WH then invoked export controls to demand that Anthropic block access to Fable/Mythos for foreign nationals (regardless of where they happen to be)
  5. Anthropic blocked access entirely, arguing that they had no way of distinguishing foreign nationals from American citizens. 
Now most of the reporting will focus on solidifying the facts of the matter (I hope) and will probably also focus on the drama. Drama is fun (don't get me wrong), but it can make thinking about policy really hard. 

So let me lay out some of the questions about the drama that might be useful to have answered, but then try to focus on the bigger policy questions that come out of this. 
  1. What were these mysterious jailbreaks? This is actually an important question that will shape the policy response as well. 
  2. How were these jailbreaks flagged and sent up the chain, and why was the communication of the form "hey I called my buddy at the WH and told him stuff"? 
  3. What actually transpired in the discussion between the WH and Anthropic? 

Part II: Clean-slate Policy

Let's pretend we are working in a vacuum for a second, and think about this with policy hats on, without worrying about the actual players (unrealistic I know, but a useful exercise). 

The US Government is worried about powerful models allowing any user to generate (say) cybersecurity hacks that can compromise critical national infrastructure (for e.g the financial sector which is why the Treasury Secretary is paying close attention). These models are general purpose and have many uses, and the USG doesn't just want to shut them down entirely (we can debate that, but not right now). 

What they'd like to do is have some way to monitor models for specific kinds of risks, before deployment, and also on a continuing basis. Maybe there's some kind of voluntary program where providers of powerful models give access to independent testers (for eg. some kind of Center for AI Security Standards and Innovation) who can identify risks, communicate these risks to the companies involved, and make sure that mitigations are put in place. It wouldn't be perfect, but it would be an ongoing process. 

If this sounds familiar, it should. because a) it's how we do cybersecurity right now without any government involvement and b) it's a little bit of how the recent WH EO was constructed (there were other parts of the EO that are problematic, but again, not for now)

In other words, there's a way to do what the government wants if this is indeed what they want and companies are willing to cooperate (this is setting aside whether you and I want the government to do this. That's a different discussion)

Part III: (we know) Drama

Well. that's all well and good. But I don't unfortunately live in a rationalist universe where I can write 20,000 word screeds on moreright.com and be "aligned" with everyone else. What's the reality here? 

The first thing I want to emphasize is that drama loves a good guy (yes "guy") and a bad guy, and it's really tempting to first decide who's the bad guy and then decide the other one must be good. It would be really tempting to say for e.g "the Trump administration has no clue on AI and therefore Anthropic is the good guy", or "Tech companies are evillll and the administration is therefore doing the right thing". 

Unfortunately (reporters please please pretty please pay attention), it's not that simple. 

There are no innocent actors here. 

This particular administration has always approached AI regulation in a very "we will say we are hands off but actually we are not but it's really about who's in favor and who's not that decides how we will act" way. Trying to retrofit logical policy actions onto that is hard, and this case is no different. The administration seems to operate its AI policy on some mix of favoritism, pique, and vengeance, and so it's hard to reconcile this reaction with the complete silence when (say) Grok was churning out CSAM and deepfake nonconsensual porn on demand while also being used within the department of war defense. For more on the internal incoherence of the administration's approach to AI, see Justin Hendrix's great analysis

Anthropic is the "hero" of the moment, because their seeming adversary is the "bad guy" for so many in tech policy. But on the eve of the UFC fight on the WH lawn, keep in mind that these are all actors, and there's an audience. Anthropic is about to go public and make an insane amount of money for some people. It's in their interest to say "oh yeah our models are SCARY (good) and the best out there" and also say at the same time "Yeah your jailbreak is not that scary and we are fine and can release our systems". I don't doubt that there are people at Anthropic who genuinely believe such things, but Anthropic is a corporation (not a "lab") and is in the business of market control and profit. 

Specifically, it is entirely possible that Fable is both a great improvement on Opus, and can do some questionable things better, and is also susceptible to the same jailbreaks and vulnerabilities as other models. It's possible it's not some special unicorn that is so dangerous we all have to trust in Anthropic's good intentions, but just the next incarnation of a product with many of the same weaknesses. We just don't know because Anthropic won't say, and won't actually allow for independent testing separately from the folks they want to give access to. 

Part IV: So what should we do? 

This episode doesn't change many of the things we understand already about the contours of AI policy. And in fact it's dangerous to overindex on one episode - that tends to leads to a whack-a-mole approach to doing AI regulation that has been harmful in other settings

1. We need to regulate the downstream risks and harms that come from the introduction of AI. 

All this nonsense around "but but innovation" needs to stop. You can tell an argument is not very useful when it's been used over and over again for virtually every single sector of society over the past century, including all the currently regulated sectors that we don't want to loosen regulations on. 

We need to do this 10 years ago. And we need to do this now. The AI industry is not some delicate hothouse flower that needs nurturing. It's a robust trillion dollar enterprise that's reshaping our world and will do so without our say so. 

2. It's more effective to focus sector by sector. 

Cybersecurity risks are concrete risks that we can evaluate in a focused way. And we can make use of the infrastructure and policy around cybersecurity to do so. Will this exact framework work for (say) threats to the electrical grid? probably not, and so we need a different "vertical" for understanding, evaluating, and mitigating risks in that sector. And so on. 

3. You don't need to focus only on the tech: focus on the ecosystem of actors and safeguards currently in place

There's a lot of concern around the use of AI in medicine, and in the financial sector. But these are both heavily regulated sectors where there are already checks in place to make sure that the systems function as we want them to. Are they perfect? no. But it's easier to tweak an existing system of safeguards. Maybe AI is used to generate a new drug: but such a drug will need to go through regular clinical trials with real people (not synthetic!) in order to be put on the market. So focus on where AI might be compromising an existing system of governance, rather than assuming we need to regulate the model itself. 

4. Testing testing testing (independently)
To really assess the risks associated with the introduction of AI in different sectors, we need ... testing. Independent testing - not whatever blog posts the labs companies put out. But focused testing on specific issues, rather than general "capability testing". And we need to build and support the infrastructure for that. This is already too long to go on a rant about the decimation of the scientific research apparatus in the US courtesy of the administration, but yes, the decimation of the scientific research apparatus in the US will have a direct effect on our ability to test for risks and harms, and has to be part of any policy directions we explore. 

Wednesday, May 20, 2026

The unit distances problem

 OpenAI just announced that ChatGPT has disproved a conjecture about one of Erdos's most famous problems: the unit distance problem.

This problem is personal to me: I spent a good chunk of time during my Ph.D mulling over it, and it's what hooked me into computational geometry. Like most of Erdos's problems, it's really easy to state

Let P be a set of n points in the plane (amen). What is the maximum number of unit distances that can be achieved? 

(the amen is a running joke in my old field of computational geometry) 

Note that this is really about duplicate distances (because if you get a bunch of pairs of points at distance d, you can scale the point set so that d = 1). It's also trivial to see that the maximum is at least n-1 (just put points evenly spaced along the number line), and that the number can't be more than n-choose-2 (because that's the number of pairs). 

So what's the real number? Erdos showed using a fairly complicated construction that you can get a set of points that has $n^{1 + 1/\log\log n}$ unit distances. On the other side of it, a famous result from 1983 by Spencer, Szemederi and Trotter showed that you can't get more than $O(n^{4/3})$ distances. 

Erdos himself used a really elegant but weaker argument to show that you can't get more than $O(n^{3/2})$ distances. And the argument was cool and deceptive (in retrospect). To get a distance pair of 1, a circle of size 1 around a point must touch another point (and vice versa). Draw an edge between those two points when this happens. So here's a fun fact: you can't create a situation where two points are connected by edges to each of three other points. Or to be more formal, you can't create a situation where there's a $K_{2,3}$ bipartite graph hidden inside the set of edges. By a well known result in graph theory this means this graph can't have too many edges in it (because if it did you'd eventually find one of these special graphs): specifically no more than $O(n^{3/2})$ edges. 

So there's a limit. But what's the real limit? A long standing conjecture was that you could NOT get anything nontrivially more than n pairs. Specifically that you couldn't get $\Omega(n^{1 + \epsilon})$ pairs for any $\epsilon$. This is frustrating because the gap between this, and $n^{4/3}$ is huge. 

Turns out this conjecture was wrong. And ChatGPT proved it, by building a complicated generalization of Erdos's construction that is indeed of size $\Omega(n^{1 + \epsilon})$ for some $\epsilon > 0$. 

This was a tantalizing, infuriating, and beautiful problem that has resisted progress for a very long time, and touches on some very deep concepts in mathematics. It's really impressive that an AI system has provided a proof for it. For more on the significance of the result and some interpretation of the proof technique, check out the companion article

Thursday, May 07, 2026

The AAAI 2026 AI review experiment

 AAAI did an experiment this year where they supplemented human reviews with AI-generated reviews and solicited feedback from authors and the review hierarchy about the process. They've now written up the experiment

The paper isn't too long, and I'd encourage you to read the whole thing (or, I don't know, put it into notebookLM and make a podcast out of it!). Some interesting points stood out to me as I read the report. 

The complexity of the process

The process of architecting the AI review was not the cartoonish "hey ChatGPT review this paper for me". It was carefully structured to focus on specific elements of the review (content, readability, evaluation, setup, etc). The system had what is now standard: a second LLM that acted as a critic and was not told where the review came from, and a third LLM that has to integrate the analysis of the critic and the original review into a final review. I've heard plenty of cases where this architecture does better than just getting the review or even just having a judge. 

To be clear, the critic was only doing a 'meta review'. It didn't have access to the original paper, so its goal was mostly structural/formal: does the review have all the elements and does it avoid things like accidental author reveal, or obnoxious comments etc. 

One thing that wasn't clear from the article was how exactly the LLM was checking code, experiments, theorems and proofs, "using the code interpreter as needed". I'd want to see more details about that seemingly agentic handoff. 

The perception of the results

There's a pretty dramatic signal in the survey results (and the number of responses was decent). AI-generated reviews were viewed as better than human generated reviews along six of nine categories. Where humans did better was on not nitpicking, identifying technical errors, and providing useful suggestions, but where AI reviews did better included being thorough and providing useful suggestions for improvement (which reminds of https://www.refine.ink/)

It was interesting to see that almost across the board, authors were more enthusiastic about the AI reviews than the reviewer hierarchy. If I'm being my burnt-out AC persona, I'd say this is because authors are likely grateful to get any kind of thorough review of their paper, and man do human reviews of papers suck. 

The human-AI interaction

The survey had free form responses that were interesting from a qualitative perspective. I think this is where the report fell down a bit, because I suspect there's a rich trove of analysis to do on the assessments that people wrote in. A couple of highlighted examples though brought home the important point that perhaps the best use of these AI reviews is before submission itself, kind of like what STOC 2026 did in their experiment. Because the AI reviews are great at identifying lots of small things that a friendly pre-submission review might miss, but they don't have the same kind of judgement and taste that a person has. 

A minor notes:

  • The process cost less than $1/paper, for 30,000 submissions. That's not a bad amount to spend. But you have to wonder why reviewers can't get compensation for their work but OpenAI gets paid. 

Sunday, June 22, 2025

Paper links from my keynote at FAccT

 it's always difficult to keep track of the papers a speaker mentions when they're giving a talk. I'm delivering a keynote at FAccT 2025, and so thought I'd make a list of paper references for easy access to anyone interested. Note that some of my papers are not yet publicly available. I'll make them accessible as soon as they are, and you can find the link here when they are. 

  1. FATML 2016 closing panel
  2. FATML 2014
  3. AI Bill of Rights; the NIST AI Risk Management Framework
  4. Biden administration memos on AI
    1. Executive order 14110
    2. OMB Memo M-24-10
    3. National Security Memo on AI
  5. CNTR AISLE framework
  6. Remarks by the Catholic Church and Pope Leo XIV (one, two)
  7. Explainer on Sociotechnical AI policy
  8. Fairness and Abstractions in Sociotechnical Systems
  9. Participatory AI
  10. Measurement and Fairness
  11. Framework for undersstanding sources of harm throughout the machine learning lifecycle
  12. Explanations in artificial intelligence: insights from the social sciences
  13. DOGE and Veterans Affairs Contracts.
  14. Distinguishing Predictive and Generative AI in Regulation (coming soon)
  15. Sovereignty as a Service (coming soon)
  16. Evaluation Science
  17. Position paper on evaluating genAI.
  18. Multi-lingual functional evaluation (coming soon)
  19. Data and DOGE panel at Brown
  20. MIT Tech Review article on Amsterdam deployment of AI.
  21. Red-Teaming AI Policy
  22. Better proxy estimation
  23. Genetic data governance
  24. Audit trails (coming soon)
  25. CNTR website (and tech policy summer school)


Friday, March 07, 2025

Standing up for Science

 It's been forever since I've written a blog post. Twitter, and then X, and then Bluesky, has absorbed most of my hot takes. But I think more and more that it's time to move away from transient thoughts to things that are more well formed, and so I'm going to try and blog a bit more again. 

Mar 7, 2025 was Stand Up For Science Day. I was invited to speak at the Rhode Island local event. It was a freezing cold day in front of the Rhode Island State House in Providence, RI. With encouragement from my students, we did a little "teach-in" on campus first to lay out some of the history of federal funding in the US (going back to Vannevar Bush and Endless Frontiers), and why some of the new administration moves were so radical and dangerous. 

Then a bunch of us walked over to the State House for the rally. There was a good crowd there inspite of the bitter wind - by my estimate it was over 100 and perhaps close to 200. Lots of fantastic placards, including this one: 


And then it was time for me to speak. I've never spoken at a rally before, and it took a good amount of preparation (and much more trepidation) to generate my 3 minutes of remarks. The crowd was very encouraging, cheering every time I paused, and that helped a lot :). 

Here's what I said. 

Monday, May 17, 2021

Transitions

 I've been at the U of Utah and Salt Lake City for 14 years (14.5 really). It was my first academic job and the longest time I've spent anywhere (throughout my whole life). So it's a little hard to accept that I'm moving to my next adventure. 

It's a two-part adventure, because why make one move when you can make two. 

Firstly, as of today, I'm going to working with Alondra Nelson at the White House Office of Science and Technology Policy, advising on matters relating to fairness and bias in tech systems. This is a scary and exciting new position, and I hope to help to nudge things along just a bit further in the direction of tech that can help more than it harms, especially for those who've been left behind in our rush to an algorithmically controlled future. 

Secondly, I'm moving to Brown University to join the CS department there as well as their Data Science Initiative. Together with Seny Kamara and others, I'm going to start a new center on Computing for the People, to help think through what it means to do computer science that truly responds to the needs of people, instead of hiding behind a neutrality that merely gives more power to those already in power. 

Lots of changes, and because of the pandemic, all this will happen in slow machine, but it's a whirlwind of emotions (and new clothes - apparently tech conference T-shirts don't work in formal settings - WHO KNEW!!!). 


Friday, December 25, 2020

Lars Arge.

 Not a post I'd have wanted to make on Christmas day, but that's how it goes sometimes. 

Lars Arge just passed away, on Dec 23. For those of us who've been following his battles with cancer, this might not come as a total shock, but there was always hope, and that's no longer an option. 

It's hard to imagine this in 2020, but there was a time not that long ago (at least in my mind) when "big data" wasn't really a thing. Companies were acquiring lots of data, and "GIGA byte" was a thing, but there was no real appreciation of the computational challenge associated with big data. 

A paper by Aggarwal and Vitter in 1998 made the first step towards changing that, introducing the external memory model as a way to think about computations when you have memory access that are cheap (in RAM) and expensive (on disk). 

It's a diabolically simple model: all main memory access is free, and any disk access costs 1 unit (but you can get a block of data of size B for that one unit of access). It's not meant to be realistic, but like the best computational models, it's meant to isolate the key operations that are expensive so that we can study how algorithm design needs to change. 

Lars was one of the foremost algorithm designers for this new world of external memory. His Ph.D thesis laid out ideas for how to build data structures that are external memory efficient, and his research over the next many decades, in true Tarjan/Hopcroft form, built the fundamental structures and concepts one would need to even think about efficient algorithm design, with many clever ideas around batching queries, processing data in main memory to prepare for queries, and streaming access to disk when appropriate. 

Formal algorithmic models are often misunderstood. They look simplistic, miss many of the details that seem relevant in practice, and appear to encourage theoretical game playing divorced from reality. But a formal model at its best does its work invisibly. It shifts the way we think about a framework. It fosters the design of new paradigms for efficient algorithms, and it allows us to layer optimizations on that move a system from theory to practice without ever having to compromise the underlying design principles.

Lars was a force of nature in this area. I first remember meeting him in 1998 at AT&T Labs when I was interning and he was visiting there. He had boundless energy for this space, and seemingly wanted to turn everything into an external memory algorithm, whether it was geometry, data structures, or even the most basic algorithms like sorting. His intuition was the best kind of algorithmic intuition: build up the core primitives, and the rest would follow. 

And this is exactly what happened. The field exploded. For a while, "big data algorithms" WERE external memory algorithms. There was no other way to even talk about big data. And that spawned even more models. Streaming algorithms were inspired by external memory and the realization that a one pass stream was an effective way to work with large data. Cache-oblivious algorithms asked about what would happen if we took the same two-part hierarchy with main memory and disk and extended it to the cache. Semi-external memory models asked how we might modify the base model for graph computations. The MapReduce framework from the early 2000s generalized the external memory model to handle newer kinds of streaming/memory-limited architectures, in turn to be followed by Spark and so many other models. 

I'd go as far as to say this: all of the conceptual developments we see today in big data computations at some level can be traced back to work on external memory algorithms, and that was driven by Lars (and his collaborators). 

It wasn't just the papers he wrote. Lars was a leader in shaping the field. Early in the 2000s he moved back from Duke University to Aarhus University, and from there started to build what would become one of the foremost institutes for thinking about big data, first as a BRICS center and then as the appropriately named MADALGO Institute. 

Many of us who had anything to do with big data visited MADALGO at some point in our careers. I spent one of the best summers of my life being hosted by him during my sabbatical - my children still remember that summer we spent in Aarhus and wish we could go back each year. He instinctively knew that the best way to foster the area was to facilitate a generation of researchers who would bring their own ideas to Aarhus, mix and exchange them,  and then go away and share them with the world. 

And he wasn't merely content with that. He wanted to demonstrate the power of his perspective beyond just the realm of academia. He started a company SCALGO that applied the principles of external memory algorithms (and so much more) to help with modeling geospatial data. I remember distinctly him telling me the first time he demonstrated SCALGO products in a forum with other companies doing GIS work and how the performance of their system blew the other products out of the water. For someone (at the time) deeply embedded in the theory of computer science, I was astounded and encouraged by this validation of formal thinking. 

Lars was a giant in our field (his email address was always large@..., and this worked more appropriately than one would ever dream of). But he was also a giant both in real life and in his personality. He was the warmest, most fun person to be around. He seemed almost ego-free, and often downplayed his own accomplishments, claiming that his main talent was hanging around with smarter people. He was extremely generous with his time and resources (which is why so many of us were able to visit Aarhus and benefit from being at MADALGO)

He was the life of any party -- I still remember when he hosted the Symposium on Computational Geometry in Denmark. It felt like we were at a post-battle Viking celebration (and yes he got up on a table and shouted "SKÃ…L" over and over again while an actual pig was roasting on a spit nearby). I remember him taking me to a Denmark-Sweden soccer game and warning me not to wear anything with blue on it. I remember us going for go-kart racing and his stream of trash talking. 

Lars was the entire package: a great person, a great researcher, a visionary leader, and a canny entrepreneur. I will miss him greatly. 

Thursday, April 11, 2019

New conference announcement

Martin Farach-Colton asked me to mention this, which is definitely NOT a pox on computer systems. 
ACM-SIAM Algorithmic Principles of Computer Systems (APoCS20) 
https://www.siam.org/Conferences/CM/Main/apocs20January 8, 2020
Hilton Salt Lake City Center, Salt Lake City, Utah, USA
Colocated with SODA, SOSA, and Alenex 
The First ACM-SIAM APoCS is sponsored by SIAM SIAG/ACDA and ACM SIGACT. 
Important Dates:  
        August 9: Abstract Submission and Paper Registration Deadline
August 16: Full Paper Deadline
October 4: Decision Announcement 
Program Chair: Bruce Maggs, Duke University and Akamai Technologies 
Submissions: Contributed papers are sought in all areas of algorithms and architectures that offer insight into the performance and design of computer systems.  Topics of interest include, but are not limited to algorithms and data structures for: 

  • Databases
  • Compilers
  • Emerging Architectures
  • Energy Efficient Computing
  • High-performance Computing
  • Management of Massive Data
  • Networks, including Mobile, Ad-Hoc and Sensor Networks
  • Operating Systems
  • Parallel and Distributed Systems
  • Storage Systems

A submission must report original research that has not previously or is not concurrently being published. Manuscripts must not exceed twelve (12) single-spaced double-column pages, in addition the bibliography and any pages containing only figures.  Submission must be self-contained, and any extra details may be submitted in a clearly marked appendix. 
Steering Committee: 

  • Michael Bender
  • Guy Blelloch
  • Jennifer Chayes
  • Martin Farach-Colton (Chair)
  • Charles Leiserson
  • Don Porter
  • Jennifer Rexford
  • Margo Seltzer

Tuesday, March 26, 2019

On PC submissions at SODA 2020

SODA 2020 (in SLC!!) is experimenting with a new submission guideline: PC members will be allowed to submit papers. I had a conversation about this with Shuchi Chawla (the PC chair) and she was kind enough (thanks Shuchi!) to share the guidelines she's provided to PC members about how this will work.


SODA is allowing PC members (but not the PC chair) to submit papers this year. To preserve the integrity of the review process, we will handle PC member submissions as follows. 
1. PC members are required to declare a conflict for papers that overlap in content with their own submissions (in addition to other CoI situations). These will be treated as hard conflicts. If necessary, in particular if we don't have enough confidence in our evaluation of a paper, PC members will be asked to comment on papers they have a hard conflict with. However, they will not have a say in the final outcome for such papers.  
2. PC submissions will receive 4 reviews instead of just 3. This is so that we have more confidence on our evaluation and ultimate decision. 
3. We will make early accept/reject decisions on PC members submissions, that is, before we start considering "borderline" papers and worrying about the total number of papers accepted. This is because the later phases of discussion are when subjectivity and bias tend to creep in the most. 
4. In order to be accepted, PC member submissions must receive no ratings below "weak accept" and must receive at least two out of four ratings of "accept" or above.  
5. PC member submissions will not be eligible for the best paper award.

My understanding is that this was done to solve the problem of not being able to get people to agree to be on the PC - this year's PC has substantially more members than prior years.

And yet....

Given all the discussion about conflicts of interest, implicit bias, and double blind review, this appears to be a bizarrely retrograde move, and in fact one that sends a very loud message that issues of implicit bias aren't really viewed as a problem. As one of my colleagues put it sarcastically when I described the new plan:

"why don't they just cut out the reviews and accept all PC submissions to start with?"
and as another colleague pointed out:

"It's mostly ridiculous that they seem to be tying themselves in knots trying to figure out how to resolve COIs when there's a really easy solution that they're willfully ignoring..."

Some of the arguments I've been hearing in support of this policy frankly make no sense to me.

First of all, the idea that a more heightened scrutiny of PC papers can alleviate the bias associated with reviewing papers of your colleagues goes against basically all of what we know about implicit bias in reviewing. The most basic tenet of human judgement is that we are very bad at filtering our own biases and this only makes it worse. The one thing that theory conferences (compared to other venues) had going for them regarding issues of bias was that PC members couldn't submit papers, but now....

Another claim I've heard is that the scale of SODA makes double blind review difficult. It's hard to hear this claim without bursting out into hysterical laughter (and from the reaction of the people I mentioned this to, I'm not the only one).  Conferences that manage with double blind review (and PC submissions btw) are at least an order of magnitude bigger (think of all the ML conferences). Most conference software (including easy chair) is capable of managing the conflicts of interest without too much trouble. Given that SODA (and theory conferences in general) are less familiar with this process, I’ve recommended in the past that there be a “workflow chair” whose job it is to manage the unfamiliarity associated with dealing the software. Workflow chairs are common at bigger conferences that typically deal with 1000s of reviewers and conflicts.

Further, as a colleague points out, what one should really be doing is "aligning nomenclature and systems with other fields: call current PC as SPC or Area Chairs, or your favorite nomenclature, and add other folks as reviewers. This way you (i) get a list of all conflicts entered into the system, and (ii) recognize the work that the reviewers are doing more officially as labeling the PC members. "


Changes in format (and culture) take time, and I'm still hopeful that the SODA organizing team  will take a lesson from ESA 2019  (and their own resolution to look at DB review more carefully that was passed a year or so ago) and consider exploring DB review. But this year's model is certainly not going to help.

Update: Steve Blackburn outlines how PLDI handles PC submissions (in brief, double blind + external review committee)

Update: Michael Ekstrand takes on the question that Thomas Steinke asks in the comments below: "How is double blind review different from fairness-through-blindness?".

Tuesday, February 19, 2019

OpenAI, AI threats, and norm-building for responsible (data) science

All of twitter is .... atwitter?... over the OpenAI announcement and partial non-release of code/documentation for a language model that purports to generate realistic-sounding text from simple prompts. The system actually addresses many NLP tasks, but the one that's drawing the most attention is the deepfakes-like generation of plausible news copy (here's one sample).

Most consternation is over the rapid PR buzz around the announcement, including somewhat breathless headlines (that OpenAI is not responsible for) like

OpenAI built a text generator so good, it’s considered too dangerous to release
or
Researchers, scared by their own work, hold back “deepfakes for text” AI
There are concerns that OpenAI is overhyping solid but incremental work, that they're disingenuously allowing for overhyped coverage in the way they released the information, or worse that they're deliberately controlling hype as a publicity stunt.

I have nothing useful to add to the discussion above: indeed, see posts by Anima Anandkumar, Rob MunroZachary Lipton  and Ryan Lowe for a comprehensive discussion of the issues relating to OpenAI.  Jack Clark from OpenAI has been engaging in a lot of twitter discussion on this as well.

But what I do want to talk about is the larger issues around responsible science that this kerfuffle brings up. Caveat, as Margaret Mitchell puts it in this searing thread.

To understand the kind of "norm-building" that needs to happen here, let's look at two related domains.

In computer security, there's a fairly well-established model for finding weaknesses in systems. An exploit is discovered, the vulnerable entity is given a chance to fix it, and then the exploit is revealed , often simultaneously with patches that rectify it. Sometimes the vulnerability isn't easily fixed (see Meltdown and Spectre). But it's still announced.

A defining characteristic of security exploits is that they are targeted, specific and usually suggest a direct patch. The harms might be theoretical, but are still considered with as much seriousness as the exploit warrants.

Let's switch to a different domain: biology. Starting from the sequencing of the human genome through the million-person precision medicine project to CRISPR and cloning babies, genetic manipulation has provided both invaluable technology for curing disease as well as grave ethical concerns about misuse of the technology. And professional organizations as well as the NIH have (sometimes slowly) risen to the challenge of articulating norms around the use and misuse of such technology.

Here, the harms are often more diffuse, and the harms are harder to separate from the benefits. But the harm articulation is often focused on the individual patient, especially given the shadow of abuse that darkens the history of medicine.

The harms with various forms of AI/ML technology are myriad and diffuse. They can cause structural damage to society - in the concerns over bias, the ways in which automation affects labor, the way in which fake news can erode trust and a common frame of truth, and so many others - and they can cause direct harm to individuals. And the scale at which these harms can happen is immense.

So where are the professional groups, the experts in thinking about the risks of democratization of ML, and all the folks concerned about the harms associated with AI tech? Why don't we have the equivalent of the Asilomar conference on recombinant DNA?

I appreciate that OpenAI has at least raised the issue of thinking through the ethical ramifications of releasing technology. But as the furore over their decision has shown, no single imperfect actor can really claim to be setting the guidelines for ethical technology release, and "starting the conversation" doesn't count when (again as Margaret Mitchell points out) these kinds of discussions have been going on in different settings for many years already.

Ryan Lowe suggests workshops at major machine learning conferences. That's not a bad idea. But it will attract the people who go to machine learning conferences. It won't bring in the journalists, the people getting SWAT'd (and one case killed) by fake news, the women being harassed by trolls online with deep-fake porn images. 

News is driven by news cycles. Maybe OpenAI's announcement will lead to us thinking more about issues of responsible data science. But let's not pretend these are new, or haven't been studied for a long time, or need to have a discussion "started".


Monday, January 28, 2019

FAT* Session 2: Systems and Measurement.

Building systems that have fairness properties and monitoring systems that do A/B testing on us.

Session 2 of FAT*: my opinionated summary.

Sunday, January 27, 2019

FAT* blogging

I'll be blogging about each session of papers from the FAT* Conference. So as not to clutter your feed, the posts will be housed at the fairness blog that I co-write along with Sorelle Friedler and Carlos Scheidegger.

The first post is on Session 1: Framing and Abstraction.

Thursday, December 20, 2018

The theoryCS blog aggregator REBORN

(will all those absent today please email me)

(if you can't hear me in the back, raise your hand)

The theoryCS blog aggregator is back up and running at its new location -- cstheory-feed.org -- which of course you can't know unless you're subscribed to the new feed, which....

More seriously, we've announced this on the cstheory twitter feed as well, so feel free to repost this and spread the word so that all the theorists living in caves plotting their ICML, COLT and ICALP submissions will get the word. 

Who's this royal "we"? Arnab Bhattacharyya and myself (well mostly Arnab :)). 

For anyone interested in the arcana of how the sausage (SoCG?) gets made, read on: 

Arvind Narayanan had set up an aggregator based on the Planet Venus software for feed aggregation (itself based on python packages for parsing feeds). The two-step process for publishing the aggregator works as follows:
  1. Run the software to generate the list of feed items and associated pages from a configuration file containing the list of blogs
  2. Push all the generated content to the hosting server. 
Right now, both Arnab and I have git access to the software and config files and can edit the config to update blogs etc. The generator is run once an hour and the results are pushed to the new server. 

So if you have updates or additions, either of us can make the changes and they should be reflected fairly soon on the main page. The easiest way to verify this is to wait a few hours, reload the page and see if your changes have appeared. 

The code is run off a server that Arnab controls and both of us have access to the domain registry. I say this in the interest of transparency (PLUG!!) but also so that if things go wonky as they did earlier, the community knows who to reach. 

Separately, I've been pleasantly surprised at the level of concern and anxiety over the feed -- mainly because it shows what a valuable community resource the feed is and that I'm glad to be one of the curators. 

If you've read this far, then you really are interested in the nitty gritty, and so if you'd like to volunteer to help out, let us know. It would be useful for e.g to have a volunteer in Europe so that we have different time zones covered when things break. And maybe our central Politburo (err. I mean the committee to advance TCS) might also have some thoughts, especially in regard to their mission item #3:
To promote TCS to and increase dialog with other research communities, including facilitating and coordinating the development of materials that educate the general scientific community and general public about TCS.

Thursday, December 06, 2018

The theoryCS aggregator

As you all might now, the cstheory blog aggregator is currently down. Many people have been wondering what's going on and when it will be back up so here's a short summary.

The aggregator has been thus far maintained by Arvind Narayanan who deserves a HUGE thanks for setting up the aggregator, lots of custom code and the linked twitter account. Arvind has been planning to hand it over and the domain going down was a good motivator for him to do that.

Currently I have all the code that is used to generate the feed, as well as control over the twitter feed. Arnab Bhattacharyya has kindly volunteered to be the co-manager of the aggregator. What remains to be done now is

  • set up a new location to run the aggregator code from
  • set up hosting for the website
  • link this to the twitter account. 
None of these seem too difficult and the main bottleneck is merely having Arnab and I put together a few hours of work to get this all organized (we have a domain registered already). We hope to have it done fairly soon so you can all get back to reading papers and blogs again. 

Saturday, November 24, 2018

Should credit scores be used for determining residency?

It's both exhilarating and frustrating when you see the warnings in papers you write play out in practice. Case in point, the proposal by DHS to use credit scores to ascertain whether someone should be granted legal residence.

Josh Lauer at Slate does a nice analysis of the proposal and I'll extract some relevant bits for commentary. First up: what does the proposal call for? (emphasis mine)
The new rule, contained in a proposal signed by DHS Secretary Kirstjen Nielsen, is designed to help immigration officers identify applicants likely to become a “public charge”—that is, a person primarily dependent on government assistance for food, housing, or medical care. According to the proposal, credit scores and other financial records (including credit reports, the comprehensive individual files from which credit scores are generated) would be reviewed to predict an applicant’s chances of “self-sufficiency.”
So what's the problem with this? What we're seeing is an example of the portability trap (from our upcoming FAT* paper). Specifically, scores designed in a different context (for deciding who to give loans to) are being used in this context (to determine self-sufficiency). Why is this a problem?
Unfortunately, this is not what traditional credit scores measure. They are specialized algorithms designed for one purpose: to predict future bill-paying delinquencies, for any reason. This includes late payments or defaults caused by insurmountable medical debts, job loss, and divorce—three leading causes of personal bankruptcy—as well as overspending and poor money management.
That is, the reason the portability trap is a problem is because you're using one predictor to train another system. And if you're trying to make any estimations about the validity of the resulting process, then you have to know whether the thing you're observing (in this case the credit score) has any relation to the thing you're trying to observe (the construct of "self-sufficiency"). And this is something we harp on a lot in our paper on axiomatic considerations of fairness (and ML in general)

And in this case there's a clear disconnect:
Credit scores do not predict whether an individual will become a public charge. And they do not predict financial self-sufficiency. They are only useful in this context if one believes credit scores reveal something about a person’s character. In other words, if one believes that people with low credit scores are moochers and malingerers. Given the Trump administration’s hostility toward (brown-skinned) immigrants, this conflation of credit scores and morality is not surprising.
And this is a core defining principle of our work: that beliefs about the world control how we choose our representations and learning procedures: the procedures cannot be justified except in the context of the beliefs that underpin them. 

I think that if you read anything I've written, it will be clear where I stand on the normative question of whether this is a good idea (tl;dr: NOT). But as a researcher, it's important to lay out a principled reason for why, and this sadly merely confirms that our work is on the right track.


Friday, November 02, 2018

What do I work on ?

So, what do you work on? 

As questions go, this is one of the most rudimentary. It's the conference equivalent of "Nice weather we're having", or "How about them Broncos!". It's a throat-clearer, designed to start a conversation in an easy non-controversial way. 

And yet I'm always having to calculate and calibrate my answers. There's a visible pause, a hesitation as I quickly look through my internal catalog of problems and decide which one I'll pull out. On the outside, the hesitation seems strange: as if I don't quite know what I work on, or if I don't know how to explain it. 

It's an occupational hazard that comes from living on the edge of many different areas. I go to data mining conferences, machine learning conferences, theory/geometry conferences, and (now) conferences on ethics, society and algorithms. And in each place I have a different circle of people I know, and a different answer to the question

So, what do you work on?  

It makes me uncomfortable, even though it shouldn't. I feel like I can only share a part of my research identity because otherwise my answer will make no sense or (worse!) seem like I'm trying to impress people with incomprehensible words. 

I don't doubt that most people share some form of this feeling. As researchers, none of us are one-dimensional, and most of us work on many different problems at a time. Probably the easiest answer to the question is the problem that one has most recently worked on. But I sense that my case is a little unusual: not the breadth per se, but the range of topics (and styles of problem solving) that I dabble in. 

So, what do you work on? 

I often joke that my research area is a random walk through computer science and beyond. I started off in geometry, dabbled with GPUs (alas, before they were popular), found my way into information theory and geometry (and some differential geometry), slipped down the rabbit hole into data mining, machine learning, and a brief side foray into deep learning, and then built a nice little cottage in algorithmic fairness, where I spend more time talking to social scientists and lawyers than computer scientists.

Being an academic nomad has its virtues: I don't really get bored with my work. But it also feels like I'm always starting from square one with my learning and that there are always people who know way more about every topic than I do. And my academic roamings seem to mirror my actual nomadic status. I'm a foreigner in a land that gets stranger and less familiar by the day, and the longest time I've spent in any location is the place I'm in right now.



So, what do you work on? 

Maybe, in a way that's so American, "What do you work on" is really a question of "Who are you" in the way we bind together our work and our identity. When my students come and ask me what they should work on, what they're really asking me is to tell them what their research identity is, and my answer usually is, "whatever you want it to be right now". It's a frustrating answer no doubt, but I feel that it lowers the import of the question to a manageable level. 

So, what DO you work on?

I do algorithmic fairness, and think about the ethics of automated decision-making. I bring an algorithmic (and geometric) sensibility to these questions. I'm an amateur computational philosopher, a bias detective, an ML-translator for lawyers and policy folk, and my heart still sings when I see a beautiful lemma. 


Monday, October 22, 2018

On teaching ethics to tech companies

Kara Swisher (who is unafraid to call it like it is!) has a new op-ed in the NYT titled "Who will teach Silicon Valley to be ethical". She asks
How can an industry that, unlike other business sectors, persistently promotes itself as doing good, learn to do that in reality? Do you want to not do harm, or do you want to do good? These are two totally different things. 
And how do you put an official ethical system in place without it seeming like you’re telling everyone how to behave? Who gets to decide those rules anyway, setting a moral path for the industry and — considering tech companies’ enormous power — the world.

There are things that puzzle me about this entire discussion about ethics and tech. It seems like an interesting idea for tech companies to incorporate ethical thinking into their operations. Those of us who work in this space are clamoring for more ethics education for budding technologists.

There is of course the cynical view that this is merely window dressing to make it look like Big Tech (is that a phrase now?) cares without actually having to change their practices.

But let's put that aside for a minute. Suppose we assume that indeed tech companies are (in some shape of form) concerned about the effects of technology on society and that their leaders do want to do something about it.

What I really don't understand is the idea that we should teach Silicon Valley to be ethical. This seems to play into the overarching narrative that tech companies are trying to do good in the world and slip up because they're not adults yet -- a problem that can be resolved by education that will allow them to be good "citizens" with upstanding moral values.

This seems rather ridiculous. When chemical companies were dumping pesticides on the land by the ton and Rachel Carson wrote Silent Spring, we didn't shake our heads sorrowfully at companies and sent them moral philosophers. We founded the EPA!

When the milk we drink was being adulterated with borax and formaldehyde and all kinds of other horrific additives that Deborah Blum documents so scarily in her new book 'The Poison Squad', we didn't shake our heads sorrowfully at food vendors and ask them to grow up. We passed a law that led eventually to the formation of the FDA.

Tech companies are companies. They are not moral agents, or even immoral agents. They are amoral profit-maximizing vehicles for their shareholders (and this is not even a criticism). Companies are supposed to make money, and do it well. Facebook's stock price didn't slip when it was discovered how their systems had been manipulated for propaganda. It slipped when they proposed changes to their newsfeed ratings mechanisms to address these issues.

It makes no sense to rely on tech companies to police themselves, and to his credit, Brad Smith of Microsoft made exactly this point in a recent post on face recognition systems. Regulation, policing and whatever else we might imagine, has to come from the outside. While I don't claim that regulation mechanisms all work as they are currently conceived, the very idea of checks and balances seems more robust than merely hoping that tech companies will get their act together on their own.

Don't get me wrong. It's not even clear what has to be regulated here. Unlike with poisoned food or toxic chemicals, it's not clear how to handle poisonous speech or toxic propaganda. And that's a real discussion we need to have.

But let's not buy into Silicon Valley's internal hype about "doing good". Even Google has dropped its "Don't be evil" credo.

Thursday, October 11, 2018

Google's analysis of the dilemma of free speech vs hate speech

Breitbart just acquired a leaked copy of an internal google doc taking a cold hard look at the problems of free speech, fake news and censorship in the current era. I wrote a tweet storm about it, but also wanted to preserve it here because tweets, once off the TL, cease to exist.

Breitbart acquired an internal google doc discussing the misinformation landscape that the world finds itself in now: https://www.scribd.com/document/390521673/The-Good-Censor-GOOGLE-LEAK#from_embed … 
I almost wish that Google had put out this document to read in public. It's a well thought out exploration of the challenges faced by all of us in dealing with information dissemination, fake news, censorship and the like. And to my surprise, it (mostly) is willing to point figures backwards at Google and other tech companies for their role in it. (although there are some glaring omissions like the building of the new censored search tool in China). It's not surprising that people inside Google are thinking carefully about these issues, even as they flail around in public. And the analysis is comprehensive without attempting to provide glib solutions

Obviously, since this is a doc generated within Google, the space of solutions is circumscribed to those that have tech as a major player. For e.g the idea of publicly run social media isn't really on the table, or even better ways to decentralize value assignment for news, or alternate models for search that don't require a business model. But with those caveats in mind, the analysis of the problems is reasonable.

Monday, October 08, 2018

A new sexual harassment policy for TCS conferences.

One of my most visited posts is the anonymous post by a theoryCS colleague describing her own #metoo moments inside the TCS conference circuit. It was a brutal and horrific story to read.

Concurrently (I don't know if the blog post had an effect, but one can but hope it helped push things along), a committee was set up under the auspices of TCMF (FOCS), ACM, SIAM, and EATCS to
Draft a proposal for joint ToC measures to combat discrimination, harassment, bullying, and retaliation, and all matters of ethics that might relate to that.
That committee has now completed its work, and a final report is available. The report was also endorsed at the FOCS business meeting this week. The report is short, and you should read it. The main takeaways/recommendations are that every conference should
  • adopt a code of conduct and post it clearly. 
  • recruit and train a group of advocates to provide confidential support to those facing problems at a conference
  • have mechanisms for authors to declare a conflict of interest without needing to be openly specific about the reasons. 
There are many useful references in the report, as well as more concrete suggestions about how to implement the above recommendations. This committee was put together fast, and generated a very useful report quickly. Well done!

Disqus for The Geomblog