Taste = saying things that are obvious
Taste = saying things that are obvious. Geoffrey Irving, who may be the highest taste person I interact with semi-regularly, has repeatedly said that he does not consider the takes he says to be special, he just says things that seem obvious to him. And yet these takes for me are pretty high value, and often change how I think about things. It might be that I am insufficiently clear-sighted, or that I am overhyping those takes because I think Geoffrey is very smart. But I think that is unlikely; on more than a few projects, I have said that I think the direction wasn't really what I wanted to pursue, and it ended up being the important thing anyway. For example, in LURE, Igor and I considered Geoffrey's idea of replaying production data cheaply to do eval awareness, but instead we first tried it from the other direction of doing a pairwise comparison of agentic transcripts; we found that replay worked really well after a few months of iterating on the other idea. In misalignment quarantining, Geoffrey suggested first to do a principled search over the design space of a constitution/declarative midtraining document—we went with the persona thing and in my estimation, insufficiently sketched out the design space. Now we have some pretty good results, but they're a bit hard to explain/confusing to interpret mechanistically; in hindsight we would be less fuzzy about things if we had sketched out the design space first. “Language is Enough” is a google doc written by Geoffrey so influential that it was mentioned by name in Demis Hassabis' autobiography, which argues that language alone is a sufficiently rich environment to get to AGI: we are seeing the truth of that play out before our eyes. So it seems clear to me that I am not being obviously wrong or clouded, nor are my collaborators, and indeed Geoffrey has taste. But Geoffrey feels that his takes are not special.
How to square these things? My guess is that saying something that is obvious is not easy! it requires you to work through the world in your head, and square background assumptions. This is how research should be, but typically it is not like that because you have erroneous assumptions. There is something about the crispness of the language used to describe the thing you want to say, there is something about observing recurring research themes (e.g., in language is enough, Geoffrey writes “a portfolio strategy is crucial,” now the alignment team has indeed written up a portfolio strategy) and retaining an open disposition such that you work on great problems. But some of this is being humble/brave enough to say the thing that is obvious to you even if it seems not so special, because you carry very different worlds of tacit knowledge to your interlocutor, and to distill that is a great and useful thing, as the only interdisciplinary conversations worth having are those that go on inside a single head. So, I will, every now and then (perhaps inkhaven-ish) make a deliberate effort to write non-experimental, non-theory, meta-research things about things that are obvious to me. There are bits that are cruxy to me that I think people will disagree with, and maybe those disagreements will be productive.