Eval awareness about being eval aware (EA)?
Eval awareness about being eval aware (EA)? Playing around with LLMs, it seems that they guess a lot of the time that they are in an evaluation to specifically test eval awareness (I will call this EA eval from now on). I think this type of second-level conditioning might elicit a separate response from just being in an evaluation. Might spend some time shaping out how I would (1) measure that and (2) contrast against other controls. My guess is that it seems really dependent on the most recent question, and that asking about whether or not the model thinks it's in an eval primes it to think it's in an EA eval.