August 20, 2025

Thoughts on the monofact rate

Having some thoughts about the monofact rate where the number of facts that appear exactly once empirically correlate to the amount of hallucinated facts in LLMs. Post training which reduces hallucination rates seems to increase calibration error.

Why I like this (or more accurately why this is twigging my research sense):

Weird thing in data related to alignment relevant property in model.
Theory in 2024 (Kalai and Vempala 2024) was empirically validated in 2025 (Miao and Kearns). Wow! Theory of impact much?
Seems to not be much (or any) attention on this other than the two papers I cited?
Seems to have some impactful/tractable research directions
- Hallucinations can be bad. We want models to tell the truth if we are using them for auto-alignment stuff. They suggest an intervention to inject facts, which seems like we could improve on.
- Maybe we want models to hallucinate? One wishful thought is to seed models with tons of monofacts about X harmful capability so that it's totally untrustworthy for users and make fine-tuning harder.
  - Monofact poisoning?
- Tells us about certain engineering practices in the data pipeline like deduplication.
- The empirical validation could use some work. They work on either (basically toy) classical n-gram models or a 220M seq-to-seq parameter model from 2020. Try bigger!

Related work

Zucchet et al 2025 seems related.

← Back to Notes and Posts