> Transformers are the dominant architecture in AI, yet why they work remains po...

Mithriil · 2026-03-25T03:09:47 1774408187

Bayesian network is a really general concept. It applies to all multidimensional probability distribution. It's a graph that encodes independence between variables. Ish.

I have not taken the time to review the paper, but if the claim stands, it means we might have another tool to our toolbox to better understand transformers.