Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would like to counteract your statement that each token adds a distraction.

In our experiments, we see a surprising benefit to rewriting blocks to use more tokens, especially long lists etc..

E.g. compare these two options

"The following conditions are excluded from your contract - condition A - condition B ... - condition Z"

The next one works better for us:

"The following conditions are excluded from your contract - condition A is excluded - condition B is excluded ... - condition Z is excluded"

And we now have scripts to rewrite long documents like this, explicitly adding more tokens. Would you have any opinion on this?



This observation makes sense, because all models currently probably use some kind of a sparse attention architecture.

So the closer the two related pieces of information are to each other in the input context, the larger the chance their relationship will be preserved.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: