On a challenging multi-hop Q&A benchmark released on
@huggingface
, we observe that in comparison to RAG / long-context / finetuning, self-taught models have better closed-book multi-document reasoning, less forgetting, and better in-context reasoning.