2/ Gemini’s performance and comparison to others models are inconsistently presented (w/ 0-shot, X-shot, variable-shot, CoT
@32
…) across 32 benchmarks that are not equally robust, fairly limited in terms of intelligence evaluation, and mostly obscure to a non-tech audience.
8/n