Optimize for inference cost at user scale Meta made the unusual decision to tra — Dwarkesh Podcast

🤖

Podcast Lesson

"Optimize for inference cost at user scale Meta made the unusual decision to train Llama 3 on far more data than is theoretically optimal for a single training run, because their serving scale inverts normal priorities: "our ratio of inference compute required to training is probably much higher than most other companies that are doing this stuff just because the sheer volume of the community that we're serving." Even with the 70-billion model, "by the end it was still learning" — they stopped not because the model stopped improving but because they needed to move on. Whenever you deploy something to millions of users, optimize for the cost of running it at scale, not just the cost of building it. Source: Mark Zuckerberg, Dwarkesh Patel Podcast, Llama 3, Meta AI, Future of AI"

🎙️

Dwarkesh Podcast

Dwarkesh Patel

"Mark Zuckerberg — Llama 3, $10B models, Caesar Augustus, & 1 GW datacenters"

⏱ 24:00 into the episode

Why This Lesson Matters

This insight from Dwarkesh Podcast represents one of the core ideas explored in "Mark Zuckerberg — Llama 3, $10B models, Caesar Augustus, & 1 GW datacenters". Artificial Intelligence & Technology podcasts consistently surface lessons that are immediately applicable — and this one is no exception. The timestamp link below takes you directly to the moment this was said, so you can hear it in context.

More Artificial Intelligence & Technology Lessons →

Why This Lesson Matters

More Lessons from Dwarkesh Podcast

Unlock 1,000+ More Lessons Like This

Related Artificial Intelligence & Technology Lessons