Podcast Lesson
"Optimize for inference cost when users massively outnumber trainers Meta made the unusual decision to train Llama 3 on far more data than is theoretically optimal for a single training run, because their serving scale inverts normal priorities: "our ratio of inference compute required to training is probably much higher than most other companies that are doing this stuff just because the sheer volume of the community that we're serving." Even with the 70-billion model, "by the end it was still learning" — they stopped not because the model stopped improving but because they needed to move on. Whenever you deploy something to millions of users, optimize for the cost of running it at scale, not just the cost of building it. Source: Mark Zuckerberg, Dwarkesh Patel Podcast, Llama 3, Meta AI, Future of AI"
Dwarkesh Podcast
Dwarkesh Patel
"Mark Zuckerberg — Llama 3, $10B models, Caesar Augustus, & 1 GW datacenters"
⏱ 24:00 into the episode
Why This Lesson Matters
This insight from Dwarkesh Podcast represents one of the core ideas explored in "Mark Zuckerberg — Llama 3, $10B models, Caesar Augustus, & 1 GW datacenters". Artificial Intelligence & Technology podcasts consistently surface lessons that are immediately applicable — and this one is no exception. The timestamp link below takes you directly to the moment this was said, so you can hear it in context.