Thumbnails do more for video engagement than most algorithmic improvements. Generating them at scale (millions/day) requires careful frame extraction, smart-crop, and CDN strategy — and most teams underinvest until quality regression incidents force the conversation.

Advertisement

Frame extraction strategy

Don't extract from the first frame (often black/intro). Sample at 10%, 25%, 50%, 75% of duration; score by ML model (face detection, aesthetic score); pick top. ~50ms/video on GPU with ffmpeg + ML.

Smart-crop for variable aspect ratios

Source is 16:9; need 1:1 for feed, 9:16 for mobile, 21:9 for hero. Use ML saliency detection to crop around faces/main subject. Tools: open_clip + saliency model, AWS Rekognition, GCP Vision.

Advertisement

CDN delivery and cache

Generate WebP + AVIF for modern clients, JPEG fallback. Serve from CDN with long cache, version in URL. Lazy-generate on first request, async-warm popular videos. ~95% cache hit rate sustainable.

Sample → score → crop → multi-format → CDN with long cache. Most teams underinvest until A/B testing shows the impact.