Language of instruction

Instructions in English often outperform native language due to training data mix. But: instructions in target language reduce accidental English leaks in output.

Advertisement

Cross-lingual retrieval

Multilingual embeddings (BGE-M3, e5-mistral) enable query-in-language-A + docs-in-language-B. Native retrieval beats translate-then-retrieve for many pairs.

Advertisement

Translation as pivot

Complex reasoning: translate to English, reason, translate back. Loses nuance. Modern GPT-4/Claude do fine without pivot in most language pairs.