Khalid Yusuf Dahir
Independent researcher on multilingual AI safety and low-resource NLP, studying transformer-based large language models from the linear-algebra layer up. Building the corpora, tokenizers, and safety benchmarks Somali language models will be measured against. Previously architected Somalia's first national Electronic Health Record system, now serving 100+ clinics.
Research
- SomaliWeb v1 · arXiv:2605.18232 · quality-filtered Somali corpus + BPE-16K tokenizer + Somali LID benchmark
- multilingual-safety-probe · Llama-3 refusal-rate gradient across 5 languages, ρ=0.97
- somaliweb-v1 dataset · 819k documents, ~303M tokens on Hugging Face
Writing
- 2026 / 05The Cliff Below Which Safety Training Vanishes
- 2026 / 04The Mental Maps That Made ML Click
- 2026 / 04Seven Chapters Deep: What Linear Algebra Looks Like From the Inside
- 2026 / 03What Doing Linear Algebra by Hand Showed Me About Transformers
- 2026 / 03The Journey from Software Developer to AI Researcher
Elsewhere