justbytecode

justbytecode

ML Engineer focused on LLM inference optimization and production systems.

• Built 700M hybrid LLM (Mamba + Transformer) from scratch
• 2.4x faster CPU inference via speculative decoding
• FlashAttention-2 CUDA kernels (2.1x throughput)
• Contributor to vLLM

Specializing in:

Open to freelance work (LLM optimization, deployment, custom AI systems).