Software & Systems Engineer
I work with systems programming, real-time computing, and simulation engineering. My background spans Linux, low-level development in C and Python, and debugging latency-critical systems where determinism and microseconds count. Currently exploring applied AI and LLM integration.
Reach me at bryan@ramos.codes
Some notes from maintaining a TurboQuant llama.cpp fork and testing whether hot MoE experts on the GPU could make a huge local model practical.
The machine I built for local AI work, why I built it, and why good enough hardware changes how I use these tools.
Cloud AI is useful, but I want more of my day-to-day AI workflow on infrastructure I control.