I suppose if we couldn’t laugh at things that don’t make sense, we couldn’t react to a lot of life.
Bill WattersonWe walk through the architecture, dataflow, and some design decisions of the vLLM library.
1Elegant local language model inference with ollama
11 March 2025
We investigate the internals of ollama, a widely-adored local LLM inference management platform built atop ggml.
2Understanding ggml, from the ground up
23 February 2025
On-device, low-latency language model inference have become increasingly important in designing production systems. Here, we dive deep into one leading framework for performant inference.
3Learning Undirected Graphical Models
06 September 2020
Undirected graphical models formed a large part of the initial push for machine intelligence, and remain relevant today. Here, I motivate and derive Monte Carlo-based learning algorithms for such models.
415 August 2020
We motivate and derive the generalized backpropagation algorithm for arbitrarily structured networks.
5Grokking Fully Convolutional Networks
16 August 2020
We discuss the fundamental ideas behind fully convolutional networks, including the transformation of fully connected layers to convolutional layers and upsampling via transposed convolutions ("deconvolutions").
6Learning Convolutional Networks
06 July 2020
We motivate and derive the backpropagation learning algorithm for convolutional networks.
705 July 2020
We motivate and derive the backpropagation learning algorithm for feedforward networks.
8On Ken Thompson's "Reflections on Trusting Trust"
01 July 2020
A detailed look at one of my favorite software security papers, and its implications on bootstrapping trust.
9Learning Directed Latent Variable Models
23 June 2020
Directed latent variable models provide a powerful way to represent complex distributions by combining simple ones. However, they often have intractable log-likelihoods, yielding complicated learning algorithms. In this post, we build intuition for these concepts.
10