Understanding LLM System with 3-layer Abstraction

Performance optimization of LLM systems requires a thorough understanding of the full software stack. Somehow I couldn’t find a comprehensive article that covers the big picture yet, so instead of waiting for one, I decided to write this article. This article is not a comprehensive review or best practice guide, but rather a sharing of my overall perspective on the current LLM system landscape.

Develop Hardware-Efficient AI without being a Hardware Expert

Disclaimer: This blog was originally published on OmniML’s website in November 2022. Since the release of ChatGPT, ML industry has greatly shifted priorities and a lot of the previous assumptions have changed. Neverthless, I still kept the blog for historical reference to the pre-LLM-era view of ML systems.

Understand Autograd - A Bottom-up Tutorial

You may have wondered how autograd actually works in imperative programming. In this post, I am going to explain it with hand-by-hand examples. Unlike other tutorials, this post is not borrowing one single line of codes from PyTorch or MXNet, but instead building everything from scratch.