Mixture of Experts: How AI Models Scale Without Going Broke
A visual deep-dive into sparse activation, gating networks, and the architecture powering modern large language models
GenAI@Adobe | Rephrase AI | CMU | IIT BHU
A visual deep-dive into sparse activation, gating networks, and the architecture powering modern large language models
I recently went down an optimizer rabbit hole. It started innocently — I wanted to build one of those classic Alec Radford-style contour visualizations where you watch SGD, Adam, and friends race toward a minimum. By the end, I’d built two completely different demos, read a dozen papers, and learned that the most important new optimizer in deep learning literally cannot show its advantage on the toy problems we’ve been using for a decade.
A step-by-step walkthrough of Denoising Diffusion Probabilistic Models (DDPMs) — the algorithm powering Stable Diffusion, DALL·E, and Imagen — implemented in ~300 lines of PyTorch.
If you’re reading this, you’ve probably noticed the last post on this blog is dated somewhere around 2017–2018. A whole different era. A different version of me.
As a part of my first project at CMU I was asked to verify the perks of using the implementaion of Yang et al. RIDI: Robust IMU Double Integration for robust localisation. The method might look mathematically too involved as we look at the paper for the very first time, however, the work is revolutionary in the aspect that it uses IMU data, which in comparison to vision based inputs comsume negligible battery and resources and can be processed faster as well!
After my sophomore year I started working on sketches! Sketches are very different from images. Let’s take an example to understand this.
A description of the depth network that we worked on in Adelaide!
I did my internship under Prof. Ian D Reid at University of Adelaide, I was supposed to work on a ConvNet architecture that can give us the depth and the pose at the same time! So in essence a full-fledged vSLAM system, using a Deep Learning framework. Should be possible, isn’t it? Just imagine when you were a kid you were not taught geometry and concepts of optics and vision to move around, you just started moving around, bumping, falling and eventually learning how to walk! Can we train a DeepNet with a similar idea?
Imagine yourself designing a web page on your IPad using an Apple Pencil, and within seconds based on the design that you have made your web page is created! Well, that seems to be a moon shot. However, researchers have started taking steps towards this phantasmagorical dream. OpenAI, in their requests for research, has put down a project that I feel could be considered as the first ‘baby’ step towards it.
“As I entered the room I saw there was a green apple kept on the table…”
When I started working on Deep Learning and Computer Vision I had one of the finest mentors Ravi Kiran Sarvadevabhtla from IISc Bangalore, who helped me around in getting a kick start in this field by making a log that consists of the relevant links that helped me go through and learn various concepts with ease. They are in essence related links from multiple blogs that are present on the internet that helps you skim through specific articles and topics from a particular source. Hope they help you even :)
Well, according to me I have started understanding the titbits of the corporate world aka the industrial sector. The first thing that I analyzed after coming to the industry was the hierarchical structure that exists within the organization.
Have you guys seen the movie Momento by Christopher Nolan…No worries! Let me explain you the idea. The protagonist in the film suffers from a syndrome referred to as short-term memory loss. He cannot make new memories!