Yoga LLM & Yoga VLM: A blog-post series

A guide for developers on getting started with small-scale LLM & multimodal LM projects

2 min readJun 21, 2024

Over the last few weekends, I have been working on a pet project to understand the full pipeline of fine-tuning LLMs and multimodal LLMs. Given my affinity towards Yoga and AI, I couldn’t have thought of a better idea than to create a custom LLM and VLM (vision-language model) for Yoga. In this blog series, I will walk through the process of creating a dataset from scratch, fine-tuning your LLM, and releasing it on the HuggingFace hub.

For this project, I will be using Google’s suite of open-source models, Gemma (Gemma 2B and PaliGemma). This is because apart from its size Gemma has a tokenizer that allows multimodal support, allowing us to add support for various languages easily. Below you can find the list of blogposts in this series. You can read them in order, or pick and choose the one you would like to read depending on your needs. Alright, enjoy!

Yoga-LLM : Dataset Creation (Using publicly available and creating synthetic data to train your LLM)
Yoga LLM : Instruction Fine-tuning
Yoga-LLM : Extending the LLM to a new language
Yoga VLM : Multimodal Dataset creation
Yoga VLM : Multimodal Instruction Fine-tuning

Stay tuned to this page, as I will keep adding links to each of the blogposts in the series as they are released!

Yoga LLM & Yoga VLM: A blog-post series

A guide for developers on getting started with small-scale LLM & multimodal LM projects

Written by Vijayasri Iyer