Adarsh is an MSc student in Data Science at Alliance University. Expected graduation in June 2026.
My long-term research goal is to advance trustworthy machine learning methods for healthcare. I am particularly interested in machine learning, computer vision, natural language processing, and generative models, with applications to medical imaging and clinical decision support.
I am seeking PhD opportunities in Machine Learning and Healthcare applications starting Fall 2026. If you are interested in my research, collaborations, or potential opportunities, feel free to contact me.
Email / Medium / LinkedIn / GitHub / Google Scholar
Machine Learning Research Intern @ I2CS Research Group, IIITK (May 2025 - Present)
Carried out supervised research in the field of Vision-Language Models, focusing on enhancing multimodal alignment.
Engineered and trained a custom Vision-Language Model (VLM) by integrating a fine-tuned CLIP vision encoder with a 7B-parameter LLM.
Implemented a novel architecture from recent literature to enhance the vision-language bridge, improving model performance.
Benchmarked the custom VLM’s performance against established vision-language baselines on key metrics.
Authored key sections of the research manuscript for a publication, including the literature review, architectural description of the models, and comparative performance analysis.
A Comparative Analysis of Generative Models for Medical Image Augmentation (Ongoing)
Leading an independent research project to investigate the performance of modern generative models and the impact of pre-processing for augmenting dermatological datasets and improving diagnostic classifier performance.
Implementing and benchmarking state-of-the-art generative architectures, including StyleGAN-based models and Denoising Diffusion Probabilistic Models (DDPMs), to synthesize high-fidelity 256x256 medical images.
Evaluating the quality and clinical utility of synthetic images through both quantitative metrics (e.g., FID) and the performance improvement of a downstream ResNet-based classifier.
An End-to-End Sign Language Translation Pipeline from Static Gestures to English Using T5
N A Adarsh Pritam, Asha Kurian. Proceedings of the International Conference on Emerging Technologies in Computing and Communication (ETCC 2025) [PDF] [Publication]
Proposed an end-to-end ASL-to-English translation system that integrates vision-based gesture recognition (MediaPipe, Teachable Machine) with a fine-tuned FLAN-T5 model for natural language generation.
Achieved substantial improvements in translation quality (BLEU: 0.74, ROUGE-L: 0.89), producing fluent and contextually coherent English sentences.
Delivered a lightweight, modular prototype suitable for deployment on consumer hardware, highlighting its potential as an assistive communication tool in education and healthcare settings.
Built-from-Scratch Personalized Conversational-AI to Mimic Communication Style via Personal Chat Logs [GitHub]
(Ongoing)
Designed an end-to-end custom language model to emulate individual communication styles using personal chat logs.
Implemented data extraction, cleaning, and a custom BPE tokenizer, followed by training a decoder-only transformer from scratch.
Applied QLoRA-based fine-tuning for parameter-efficient adaptation, demonstrating potential for personalized dialogue systems and low-resource model customization.
Reimplemented a LLaMA-style Transformer from Scratch in PyTorch [GitHub]
Developed an educational reimplementation of a modern LLaMA-inspired large language model, constructed entirely from first principles in PyTorch.
Implemented and analyzed key components of transformer architecture, including multi-head attention, grouped-query attention (GQA), and rotary positional embeddings (RoPE).
How do models like Llama and GPT understand context so effectively? The answer lies in multi-head self-attention. In this post, I provide a step-by-step breakdown of this foundational technology, translating complex theory into intuitive concepts.
This is part of my journey to understand large language models from first principles. I’m building a LLaMA-like model from scratch, documenting each component in this blog series.
For a complete overview of my open source contributions, check this GitHub search query. Below is the same information organized by year.
Improved the model’s positional embedding layer by refactoring duplicated logic to enhance code clarity and robustness (OpenAI) [approved]
Improved code documentation by adding comprehensive docstrings to utility and helper functions (OpenAI) [merged]
Improved an example script by removing dead code to enhance long-term code health and maintainability (OpenAI) [merged]
Refactored a utility function for clarity, improving code readability and long-term maintainability (OpenAI) [merged]
Resolved UnicodeDecodeError in a txt file-reader by specifying encoding for cross-platform compatibility (FreeCodeCamp) [merged]
Fixed typo in function docstring to improve clarity (FreeCodeCamp) [merged]
AWS Certified AI Practitioner - Amazon Web Services (AWS)
AWS Certified AI Practitioner Early Adopter - Amazon Web Services (AWS)
Mathematics for Machine Learning and Data Science Specialization - DeepLearning.AI
Google Business Intelligence Specialization - Google
Google Data Analytics Specialization - Google