Writing
View allA Small LLM and a Point-to-Click Model Beat a Big Multimodal Agent
I built a computer-use agent by splitting it in two: a small reasoning model that plans, and a vision-grounding model that turns 'click the blue Sign In button' into pixel coordinates. Cheaper, faster, and weirdly more reliable than a single big multimodal call.
Understanding NATs in LLM Training
A rough guide to what NATs actually mean in LLM training loss, because I kept seeing this term everywhere and finally decided to understand it.
Fine-Tuning a 2B Vision Model for PDF-to-Markdown with GRPO
I spent a day fine-tuning Qwen3-VL-2B for PDF-to-markdown conversion using SFT + GRPO. Total cost: $4. Here's what worked, what didn't, and why GRPO alone fails on vision models.
Research
Flash Flood Susceptibility Mapping
Graph Neural Network approach for flash flood susceptibility mapping that respects watershed connectivity. Achieved AUC = 0.978 using GraphSAGE on 460 sub-watersheds with six-year SAR flood inventory. Features conformal prediction for statistically guaranteed 90% coverage intervals and identifies 1,457 km of high-risk highways.
WhisperGate
A lightweight trainable gate module (~12K parameters) that sits between Whisper's frozen encoder and decoder, learning to classify each encoder frame as speech or non-speech. Eliminates 100% of hallucinations on silence and noise inputs while preserving clean-speech word error rate.
Pulse-Driven Neural Architecture (PDNA)
Introduces structured oscillatory dynamics into continuous-time neural networks. A pulse module injects learnable sinusoidal perturbations that improve temporal gap robustness on sequential tasks, with self-attention for inter-dimensional coordination.
Experience
Autify
Senior Software Engineer
Working on AI-driven Quality Assurance tools. Developed multi-environment desktop app using Electron, TypeScript and React. Built integrations with Atlassian, Jira, and Figma. Implemented Computer/Mobile Use agents using Vision Language Models. Technical lead for Genesis, building product architecture and AI-native workflows.
Director & Co-Founder (Technology)
Managed product team for a special needs intervention platform. Implemented NextJS project with Apollo GraphQL, Express, and MongoDB. Created data analytics pipeline using Python, RabbitMQ, and PyTorch for game-based Autism screening (86% sensitivity). Developed GenAI-based curriculum creation and recommendation system.
Fellow
Fellow of the Japan-India Transformative Technology Network - 2023. Connects and empowers outstanding change-makers in India and Japan.
Freelance
Data, ML, Python
Implemented API servers in Django Rest Framework with ReactJS. Worked on Python, AI/ML related projects. Assisted in quick prototyping and feature validation.
Education
Guru Nanak Dev University
Bachelor's Degree in Computer Science and Engineering
Skills
Languages
Frameworks
Tools & Infra
AI & Concepts
Projects
Video Assessment for Behavioral Conditions
AI-driven system to analyze video data for behavioral traits using the Iceberg Model and Google Gemini multimodal models.