Projects
Implemented a document ingestion and preprocessing pipeline using Docling to convert PDFs into markdown and segmented large documents into manageable chunks with LangChain’s RecursiveCharacterTextSplitter for efficient embedding and querying. Integrated Mixedbread’s embedding model and open-source LLMs via Ollama, ensuring data privacy compliance by addressing security and ML requirements, and optimized Chroma vector store performance, enhancing answer relevance by 20% through custom prompt engineering. Engineered a robust testing framework with pytest, leveraging the same LLM used for querying to validate RAG system accuracy across 100+ predefined test cases, and designed a custom evaluation pipeline to rapidly detect response inconsistencies, ensuring continuous system reliability.
Developed convolutional neural network models using Pytorch and Keras to automatically predict locations based on landmarks in images, assisting in tagging photos that lack geolocation metadata. Engineered features from landmark images and optimized CNN hyperparameters, achieving 88% accuracy on a test set for classifying photos into different landmark location categories. Deployed the best-performing CNN model into a photo location prediction app, enabling the automatic tagging of images with relevant location information, demonstrating the practical application of the model.
Created a Python-based program designed to assist individuals with medical conditions such as Parkinson's disease, where typing can be challenging due to tremors or sudden, uncontrolled motor movements. By providing real-time spelling and grammar correction, it makes typing easier and more accurate. The program uses the Gemini language model from Google's Generative AI suite to intelligently correct text, simulating keystrokes to apply corrections.
Created a Python desktop application to monitor file changes in real time using Watchdog, integrated with AWS S3 for cloud storage, and established version control for seamless rollback. Optimized file versioning using hashing and delta storage, reducing cloud storage usage by 35% and bandwidth consumption during synchronization. Designed a user interface with PyQt6, enabling users to upload, track version history, and revert to previous file states, enhancing usability and maintaining data integrity
Built a scalable RSVP tracking system with real-time updates, supporting up to 1,000 invitees, allowing event organizers to monitor responses via a dashboard. Automated email forwarding system, streamlining RSVP link distribution to 100+ recipients per event, improving communication efficiency for event organizers. Enhanced event management efficiency by 25% through automated RSVP status analytics, providing event hosts with actionable insights in real-time.
Developed a Chrome extension that allows users to toggle the removal and restoration of YouTube Shorts, utilizing JavaScript, MutationObservers, and Chrome storage, achieving a 99% success rate in consistently removing Shorts. Optimized performance by reducing DOM access by 50% and implementing throttled MutationObservers, improving page load times by 15%. Actively used by 600+ Users Worldwide!
Headed a team of 5 in developing an interactive web application simulating critical algorithms such as Banker’s, Peterson’s, Dining-Philosopher’s, and Producer-Consumer, providing a practical learning platform. Integrated an explanatory chatbot to enhance algorithm comprehension for university juniors, and hosted the application on Netlify for accessible, hands-on learning.