Data Analyst Intern, Data Science Foundations Lab, HKUST
Project Overview
- Developed a full-stack RAG pipeline to enhance LLM response accuracy and relevance across 100,000+ vector database queries.
- Fine-tuned reranking models (Qwen3, FlagReranker) to boost performance on QA benchmarks.
- Automated preprocessing and embedding generation pipelines for large-scale datasets.
- Explored trade-offs between retrieval accuracy, computational cost, and latency for scalable RAG deployment.
This project deepened my understanding of vector databases, high-dimensional search, and hybrid retrieval methods for LLMs. I focused on bridging optimization and NLP by designing efficient, data-driven retrieval systems.
CS 1110 Teaching Assistant, Cornell University
Role Overview
- Led bi-weekly lab sessions to guide students through programming exercises on recursion, object-oriented design, and algorithmic problem-solving.
- Provided one-on-one consulting to help students debug code, refine logic, and build conceptual understanding in Python.
- Developed instructional materials such as walkthroughs, visual examples, and practice problems to strengthen comprehension.
This experience enhanced my ability to communicate technical concepts, mentor effectively, and foster analytical thinking — skills that directly support data-driven research and technical collaboration.
AEW Facilitator, College of Engineering, Cornell University
Workshop Overview
- Designed interactive workshops and problem sets to reinforce key concepts in Multivariable Calculus.
- Encouraged collaborative learning by guiding peer discussions and supporting diverse problem-solving strategies.
- Clarified challenging topics through examples connecting theory to practical applications in engineering and data analysis.
Through leading AEW sessions, I developed strong communication, facilitation, and problem-solving skills while helping students build confidence and technical proficiency.
Let’s Connect
Email: songyijin1214@gmail.com
LinkedIn: Hedy Song