I am a graduate student in Computer Science at New York University. My interests are centered on recommender systems, language model reasoning, and machine learning systems that are rigorous, reproducible, and useful beyond a single benchmark.

Recently, I have been building end-to-end retrieval-ranking pipelines, transformer training code from scratch, and reinforcement learning workflows for reasoning models. I am especially interested in how system design choices affect stability, evaluation quality, and downstream behavior.

Prior to NYU, I received my B.E. in Computer Science and Technology from Anhui University, where I graduated as an Outstanding Undergraduate Graduate of the Class of 2025. I also joined an exchange program at Deakin University.

I am happy to discuss research, engineering, internships, and collaboration in general. You can reach me by ym3447@nyu.edu or browse my GitHub.

I am currently interested in:

  • Recommender Systems: retrieval, ranking, re-ranking, sequence modeling, and evaluation design.
  • LLM Reasoning and Post-Training: supervised fine-tuning, reward design, GRPO-style optimization, and test-time behavior.
  • ML Systems: reproducible data pipelines, checkpointing, experiment tracking, and low-friction deployment.

news

Apr 14, 2026 Launched this academic-style personal website.
Mar 2026 Completed a reinforcement learning pipeline for LLM reasoning using SFT and GRPO on Qwen2.5-Math-1.5B.
Mar 2026 Finished a four-stage recommendation system for short-video feed ranking on KuaiRand-Pure.
Feb 2026 Built a GPT-style transformer language model from scratch with a streaming tokenizer and memmap training pipeline.
Aug 2025 Received overseas-study support from Anhui University for pursuing graduate study at New York University.
Mar 2024 Completed a machine learning engineering internship at National Quantum, working on image segmentation and deployment.
Jul 2023 - Oct 2023 Served as the student lead of the Information Technology Honors Class visiting cohort from the School of Computer Science and Technology at Anhui University, led the visit to Deakin University, completed a four-month exchange program there, and received a Certificate of Completion on Nov 10, 2023.

selected projects

  1. 01 RecSys

    Four-Stage Recommendation System for Short-Video Feed

    Jan 2026 - Mar 2026

    Built a full recommendation system on KuaiRand-Pure with multi-channel retrieval, coarse ranking, fine ranking, and rule-based re-ranking. The pipeline used time-based splitting and materialized intermediate candidates to keep experiments reproducible.

    Highlights: Recall@100 of 0.2476 / 0.2343 on validation and test, val AUC of 0.8149 for coarse ranking, and improved top-100 recall after candidate reduction.

    ItemCF Two-Tower Graph Embedding LightGBM DIN
  2. 02 Benchmark

    Retrieval-Ranking Benchmark for Recommender Systems

    Sep 2025 - Nov 2025

    Built a unified recommendation benchmark on MovieLens-32M with negative sampling, Recall, and NDCG evaluation, and compared BPR-MF, GRU4Rec, SASRec, BERT4Rec, and dual-tower retrieval.

    Highlights: BERT4Rec achieved Recall@10 = 0.97 and NDCG@10 = 0.80.

    BPR-MF GRU4Rec SASRec BERT4Rec Dual-Tower
  3. 03 LLM

    GPT-Style Transformer Language Model from Scratch

    Jan 2026 - Feb 2026

    Implemented a byte-level BPE tokenizer and core transformer modules including causal self-attention, RoPE, RMSNorm, and SwiGLU, then paired them with a streaming tokenization and uint16 memmap pipeline.

    Highlights: processed 541M training tokens and 5.46M validation tokens, with a full-dev loss of 1.475.

    Transformer RoPE RMSNorm SwiGLU TinyStories
  4. 04 RL

    Reinforcement Learning for LLM Reasoning

    Feb 2026 - Mar 2026

    Built a training and evaluation pipeline for reasoning models based on Qwen2.5-Math-1.5B, covering zero-shot baselines, SFT, GRPO, reward design, automatic grading, and rollout-based updates.

    Highlights: improved Countdown validation accuracy to 32.4%.

    SFT REINFORCE GRPO Reward Design
  5. 05 Systems

    LLM Systems Optimization

    PyTorch + Triton

    Implemented a FlashAttention-2 optimized attention stack for a decoder-only Transformer, including a custom torch.autograd.Function reference path and a Triton fused forward kernel with causal masking.

    Highlights: built a GPU benchmarking and profiling workflow with CUDA-synchronized timing, torch.compile, mixed precision, Nsight profiling, and PyTorch memory snapshots to compare baseline attention against FlashAttention across model sizes and sequence lengths.

    PyTorch Triton CUDA FlashAttention torch.compile

experience

National Quantum

Machine Learning Engineer Intern · Jan 2024 - Mar 2024

Worked on magnetic-particle image segmentation under limited-data conditions. I designed data processing and annotation workflows, used SAM for bootstrapped labeling, trained UNet variants in PyTorch, and improved validation performance with augmentation, Optuna, Focal Loss, and Dice Loss.

I also completed model export and production-side inference with ONNX Runtime, which gave me end-to-end exposure from training to deployment.

Education

New York University · Aug 2025 - May 2027

M.S. in Computer Science, Concentration in Artificial Intelligence · GPA 3.9 / 4.0

Anhui University · Sep 2021 - Jun 2025

B.E. in Computer Science and Technology · GPA 3.73 / 4.0 · Outstanding Undergraduate Graduate of the Class of 2025 · Exchange Program at Deakin University, Australia

Alongside academics, I served as class monitor of the Information Technology Honors Class, headed the college Arts Department, and captained the college soccer team.