Yuhan Ma

I am a graduate student in Computer Science at New York University. My interests are centered on recommender systems, language model reasoning, and machine learning systems that are rigorous, reproducible, and useful beyond a single benchmark.

Recently, I have been building end-to-end retrieval-ranking pipelines, transformer training code from scratch, and reinforcement learning workflows for reasoning models. I am especially interested in how system design choices affect stability, evaluation quality, and downstream behavior.

Prior to NYU, I received my B.E. in Computer Science and Technology from Anhui University, where I graduated as an Outstanding Undergraduate Graduate of the Class of 2025. I also joined an exchange program at Deakin University.

I am happy to discuss research, engineering, internships, and collaboration in general. You can reach me by ym3447@nyu.edu or browse my GitHub.

I am currently interested in:

Recommender Systems: retrieval, ranking, re-ranking, sequence modeling, and evaluation design.
LLM Reasoning and Post-Training: supervised fine-tuning, reward design, GRPO-style optimization, and test-time behavior.
ML Systems: reproducible data pipelines, checkpointing, experiment tracking, and low-friction deployment.

news

Apr 14, 2026	Launched this academic-style personal website.
Mar 2026	Completed a reinforcement learning pipeline for LLM reasoning using SFT and GRPO on Qwen2.5-Math-1.5B.
Mar 2026	Finished a four-stage recommendation system for short-video feed ranking on KuaiRand-Pure.
Feb 2026	Built a GPT-style transformer language model from scratch with a streaming tokenizer and memmap training pipeline.
Aug 2025	Received overseas-study support from Anhui University for pursuing graduate study at New York University.
Mar 2024	Completed a machine learning engineering internship at National Quantum, working on image segmentation and deployment.
Jul 2023 - Oct 2023	Served as the student lead of the Information Technology Honors Class visiting cohort from the School of Computer Science and Technology at Anhui University, led the visit to Deakin University, completed a four-month exchange program there, and received a Certificate of Completion on Nov 10, 2023.

selected projects

01 RecSys

Four-Stage Recommendation System for Short-Video Feed

Jan 2026 - Mar 2026

Built a full recommendation system on KuaiRand-Pure with multi-channel retrieval, coarse ranking, fine ranking, and rule-based re-ranking. The pipeline used time-based splitting and materialized intermediate candidates to keep experiments reproducible.

Highlights: Recall@100 of 0.2476 / 0.2343 on validation and test, val AUC of 0.8149 for coarse ranking, and improved top-100 recall after candidate reduction.

ItemCF Two-Tower Graph Embedding LightGBM DIN
02 Benchmark

Retrieval-Ranking Benchmark for Recommender Systems

Sep 2025 - Nov 2025

Built a unified recommendation benchmark on MovieLens-32M with negative sampling, Recall, and NDCG evaluation, and compared BPR-MF, GRU4Rec, SASRec, BERT4Rec, and dual-tower retrieval.

Highlights: BERT4Rec achieved Recall@10 = 0.97 and NDCG@10 = 0.80.

BPR-MF GRU4Rec SASRec BERT4Rec Dual-Tower
03 LLM

GPT-Style Transformer Language Model from Scratch

Jan 2026 - Feb 2026

Implemented a byte-level BPE tokenizer and core transformer modules including causal self-attention, RoPE, RMSNorm, and SwiGLU, then paired them with a streaming tokenization and uint16 memmap pipeline.

Highlights: processed 541M training tokens and 5.46M validation tokens, with a full-dev loss of 1.475.

Transformer RoPE RMSNorm SwiGLU TinyStories
04 RL

Reinforcement Learning for LLM Reasoning

Feb 2026 - Mar 2026

Built a training and evaluation pipeline for reasoning models based on Qwen2.5-Math-1.5B, covering zero-shot baselines, SFT, GRPO, reward design, automatic grading, and rollout-based updates.

Highlights: improved Countdown validation accuracy to 32.4%.

SFT REINFORCE GRPO Reward Design
05 Systems

LLM Systems Optimization

PyTorch + Triton

Implemented a FlashAttention-2 optimized attention stack for a decoder-only Transformer, including a custom torch.autograd.Function reference path and a Triton fused forward kernel with causal masking.

Highlights: built a GPU benchmarking and profiling workflow with CUDA-synchronized timing, torch.compile, mixed precision, Nsight profiling, and PyTorch memory snapshots to compare baseline attention against FlashAttention across model sizes and sequence lengths.

PyTorch Triton CUDA FlashAttention torch.compile

experience

National Quantum

Machine Learning Engineer Intern · Jan 2024 - Mar 2024

Worked on magnetic-particle image segmentation under limited-data conditions. I designed data processing and annotation workflows, used SAM for bootstrapped labeling, trained UNet variants in PyTorch, and improved validation performance with augmentation, Optuna, Focal Loss, and Dice Loss.

I also completed model export and production-side inference with ONNX Runtime, which gave me end-to-end exposure from training to deployment.

Education

New York University · Aug 2025 - May 2027

M.S. in Computer Science, Concentration in Artificial Intelligence · GPA 3.9 / 4.0

Anhui University · Sep 2021 - Jun 2025

B.E. in Computer Science and Technology · GPA 3.73 / 4.0 · Outstanding Undergraduate Graduate of the Class of 2025 · Exchange Program at Deakin University, Australia

Alongside academics, I served as class monitor of the Information Technology Honors Class, headed the college Arts Department, and captained the college soccer team.

news

selected projects

Four-Stage Recommendation System for Short-Video Feed

Retrieval-Ranking Benchmark for Recommender Systems

GPT-Style Transformer Language Model from Scratch

Reinforcement Learning for LLM Reasoning

LLM Systems Optimization

experience

National Quantum

Education