GitHub | Email: me@ytzi.org, zi.ya@northeastern.edu
I'm a PhD student in Northeastern University and currently advised by Arjun Guha. Previously, I was a Master of Math (CS) student in University of Waterloo, advised by Gregor Richards.
I'm interested in both research and teaching. My current research interests are about all aspects of Large Language Models for Code; more specifically, about its evaluation and human perception. Previously in University of Waterloo, I was a teaching assistant for various courses for many terms. For one term, I taught a first year CS course as an instructor.
Previously I have worked in various places including Amazon, IBM and Oracle as software developer intern.
StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code
Accepted by Findings of the Association for Computational Linguistics (ACL Findings)
Hannah McLean Babe, Sydney Nguyen, Yangtian Zi, Arjun Guha, Molly Q Feldman, and Carolyn Jane Anderson
Paper | arXiv | Dataset
How Beginning Programmers and Code LLMs (Mis)read Each Other
Accepted by ACM CHI Conference on Human Factors in Computing Systems 2024 (CHI'24)
Hannah McLean Babe, Sydney Nguyen, Yangtian Zi, Arjun Guha, Molly Q Feldman, and Carolyn Jane Anderson
Paper
StarCoder: May the Source be With You!
Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Randy, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Benjamin Lipkin, Muhtasham Oblokulov, Zhiruo Wang, Rudra Murthy, Jason Stillerman, Siva Sankalp Patel, Dmitry Abulkhanov, Marco Zocca, Manan Dey, Zhihan Zhang, Nour Fahmy, Urvashi Bhattacharyya, Suriya Gunasekar, Wenhao Yu, Swayam Singh, Sasha Luccioni, Paulo Villegas, Maxim Kunakov, Fedor Zhdanov, Manuel Romero, Tony Lee, Nadav Timor, Jennifer Ding, Claire Schlesinger, Hailey Schoelkopf, Jan Ebert, Tri Dao, Mayank Mishra, Alex Gu, Jennifer Robinson, Carolyn Jane Anderson, Brendan Dolan-Gavitt, Danish Contractor, Siva Reddy, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Carlos Muñoz Ferrandis, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries.
arXiv | Github | Model on HuggingFace
SantaCoder: don't reach for the stars!
Accepted by DL4C workshop at ICLR 2023
Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra
arXiv
MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation
Published on IEEE TSE
Federico Cassano, John Gouwar, Daniel Nguyen, Sydney Nguyen, Luna Phipps-Costin, Donald Pinckney, Ming-Ho Yee, Yangtian Zi, Carolyn Jane Anderson, Molly Q Feldman, Arjun Guha, Michael Greenberg, Abhinav Jangda
PDF | arXiv | GitHub | Dataset on HuggingFace | BigCode Eval Harness
CS 2500 - Fundamentals of Computer Science 1 - Fall 2022, Fall 2023
CS 115 - Introduction to Computer Science - Winter 2020
CS 115 - Introduction to Computer Science - Fall 2020
CS 442 - Principles of Programming Languages - Winter 2019
CS 246 - Object-Oriented Software Development - Spring 2015, Fall 2015, Spring 2016, Spring 2017
CS 241E - Foundations of Sequential Programs (Enriched) - Fall 2017
CS 241 - Foundations of Sequential Programs - Winter 2017, Fall 2018, Spring 2019, Fall 2019