I am a computer architect working on memory system and domain-specific hardware.


On-going project at Alibaba DAMO:

Cost-effcient LLM Inference Accelerator
The project is motivated by in-house profiling of Alibaba's production-level LLM. I work as the lead architect. We have developed a FPGA PoC demo; Paper and opensource project on the way.

Past projects at Alibaba:

Large-scale distributed GNN Trainning with Cloud FPGA:
The project targeted at Alibaba's TB-level distributed graph neural network training. I worked as the lead architect. We have deliveried a 4-card FPGA demo, paper on ISCA-22, 6 co-author research paper (OSDI/MICRO etc), and 14+ patents.

Recommendation Accelerator with 3D-DRAM
This is an accelerator testchip motivated by Alibaba's recommendation system. I was a major contributor (2nd author). Testchip on ISSCC 2022.


Previous Projects at UCSB:

Processing-In-Memory (PIM) and Near Data Processing (NDP) Architecture:
PIM/NDP DRAM and emerging NVM for applications such as deep learning, bioinformatics. Pioneer work PRIME has 1000+ citations. Publications on ISCA'16, DAC'16, MICRO'17/18/19, IEDM'17, HPCA'20 etc.

Memory Subsystem Optimization for Big Data Applications:
Memory optimizations for (dynamic) graph analytic, persistent database, blockchain, homomorphic encryption. Publications on DAC'18, CAL'18, MICRO'18, HPCA'19, MICRO'19, etc.

Non-Von Neumann Architecture for Deep Neural Network:
Algorithm-architecture co-design. Publications on ISCA'16, MICRO'16, TPDS'18, ASPLOS'19, MICRO'20, etc.

Non-volatile Processor Architecture and Chip Design for IoT:
HPCA best paper; Micro Top Pick 2016