I am a computer architect working on memory system and domain-specific hardware.
On-going project at Alibaba DAMO:
Cost-effcient LLM Inference Accelerator
The project is motivated by in-house profiling of Alibaba's production-level LLM. I work as the lead architect. We have developed a FPGA PoC demo; Paper and opensource project on the way.
Past projects at Alibaba:
Large-scale distributed GNN Trainning with Cloud FPGA:
The project targeted at Alibaba's TB-level distributed graph neural network training. I worked as the lead architect.
We have deliveried a 4-card FPGA demo, paper on ISCA-22, 6 co-author research paper (OSDI/MICRO etc), and 14+ patents.
Recommendation Accelerator with 3D-DRAM
This is an accelerator testchip motivated by Alibaba's recommendation system.
I was a major contributor (2nd author). Testchip on ISSCC 2022.
Previous Projects at UCSB:
Processing-In-Memory (PIM) and Near Data Processing (NDP) Architecture:
PIM/NDP DRAM and emerging NVM for applications such as deep learning, bioinformatics.
Pioneer work PRIME has 1000+ citations. Publications on ISCA'16, DAC'16, MICRO'17/18/19, IEDM'17, HPCA'20 etc.
Memory Subsystem Optimization for Big Data Applications:
Memory optimizations for (dynamic) graph analytic, persistent database, blockchain, homomorphic encryption.
Publications on DAC'18, CAL'18, MICRO'18, HPCA'19, MICRO'19, etc.
Non-Von Neumann Architecture for Deep Neural Network:
Algorithm-architecture co-design.
Publications on ISCA'16, MICRO'16, TPDS'18, ASPLOS'19, MICRO'20, etc.
Non-volatile Processor Architecture and Chip Design for IoT:
HPCA best paper; Micro Top Pick 2016