Summary
With colleagues from Harvard Architecture, Circuits, and Compilers Lab , I built and demostrated state-of-the-art performance deep learning and bayesian learning inference accelerators every year from 2018 to 2020.
Papers
- A 16nm 25mm2 SoC with a 54.5x Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53 to eFPGA and Cache-Coherent Accelerators – World’s first demonstration of deep learning acceleration on Arm, eFPGA, and cache-coherent accelerators
- A 3mm2 Programmable Bayesian Inference Accelerator for Unsupervised Machine Perception using Parallel Gibbs Sampling in 16nm – World’s first Bayesian inference chip
- A 25mm2 SoC for IoT Devices with 18ms Noise-Robust Speech-to-Text Latency via Bayesian Speech Denoising and Attention-Based Sequence-to-Sequence DNN Speech Recognition in 16nm FinFET – World’s first LLM inference chip