Loading Events
This event has passed.

COMMITTEE CHAIR: Dr. Suxia Cui

TITLE: TRANSCENDING VON NEUMANN: PERFORMANCE AND ENERGY EFFICIENCY ANALYSIS OF HYBRID IN-MEMORY AI ACCELERATOR FOR TRANSFORMER

ABSTRACT: Deep learning has been revolutionized by Transformer-based AI models but remains computationally exhaustive owing to memory-bound operations and energy-hungry matrix multiplications. Graphic Processing Units (GPUs) and Tensor Processing Units (TPUs), which are conventional AI accelerators, overcome the von Neumann bottleneck, causing high latency, high energy consumption, and limited throughput. By facilitating computation directly within memory arrays, in-memory computing (IMC) has emerged as a promising substitute for decreasing data transfer overhead and improving effectiveness. This paper investigates Non-Volatile Memory (NVM) IMC-based AI accelerators, classified into analog in-memory computing (AIMC)-examples are: – ReRAM, PCM, and FeFETs for analog matrix computations-and digital compute-in-memory (CIM) architectures, which use SRAM and NVM for bit-wise procedures. The analysis entailed a study of uncoupled accelerators regarding their precision, energy efficiency, performance, latency, throughput, and trade-offs based on hardware utilization. Additionally, hybrid structures are considered and explored as they combine AIMC and CIM to optimize transformer inference. Finally, the lingering issues of precision, scalability, and constant evolution of these accelerators with subsequent generative AI models were analyzed. IMC-based accelerators provide high-level performance AI inference while utilizing minimal energy and, thus, data.

Keywords: Transformer Models; In-Memory Computing; Analog Compute-in Memory (AIMC); AI Hardware Accelerators; Non-Volatile Memory (NVM); Digital Compute-in-Memory (CIM); Latency Optimization; SRAM-based AI Accelerators; Energy Efficient AI; Edge AI.

Room Location: Electrical Engineering Building Room 315D