An international team of researchers has designed and built a chip that performs calculations directly in memory and can run a wide variety of AI applications, all at a fraction of energy consumed by computing platforms for general-purpose AI computing.
NeuRRAM neuromorphic chip brings the AIs run on a wide range of edge devices, disconnected from the cloud, where they can perform sophisticated cognitive tasks anywhere and anytime without depending on a network connection to a centralized server. Apps abound in every corner of the world and in every facet of our lives, and range from smartwatches to VR headsets, smart headphones, smart sensors in factories, and rovers for space exploration.
The NeuRRAM chip is not only twice as energy efficient as state-of-the-art “compute-in-memory” chips, an innovative class of hybrid chips that run Calculations in memory, it also provides results as precise as conventional digital chips. Conventional AI platforms are much larger in size and are usually constrained to use large data servers running in the cloud.
Additionally, the NeuRRAM chip is very Versatile and take in cost many models and architectures of different neural networks. As a result, the chip can be used for many different purposes, including image recognition and reconstruction as well as voice recognition.
“Conventional wisdom is that the “Higher in-memory computational efficiency comes at the expense of versatility, but our NeuRRAM chip achieves efficiency without sacrificing versatility,” said Weier Wan, the paper’s first corresponding author and a recent Ph.D. graduate of Stanford University who worked on the chip at UC San Diego, where he was co-supervised by Gert Cauwenberghs in the Department of Bioengineering.
The research team, co-led by bioengineers from the University of California, San Diego, presents its findings in the 17 August issue of a nature.
Currently, IA Informatics is both energy -consuming and costly in calculation. Most AI apps on edge devices involve moving data from devices to the cloud, where AI processes and analyzes it. Then the results are transferred to the device. This is because most edge devices are battery powered and therefore only have a limited amount of power that can be dedicated to computing.
By reducing the power consumption needed for AI inference at the edge, this NeuRRAM chip could lead to more edge devices Robust, intelligent and accessible additions and an additional intelligent manufacturing. It could also lead to better data privacy, as moving data from devices to the cloud comes with increased security risks.
On AI chips, the moving data from memory to compute units is a major bottleneck. two-hour job,” Wan said.
To solve this data transfer problem, the researchers used what is called resistive random-access memory, a Type of non -unstable memory which allows calculation directly in memory rather than in separate calculation units. RRAM and other emerging memory systems used as synapse arrays for neuromorphic computing were put to work in the lab of Philip Wong, Wan’s adviser at Stanford and the main contributor to this work. Computing with RRAM chips is not necessarily new, but it generally leads to a decrease in the accuracy of the calculations performed on the chip and a lack of flexibility in the chip architecture.
“Calculation in memory is a common practice in neuromorphic engineering since its introduction there are more than years, ”said Cauwenberghs. “What’s new with NeuRRAM is that extreme efficiency now goes hand in hand with great flexibility for various AI programs with almost no loss in accuracy compared to normal general purpose digital computing platforms. ”
a carefully elaborate methodology was the key to work with several levels of“ Co -optimization” through the abstraction layers of hardware and software, from designing the chip to configuring it to perform various AI tasks. Additionally, the team made sure to account for various constraints ranging from the physics of the memory devices to circuitry and network architecture.
“This chip now provides us with a platform to solve these problems across the stack, from devices and circuits to algorithms,” said Siddharth Joshi, assistant professor of computer science and engineering at the University of Notre Dame, who started working on the project as a Ph.D. student and postdoctoral researcher in Cauwenbergh’s lab at UC San Diego.
Performances of the chip
The researchers measured the energy efficiency of the Puce by a measure known as the energy delay product, or EDP. The EDP incorporates both the amount of energy consumed for each operation and the time required to complete the operation. By this measure, the Neurram chip reaches an EDP 1.6 to 2.3 times a weak furlmore (less it is better) and a calculation density 7 to times high more than the cutting edge of technology.
The researchers have performed various AI tasks on the chip. He reached a clarification of 85 % on a task of recognition of manuscripts 17, 7 % on a classification task of images and 84, 7 % on a task of recognition of Google vocal commands. In addition, the chip also made it possible to reduce 30 % the image reconstruction error during a recovery task image. These results are comparable to existing digital chips that perform calculations with the same bit-accuracy, but with drastic power savings.
The researchers point out that one The main contributions of the article is that all the results presented are obtained directly on the equipment. In many previous works on in-memory compute chips, benchmark AI results were often achieved in part by software simulation.
Next steps include improving Architectures and circuits and the adaptation of design to technological nodes as well as advanced. The researchers also plan to tackle other apps, such as spike neural networks.
“We can do better at the device level, improve the design of circuits to implement additional functionality and respond to various apps with our dynamic NeuRRAM platform,” said Rajkumar Kubendran, assistant professor at the University of Pittsburgh, who began working on the project as a Ph.D. . student in Cauwenberghs research group at UC San Diego.
Calculation in memory. “As a researcher and engineer, my ambition is to put research innovations from the labs into practice,” said Wan.
The key to energy efficiency De Neurram is an innovative method to detect the memory outing. Conventional approaches use stress as input and measure the current as a result. But that leads to the need for complex and more energy -fed Furthermore circuits. In NeuRRAM, the team designed a neural circuit that senses stress and performs analog-to-digital conversion in an energy-efficient way. This stress mode detection can activate all rows and columns of an RRAM array in a single computation cycle, allowing for higher parallelism.
In the Neurram architecture, CMOS neurons circuits are physically intertwined with rram weights. It differs from conventional conceptions where CMOS circuits are generally on the periphery of RRAM weights. Neuron connections with the RRAM network can be configured to serve as an input or output of the neuron. This allows neural network inference in various data flow instructions without incurring overhead in terms of area or power consumption. This makes the architecture easier to reconfigure.
To ensure that the accuracy of AI calculations can be preserved across various neural network architectures, researchers have Developed a set of co-optimization strategies of material algorithms. The techniques were verified on various neural networks, including convolutional neural networks, long-term short-term memory, and restricted Boltzmann machines.
As a chip ‘Neuromorphic ia, Neurorram performs a distributed parallel treatment on neurosynaptic cores. To simultaneously achieve high versatility and efficiency, NeuRRAM supports data parallelism by mapping one layer of the neural network model across multiple cores for parallel inference across multiple data. Additionally, NeuRRAM provides model parallelism by mapping different layers of a model to different cores and performing pipelined inference.
An international research team
The work is the result of an international team of researchers.
The UC San Diego team designed the CMOS circuits that implement the neural functions interfacing with the matrices Rram to take the synaptic functions in the architecture of the chip, for high efficiency and versatility. Wan, working closely with the entire team, implemented the design, characterized the chip, trained the AI models and ran the experiments. Wan has also developed a software tool chain that maps AI programs to the chip.
The RRAM synapse network and its operating conditions have been extensively characterized and optimized at Stanford University.
The RRAM array was fabricated and integrated on CMOS at Tsinghua University.
The Notre Dame team contributed to both the design and architecture of the chip, as well as the design and development of the machine learning model that followed.
The research began as part of the Countrywide Science Basis-funded Expeditions in Computing project on the visual cortex on silicon at Penn Point out College, with ongoing financial support from the Workplace of Naval Investigation Science of AI, semiconductor analysis corporation and the Darpa Soar program, and Western Digital Corporation.