Edge AI

ReRAM: A Game Changer for Artificial Intelligence

AI Gets Physical

Artificial intelligence is no longer confined to data centers or specialized systems.
It is becoming ubiquitous.

Today, AI is embedded into everyday products, robots, and machines that sense, decide, and act in real time. From vision and speech to control and autonomy, AI is increasingly expected to operate continuously, efficiently, and reliably, often at the edge and within the physical world.

While AI training remains compute-intensive and typically centralized, inference is rapidly moving into real-world systems which are tightly constrained. In this environment, memory has become a first-order design limitation. Performance, energy efficiency, determinism, and scalability now matter as much as raw compute throughput, forcing a fundamental rethink of traditional memory architectures.

Constraints at the Edge

In AI inference, trained models apply learned weights to make real-time predictions and decisions. Moving this capability into edge devices places intelligence closer to where data is generated and acted on, rather than relying on constant communication with the cloud. This enables real-time processing, lowers bandwidth requirements, and supports privacy-sensitive applications across a wide range of intelligent systems.

Since edge devices are typically small, battery-powered, and resource-constrained, power consumption and cost are critical. This presents a fundamental challenge for AI workloads, which are inherently data-intensive. In fact, AI systems spend most of their energy not on computation, but on accessing and moving data between memory and compute. As a result, minimizing data movement while delivering ultra-low-power memory access and low latency is now a primary requirement for enabling efficient, scalable AI at the edge.

NVM for AI Inference at the Edge

The semiconductor industry has improved energy efficiency by moving to advanced logic nodes, delivering higher performance in smaller silicon area and lower power envelope. Memory has not kept pace. While leading-edge logic is now manufactured at nodes as advanced as 3nm, embedded flash cannot scale below 28nm. As a result, NVM and AI compute are often built on separate dies and cannot be integrated on the same chip.

Today’s edge AI systems rely on a two-chip architecture, with external flash storing model weights and on-chip SRAM used as a temporary buffer to feed the AI engines. This approach increases power consumption, cost, and security risk while limiting the performance of such designs as data movement remains a challenge. These limitations are driving the industry to adopt new memory technologies such as ReRAM.

ReRAM:
The NVM Best Positioned
to Lead the Way

Weebit ReRAM is a fast, embedded NVM that can scale to the advanced process nodes needed to meet the demands of physical and edge AI systems.

ReRAM enables a one-chip solution with tight coupling between memory and compute, deterministic access times, and significantly lower energy per operation. It also eliminates the security risks of eavesdropping on the communication between the chips in a multi-chip solution. These characteristics make ReRAM particularly well suited for AI inference workloads, where predictable performance, power efficiency, and system integration are critical.

Weebit can be deployed in three different AI architectures: Near-Memory Compute (NMC) – where the memory resides close by the computation; In-Memory Compute (IMC) – where the memory not only stores data but also runs the computation; and Neuromorphic Compute – where the memory mimics operation of the brain synapses.

AI Architectures Enabled by Weebit ReRAM​

Single-Chip Near-Memory Compute with ReRAM

As embedded flash fails to scale at advanced logic nodes, memory and compute are increasingly split across separate dies, increasing latency, power, cost, and complexity. Weebit ReRAM enables single-chip near-memory compute by integrating large, fast, NVM directly alongside MCUs and AI accelerators, keeping data local and reducing energy-hungry data movement.

By storing AI weights and firmware on-chip, Weebit ReRAM enables instant-on operation and higher efficiency than SRAM-based approaches. With more than 4× higher density than SRAM, ReRAM supports higher model accuracy in a smaller footprint, while reducing system cost, complexity, and security exposure.

In-Memory Compute with ReRAM

In-memory compute (IMC) paradigms use memory elements also as compute resources, allowing calculations to run directly within the memory itself. Since there is no need to move the coefficients, data transfers between the memory and CPU are eliminated, dramatically improving speed and lowering power consumption.

ReRAM’s physical properties make it well suited for such analog computations, enabling highly parallel operations such as matrix-vector multiplications with significantly improved energy efficiency for AI inference.

Neuromorphic Computing with ReRAM

ReRAM is a strong foundation for neuromorphic computing, a brain-inspired approach to edge AI that emphasizes parallelism, data locality, and event-driven operation. Its non-volatility and analog programmability make ReRAM well suited for implementing synaptic weights in spiking neural networks (SNN), where neural matrices map directly to ReRAM arrays and weights are stored as cell conductance.

Because ReRAM closely mirrors the physical behavior of biological synapses, it enables fast, real-time processing of large data sets at dramatically lower power than conventional neural network implementations. This makes ReRAM a compelling path towards highly efficient, next-generation edge AI systems.

Learn More about Weebit ReRAM in AI

Organizations working with Weebit on AI initiatives