How the Nehalem Microprocessor Microarchitecture Works

By: Jonathan Strickland

Nehalem and QuickPath

Intel built the Core i7 chip series using the Nehalem microarchitecture.
Intel built the Core i7 chip series using the Nehalem microarchitecture.
Courtesy Intel

According to Intel, the Nehalem microarchitecture uses a system the company calls QuickPath. QuickPath encompasses the connections between the processors, memory and other components.­

In older Intel microprocessors, commands come in through an input/output (I/O) controller to a centralized memory controller. The memory controller contacts a processor, which may request data. The memory controller retrieves this data from memory storage and sends it to the processor. The processor makes computations based upon that data and sends the results back through the memory controller to the I/O controller. As microprocessors become more complex with multiple processors on a single chip, this model becomes less efficient.


Using the old microarchitecture, Intel's chips had a memory bandwidth of up to 21 gigabytes per second. QuickPath connectivity improves the memory bandwidth, allowing more information to pass each second.

Processors using the new technology decentralize communication between processors and memory. That means that instead of a centralized memory controller, each processor has its own memory controller, dedicated memory and cache memory. The processors communicate directly with the I/O controller. Commands come from the I/O controllers to the processors. Because each processor has a dedicated memory controller, memory and cache, information flows more freely. Each processor can communicate with its dedicated memory at a speed of 32 gigabytes per second.

Nehalem-based processors also have point-to-point interconnections between each other. That means if one processor needs to access data within another processor's cache, it can send a request directly to the respective processor and get a response. Within each interconnection are distinct data pathways. Data can flow in both directions at the same time, speeding up data transfers. Transfer speeds between the multiple processors and the I/O controller can be up to 25.6 gigabytes per second.

(c) 2009

QuickPath allows processors to take shortcuts when they ask  other processors for information. Imagine a quad-core microprocessor with processors A, B, C and D. There are links between each processor. In older architectures, if processor A needed information from D, it would send a request. D would then send a request to processors B and C to make sure D had the most recent instance of that data. B and C would send the results to D, which would then be able to send information back to A. Each round of messages is called a hop -- this example had four hops.

QuickPath skips one of these steps. Processor A would send its initial request -- called a "snoop" -- to B, C and D, with D designated as the respondent. Processors B and C would send data to D. D would then send the result to A. This method skips one round of messages, so there are only three hops. It seems like a small improvement, but over billions of calculations it makes a big difference.

In addition, if one of the other processors had the information A requests, it can send the data directly to A. That reduces the hops to 2. QuickPath also packs information in more compact payloads.