<< Chapter < Page Chapter >> Page >

Wider memory systems

Consider what happens when a cache line is refilled from memory: consecutive memory locations from main memory are read to fill consecutive locations within the cache line. The number of bytes transferred depends on how big the line is — anywhere from 16 bytes to 256 bytes or more. We want the refill to proceed quickly because an instruction is stalled in the pipeline, or perhaps the processor is waiting for more instructions. In [link] , if we have two DRAM chips that provide us with 4 bits of data every 100 ns (remember cycle time), a cache fill of a 16-byte line takes 1600 ns.

Narrow memory system

This figure shows three labeled boxes. A small box on the left side of the figure is labeled CPU, with a single thick black line connecting it to the right to the second, larger box, labeled Cache. To the right of the Cache box is a box labeld DRAM DRAM. In between these boxes are two thick grey arrows. One arrow pointing from Cache to DRAM DRAM, is labeled Address, and the other, pointing from DRAM DRAM to Cache, is labeled 8 bits.

One way to make the cache-line fill operation faster is to “widen” the memory system as shown in [link] . Instead of having two rows of DRAMs, we create multiple rows of DRAMs. Now on every 100-ns cycle, we get 32 contiguous bits, and our cache-line fills are four times faster.

Wide memory system

This figure shows three labeled boxes. A small box on the left side of the figure is labeled CPU, with a single thick black line connecting it to the right to the second, larger box, labeled Cache. To the right of the Cache box is a box labeld DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM. In between these boxes are two thick grey arrows. One arrow pointing from Cache to DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM, is labeled Address, and the other, pointing from DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM to Cache, is labeled 32 bits.

We can improve the performance of a memory system by increasing the width of the memory system up to the length of the cache line, at which time we can fill the entire line in a single memory cycle. On the SGI Power Challenge series of systems, the memory width is 256 bits. The downside of a wider memory system is that DRAMs must be added in multiples. In many modern workstations and personal computers, memory is expanded in the form of single inline memory modules (SIMMs). SIMMs currently are either 30-, 72-, or 168-pin modules, each of which is made up of several DRAM chips ready to be installed into a memory sub-system.

Bypassing cache

It’s interesting that we have spent nearly an entire chapter on how great a cache is for high performance computers, and now we are going to bypass the cache to improve performance. As mentioned earlier, some types of processing result in non-unit strides (or bouncing around) through memory. These types of memory reference patterns bring out the worst-case behavior in cache-based architectures. It is these reference patterns that see improved performance by bypassing the cache. Inability to support these types of computations remains an area where traditional supercomputers can significantly outperform high-speed RISC processors. For this reason, RISC processors that are serious about number crunching may have special instructions that bypass data cache memory; the data are transferred directly between the processor and the main memory system. By the way, most machines have uncached memory spaces for process synchronization and I/O device registers. However, memory references to these locations bypass the cache because of the address chosen, not necessarily because of the instruction chosen. In [link] we have four banks of SIMMs that can do cache fills at 128 bits per 100 ns memory cycle. Remember that the data is available after 50 ns but we can’t get more data until the DRAMs refresh 50–60 ns later. However, if we are doing 32-bit non-unit- stride loads and have the capability to bypass cache, each load will be satisfied from one of the four SIMMs in 50 ns. While that SIMM refreshed, another load can occur from any of the other three SIMMs in 50 ns. In a random mix of non-unit loads there is a 75% chance that the next load will fall on a “fresh” DRAM. If the load falls on a bank while it is refreshing, it simply has to wait until the refresh completes.

Questions & Answers

what is decentralised
mithlesh Reply
Ayele, K., 2003. Introductory Economics, 3rd ed., Addis Ababa.
Widad Reply
can you send the book attached ?
Ariel
?
Ariel
What is economics
Widad Reply
the study of how humans make choices under conditions of scarcity
AI-Robot
U(x,y) = (x×y)1/2 find mu of x for y
Desalegn Reply
U(x,y) = (x×y)1/2 find mu of x for y
Desalegn
what is ecnomics
Jan Reply
this is the study of how the society manages it's scarce resources
Belonwu
what is macroeconomic
John Reply
macroeconomic is the branch of economics which studies actions, scale, activities and behaviour of the aggregate economy as a whole.
husaini
etc
husaini
difference between firm and industry
husaini Reply
what's the difference between a firm and an industry
Abdul
firm is the unit which transform inputs to output where as industry contain combination of firms with similar production 😅😅
Abdulraufu
Suppose the demand function that a firm faces shifted from Qd  120 3P to Qd  90  3P and the supply function has shifted from QS  20  2P to QS 10  2P . a) Find the effect of this change on price and quantity. b) Which of the changes in demand and supply is higher?
Toofiq Reply
explain standard reason why economic is a science
innocent Reply
factors influencing supply
Petrus Reply
what is economic.
Milan Reply
scares means__________________ends resources. unlimited
Jan
economics is a science that studies human behaviour as a relationship b/w ends and scares means which have alternative uses
Jan
calculate the profit maximizing for demand and supply
Zarshad Reply
Why qualify 28 supplies
Milan
what are explicit costs
Nomsa Reply
out-of-pocket costs for a firm, for example, payments for wages and salaries, rent, or materials
AI-Robot
concepts of supply in microeconomics
David Reply
economic overview notes
Amahle Reply
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, High performance computing. OpenStax CNX. Aug 25, 2010 Download for free at http://cnx.org/content/col11136/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'High performance computing' conversation and receive update notifications?

Ask