[linux-elitists] 32 essential computer books?

Martin Pool mbp@sourcefrog.net
Thu Dec 18 17:16:22 PST 2003


On 18 Dec 2003 Jim Thompson <jim@netgate.com> wrote:

> > Thank you for bringing up this point, Nick. I had forgotten about
> > that one. It is rather amazing what happens to algorithms when you
> > change some of the assumptions and throw all the oddities of modern
> > hardware into the fray.  For example sometimes it is worth it to
> > have the processor do twice as much work internally rather than
> > doing an additional memory access.
> 
> Note, you haven't shown this example.

(I have never been mistaken for a hardware designer, so if anyone else
knows more about it, please go right ahead.)

Many years ago, I invented (heh) on my C64 the technique of
memoization: storing values in a table rather than repeatedly
calculating them.  On that machine, it was an obvious win: one array
access to look it up, rather than a complicated formula.

However, on modern machines, memory access is relatively very slow.
There are several reasons: cache misses can take a long time; memory
access takes a risk of a TLB or even page fault; multiprocessors may
contend for access to memory; and superscalar design may mean that on
a particular cycle the CPU has the capacity to do an ALU operation but
not to access memory.  Remember that some of these things will slow
you down even if you have a machine that has plenty of memory for its
load.  Accessing memory may affect the pipeline in complicated ways.

The worst case is enormous, thousands of cycles or more.  (The *real*
worst case of a disk access takes a small eternity.)  The average case
is much less, but probably still rather more than the cycles/inst for
non-memory operations.

And of course you have the throughput vs latency thing.

On some chips I think it is literally true that the balance of
execution units mean that they can do twice as many instructions
internally as memory accesses, even before allowing for memory-related
stalls.  And why not?  Adding faster memory requires new buses, new
memory controllers, new memory.  Adding more integer or FP units is, I
think, relatively cheap in terms of die area and design effort.

So Ben is right: in some cases, it's worth doing more instructions
rather than risking a memory access.  In some cases it's not.
Memoization for example is no longer an automatic win.

I have not seen an undergrad algorithms text that reflects this
reality.  Many of them say "real computers have cache and parallelism
but we'll ignore that", but this is becoming an increasingly
unrealistic assumption.

See http://surriel.com/research_wanted/ for some order-of-magnitude
numbers to back this up, and (for example) "Intel Itanium Architecture
Software Developer's Manual, Part II, Optimization Guide" for gory
details.

Incidentally, I think 'Cormen, Leiserson & Rivest', MIT Press,
something about Algorithms, is as good a text as any.

> > I don't know if I understand the problem well enough to explain it
> > but what I gather is the fundamental problem is that all the
> > mathematics of computer science is based upon the assumption that
> > you are executing on a Von Neuman machine and parallel computers are
> > not Von Neuman machines.-ben
> 
> Wow... are you ever wrong.

Well, Ben is wrong that *all* of CS is based on that assumption, but
certainly many people still assume it.  And as you say there are
parallel algorithms that he probably needs to know about.

> "Speed, it seems to me, provides the one genuinely modern pleasure."
> 			-- Aldous Huxley (1894 - 1963)

Indeed.  

 http://distcc.samba.org/

-- 
Martin 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://allium.zgp.org/pipermail/linux-elitists/attachments/20031219/5c53b9b9/attachment.pgp 


More information about the linux-elitists mailing list