[linux-elitists] 32 essential computer books?

Jim Thompson jim@netgate.com
Fri Dec 19 10:57:26 PST 2003

> > Note, you haven't shown this example.
> Correct, I haven't shown you the exact machine language. However, the
> next time I'm reading through some optimized itanium assembly language
> and I see a case of it, I will try to remember to send it onto this
> list. One situation where this tends to come up is when:
>  1) you calculate a value
>  2) use it one way 
>  3) then store it somewhere
>  4) the calculated number gets pushed out of the registers
>  5) you need the value again. 
> With no -O it fetches it from the memory.
> With -O3 it recalculates it.
> In C it might look like this:
> data->highest=array[37];
> <lots of code>
> b=data->highest;
> What actually gets generated with the optimizer turned on is something
> like:
> data->highest=array[37];
> <lots of code>
> b=array[37];
> I've seen something like that recently when working with the intel
> compiler on an itanium machine but I don't remember exactly where.

This is a compiler implementation issue, not an algorithm issue.

> This kind of thing shows up ALL the time in ccNUMA and grid computing
> codes. Think about it for a second, if the piece of information you need
> could be halfway across the planet on another node in a grid, or if it
> exists on other nodes's cache on a ccNUMA machine it is much faster to
> recalculate the information rather than fetch it from wherever it
> happens to be.

OK.  Its still not an algorithm issue.

> > > For example, in my current world of high performance computing, what I
> > > am seeing is that most of the time you can't use data structures more
> > > elaborate than an array. Even linked lists are verbotten. The problem is
> > > if you have this massive amount of data and you need to do something to
> > > it, and you have lots of processors the data pretty much has to be in an
> > > array. There is no parallel algorithm to traverse a linked list, or most
> > > kinds of trees or things like that.
> > 
> > Bullshit.  Here is one quick example.
> > 
> > http://www.extreme.indiana.edu/hpjava/papers/implicit/node34.html
> Uh no. You and are are talking about a different scale of parallelism.
> Our smallest machine is 512 CPUs. Two of our machines have over 2000
> CPUs. The one we are currently building has 4000 CPUs and the one that
> is scheduled for 2005 is 65,000 CPUs. The technique that the author used
> to parallelize his merge sort doesn't scale.

Its not linear, no.   You've got an O(n) sort algorithm that uses arrays?

> What you find overall is that many if not most normal data structures
> are exceptionally difficult to work with when you need to parallelize
> the application. Arrays are good because you can have different
> execution units work on different portions of the array. 

sometimes, yes.  Perhaps even often in some applications, but you were
saying that ADTs are useless for parallel code.

> > I cut my teeth (first real unix kernel job) at Convex, which built
> > SIMD (vector) machines, then parallel SIMD machines. That was 1986 or so.
> Parallel computers today are not that simple. I will readily admit that
> when I was talking about parallel computers, I was talking about the
> parallel computers that I am used to working with. These are overgrown
> beowulf clusters not even single system image ccNUMA machines. 

There are many kinds of parallel computers.   We can talk about
large-scale beowulf clusters if you like.  I'll admit up front that
I've never had access to one.

I do imagine that they're great for some types of applications
(algorithms), and simply horrid (due to communication overhead) for

> I will stand behind my statement though. The computer science is lagging
> very far behind the state of the art in the hardware we can build. We do
> not have very many well researched and well understood algorithms and
> data structures which work well on the kinds of parallel machines that
> we are building. I've heard it repeated several times that much of the
> problem is that the mathematics to express these algorithms are pushing
> the limits of our understanding.

perhaps the machines *will* take over then.

> > Even then, the above statement was wrong.
> I would argue that the statement was more wrong then than it is now.
> > 
> > These days its cheap to build an N-way (for small values of 'N')
> > parallel machine, and various CPUs have SIMD fp units attached.
> > 
> > While vectorizing arrays is EZ (read: cheap to implement), arrays
> > bring with them their own special issues.  Accesses tend to saturate
> > the memory system (and thrash the data cache) 
> I would say that with SIMD you are absolutely correct. However, what we
> are dealing with here is not SIMD. We are dealing with these days are
> distributed memory parallel execution environments. 
> I would argue that yours is a variation on a straw man attack. 

OK, whatever.  I don't know what I'm talking about.  These are not the
droids you're looking for.

"Speed, it seems to me, provides the one genuinely modern pleasure."
			-- Aldous Huxley (1894 - 1963)

More information about the linux-elitists mailing list