[linux-elitists] 32 essential computer books?
Fri Dec 19 10:57:26 PST 2003
> > Note, you haven't shown this example.
> Correct, I haven't shown you the exact machine language. However, the
> next time I'm reading through some optimized itanium assembly language
> and I see a case of it, I will try to remember to send it onto this
> list. One situation where this tends to come up is when:
> 1) you calculate a value
> 2) use it one way
> 3) then store it somewhere
> 4) the calculated number gets pushed out of the registers
> 5) you need the value again.
> With no -O it fetches it from the memory.
> With -O3 it recalculates it.
> In C it might look like this:
> <lots of code>
> What actually gets generated with the optimizer turned on is something
> <lots of code>
> I've seen something like that recently when working with the intel
> compiler on an itanium machine but I don't remember exactly where.
This is a compiler implementation issue, not an algorithm issue.
> This kind of thing shows up ALL the time in ccNUMA and grid computing
> codes. Think about it for a second, if the piece of information you need
> could be halfway across the planet on another node in a grid, or if it
> exists on other nodes's cache on a ccNUMA machine it is much faster to
> recalculate the information rather than fetch it from wherever it
> happens to be.
OK. Its still not an algorithm issue.
> > > For example, in my current world of high performance computing, what I
> > > am seeing is that most of the time you can't use data structures more
> > > elaborate than an array. Even linked lists are verbotten. The problem is
> > > if you have this massive amount of data and you need to do something to
> > > it, and you have lots of processors the data pretty much has to be in an
> > > array. There is no parallel algorithm to traverse a linked list, or most
> > > kinds of trees or things like that.
> > Bullshit. Here is one quick example.
> > http://www.extreme.indiana.edu/hpjava/papers/implicit/node34.html
> Uh no. You and are are talking about a different scale of parallelism.
> Our smallest machine is 512 CPUs. Two of our machines have over 2000
> CPUs. The one we are currently building has 4000 CPUs and the one that
> is scheduled for 2005 is 65,000 CPUs. The technique that the author used
> to parallelize his merge sort doesn't scale.
Its not linear, no. You've got an O(n) sort algorithm that uses arrays?
> What you find overall is that many if not most normal data structures
> are exceptionally difficult to work with when you need to parallelize
> the application. Arrays are good because you can have different
> execution units work on different portions of the array.
sometimes, yes. Perhaps even often in some applications, but you were
saying that ADTs are useless for parallel code.
> > I cut my teeth (first real unix kernel job) at Convex, which built
> > SIMD (vector) machines, then parallel SIMD machines. That was 1986 or so.
> Parallel computers today are not that simple. I will readily admit that
> when I was talking about parallel computers, I was talking about the
> parallel computers that I am used to working with. These are overgrown
> beowulf clusters not even single system image ccNUMA machines.
There are many kinds of parallel computers. We can talk about
large-scale beowulf clusters if you like. I'll admit up front that
I've never had access to one.
I do imagine that they're great for some types of applications
(algorithms), and simply horrid (due to communication overhead) for
> I will stand behind my statement though. The computer science is lagging
> very far behind the state of the art in the hardware we can build. We do
> not have very many well researched and well understood algorithms and
> data structures which work well on the kinds of parallel machines that
> we are building. I've heard it repeated several times that much of the
> problem is that the mathematics to express these algorithms are pushing
> the limits of our understanding.
perhaps the machines *will* take over then.
> > Even then, the above statement was wrong.
> I would argue that the statement was more wrong then than it is now.
> > These days its cheap to build an N-way (for small values of 'N')
> > parallel machine, and various CPUs have SIMD fp units attached.
> > While vectorizing arrays is EZ (read: cheap to implement), arrays
> > bring with them their own special issues. Accesses tend to saturate
> > the memory system (and thrash the data cache)
> I would say that with SIMD you are absolutely correct. However, what we
> are dealing with here is not SIMD. We are dealing with these days are
> distributed memory parallel execution environments.
> I would argue that yours is a variation on a straw man attack.
OK, whatever. I don't know what I'm talking about. These are not the
droids you're looking for.
"Speed, it seems to me, provides the one genuinely modern pleasure."
-- Aldous Huxley (1894 - 1963)
More information about the linux-elitists