Don Marti

Sat 29 Nov 2008 10:11:07 AM PST

What (more) Sun Should Do

Tim Bray has posted a thought-provoking What Sun Should Do.

It's good to see that he picks up on two of the reasons why I would still pay more for a Linux server or hosting plan than for Solaris: the Solaris command-line tools are ready for the Museum of Unix, and Solaris has nothing like the time-saving package management available on Linux.

"The Mac and Linux platforms are remarkably similar once you drop into the command-line mode where developers live around deployment time; they have a mostly-common "GNU Userland". Solaris needs to offer a similar set of commands and utilities, so that when you go there for deployment the commands you’re used to using will Just Work."

"Also, developers depend crucially on package management with the implied provision of dependency and version tracking. Solaris must have something that’s generally competitive with APT, which means that we need to get IPS polished and finished pretty damn quick, or adopt the Nexenta approach of combining the Solaris kernel with the Debian userland."

Brian Aker agrees. The first thing you have to do with a new Sun system is put decent tools on it, and installing and updating software on Solaris hasn't changed much since 1992. Maybe that's because almost everyone who has had to do it on Solaris was on the clock, and a lot of the people doing it on Linux were playing with it on their own time. I don't know. (I have suggested that Sun should just build OpenSolaris kernel packages for Ubuntu and CentOS, with ext3 and Linux kernel personality. Add a repository, reboot, and there's a Solaris user, and another person to test the world's free software on your platform. Easier for the user than switching Linux distributions if you do the packages right.)

But what should Sun do? (besides having their, um, "Enterprise Class" Marketing department re-brand MySQL as Sun Java System Enterprise Database Suite, of course.) One more important one. Besides the all web, all the time plan, high-performance computing needs to stay on any Sun to-do list.

Sun holds the number 6 and 29 spots on the November 2008 TOP500 list, and a future Sun that wants to sell to serious web and "cloud" customers will want to grab some more spots above the fold on future lists. Even though HPC customers are the worst IT customers in the world—always beating you up on price, wanting bizarre configurations, running their own weird Linux load instead of the branded OS you want to sell and reporting bugs you can't duplicate—you need them.

You'll probably best be able to sell Management on doing HPC by pointing out the similarities between current HPC environments and future stuff needed for "the cloud" on the software side. Sun Grid Engine is on the Sun Constellation Linux Cluster, the current number 6 machine on the TOP500 list. Bray writes, "I am convinced that we have to go ahead and build some Cloud infrastructure anyhow and operate it and make it pay for itself, so that when the ecosystem does find its shape, we'll understand it and be positioned to sell the Web Suite into it." That's right. And in order to build that kind of understanding, computer companies do HPC, which is sort of like a racing program for a car company. Easy to dismiss as an ego-driven waste, but in an HPC project, you get more control over software at all levels of the stack than in an ordinary data center project. You can put the stuff out there, not bleeding edge stuff but at least not the stale software that ends up on most large IT projects, and see what happens. You can use paying (for small values of "paying", but still) customers instead of paid testers.

But there's a more important reason. When your hardware engineers start sneaking out to smoke crack, your HPC customers will tell you first, because they have the biggest "antenna" to spot flakeouts, and all their stuff has to be exactly right. Scientific journals don't accept Fail Whale pictures. "I've discovered a new subatomic particle! Oh, no, wait. Piece of crap computer." Building a PC-architecture system looks easy. World of Warcraft players do it in their basements—how hard can it be for a company that designs its own processors? But really getting it right is tricky, and HPC customers have been burned by enough not-quite-right x86 machines that they're worth the hassle just as a quality filter.

This is not new advice, just old advice that hasn't expired yet.

"For more than two years, it has been apparent in the IBM Company that we were behind in the large scientific area. This is an area where, since the days of our Harvard machine, we have attempted to lead. Although four or five years ago there was some doubt as to whether or not we should continue to try to lead in this area because of the expense and other considerations, at some point between two and three years ago it became evident that the fallout from the building of such large-scale machines was so great as to justify their continuance at almost any cost." -- T.J. Watson, Jr., May 17, 1965