[linux-elitists] A modest proposal: SSD disk acceleration in kernel.

Karsten M. Self karsten@linuxmafia.com
Fri Oct 24 02:11:38 PDT 2008


SSD:  it's the seeks, stupid.

I've just realized (probalby through Don) that Linus has a blog.  And
he's discovered SSD (solid state disk drives) -- the new-fangled
flash-based storage that's just starting to break out.

    http://torvalds-family.blogspot.com/2008/10/so-i-got-one-of-new-intel-ssds.html

Linus is in love.

I noticed SSDs as one of two things I saw at LinuxWorld Expo this past
August which really impressed me[1].  The demo that one vendor set up
[2] was absolutely brilliant:  four servers, hooked up to one SSD,
streaming 1000 DVD-quality videos, simultaneously.  As I recognized
instantly:  seeks no longer matter.  As Linus notes in his own post, the
raw sustained read/write speeds aren't terribly much more impressive
than conventional disk, but for true random access, it's a quantum
difference over conventional disk.

Those of us who've dealt with system performance issues for the past two
(or four, or eight, or...) decades have long known that the real culprit
to systems slowing down isn't CPU, or even swapping per se, it's the
physical inertial limits of disk heads having to fly between positions
on the disk while writing or reading data, or worse:  lots of both at
once.

The downside of SSDs are the cost -- they're still (and I'm guessing as
price-points dive downward, likely forever will be) about 10 times the
cost of equivalent rotating media storage -- or conversely, you can
afford 1/10 the amount of SSD as you can conventional disk.

That said, it's cheaper than RAM (and non-volatile), and with a standard
80/20 split, means that as a drive accelerator, SSD seems like e true
win:  your CPU(s) interface with the SSD, and it back-end pushes reads
and writes to conventional media.  With some discussion of the coming
nonviability of RAID 5 & 6 (due to the risk of non-recoverable failure
during array rebuild operations), front-ending RAID might also make
sense.

One possibility, of course, is for manufacturers to build SSD caches
directly onto conventional disks or disk packs.  I don't think that's
the right way to go.  Another is to manually configure SSDs as just
another disk partition and manually move stuff in and out of there as
needed.  Puhlease.

Put SSD acceleration in the kernel instead. 

As with other aspects of system tuning, allowing the kernel to sort out
(with operator-adjusted parameters, of course) appropriate caching,
stale/dirty, floor/ceiling, and other aspects of I/O operations wins for
several reasons:

1. You can change the kernel.  The delta of ease between updating a
kernel version and replacing (or even just re-flashing) hardware ROMs 
will mean the ability to continuously tune for improved performance even
on old (or possibly buggy) hardware.  We've seen a similar situation in
the HW v. SW RAID argument, where a significant win for software RAID is
that it's vastly easier to change mechanisms.

2. You can readily tune kernel-based SSD accelerator performance.  We've
got numerous methods and mechanisms for querying and setting kernel
values, from bootparms to module arguments to /proc ans /sys filesystem
entries.  These interfaces will be (at least more) standard across both
different hardware variants (vendors and modules) -- or would you prefer
interfacing via vendor-specific boot disks, utilities, BIOS interfaces,
and/or syntaxes.  They will also be largely standardized across Linux
distros, meaning your friendly neighborhood sysop doesn't have to waste
precious neurons learning and unlearning different methods.

3. Linux with its development model has a massive leg up over any
proprietary OS.  Updates can be tested and deployed though (or around)
the kernel tree as needed.  Other vendors will be playing catchup on
this years down the road.

Oh yeah:  this posting a bit of prior art.  Among the reasons I'm
choosing to pitch this out.  So any patent apps _after_ today's date
will have a little problem.


SSD really is pretty damned fascinating.  Among other things:

  - Hashing isn't restricted to memory.  Until now, in-memory hashing
    was an excellent tool for increasing software performance ... until
    your lookup tables got so large they hit disk, at which point a
    hash's prime benefit, random access, became its chief liability.
    SSD means never having to say "I'm sorry" to a hashing method.

  - Swap to SSD.  As an intermediate stage between disk and memory in
    speed, and with no seek penalty, parking your swap on SSD becomes a
    no-brainer.  Likewise your suspend/restore partition (which is
    already in swap if you or your distro are sane).  Suddenly paging
    itself becomes a much less significant matter (you lose some
    performance, but it's not order-of-magnitude as now).

  - Database.  Say no more.  Table, access, lock, and other contention
    issues evaporate.  I'm sure Oracle is looking at this stuff, I hope
    the Postgresql and MySQL folk are as well.

  - Power conservation.  Laptops with both SSD and rotational storage
    can minimize spin-ups to only those times when data must be flushed
    to (or read from) disk. Hopefully rarely.

I suspect we're going to see future storage configurations in which
10-50% of system storage is SSD, and the remainder fixed disk for the
bulk of server and desktop solutions.  High-end laptops will be fully
SSD, as will most handhelds (cost, weight, performance, power, and
reliability will drive this).  Allowing the OS to tune for maximum
performance is almost certainly going to be the best way to utilize
this capability.

... or am I smoking?


Peace.

--------------------
Notes:

1.  The other was Infiscale, who apparently deliver large-scale server
    farm solutions to most of the large US gov't labs.

2.  The company was Fusion IO.  It was such a damned impressive and
    effective  demo that they deserve the plug.

-- 
Karsten M. Self <karsten@linuxmafia.com>        http://linuxmafia.com/~karsten
    Ceterum censeo, Caldera delenda est.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: Digital signature
Url : http://allium.zgp.org/pipermail/linux-elitists/attachments/20081024/ff4ca7d3/attachment.pgp 


More information about the linux-elitists mailing list