This afternoon I sat through a presentation from a few guys at Netezza. They were here to discuss their system for high-performance data analytics. What they’ve effectively done is build a large database machine with some special hardware to accelerate database queries via parallel processing nodes. These are some notes I jotted down:
Architecture:
- SMP Host
- 100+ specialized processing units per cabinet (they named them SPU’s for “snippet processing units”)
- SPU’s have their own PPC CPU, commodity disk, memory, and an FPGA
- GigE networks between SPU’s
- SMP Host partitions queries and broker activity to the processing nodes
- Hardware fault-tolert (SPU’s can be hotswapped)
I’ll admit my skepticism tends to mount against any speaker that spends a lot of time at the outset with a marketing pitch when the audience is full of scientists. Do scientists need to be reminded that data sizes are growing? Or that enterprise X, Y, and Z are already using your product? Just show me how at works.
I did a quick search across my feeds to see if anyone has written about Netezza and (not surprisingly there is a post over at Computing at Scale. It appears there are similar efforts from Teradata, Greenplum, and DATAllegro in this space. I can imagine how a systems like Netezza’s might complement more traditional supercomputing. There’s certainly a big effort to commercialize the “new era of HPC” but the technologies that come out of it are business-driven and not science-driven.

1 Response to “High-performance data appliances (Netezza)”