IBM, Intel, others question the usefulness of the Top500 key metric, the Linpack test
Like Hollywood's Academy Awards, the Top500 list of supercomputers is dutifully watched by high-performance computing (HPC) participants and observers, even as they vocally doubt its fidelity to excellence.
"The Top 500 [uses] an artificial problem -- it doesn't measure about 80 percent of the workloads" that are usually run on supercomputers, said John Hengeveld, director of technical compute marketing for Intel's Data Center Group, speaking on the sidelines of the Supercomputer 2010 conference this week. "It is not a representative benchmark for the industry."
"The list is unclear exactly what it measures," agreed Dave Turek, who heads up IBM's deep computing division, in an interview last week.
The selection process for the Academy Awards ("The Oscars"), run by the Academy of Motion Picture Arts and Sciences, is shrouded in mystery, and, perhaps not surprisingly, observers grumble over which movies and people receive awards and which remain neglected. With the Top500 though, the discontent centers around the single metric used to measure the supercomputers, called Linpack.
Many question the use of a single metric to rank the performance of something as mind-bogglingly complex as a supercomputer.
During one panel at the SC2010 conference this week in New Orleans, one high-performance-computing vendor executive joked about stringing together 100,000 Android smartphones to get the largest Linpack number, thereby revealing the "stupidity" of Linpack.
The Top500 list is compiled twice a year by researchers at the University of Mannheim, Germany; the U.S. Department of Energy's Lawrence Berkeley National Laboratory; and the University of Tennessee, Knoxville.
In the latest iteration, unveiled Sunday, China's newly built Tianhe-1A system topped the list, reporting a sustained performance of 2.57 petaflops. Placing second was the DOE's Oak Ridge Leadership Computing Facility's Jaguar system, reporting 1.75 petaflops.
While grumbling about Linpack is nothing new, the discontent was pronounced this year as more systems, such as the Tianhe-1A, used GPUs (graphics processing units) to boost Linpack ratings, in effect gaming the Top500 list.
"It is difficult to figure out the real application speed from the benchmark. I don't want to make the Top500 just a contest for the number of GPUs," noted an attendee at the Top500 awards ceremony.
"Linpack has many problems with it, but it has a few positive things. It is important to keep in mind that it is one number and it should be taken in the context of a number of things," argued Jack Dongarra, one of the judges for the ranking, during the awards presentation.
One advantage to Linpack is that, thanks to its simplicity, supercomputer keepers can improve their scores on a relatively periodic basis, which makes for exciting news coverage as different facilities and system builders vie with one another for the next top spot.
One year ago, for instance, the Cray-built Jaguar system, clocking in at 1.75 petaflops, knocked the DOE Los Alamos National Laboratory's Roadrunner System from the top spot. Roadrunner, which then clocked 1.04 petaflops, was the first system to break the petaflop barrier in June 2008.
But Linpack, being only one metric, does not take into account many factors of a supercomputer.
"The important thing is how well does your application run on these machines. That is a harder thing to measure and a harder thing to compare across different machines," Dongarra said.
Because Linpack is basically a set of Fortran routines that solve linear equations, it is best suited to measuring the computational muscle of a machine. "Linpack captures how well can you make lots of processors work in parallel on a single big problem," Intel's Hengeveld said.
Linpack is less suitable, however, at estimating the memory performance of a machine, which is an increasingly crucial metric for many of today's big data-styled problems, Hengeveld said.
Another aspect that Linpack does not measure is reliability, Turek noted. "If your mean time between failures is a week or two, Linpack will not teach you anything about that," he said.
The good news is that additional supercomputing benchmarks are increasingly being developed and deployed. For several years, Dongarra and other researchers have offered the HPC Challenge, which rates supercomputers on seven different benchmarks.
For the past four years, Virginia Tech researcher Wu Feng has measured the energy effectiveness of supercomputers by the Green500, the latest iteration of which will be released Thursday at SC2010.
Also at SC2010, a group of researchers introduced the first edition of Graph500, which will be a set of benchmarks that measure supercomputer performance on data-intensive applications.
The team has finished the first benchmark -- a search problem across multiple nodes involving multiple analysis techniques -- and announced the winners. Topping the list was the DOE Argonne National Laboratory's 8,192-core Intrepid system, which was able to execute 6.6 billion references, or results, per second
Time will tell if any of these benchmarks will supplant the Top500. Also like the Oscars, the Top500 may have a staying power that defies logic.
"It is a very convenient way to write about the industry," Turek said. "I care about the misinformation it conveys to the casual user. But I know the people who buy in this space are sophisticated and not fooled by it."
Copyright 2009 IDG Magazines Norge AS. All rights reserved
Postboks 9090 Grønland - 0133 OSLO / Telefon 22053000
Ansvarlig redaktør Henning Meese / Utviklingsansvarlig Ulf Helland / Salgsdirektør Tore Harald Pettersen