April 12, 2002 - World's Largest Database reaches 500,000 Gigabytes

Date Issued: April 12, 2002

Contact:

Neil Calder, Stanford Linear Accelerator Center: 1 (650) 926-8707, neil.calder@slac.stanford.edu

Relevant Web URLs:

Last week at the Stanford Linear Accelerator Center (SLAC), the BABAR experiment's database stored its 500,000th Gigabyte - a milestone that makes it the largest known database in the world. The BABAR experiment - a collaboration of 600 physicists from nine nations - observes collisions between subatomic particles to understand how the behavior of matter and antimatter shaped our universe. BABAR, also known as the "B Factory," mass-produces huge quantities of scientific data with industrial efficiency. Up to 500 Gigabytes of data is sent relentlessly to the experiment's database daily.

The half million Gigabytes of data in the BABAR database, printed out, would fill one billion books. That's nearly 60 times the number of books in the Library of Congress, the largest library in the world. "The need to store the avalanche of information coming from the experiment and then efficiently search and retrieve specific data samples has driven physicists and computer experts to create innovative technology," said SLAC Director Jonathan Dorfan. "Governments, commercial corporations and institutes will face similar needs in the near future and the knowledge and experience we have gained will be passed on."

In 1996, while work was beginning on the construction of the experimental apparatus, a small group of dedicated researchers at SLAC and the Lawrence Berkeley National Laboratory (LBNL), both U.S. Department of Energy laboratories, began the arduous task of constructing an efficient and convenient way of storing and retrieving the enormous output of information expected from the experiment. Working closely with physicists from the BABAR project, as well as researchers at other physics laboratories, the development team chose to base the system on a new object-oriented database technology. Objectivity/DB, a product of Objectivity, Inc. based in nearby Mountain View in the heart of Silicon Valley, was chosen to meet the demands of the BABAR data.

The team at SLAC and LBNL worked over two years to customize the core database software to provide the scientists with the initial features they needed for this immense project. The SLAC and LBNL researchers wrote more than half a million lines of software code to provide the physicists access to their data in a simple and reliable fashion. "We like the challenge", said Jacek Becla, the BABAR Database group manager. "We bet on a promising, but somewhat unproven, object-oriented database technology back in 1996."

In May 1999, the experiment began taking data and the database was put to the test. It was not at all clear that the technology could keep up with the vast flow of information coming from the experiment. Bottlenecks were found not only in the database software, but also in other parts of the information system: the network used to transmit the data, and even the operating systems of the computers being used. One by one these problems were eliminated and the database began to hit its stride. Now in 2002, the database is capable of recording data at speeds no one had dared dream in 1999. More than a 1000 Gigabytes of data can be stored every day.

"This milestone of 500,000 Gigabytes is a vindication of the potential of object databases that we anticipated when we embarked on this project." said LBNL's David Quarrie, a chief architect of the system. "A lot of credit should go to the members of the BABAR database group and to Objectivity who have collaborated well with us." "This is a great example of how research scientists can collaborate with high-tech industry to create new systems that in the near future can benefit all kinds of fields."

Particle physics laboratories such as SLAC have been leaders in information technology research for decades. Ten years ago, SLAC researchers set the World Wide Web loose on America, with astounding results.

"We have to stay on the cutting edge in order to use our resources in the most efficient way", said Richard Mount, SLAC's head of computing. "The database is the biggest not because we want it to be but because we need it. And it looks like we have a lot more scaling to do in the next few years as the amount of data along with the number of data analysis jobs grows."

Press Release: World's Largest Database reaches 500,000 Gigabytes