The Scientific Data and Computing Center (SDCC) at the US Department of Energy’s (DOE) Brookhaven National Laboratory has revealed it is now storing more than 300 petabytes of data.
The data held by the laboratory relates to nuclear and particle physics experiments and is the largest tape archive in the US containing this type of data, and the third largest scientific data repository in the country.
In a press release published by Brookhaven National Laboratory, Alexei Klimentov, a Brookhaven Lab physicist who manages SDCC, said the lab now holds six times more data than would be generated by compiling all of written history from Sanskrit to the present day.
The data cache held by SDCC consists of information relating to experiments undertaken at the DOE’s Relativistic Heavy Ion Collider which has been operating at Brookhaven National Laboratory since 2000, and the ATLAS experiment at the Large Hadron Collider, located at CERN, the European Organization for Nuclear Research.
All the data is available online and on demand. It is stored in a ‘high-tech’ tape storage library, with physicists able to access the data on disk. Klimentov said access is automated, with robots grabbing tapes and mounting the desired information onto disks.
The lab has also developed its own software and website to keep track of data transfers, in addition to collaborating with IBM and other DOE labs to design a data management system dubbed High-Performance Storage System (HPSS).
HPSS ensures that different data storage systems, such as tapes, databases, and disks can communicate with one another. The consortium also developed the software physicists use to access SDCC’s data.
SDCC engineer Ognian Novakov explained that the lab favors a tape-to-disk system because of its cost and environmental benefits. Disk storage requires computers to constantly be spinning the disks, which in turn need to be cooled, while tape is relatively static when not in use, meaning it has lower power demands.
According to SDCC, most of its tape libraries are now located in a facility with power and cooling efficiencies designed specifically for data systems, and also have capacity for the growing cache of data still being compiled by the labs it serves.
“The storage capacity on tape generally doubles every four to five years,” Novakov explained. “We started 26 years ago with 20-gigabyte tape cartridges; now we are at 18 terabytes on one cartridge — and it’s even smaller in physical size.”
By periodically rewriting data from older media to new, Novakov said the lab is “freeing a lot of slots in the library.”