Storage Upgrade Proposal
Darryn Schneider
April 16, 2008
Executive Summary
Expand current online storage by 190TB usable to a total of 400TB which will be divided into 3 file systems.
Migrate from Ibrix distributed file system to Luster distributed file system with support from Sun Microsystems.
Implement a secondary storage system based on a tape based file system (HSM system) using an application by FileTek with an initial capacity of 400TB.
Expand present Atempo backup system to support expanded storage, maintaining nightly incremental backups and full offsite backups for potential disaster recovery.
Purchase new tape library for expanded backup system and secondary storage.
Total cost to implement $1400k.
Introduction
Over the initial 5 years of the construction phase of the IceCube project the scope of data storage has grown significantly. A series of meeting were held with key stake holders and a revised set of requirements have been identified. The detailed requirements for the next 5 years have been captured in the document “IceCube Project Data Storage Requirements, 2008 to 2013”.
This proposal is for an expansion which will provide immediate online storage needs for the next year, backup and secondary storage for 2 years, and a frame work for further expansion for 5 years. The frame work will provide an environment which can be expanded as required at reasonable cost to meet future requirements, and provide flexibility in dealing with any additional scope change.
Evaluation
The online storage systems were evaluated against the following list of required or desired features and performance requirements.
* Hardware Management
o Add hardware and expand file system while online
o Remove hardware and shrink file system
o Fail over servers
* File System Maintenance
o Be able to do file system maintenance without bringing down file system
o Be able to move data as needed to perform maintenance
* Performance
o Load balance across arrays
o Load balance across servers
o Minimum RW speeds of 400MB/sec
o Support performance for multi server clients (minimum 8 servers 20 to 40 clients each)
o Support multiple file systems of at least 400TB
* Other
o Support for group/user quotas
o Vendor support
o Performance reporting
* OS Support
o Linux client support
o Linux server support
o 32/64 bit
* Costs
o Initial
o Expansion
Options
Three main scenarios were considered.
1. | Replace Ibrix with Luster. | |
2. | Replace Ibrix with supported appliance with vendor neutral storage. | |
3. | Replace Ibrix with Luster, and add single vendor NAS/SAN solution. |
Primary Storage Solution
The present IceCube storage system is a distributed file system and software SAN solution called Ibrix built on vendor neutral hardware. The major advantage of this approach is vendor independence on hardware, the most expensive component of the storage system. It is proposed that Ibrix be replaced by a similar software system called Luster, which allows for the continued use of existing hardware and the same vendor neutrality on hardware.
Present storage hardware is a mix of Apple XServe XRaid arrays and NexSAN SATABeast arrays. These arrays are very price competitive, and reliable. However the Apple product has been discontinued and also had the disadvantage of lower density (14 drives per 3U compared to 42 drives per 4U). The NexSAN units have the disadvantage of a growing management overhead, with each unit requiring independent management. It is proposed that future expansion use a product by Digi-Data which expands with the addition of JBOD disk shelves, thus reducing management overhead, and averaged over a number of arrays is cheaper than the NexSAN product. IceCube has evaluated this product and is very happy with the performance. To provide a minimum of 190TB usable storage at RAID 6, 6 array of 48 1TB drives each would be required (1 head unit and 5 JBODs).
The present SAN fabric is fiber channel using a mix of Cisco and QLogic FC switches. While the Cisco switches fit more closely with the IceCube network infrastructure it was decided to move to QLogic switches for the SAN fabric as they are considered high quality within the industry and provide significant cost savings over the Cisco switches. For the proposed expansion a single blade based Qlogic SANBox 9000 with 3 16 port 4GB/sec blades and a single 10GB/sec blade (for switch interconnect) has been identified. This switch provides for significant expansion into the future, supporting up to 8 blades.
To provide optimal access to the new storage the core switch at 222 will need expanding with a 10GB/sec network blade to support high speed networking to CPU resources and to Chamberlin Hall, where future HPC systems will be located. A basic 1GB/sec switch will also be required to connect Luster storage servers.
Secondary Storage Solution
IceCube implemented a taped based file system, better known as a HSM (Hierarchical Storage Management) system late last year primarily as an application to automate taping of data at South Pole. This is system consists of a buffer disk, and tape library, and an application that automates the movement of data between the two as data access is required by users. These systems are designed for data which has long periods of dormancy but occasional access is required for long periods of time. This described a lot IceCube data. These systems have enormous benefits in much lower storage costs per TB, and lower power consumption and cooling requirements.
The system chosen for the South Pole was a basic system that met some very specific needs. While it is functional for data in the Data Warehouse it will not meet long terms needs for intensive user access. It is proposed that a fully featured HSM be implemented in the Data Warehouse which will allow for the long term storage of large volumes of data that have significant dormancy periods (months). A number of products were investigated and the product that best met IceCube’s needs, while not being price prohibitive was by a vendor called FileTek. This is a very mature product with a good reputation within the industry. They have also provided a very competitive educational discount.
Initially this system would have a capacity of about 400TB, and would not need expanding for at least 2 years. It would use 4 LTO-4 tape drives and a buffer disk of about 10TB. The drives would be co-located in the tape library also used by the backup solution.
This system will provide an important level of flexibility in managing data storage within IceCube, and should greatly reduce the risk of future scope increases, or at least minimize the cost, by providing a cheaper data storing technology.
Backup
IceCube’s backup policy is standard minimum best practice, with full onsite plus nightly incremental, along with an offsite copy for potential disaster recovery. Presently IceCube uses a backup application called TIMEnavigator by Atempo to do full plus nightly on site, and a single backup copy which is kept at Chamberlin Hall. This system works well, except needs expansion to meet an expanded storage system. To meet these needs a new library with expansion cabinet will be required, which can be shared with the secondary storage system, and an upgrade to a faster higher capacity tape technology. The current latest technology is LTO-4, and 4 drives would meet needs for 2 years. Further expansion would require additional drives, which could be accommodated with the proposed new library. The proposed library, Qualstar LRM 837200, can be expanded with additional expansion cabinets to meet IceCube needs for the next 5 years. The Qualstar library has the additional benefit of not having a licensing or support cost based on resources (tape or drive slots) used. This greatly reduced potential cost associated with unexpected future increases in scope. The existing library would be maintained at current capacity and moved to Chamberlin Hall for access of South Pole data, written on LTO-3 tapes using QStar HSM application. The largest cost of expanding the backup system is tape media at about $150k.
Cost Summary
Luster Servers Existing
Luster Support $100k
Digi-Data disk array (1 head unit and 5 JBODs of 48 1TB drives each) $350k
QLogic SAN Switch $140k
Qualstar library, expansion cabinet, 8 LTO-4 drives $665k
FileTek HSM application -
HSM Server Existing
HSM Buffer disk Existing
Cisco network switches $60k
UPS (power) $10k
Total $1325k
Recurring Costs
The storage system will have associated maintenance costs, vendor support contracts, and future expansion costs. While these expenditures will be covered by operating funds it is listed here so all costs are understood. The exact costs for the future support after the expiry of initial support is not known exactly and best estimates are given. For instance the QLogic switch initial support is for 3 years, and QLogic does not a have a price identified for 1 year of support in 3 years from now.
Luster Servers
Additional servers may be required to maintain performance as storage and HPC expands. No additional servers should be required for the first 3 years.
Luster Support
Luster support is for 3 years. Initial year including initial installation support is about $50k. Support for subsequent years is significantly cheaper. After 3 years support will be about $15k per year.
Digi-Data disk array
At present costs the average price per usable TB is $1.8k. The head unit can support up to 500 disks before becoming performance limited. Thus an additional head unit would not be required until 2 years after the initial purchase.
QLogic SAN Switch
The initial purchase includes 3 years of support and warranty. There after support would be about $10k per year. Additional blades presently cost $10k and would not be required for 2 years after the initial purchase.
Qualstar library
The initial purchase includes 3 years of support and warranty. Beyond this support would be on the order of $10k per year. The initial purchase would have tape capacity for at least 3 years. However additional tapes may be required after 2 years, and an expansion cabinet after 3 years. Expansion cabinets are presently $45k, which add capacity for an additional 1075 tapes which can be used by both the HSM system and the backup system.
FileTek HSM application
Initial purchase includes first year of support. Support there after is $18k per year.
HSM Server
No support required.
HSM Buffer disk
This is a recently purchased disk array which includes 3 years of support and warranty. After 3 years this array may require upgrading.
Cisco network switches
Initial purchase includes 1 year of support. Subsequent support would be about $10k per year.
UPS (power)
Additional UPS required as new hardware added. 2 additional UPS (about $5k) will be required at the first disk expansion after 1 year.