Strategic Storage Task Committee Recommendations
April 17, 2023
Google currently provides unlimited storage to NC State. Starting in November 2024, free Google storage will be capped at 1.4 petabytes (PB). Current Google storage usage is 5.5PB. Each additional petabyte of storage costs $144K/year. To avoid a recurring cost of at minimum $432K/year the Strategic Storage Task Committee is recommending a series of actions to begin in April 2023.
Our goal is to create a path for NC State to have the storage it needs to be successful in its mission. Based on our investigations, we have determined that Google storage is the most efficient and cost-effective system for enterprise users and for researchers with modest data storage needs. However, for large bodies of research data, we recommend transitioning from Google storage to Office of Information Technology (OIT) Research Storage.
Recommendation 1. Improve the usability of OIT Research Storage to support increased use.
While NC State has dozens of storage systems, most support a particular application (e.g., storage for classes hosted on Moodle). There are currently two offerings that support general use: Google storage (the “Google Drive”) and OIT Research Storage. Google storage will continue to meet most productivity-based needs, like collaborative and individual coursework, projects, and business operations, but moving forward, we recommend that it should not be considered the primary storage solution for large amounts of research data. Instead, OIT Research Storage should become the standard tool for research storage.
OIT and the Office of Research and Innovation (ORI) currently fund 2TB of OIT Research Storage for every faculty member, which can be increased up to 30TB upon request. Each active sponsored project also receives 2TB of storage that is shared by the project team. However, simply put, Google storage is currently more user-friendly than OIT Research Storage. Google storage is easier to access, and file-sharing with external users is more straightforward.
As we consider quotas on Google storage (see Recommendation 3), the university does not want to encourage users to store research data on personal external hard drives, servers, or computers that are not secured or backed up. Consequently, we recommend implementing a series of projects to make OIT Research Storage easier to access and use. Initial projects could include developing a user-friendly GUI to request OIT Research storage, creating simple scripts to mount OIT Research Storage, implementing an automation process to simplify data moves from Google storage to OIT Research Storage, and improving documentation along with other short term projects to improve the currently available Globus tools (international file sharing tool) to simplify data transfer. We also need to develop an archival storage solution to address the demand for long term, infrequently used data. Improvements in research storage will be guided by the implementation steering team.
Recommendation 2. Office of Information Technology (OIT) and College IT offices work individually with the heaviest users of Google storage to develop plans to optimize their storage use.
We recommend that OIT and the College IT offices immediately start to work one-on-one with the heaviest users of Google storage to develop plans to migrate their data to appropriate storage. OIT and the College IT offices would work together, select accounts to pilot and use the information and experience gained from working with the heavy storage users to support planning for storage usage and transition for the broader campus community.
Recommendation 3. Implement storage quotas for all faculty, staff, and students by August 1, 2023.
Initial quotas will be set to not affect any user or account as we work through the process to right-size storage. Current data shows that 278 individuals and 86 shared drives are using more than 1TB of Google storage, and if each of these can reduce usage to 1TB or less, this will free approximately 1.2PB. We recommend that OIT and the College IT offices provide individual support for anyone with over 10TB of data to develop tailored solutions. Additional efforts to manage our Google storage will be guided by an implementation steering team, see Recommendation 5.
Recommendation 4. Develop a communications plan, supported by documentation and tools, to support reduction of Google storage use.
The implementation of Google storage quotas and increased use of OIT Research Storage will need to be supported by both a detailed communications plan and an ongoing education program. Faculty, staff, and students need authoritative guidance for how and where to store their data, and we recommend the development of a straightforward tool to help choose storage options. Simple tools, such as those provided by Cornell and Michigan could be developed for OIT’s website. Onboarding and orientations should include data storage at a high level so new faculty, staff, and students understand what is available.
Recommendation 5. Develop a multi-year implementation plan for the broader campus community to reduce its usage of Google storage and increase use of Research Storage, pilot in summer and communicate this plan at the beginning of the Fall 2023 semester.
The initial quotas described in Recommendation 3 will not solve the Google storage problem. We recommend that an implementation steering team be established immediately to develop a multi-year transition plan. There are several specific actions that need to be planned, scheduled, communicated, and implemented.
Deleting stored data for unused accounts including alumni will free up significant storage. Adding these actions to Recommendation 3 will leave approximately 3PB of Google storage in active use. To continue to reduce storage toward the ~2PB goal, additional actions need to be planned, scheduled, communicated, and implemented.
- Continue stepping down quotas for faculty, staff, and students over a multi-year period. Implement a robust monitoring process to determine new quotas that are announced sufficiently in advance of implementation.
- Quotas on individual and shared drives should be set to encourage shared drive usage. Shared drives are important to enterprise and research continuity as they allow important documents and data to persist after employees and students leave NC State.
- Implement a robust exception process for researchers who need collaboration space while a long-term shared research storage solution is developed.
Recommendation 6. Make short-term purchases of storage from Google to support the implementation timeline.
As quotas are implemented and the campus moves data out of Google storage, there will be a 1-3 year window where NC State needs to purchase additional Google storage. We anticipate that the storage needs will decrease each year, requiring 2PB the first year ($288K) and 1PB for years 2 and 3 ($144K/year).
Recommendation 7. Develop a long-term storage strategy for NC State.
We recommend that the implementation steering team begin developing recommendations for a long-term storage strategy for NC State. Issues that should be considered include:
- Developing a cold data storage service and policies for use; ensure easy to use and automated tools are available for moving data between active/hot storage platforms, including OIT Research Storage, to archive/cold storage systems
- Clarifying and documenting policies for purchasing storage using ledger 5 funds
- Clarifying and documenting policies for use of cloud storage
- Review, document and look at improvements for how faculty can share data externally (currently using Globus and Google)
- Providing input to a training team for creating and providing ongoing education around storage solutions for faculty, staff, and students
- Determining what non-OIT user storage solutions exist and advising to ensure those services are appropriately managed and configured. Educating users about these options.
- Developing options for database storage, which is not currently available via OIT Research Storage
- Developing options for storing secure non-CUI data (e.g., HIPAA identifiable datasets)
Implementation Costs. Implementation costs include short-term purchase of Google storage (Recommendation 6), with a cost of $288K in FY24 and $144K in FY25 and FY26. Additional costs include $50K/year for five years starting in FY24 to purchase additional OIT Research Storage. The effort will need one new FTE of research storage development staff for implementation and support of storage maintenance, management, and transition activities.
Committee Members
- Thomas Birkland tabirkla@ncsu.edu, College of Humanities and Social Sciences
- Shawn Dunning sgdunnin@ncsu.edu, College of Engineering
- Paul Huffman prhuffma@ncsu.edu, Department of Physics
- Susan Ivey slivey@ncsu.edu, University Libraries
- Cristina Lanzas clanzas@ncsu.edu, Department of Population Health and Pathobiology
- Maggie Merry mamerry@ncsu.edu, Poole College of Management
- Chris Reberg-Horton screberg@ncsu.edu, Department of Crop and Soil Sciences
- Eric Sills edsills@ncsu.edu, Office of Information Technology
- Jeff Webster jsw@ncsu.edu, DELTA
- Alyson Wilson agwilso2@ncsu.edu, Office of Research and Innovation