If there’s one constant in a good economy or bad, uncertain times or stable, it’s our reliance on mostly unstructured data and the analysis we derive from massive data collection. Unstructured data are the documents, images, audio and video files, sensor data and research data that run companies today.
Think surveillance and bodycam video plus rapid DNA analysis to solve crimes faster, supply chain analysis to forecast availability of core products and services, sensor-driven analysis of soil and weather conditions to improve crop yields or customer support call analysis to improve products and experiences.
And now, there’s generative AI and the long list of potential societal benefits, and risks, it offers. It’s the information technology professionals who manage both the data and the technologies that store, protect and deliver it to users and applications who are the key players in the data economy. In fact, preparing for AI is the leading data storage priority, followed by cloud cost optimization, according to the Komprise “2023 State of Unstructured Data Management” survey.
As we enter 2024, organizations will need to innovate and work smarter with AI while navigating constant cost constraints. Data storage and backups comprise at least 30 percent of the IT budget. Our predictions below center on the data management component of optimizing AI and cloud technologies. Getting this right, with generative AI unleashing a new era of end-user productivity and technical proficiency, has long-term implications.
How will the way we handle unstructured data evolve?
- AI data governance will require a multi-layered approach.
- FinOps expertise will be the key to successful cloud migrations.
- Cross-silo skills will be required for storage IT careers.
- A top priority will be preparing unstructured data for AI.
1. Multi-Layered Approach to AI Data Governance
The Komprise survey of IT decision-makers found that enterprises are restricting the tools and/or data that employees are allowed to use. This is an important first step, but AI data governance requires a strategic program.
Generative AI has created a multitude of risks from privacy and security to data leakage, transparency, accuracy, ethics and more. Rather than relying on one system to manage these different issues, IT will need to deploy layers of AI security tools, starting at the network level to prevent the access of blocked data by an AI tool or users from sending corporate data to unauthorized AI services.
A second level of protection sits at the data level, auditing which data was moved where, when and by whom and alerts if personally identifiable information (PII) or sensitive data is being shared. Finally, a security mechanism could exist at the user layer to warn users when they are engineering prompts with corporate or sensitive data or provide feedback when prompts may be giving away too much corporate context. Visibility into unstructured data assets across hybrid cloud storage is fundamental to protecting data and monitoring generative AI projects.
2. FinOps Expertise for Cloud Migrations
Industry research shows that managing cloud spend is a top enterprise challenge, and many organizations have limited visibility on this spend or how to optimize it. Basically, data volumes continue to outpace storage. IT leaders need cost-efficient options for data as it ages, such as cloud object storage.
While 27 percent of enterprises were managing 10PB of data or more in 2022, this year, that segment of heavy data owners has jumped to a remarkable 32 percent, according to the Komprise survey. There is ample waste from over-procurement of storage capacity to avoid any business disruption, underutilization of cloud resources and one-size-fits-all storage strategies.
Incorporating financial operations into daily practice will be a core factor in generating value and return on investment from cloud data migrations. In 2024, IT will need to understand data storage costs and data usage patterns before and after a migration project and communicate these metrics clearly with upper management to create buy-in for the cloud.
Organizations that adopt an analytics-first approach to unstructured data management will avoid cloud waste. They will be able to delete duplicate and orphaned data along with data that is no longer needed before a migration and can right-place data in the appropriate cloud tier. This analysis should include clear distinctions between the many tiers of cloud storage with automated processes to move data as it ages to low-cost storage for maximum cost savings.
3. Cross-Silo Skills for Storage IT Careers
The term FinOps will be part of the storage architect’s nomenclature in 2024. As storage has become more software and services-centric, managing hardware is less of a requirement. Instead, managing vendors, contracts and delivering secure, cost-efficient data services to departments and users will take up the bulk of storage professionals’ time. Enterprises are also moving away from being single vendor shops. Storage administrators must be able to hop between different technologies rather than specialize in one platform.
This requires broader skills and knowledge in networking, security, cloud architecture, cost modeling and data analytics. Data titles like “data insights engineer” or “data management architect” will replace storage-specific job titles. In mature infrastructure teams, managers responsible for storage will work with data science and AI teams to procure AI-ready infrastructure and devise plans for data classification and data workflows to analytics platforms.
4. Unstructured Data Preparation for AI
With strategies in place for cost optimization and AI data governance, IT organizations are in a fine spot to center squarely on leveraging unstructured data for new use cases. Unstructured data contains hidden value for AI.
IT leaders will look for automated ways to analyze unstructured data, index metadata and enrich/classify data using AI and machine learning. This will allow teams to run deep analytics to discover and feed only the right data to AI applications, saving significant manual effort for researchers and data scientists.