What Is a Data Platform? 28 Examples in Big Data You Should Know.
It’s unclear when plain old “data” became “big data,” but the latter term probably originated in 1990s Silicon Valley pitch meetings and lunch rooms. What’s easier to pinpoint is how data has exploded in the 21st century.
The total amount of data recorded until 2003 was five exabytes, or one quintillion bytes. (A quintillion is a million, cubed.) In 2011 alone, recorded data weighed in at 1.8 zettabytes — almost a thousand times more. By 2025, according to one estimate, humans will produce 463 exabytes of data per day. The volume is almost unfathomably immense.
Big Data Platforms To Know
- Microsoft Azure
Hal Koss contributed reporting to this story.
What Is a Data Platform?
Because the persistent gush of data from numerous sources is only growing more intense, lots of sophisticated and highly scalable big data platforms — many of which are cloud-based — have popped up to store and parse the ever expanding mass of information.
Companies increasingly rely on these platforms to collect loads of data and turn them into organized, actionable business insights. This helps firms get a better view of their customers, target audiences, discover new markets and make predictions about what to do next.
Data Platform Examples
We’ve rounded up the 28 big data platforms that make petabytes of data feel manageable.
Google Cloud offers lots of big data management tools, each with its own specialty. BigQuery warehouses petabytes of data in an easily queried format. Dataflow analyzes ongoing data streams and batches of historical data side by side. With Google Data Studio, clients can turn varied data into custom graphics.
Users can analyze data stored on Microsoft’s Cloud platform, Azure, with a broad spectrum of open-source Apache technologies, including Hadoop and Spark. Azure also features a native analytics tool, HDInsight, that streamlines data cluster analysis and integrates seamlessly with Azure’s other data tools.
Amazon Web Services
Best known as AWS, Amazon’s cloud-based platform comes with analytics tools that are designed for everything from data prep and warehousing to SQL queries and data lake design. All the resources scale with your data as it grows in a secure cloud-based environment. Features include customizable encryption and the option of a virtual private cloud.
Snowflake is a data warehouse used for storage, processing and analysis. It runs completely atop the public cloud infrastructures — Amazon Web Services, Google Cloud Platform and Microsoft Azure — and combines with a new SQL query engine. Built like a SaaS product, everything about its architecture is deployed and managed on the cloud.
Rooted in Apache’s Hadoop, Cloudera can handle massive amounts of data. Clients routinely store more than 50 petabytes in Cloudera’s Data Warehouse, which can manage data including machine logs, text, and more. Meanwhile, Cloudera’s DataFlow — previously Hortonworks’ DataFlow — analyzes and prioritizes data in real time.
The cloud-native Sumo Logic platform offers apps — including Airbnb and Pokémon GO — three different types of support. It troubleshoots, tracks business analytics and catches security breaches, drawing on machine learning for maximum efficiency. It’s also flexible and able to manage sudden influxes of data.
Sisense’s data analytics platform processes data swiftly thanks to its signature In-Chip Technology. The interface also lets clients build, use and embed custom dashboards and analytics apps. And with its AI technology and built-in machine learning models, Sisense enables clients to identify future business opportunities.
The Tableau platform — available on-premises or in the cloud — allows users to find correlations, trends and unexpected interdependences between data sets. The Data Management add-on further enhances the platform, allowing for more granular data cataloging and the tracking of data lineage.
Designed to accommodate the needs of banking, healthcare and other data-heavy fields, Collibra lets employees company wide find quality, relevant data. The versatile platform features semantic search, which can find more relevant results by unraveling contextual meanings and pronoun referents in search phrases.
Talend’s data replication product, Stitch, allows clients to quickly load data from hundreds of sources into a data warehouse, where it’s structured and ready for analysis. And Data Fabric, Talend’s unified data integration solution, combines data integration with data governance and integrity, and offers application and API integration.
Qualtrics Experience Management
Qualtrics’ experience management platform lets companies assess the key experiences that define their brand: customer experience; employee experience; product experience; design experience; and the brand experience, defined by marketing and brand awareness. Its analytics tools turn data on employee satisfaction, marketing campaign impact and more into actionable predictions rooted in machine learning and AI.
Teradata’s Vantage analytics software works with various public cloud services, but users can also combine it with Teradata Cloud storage. This all-Teradata experience maximizes synergy between cloud hardware and Vantage’s machine learning and NewSQL engine capabilities. Teradata Cloud users also enjoy special perks, like flexible pricing.
Oracle Cloud’s big data platform can automatically migrate diverse data formats to cloud servers, purportedly with no downtime. The platform can also operate on-premise and in hybrid settings, enriching and transforming data whether it’s streaming in real time or stored in a centralized repository, also known as a data lake. A free tier of the platform is also available.
Domo’s big data platform draws on clients’ full data portfolios to offer industry-specific findings and AI-based predictions. Even when relevant data sprawls across multiple cloud servers and hard drives, Domo clients can gather it all in one place with Magic ETL, a drag-and-drop tool that streamlines the integration process.
MongoDB doesn’t force data into spreadsheets. Instead, its cloud-based platforms store data as flexible JSON documents — in other words, as digital objects that can be arranged in a variety of ways, even nested inside each other. Designed for app developers, the platforms offer of-the-moment search functionality. For example, users can search their data for geotags and graphs as well as text phrases.
Civis Analytics’ cloud-based platform offers end-to-end data services, from data ingestion to modeling and reports. Designed with data scientists in mind, the platform integrates with GitHub to ease user collaboration and is purportedly ultra-secure — both HIPAA-compliant and SOC 2 Type II-certified.
Alteryx’s designers built the company’s eponymous platform with simplicity and interdepartmental collaboration in mind. Its interlocking tools allow users to create repeatable data workflows — stripping busywork from the data prep and analysis process — and deploy R and Python code within the platform for quicker predictive analytics.
Zeta Interactive’s Marketing Platform
This platform from Zeta Interactive uses its database of billions of permission-based profiles to help users optimize their omnichannel marketing efforts. The platform’s AI features sift through the diverse data, helping marketers target key demographics and attract new customers.
This software-only SQL data warehouse is storage system-agnostic. That means it can analyze data from cloud services, on-premise servers and any other data storage space. Vertica works quickly thanks to columnar storage, which facilitates the scanning of only relevant data. It offers predictive analytics rooted in machine learning for industries that include finance and marketing.
Treasure Data’s customer data platform sorts morasses of web, mobile and IoT data into rich, individualized customer profiles so marketers can communicate with their desired demographics in a more tailored and personalized way.
Actian’s cloud-native data warehouse, which debuted in March 2019, was built for near-instantaneous results — even if users run multiple queries at once. Backed by support from Microsoft and Amazon’s public clouds, it can analyze data in public and private Clouds. For easy app use, the platform comes with ready-made connections to Salesforce, Workday and others.
Born out of the open-source Greenplum Database project, this platform uses PostgreSQL to conquer varied data analysis and operations projects, from quests for business intelligence to deep learning. Greenplum can parse data housed in clouds and servers, as well as container orchestration systems. Additionally, it comes with a built-in toolkit of extensions for location-based analysis, document extraction and multi-node analysis.
Hitachi Vantara’s Pentaho
This data integration and analytics platform streamlines the data ingestion process by foregoing hand coding and offering time-saving functions like drag-and-drop integration, pre-made data transformation templates and metadata injection. Once users add data, the platform can mine business intelligence from any data format thanks to its data-agnostic design.
This intelligent, in-memory analytics database was designed for speed, especially on clustered systems. It can analyze all types of data — including sensor, online transaction, location and more — via massive parallel processing. The cloud-first platform also analyzes data stored in appliances and can function purely as software.
IBM’s full-stack cloud platform comes with 170 built-in tools, including many for customizable big data management. Users can opt for a NoSQL or SQL database, or store their data as JSON documents, among other database designs. The platform can also run in-memory analysis and integrate open-source tools like Apache Spark.
Users can import data into MarkLogic’s platform as is. Items ranging from images and videos to JSON and RDF files coexist peaceably in the flexible database, uploaded via a simple drag-and-drop process powered by Apache Nifi. Organized around MarkLogic’s Universal Index, files and metadata are easily queried. The database also integrates with a host of more intensive analytics apps.
Though it’s possible to code within Datameer’s platform, it’s not necessary. Users can upload structured and unstructured data directly from many data sources by following a simple wizard. From there, the point-and-click data cleansing and built-in library of more than 270 functions — like chronological organization and custom binning —make it easy to drill into data even if users don’t have a computer science background.
The largest public cloud provider in China, Alibaba operates in 24 regions worldwide, including the United States. Its popular cloud platform offers a variety of database formats and big data tools, including data warehousing, analytics for streaming data and speedy Elasticsearch, which can scan petabytes of data scattered across hundreds of servers in real time.