It's unclear when plain old “data” became “big data," but the latter term probably originated in 1990s Silicon Valley pitch meetings and lunch rooms. What's easier to pinpoint is how data has exploded in the 21st century.
The total amount of data recorded until 2003 was five exabytes, or one quintillion bytes. (A quintillion is a million, cubed.) In 2011 alone, recorded data weighed in at 1.8 zettabytes — almost a thousand times more. By 2020, according to one estimate, humans will produce on average 1.5 GB of data per day. Multiply that by 365 days and then again by a good chunk of the world's 7.5 billion-person population, and the volume is almost unfathomable immense.
Big Data Analytics Platforms To Know
- Microsoft Azure
Because the persistent gush of data from numerous sources is only growing more intense, lots of sophisticated and highly scalable big data analytics platforms — many of which are cloud-based — have popped up to parse the ever expanding mass of information.
We’ve rounded up the 31 big data platforms that make petabytes of data feel manageable.
Location: Redwood City, Calif.
What it does: The Cloud-native Sumo Logic platform offers apps — including Airbnb and Pokemon GO—three different types of support. It troubleshoots, tracks business analytics and catches security breaches, drawing on machine learning for maximum efficiency. It’s also flexible and able to manage sudden influxes of data.
What it does: Users can analyze data stored on Microsoft’s Cloud platform, Azure, with a broad spectrum of open-source Apache technologies, including Hadoop and Spark. Azure also features a native analytics tool, HDInsight, that streamlines data cluster analysis and integrates seamlessly with Azure's other data tools.
Location: Palo Alto, Calif.
What it does: Rooted in Apache’s Hadoop, Cloudera can handle massive amounts of data. Clients routinely store more than 50 petabytes in Cloudera’s Data Warehouse, which can manage data including machine logs, text, and more. Meanwhile, Cloudera’s DataFlow—previously Hortonworks’ DataFlow—analyzes and prioritizes data in real time.
SHYFT Data and Analytics Platform
What it does: SHYFT designed its data analytics platform with the life science industries in mind. While keeping patient privacy in mind, the HIPAA- and PII-compliant tool automatically finds, imports and runs analytics on hundreds of data streams, blending them into a cohesive whole. The platform’s quick and slick data visualizations help users uncover unexpected correlations between datasets.
Location: Mountain View, Calif.
What it does: Google Cloud offers lots of big data management tools, each with its own specialty. BigQuery warehouses petabytes of data in an easily queried format. Cloud Dataflow analyzes ongoing data streams and batches of historical data side by side. With Google Data Studio, clients can turn varied data into custom graphics.
What it does: Sisense’s data analytics platform processes data swiftly thanks to its signature In-Chip Technology. The interface also lets client build, use and embed custom dashboards and analytics apps. After a recent merger, Sisense is poised to combine its platform with Periscope Data’s. The merger will allow users to simultaneously comb data repositories with SQL, Python and R.
What it does: Designed to accommodate the needs of banking, healthcare and other data-heavy fields, Collibra lets employees companywide find quality, relevant data. The versatile platform features semantic search, which can find more relevant results by unraveling contextual meanings and pronoun referents in search phrases.
Company location: San Francisco
What the platform does: Talend’s trio of big data integration platforms includes a free basic platform and two paid subscription platforms, all rooted in open-source tools like Apache Spark. The paid platforms, though—one designed for existing data, the other for real-time data streams—come with more power and tech support. Both can clean and parse data, delete duplicate data and detect fraud automatically, among other functions.
Company location: Austin, Texas
What the platform does: The Tableau platform—available on-premises or in the Cloud—allows users to find correlations, trends and unexpected interdependences between data sets. The Data Management add-on further enhances the platform, allowing for more granular data cataloging and the tracking of data lineage.
Company location: Santa Clara, Calif.
What the platform does: MapR’s platform, which they term "dataware," has attracted customers like American Express and Samsung with its massive capacity (exabytes!) and robust security measures. But it's not a platform so much as a meta-platform—a dashboard for managing big data spread across various platforms, clouds, servers and edge-computing devices. Its interface offers users a 10,000-foot perspective on the totality of their data while letting them manage various data types in one place.
Qualtrics Experience Management
What it does: Qualtrics’ platform lets companies assess the four key experiences that define their brand: customer experience; employee experience; product experience; and the brand experience, defined by marketing and brand awareness. Its analytics tools turn data on employee satisfaction, marketing campaign impact and more into actionable predictions rooted in machine learning and AI.
What it does: This scalable cloud-based big data platform compiles and unifies data for giant enterprises, including Bank of America and Coca-Cola. Along the way, it can pull in relevant third-party data — like conversion rates and buyer behavior intel — from 1010Reveal. The searchable platform efficiently processes multiple complex queries at once.
Location: San Diego, Calif.
What it does: Teradata’s Vantage analytics software works with various public cloud services, but users can also combine it with Teradata Cloud storage. This all-Teradata experience maximizes synergy between cloud hardware and Vantage’s machine learning and NewSQL engine capabilities. Teradata Cloud users also enjoy special perks—new Vantage features, for instance, are available on Teradata’s cloud before they're available to users of other cloud services.
Company location: Westminster, Colo.
What the platform does: Oracle Cloud’s big data platform can automatically migrate diverse data formats to cloud servers, purportedly with no downtime. The platform can also operate on-premise and in hybrid settings, enriching and transforming data whether it’s streaming in real time or stored in a centralized repository, aka "data lake." The platform comes in three formats, including basic and governance editions.
Company location: American Fork, Utah
What the platform does: Domo’s big data platform draws on clients’ full data portfolios to offer industry-specific findings and AI-based predictions. Even when relevant data sprawls across multiple cloud servers and hard drives, Domo clients can gather it all in one place with Magic ETL, a drag-and-drop tool that streamlines the integration process.
What it does: MongoDB doesn’t force data into spreadsheets. Instead, its Cloud-based platforms store data as flexible JSON documents—in other words, as digital objects that can be arranged in a variety ways, even nested inside each other. Designed for app developers, the platforms offer of-the-moment search functionality. For example, users can search their data for geotags and graphs as well as text phrases.
What it does: Civis Analytics’ cloud-based platform offers end-to-end data services, from data ingestion to modeling and reports. Designed with data scientists in mind, the platform integrates with GitHub to ease user collaboration and is purportedly ultra-secure—both HIPAA-compliant and SOC 2 Type II-certified.
Company location: Broomfield, Colo.
What the platform does: Alteryx’s designers built the company’s eponymous platform with simplicity and interdepartmental collaboration in mind. Its four interlocking tools allow users to create repeatable data workflows — stripping busywork from the data prep and analysis process— and deploy R and Python code within the platform for quicker predictive analytics.
Zeta Interactive’s Marketing Platform
What it does: Designed for marketers, this platform from Zeta Interactive pulls data from three different clouds onto one dashboard. (One cloud is devoted to marketing, another to customer experience and a third to in-depth customer data culled from millions of user profiles with permission.) The platform’s AI features sift through the diverse data, helping marketers target key demographics and attract new customers.
Hewlett Packard Enterprise’s Vertica
Location: Palo Alto, Calif.
What it does: This software-only SQL data warehouse is storage system-agnostic. That means it can analyze data from cloud services, on-premise servers and any other data storage space. Vertica works quickly thanks to columnar storage, which facilitates the scanning of only relevant data. Its latest version offers predictive analytics rooted in machine learning for industries that include finance and marketing.
Arm Treasure Data
Location: Mountain View, Calif.
What it does: Treasure Data’s customer data platform sorts morasses of web, mobile and IoT data into rich, individualized customer profiles so marketers can communicate with their desired demographics in a more tailored and personalized way.
Amazon Web Services
What it does: Best known as AWS, Amazon’s cloud-based platform comes with 11 analytics tools that are designed for everything from data prep and warehousing to SQL queries and data lake design. All the resources scale with your data as it grows in a secure cloud-based environment. Features include customizable encryption and the option of a virtual private cloud.
Location: San Francisco
What it does: Actian’s Cloud-native data warehouse, which debuted in March 2019, was built for near-instantaneous results — even if users run multiple queries at once. Backed by support from Microsoft and Amazon’s public clouds, it can analyze data in public and private Clouds. For easy app use, the platform comes with ready-made connections to Salesforce, Workday and others.
Location: San Francisco
What it does: Born out of the open-source Greenplum Database project, this platform uses PostgreSQL to conquer varied data analysis and operations projects, from quests for business intelligence to deep learning. Pivotal Greenplum can parse data housed in clouds and servers, as well as container orchestration systems. Additionally, it comes with a built-in toolkit of extensions for location-based analysis, document extraction and multi-node analysis.
Hitachi Vantara’s Pentaho
Location: Orlando, Fla.
What it does: This platform streamlines the data ingestion process by foregoing hand coding and offering time-saving functions like drag-and-drop integration, pre-made data transformation templates and metadata injection. Once users add data, the platform can mine business intelligence from any data format thanks to its data-agnostic design.
Location: Nuremberg, Germany
What it does: This intelligent, in-memory analytics database was designed for speed, especially on clustered systems. It can analyze all types of data — including sensor, online transaction, location and more — via massive parallel processing. The cloud-first platform also analyzes data stored in appliances and can function purely as software.
Location: Armonk, N.Y.
What it does: IBM’s full-stack cloud comes with 170 built-in tools, including more than 20 for customizable big data management. Users can opt for a NoSQL or SQL database, or store their data as JSON documents, among other database designs. The platform can also run in-memory analysis and integrate open-source tools like Apache Spark.
Location: San Carlos, Calif.
What it does: Users can import data into MarkLogic’s platform as is. Items ranging from images and videos to JSON and RDF files coexist peaceably in the flexible database, uploaded via a simple drag-and-drop process powered by Apache Nifi. Organized around MarkLogic’s Universal Index, files and metadata are easily queried. The database also integrates with a host of more intensive analytics apps.
Location: San Francisco, Calif.
What it does: Though it’s possible to code within Datameer’s platform, it’s not particularly necessary. Users can upload structured and unstructured data directly from more than 70 data sources by following a simple wizard. From there, the point-and-click data cleansing and built-in library of more than 270 functions — like chronological organization and custom binning —make it easy to drill into data even if users don't have a computer science background.
Location: Palo Alto, Calif.
What it does: Designed for time-series data pulled from the likes of CollectD, JMX and Amazon Web Services, this platform specializes in spotting trends — and, more important, deviations from them. The latter capacity means that when something suspicious happens, users can send and receive intelligent alerts, activated by multi-dimensional criteria rather than simplistic thresholds.
Location: Hangzhou, Zhejiang, China
What it does: The leading public cloud provider in China, Alibaba operates in 19 regions worldwide, including the U.S. Its popular cloud platform offers a variety of database formats and big data tools, including data warehousing, analytics for streaming data and speedy Elasticsearch, which can scan petabytes of data scattered across hundreds of servers in real time.
Images via Shutterstock, social media and company websites.