There is a lot of noise in the industry about Big Data and Big Data analytics. If you are just starting to look at Big Data solutions, it can be difficult to wade through vendor claims, especially as they relate to IT operations and security. In order to understand whether a legacy solution is simply being rebranded or if you’re making an investment in technologies that can truly give you a competitive advantage, you can put them to the REAL Big Data test. Evaluate each solution by these criteria:
Reliable: the entire Big Data framework, from data-in-transit to data-at-rest, must be reliable and robust to prevent data loss. You need to know that you are collecting all the data you send into your Big Data solution in order to trust any of it.
Extensible: the solution must be built on an open architecture so you always have access to your data using industry standard tools, so you can integrate reference data, and so you can easily share data in a controlled manner.
Analytics: whether you need real-time streaming analytics today or will get there eventually, you want a solution that provides statistical modeling, data visualizations, and “data replays” for model testing – all as data arrives and implemented using industry standards.
Limitless: your long-term data needs are hard to predict, but you know they will “grow quickly”, which means your solution should be able to collect data at your current rate of generation and at least two orders of magnitude more to be ready for future needs (if you’re at 1TB/day today, look for a solution proven at 100TB/day). Similarly your solution must be able to keep petabytes of data online. The third critical capacity is response time for data analysis; the data you need should be available in seconds, no matter how old it is.
When we founded Rocana, we went through intense technical diligence in selecting the Big Data components underlying our solution and applied REAL to our architecture decisions in order to ensure that our solution meets customer needs today and in the future. In this post I’ll give you more tangible examples of how we applied REAL to Rocana Ops.
Rocana is built is on a robust data bus and a high fidelity store. Fundamentally good choices at the architectural level mean:
we can guarantee delivery of events as soon as they are seen by our collection system (agents or APIs)
we can record and store first class metrics (CPU usage, memory statistics, disk and network activity) at any interval the user requires (default is every five seconds)
we can keep data virtually forever, and it is always online and accessible (limited only by cost of storage)
We encourage users to collect all data and keep it for as long as they need, which may be for anything from compliance reporting to testing analytical models to unforeseen business needs. When deciding how long to keep data and at what granularity, Rocana users only need to consider the cost of storage. We price our software based on the number of users, not the amount of data, so our customers can collect and store what they need in order to ensure their business is operating successfully. We invest ongoing R&D to reduce storage requirements and we work with market leaders in servers and storage to make sure our customers can take advantage of the increasing efficiencies in storage and power density.
Reliable also means low latency data delivery. Data that arrives at some “undetermined time in the future” is about as useful as data that never arrives. At Rocana we use three definitions of “real time” when discussing customer needs:
High Frequency Trading Real Time -- sub-millisecond
Human Real Time -- sub-second
Near Real Time -- within 15 seconds
Anything outside of a -15 second window is simply not real time by any reasonable definition of the term. At Rocana we deliver logs and metrics (analyzed, searchable and viewable) in Human Real Time. We generally see queries return in either Human Real Time or Near Real Time, depending on complexity. No query ever take hours.
While operational event data is structured differently than transactional data, at its core it is no different than the transactional data that has been powering businesses for decades, and it can similarly be used in a nearly infinite number of ways. Out of the box, Rocana Ops allows sysadmins and security operators to take advantage of operational data in powerful ways with purpose-built analytics, anomaly detection algorithms, and rich visualizations that are suited for IT operations and security forensics.
Data collection methods must be flexible enough to support the incredible variety of sources in a global enterprise. While syslog is extremely important, it is just one of many sources. Beware of systems that can only onboard data from sources with formats that are known in advance. Rocana has a flexible event schema allows the system to capture new event types without requiring configuration within the platform prior to receiving data. Both raw and extracted data can be retained within the event, allowing for easy reprocessing in case additional operations are required later. Our architecture white paper is an excellent resource for learning more about the design considerations around event data collection and management, and a recent blog entry by CTO Eric Sammer takes a deep dive into our innovative event schema.
Also, we don’t assume that IT Operations and security are the only uses of operational data. Our customers access this data for a variety of purposes, including improving billing systems, for capacity planning, and performing churn analysis. Rather than try to be all things to all people (a recipe for startup disaster), we built mechanisms into Rocana Ops so our customers can manage, access, and control their own data. Because we leverage open source tools, we also made a very conscious choice to also maintain open file formats and schema specifications. This means if you have a team of data scientists that use R or SAS investigating churn optimization or fraud, web developers looking to create their own search mechanisms, or business analysts interested in using the familiar SQL search syntax, we’ve got them all covered – at no extra charge, out of the box.
Rocana Ops o provides a mechanism to create streams of data for sharing via a publish/subscribe model. The use of custom, real-time data streams through Rocana Ops has no impact on users’ ability to continue to search and analyze data. Our customers use the publish/subscribe capability in Rocana Ops in two ways:
shipping data to spreadsheets and BI dashboarding tools
forwarding data to purpose-built analytic tools that are not REAL Big Data systems, but serve legacy purposes and require a subset of operational event data.
With Rocana Ops extensibility, our customers minimize license costs and reduce TCO for tools that were never designed for REAL Big Data but have been jerry-rigged to address Big Data needs.
In the IT and security spaces, we encounter many tools that let users search their data. Search, by definition, is not analysis. Though powerful when you know what you’re looking for, search only helps you locate data, while leaving you responsible for the analysis. A search-based approach requires, at a minimum, that each user have a general idea of where or when an event happened in order to retrieve it and begin analysis. Legacy systems that have tried to build analytic applications on top of search-based paradigms have largely failed. The batch-oriented nature of search is the polar opposite of streaming analytics and compromises time-series analysis. Search-oriented systems also struggle with true “sliding windows” and out-of-order event sequences.
When building Rocana Ops, we realized that sysadmins and security operations can no longer maintain tribal knowledge of all the applications, services, systems, and their interdependencies, which means that they can no longer simply search for problem areas. To address this issue, we apply analytics, machine learning, and visualizations to show users anomalous events in their datacenters. In effect, we sort the acres of grass into searchable haystacks and provide context, which allows even a novice Site Reliability Engineer to become a powerful operator.
Rocana Ops’ out-of-the-box functionality includes anomaly detection. Here’s a quick synopsis from a more detailed blog entry on how our anomaly detection engine works:
“We use historical data to construct a quantitative representation of the data distribution exhibited by each metric being monitored. New data points are compared against these representations and are assigned a score. A decision is made on whether the new data point is an anomaly, based on a threshold we derive from recent observations of the data. One of the key advantages of this approach is that the thresholds are not static, but rather evolve with the data.”
The ability to perform real-time analysis against many high-volume data feeds is a critical requirement for both IT operations and security use cases.
What is the definition of “Big”? When people talk about operational data (logs, metrics, network diagnostics, application instrumentation, etc), they tend to view “Big” through a historical lens. This means they look at what they’re collecting now and bucket themselves generally into one of three camps: sub-100GB per day, up to 1TB per day, and greater than 1TB per day. The issue is that what you’re collecting now, versus what you should be or will be collecting, are two very different things. Limited by both legacy system scalability limits and onerous per-byte or per-CPU pricing schemes, companies continue to struggle to gain visibility and control of their infrastructure with these aging systems.
For Rocana, none of the current sizing buckets equates to “Big.” Our definition of Big is “two orders of magnitude greater than what you need today”. If you’re collecting 100GB/day, you need a system that can handle 10TB. Are you at 1TB/day today? Look for a 100TB/day solution. We built our system to ingest over a petabyte per day on modern technology, and we do it on commodity hardware. Collecting data volumes such as these is no longer a luxury but a necessity. In order to triage IT and security issues that may overlap dozens or hundreds of systems, including virtualized applications, network, storage, and compute tiers, you can no longer afford to cherry pick data into siloed monitoring applications. Everything operational must be stored into a single repository that delivers true analytics and not just search.
Rocana has taken a Big Data approach to the problems of IT and security operations, building on multiple execution engines, combining a wide range of data sets, and developing algorithmic techniques that are tightly coupled with out-of-the-box visualizations. When you consider the compute power required to index, ingest, and apply dozens of anomaly detection techniques in Human Real Time on an infrastructure that contains thousands of machines, tens of thousands of VMs and containers, arrays of critical network devices, and millions of loglines and metrics per hour, you realize why this is not a problem that can be addressed on a single, rack-mounted appliance, with a procedure as limiting as search, or using a platform built for a bygone era.
At Rocana, our pedigree is Big Data -- all of the founders of Rocana were very early Cloudera employees and executives -- and we spent the past five years of our lives helping the largest organizations in the world solve real-world business problems using modern Big Data technologies such as Apache Hadoop and the Hadoop ecosystem. We created Rocana in direct response to the customer demand we saw while working as part of the Cloudera services team.
Because we are architected from the ground up for real-time streaming, long-term full fidelity data retention, high-speed search, and machine learning and analytics, we can extract value from machine data that our competitors simply cannot. A primary example would be converting a “saved search” to the Big Data version of a materialized view. What this means for the user is that their hundreds or thousands of saved searches no longer have any impact on the search subsystems, so they never slow down regardless of the scale of data or level of detail.
Rocana Ops ends the constraints on IT staff’s ability to execute because of scalability limits or onerous pricing models. The scale and complexity of modern infrastructures require real-time analytics and visualizations. In turn, this requires a platform designed to capture, store, and analyze the totality of your operational data. While that may have been measured in gigabytes a few years ago, is already reaching 10’s of TBs per day for many organization and growing quickly. Soon, a petabyte of operational event data under management will be commonplace. That’s REAL Big Data.
To learn more about apply REAL to your Big Data initiatives, join us for an informational webinar where we will walk you through the REAL approach to make sure you meet all the criteria when evaluating a Big Data solution.