Blog Post

Beyond Open Source Splunk: Part 1

    
March 20, 2017 Author: Omer Trajman

As usual, Matt Asay nailed it when he wrote “Why Splunk keeps beating open source competitors” over at InfoWorld. I shared some of my observations with him for the article: for example, open source software tends to come as a box of parts and not as a complete solution, while most of the dollars being spent on Splunk are from organizations that need a complete solution and don't have the time or the talent to build a solution on their own. Further, as Matt uncovers, Splunk is used (or misused) in a wide range of small-scale pockets for ad-hoc search.

open-source-splunk1.jpg

Digging a little deeper, I thought it would be informative to share how customers are adopting open source and out-of-the-box products like Rocana in addition to their Splunk deployments. As Nancy Gohring, Senior Analyst from 451 Research points out in her recent report “Big data, machine learning shape performance-monitoring developments,” the clear market need is not just for an open source version of Splunk, but a centralized event warehouse, built on an open source data lake, and delivered as a complete, enterprise-grade product.

Rocana is Born

For some context, three years ago when Matt was penning his article “In a world of open source big data, Splunk should not exist,” we were coming to the same conclusion. We had spent years creating bespoke “Splunk-on-Hadoop” for our customers at Cloudera. As a result, Eric Sammer (now our CTO), Don Brown (now our CIO), and I decided to productize this solution, building entirely on an open source foundation. This is what became Rocana Ops.

One critical lesson we learned from Cloudera Chief Strategy Officer Mike Olson is that pure open source is a bad way to build a business. We dug into the core tenants of where open source could not only replace Splunk, but deliver 100x the price performance. Then we complemented it with purpose-built software containing features like role-based access controls, disaster recovery, embedded machine learning, and more that the open source ecosystem had not created.

Three Pieces of the New Operational Monitoring Puzzle

From day one, we realized that there were three critical pieces of this puzzle:

  1. Affordable at petabyte scales
  2. Open formats and integrations
  3. Out of the box analytics and visualizations

Our customers run Rocana Ops at tens of TBs a day, cost effectively, without breaking a sweat. They collect 100TB a day on just $1M of hardware and keep petabytes online available for query and analysis. We keep all of this data in open formats and give our customers secure, open access to their data. Our customers don’t want to lock themselves into software that forces proprietary formats and APIs or locks them out of their own data.

Entirely on their own, our customers have connected Rocana to collect data from sources they would not have attempted to feed into Splunk, including Tanium, Juniper, BlueCoat, Cloud Foundry, OpenStack, AppDynamics, FireEye, and more – as well as proprietary systems. Fortune 50 customers are using Rocana to store full-fidelity events and filter just what they need to downstream systems including Splunk ES, ArcSight and QRadar.

Open source has helped us deliver a solution that Splunk has not yet been able to – a solution that the market is desperately looking for.

This is the first in a series of two blog posts. In my next post, I take a closer look at how Rocana and Splunk are used together today, and the evolution of open source competition in the market.