Matt Asay recently wrote a great piece last week on why Splunk should not exist in a world of open source. His logic is spot on, but his timing is off. As Matt astutely concludes, sometimes just being open is not enough to succeed. He can remove the word “sometimes” from that claim. As Mike Olson has said repeatedly, open is not enough. Splunk built their solution with a singular purpose and has turned that focus into a remarkable time-to-value proposition for log data mining.
The cadence of development is different for a platform than a single purpose application
In the other corner is a suite of Big Data technologies. For brevity we’re going to use Hadoop as the umbrella term for a collection of related technologies rather than Hadoop the project. Hadoop, it turns out, is not just a one trick pony. Over the years, Hadoop has been built to solve a broad suite of problems and is being adopted across a wide variety of industries. Hadoop is disrupting data-warehousing, analytics, ETL and yes, also log management. Addressing all of these problems means a longer path to broad adoption. It also takes longer to stabilize an open platform to the point where developers can start building applications.
It took Splunk 10 years to build their solution
Looking at what was involved in creating Splunk compared to creating Hadoop, Splunk has been singularly focused on collecting and searching log files (where timeseries is a required dimension) for over a decade. The first attempt to commercialize Hadoop began less than six years ago, with an initial focus on indexing web pages and evolved from there. The reality is that Hadoop technology has finally caught up with Splunk with the growing stability of SolrCloud, ElasticSearch, Flume, Stinger, Impala and others. All of these building blocks are required to assemble a viable alternative to Splunk. The stage is set for open source to deliver a solution that not only meets but exceeds Splunk's functionality in short order.
First mover isn’t always an advantage
Splunk came to market with a very well built solution at a time when the burden of log management on IT was just starting to become unmanageable with existing tools. Folks who compare Splunk to open source seem to be missing the point that Splunk is a polished solution that reflects the tail end of an old ecosystem. With a singular focus and lacking a platform on which to build, Splunk created the right solution for the right time, but that time was pre-Big Data. Open source is driven by communities, it’s messy, it requires critical mass of interest and it is constantly adapting to the changing landscape. Open source Big Data has required time to bake before it could be used to tackle the problems that Splunk solves as elegantly as Splunk solves them.
Open source is just getting started
We’re at a crossroads today mature legacy products are competing with a fundamentally new ecosystem. Open source for Big Data has been taking off and yet it is still dwarfed by the mature technologies it aims to replace. Hadoop in particular just started crossing the chasm last year. Over the next few years, the application ecosystem for open source will emerge and the products built on Hadoop will be able to leverage more powerful technologies to deliver solutions that are superior to Splunk and other well-built legacy software.
Matt is absolutely correct that in a world of open source big data, Splunk should not exist. Like many of us early adopters, Matt is living in the future. When the rest of the world catches up, we will all realize how right he was.