It’s been striking that as the Big Data space has come into its own over the past several years, none of the vendors, buyers or analysts of Big Data technologies have figured out how to reconcile existing enterprise software pricing models with massively distributed systems.
Having served as one of the early Solutions Architects at Cloudera, later running pre-sales and services teams at Cloudera and Wibidata, I’ve advised hundreds of organizations on their Big Data strategies and have observed various experiments in pricing. Though some models worked in the heyday of proprietary software and the early days of open source, as large scale open source Big Data systems become more common, the industry in struggling to balance cost and value across the solution stack.
Consider a few of the most common pricing options and their relative merits:
Pay per machine...twice
Most Big Data vendors sell their software “by the node” with the reasoning that as machines get more powerful, customers get more value for the same dollars. The big advantage for the vendor is obvious and rational. Since these contracts are based on subscription software license and support, when cluster sizes grow, support costs scale proportionally. As it turns out, this is not completely true. While there is clearly a difference in support costs between a 10-node cluster and 100-node cluster, the gap is not nearly so large when comparing 100 nodes to 1000.
When software infrastructure costs scale linearly with hardware costs, there are diminishing benefits to the customer who is paying twice for each machine and to the vendors, who are creating a disincentive for customers to explore new use cases. Early on, usage of Big Data clusters would stagnate and vendors fought for shrinking renewal contracts as the cluster support costs dropped year over year. Big Data vendors have since compensated for this by adding management software and introducing discount schedules. Unfortunately application vendors are starting to pick up on the same model and now customers are starting to scratch their heads at paying three times for the same hardware.
Pay per compute power
The per CPU model used by early database and middleware vendors is the least favorite among buyers. Though it worked well in the early days of single core CPUs, more recently vendors have needed to continuously redefine what a CPU is. Vendors have developed an expertise in coming up with creative multiplier schemes to address any new multithreading or core innovations that reduce the number of CPUs needed to solve the same problem. As a result, software buyers must closely track changes in CPU technology and map to ever changing pricing.
This becomes obvious when observing how some companies try to game the per-CPU pricing model. Companies exploit per-CPU pricing by creating two systems: one extremely storage dense system with virtually no processing power that provides cost effective long term storage and one much smaller system where the actual CPU work is performed. The smaller system is loaded up with high end CPUs and designed to hold data for short periods of time and minimize overall costs. The level of ongoing effort in architecting and maintaining this two-system orchestration for tens or hundreds of applications clearly demonstrates the adversarial relationship this pricing model creates between buyer and vendor.
Pay per byte
Originally conceived as a tiering system for data management hardware appliances, which were sold in varying capacities, pricing per byte stored or ingested is gaining steam with application vendors. There are clear advantages to this scheme for organizations that can easily quantify the value of their data. Buyers have 100% confidence that they are only paying for what they use. Buyers can also determine how much they need by figuring out what use cases they want to implement, quantifying the data required to support those uses cases and forecasting growth.
When considering legacy software, the per-byte model works well, yet it becomes a liability when applied to Big Data. The buyer loses all economy of scale advantages gained by advances in hardware and infrastructure software, while application vendors continue to accrue price leverage with each passing byte. This model is particularly popular with SaaS vendors who store their customers’ data and can therefore justify per-byte pricing as passing on their own costs. Yet the price per byte is marked up significantly. There’s a reason SaaS model company valuations have skyrocketed relative to their non-SaaS peers the last few years.
The buyer’s pricing disadvantage in the per-byte model is further exacerbated because the long term costs are rarely apparent at the outset. The buyer gradually sinks deeper and deeper into the per-byte quicksand, not realizing that there is a point at which their ROI goes negative. The biggest drawback to the per-byte model is that it makes the assumption that all bytes are created equal. From the vendor perspective, this is hugely beneficial as what inevitably happens is the buyer creates business cases that inflate perceived ROI in order to justify using the software. This leads to an artificial inflation of value assigned to both the data and the use case. Perversely, this inflation serves as further justification for higher per byte pricing, creating a vicious feedback loop.
Much needed changes in pricing
The past ten years of advances in hardware and software have presented an opportunity to completely rethink how enterprise infrastructure software is priced and bought. Marc Andreessen has been popularizing the meme that “Software is eating the world” and there is a bit of additional nuance to the argument. The open source movement has shifted much of the value in software away from core infrastructure (databases, webservers, application servers) to the applications based on domain specific knowledge. This means that many of the infrastructure costs that were formerly dominant in enterprise software have diminished to the point that they are a small fraction of the total cost of ownership.
A reasonable model, given these conditions, is to base pricing on the cost of equivalent functionality. This is the tradeoff companies are increasingly making when they buy an open source platform and build (or contract out) a functional solution. As applications begin to emerge in the Big Data space they become viable alternatives for buyers who would otherwise build and maintain their own in-house applications. The cost of building and maintaining an application sets the pricing threshold for a vendor. This model is completely decoupled the from the infrastructure costs or the data that is used, giving buyers the opportunity to fully benefit from running on Big Data.