Blog Post

New Solutions Enabled by Big Data

February 27, 2014 Author: Omer Trajman

We recently described how new technologies in data collection, data processing and sensor systems are leading us into a world beyond search. These systems were built based broadly on concepts developed by Google to revolutionize search. In response to our post, Doug Cutting was quick to point out that search is far from dead and we completely agree. What we are seeing is not a death of search but new architectures that combine search with other would-be dead technologies like SQL and stream processing together with new capabilities like machine learning to deliver dramatically better solutions to consumers and enterprise users alike.

Underscoring the importance of creating new solutions are recent revelations such as those by Neiman Marcus that the malware infiltrating their systems triggered tens of thousands of log notifications, which was a small fraction of all notifications. For months, the IT team missed critical clues that an attack was underway because the signal was so small relative to the noise of normal operations. As the efficiency of storage increases and the cost of processing decreases, we can now deploy systems that take advantage of multiple types of processing to sift through the noise and deliver the critical information that users need to see. By tying together a cross functional set of capabilities, applications that transparently take into account user context can preemptively deliver critical information or even take pre-approved actions on behalf of a user.

Applying these techniques to the world of enterprise IT can be incredibly powerful. A support desk technician can be presented with pre-assigned tickets pointing to systems that are likely to be faulty, better focusing their attention. Consider how Cloudera uses multiple engines to create a better support system. What these examples have in common is that data is both indexed in a search system and stored efficiently in a distributed query-able format. Regardless of whether a user searches for a keyword or drills down from an aggregated view, these two engines are used in concert to deliver both the answer to a request and useful contextual information.

Solutions like these are possible today because of the economics of data storage and processing unlocked by technologies like Hadoop. It is becoming clear that future enterprise applications are following in the stead of consumer tools like Google, just as the next generation of enterprise infrastructure based on how Google created their infrastructure is being widely adopted. New applications will take advantage of all of these systems available to collect, search and query data automatically, and deliver ever more powerful solutions to enterprise users.