Blog Post

A Post Search World

February 12, 2014 Author: Don Brown

Over the last few years, we’ve witnessed the rise of virtual personal assistants for our everyday lives. Most prominent are technologies such as Apple Siri and Google Now. The movie Her explores a world where someone falls in love with his ever present electronic assistant. What’s been lost as the general public gets wrapped up in the novelty of these new tools, is that these personal assistants are not orthogonal or auxiliary to traditional search, but are being slowly introduced as an eventual replacement for the common search query. Three forces, in conjunction with one another, are moving us into a “post-search” world.

The first of these forces is the technology developed to handle the exponential growth of the Internet over the last twenty years. The data collection tools created to reliably monitor and maintain the infrastructure that supports today’s Internet can ingest, transform and store petabytes of data per day. For the first decade of this century, organizations focused on using these new systems to capture operational data including log files from web servers, databases, application servers and the transactions and reference data required to analyze those logs.

As a consequence of this new ability to store all this data, processing frameworks have steadily improved. It seems like nearly every week we hear of a new product announcement or a benchmark for an existing product being shattered. Where we started with the batch oriented MapReduce programming paradigm, we now see technologies like Impala, Spark and Tez pushing us inexorably closer to being able to analyze everything in “human real time.”

Now with the emergence of the Internet of Things and the Quantified Self, the stage is set for a truly remarkable new wave of data driven products. For example, imagine a world where a watch that detects your vital signs notices an arrhythmia (Basis), uses a dataset to classify this as high risk for your genetic profile (23andMe), then polls another dataset to find a specialist based on location, age and cost (Kyruus), and within seconds books the appointment at an opportune time based on availability (ZocDoc). Finally, this application reminds you at an appropriate time to leave for that appointment (Google Now). It’s readily apparent in this consumer focused example that search would be a limitation, not a feature of the system. As data volumes grow and disparate datasets need to be incorporated, search no longer suffices as a primary interface.

While many people are actively working on bringing consumer ideas like the above to fruition, very few are approaching it from the enterprise perspective. Consider if, rather than relying on standing meetings with multiple groups or fault alerts on dashboards and spreadsheets, a security analyst at a bank can simply sits down and, based on his or her role, be presented with the most important task to work on at any given moment. For this security analyst, this task may require displaying the relevant dashboards and cross-correlating web logs, IRC chatter, botnet traffic, and firewall logs in order to preempt a burgeoning DDOS attack. If that same analyst were on the phone, rather than presenting them with logs, such a system would initiate a call with their NOC/SOC instead. The amount of aggregate knowledge that can be incorporated in code and applied in real time is now so much more powerful and less error prone than any human system could be. An even more sophisticated system would make recommendations on possible solutions and ways to prevent future attacks, or even begin to take pre-approved actions on its own.

These two far-ranging examples demonstrate how search is but one tool out of many now available. Moreover, as datasets become larger and more numerous, accomplishing even menial tasks using search alone becomes extremely difficult. Interestingly, the technology used in building these tools is actually considerably more advanced than what’s been exposed thus far, but there is significant concern over the “creepiness” factor. Put another way, allowing technology to make suggestions or inferences based on all the data we’ve collectively put on the Internet is going to expose our personal lives in new ways, often in unexpected or even undesirable ways. There will be a slow adoption curve in this space, as the public’s perception and willingness to be analyzed to the Nth degree catches up to this new ability, even in open source tools.