In a previous blog post, we explained why we built Rocana Search: enterprise monitoring teams require full text search to address problems in real time, and retaining terabytes of data per day over the course of many years can bring general purpose search systems to their knees. The custom sharding model in Rocana Search meets those massive scalability demands with no throughput drop as data continues to pile on.
Let’s dive into some detail on how this modern search system works...
Rocana Search uses four major open source components:
- Apache Lucene
- Apache Hadoop HDFS
- Apache Kafka
- Apache Zookeeper
Rocana Search processes event data, where each event is typically a log file line plus metadata like the timestamp, host, service name, and more. For the deep dive, you can read about the Avro schema here which is part of our open source project, Osso.
Rocana Search indexes each event in Lucene so it's available for full text search. It also categorizes on several event attributes, such as hostname, OS service, etc. These facets enable grouping, which lets users focus on just a portion of the data – for example, the log events for one particular machine in the data center.
Rocana Search stores the Lucene indexes on HDFS, the Hadoop distributed file system, providing redundancy. We leverage that automatic redundancy as part of our failover model, where Rocana Search transparently handles a node failure and automatically reincorporates nodes when they come back up. This creates a resilient system that doesn’t need its hand to be held in the face of failures.
Rocana’s lightweight agent collects log data on individual servers and places it in Kafka, where it becomes the input data to Rocana Search.
Finally, Rocana Search uses ZooKeeper for discovery of nodes and leader election. This means no nodes require special configuration. Any Rocana Search node can dynamically become the leader, which is helpful if the leader node fails.
Rocana Search runs on every data node in your Hadoop cluster. Each Rocana Search node reads new data out of Kafka, off the same Kafka topic (which is analogous to a queue). These Kafka topics get further broken down into partitions. An analogy for Kafka partitions is a list of people’s names, in which one partition might be those starting with A - C, another D - F, etc.
To prevent Rocana Search nodes from sitting idle, the Kafka topic needs enough partitions – at least one per Rocana Search node (see Figure 1 below) – but to improve ingest throughput we often recommend twice that many.
Rocana Search reads the data from Kafka and instructs Lucene to index it. Lucene writes the index to an HDFS directory, typically partitioned by day, then further subdivided into “slices” of the day. There’s exactly one slice per Kafka partition, ensuring Rocana Search nodes don’t step on each other’s toes. Here’s an example HDFS path:
Rocana Search uses a divide-and-conquer approach to execute queries. Each data node in the Hadoop cluster runs Rocana Search, and each Rocana Search instance is really two query components in one:
- A query coordinator, which accepts the original query and responds with the final result
- A query executor, for running query fragments: a small part of the original query
You can send a query to any Rocana Search node since any can act as a query coordinator. A query coordinator breaks the original query into smaller query fragments, which are portions of the query. Each fragment has the same query parameters – such as filters, limit, and sorting – as the original query, but they search just one slice of the data.
The coordinator fans the fragments out to every query executor, which executes the query fragment on the slice of data it owns (see Figure 2 below). This approach provides data locality for faster queries. When finished, the query executors return their results to the coordinator, which combines all results into a single response back to the original client.
Rocana Search is a modern, time-oriented search system, built on open source components and scalable to meet the in-depth monitoring demands that large enterprises now require. General purpose search systems (e.g. Solr) share a few similarities to Rocana Search’s architecture, but a key differentiator – the dynamic partitioning model – enables years of data retention instead of just months.
Stay tuned to our blog as we continue to publish more technical details on the unique architecture and features that make Rocana Ops the go-to choice for large enterprises seeking to turn total visibility into immediate action!