This chapter will go over basics of Lucene and search, and give details of basic query structures in Lucene along with the different data structures and types in Lucene which can be diverse in application and usage.
1. What Is Search, Anyway?
2. Meet Lucene
3. Types of Structures In Lucene
4. Query Types -- Done The Lucene Way
5. Lucene Vs Relational Databases
Chapter 2: Hello World -- The Lucene Way - 10 pages
This chapter will try out a few basic Lucene queries on a standard data set. User will index some standard data set and query different types of queries on top of it. The user will explore scoring, document level boosting and queries like TopN hits, uses of Collectors.
1. Index Data In Lucene
2. Internals of a Lucene Index
3. Scoring and Boosting
4. Doing your first query
5. TopN Hits -- Why Should I Care About the 100th Hit?
6. Collectors -- The Life Of Your Application
Chapter 3: Build A Personal Desktop File Searcher - 40 pages
This chapter will go over details of building a file searcher using Lucene which will have the capability to search across the entire file system of the user’s computer and provide search results to the user for relevant documents and files given a partial or complete keyword.
1. Basics of Document Searching with Lucene
2. Partial Searches and Matching
3. A Bit About TF/IDF
4. Build The Core of Our Searcher
5. Building the File System Seek and Search Functionality
6. Bringing It All Together
Chapter 4: A Bit About Spatial Indexing - 20 pages
Basics of Spatial Indexing and space vectors. The chapter will cover spatial indexing and querying in Lucene and advanced level details of N dimensional indexing and searching.
This chapter will go over details of building a location aware search engine with representative data set and allowing location constraints to be specified during a search.
1. What is Location Aware Searching?
2. Representing Data As Spatial Data
3. Metadata Searches
4. Combining Searches -- Actual Text and Location Combined
Chapter 6: Create a Text Classifier with Apache Mahout and Lucene - 30 pages
This chapter will go over building a classifier using Apache Mahout, a popular Machine Learning framework and Lucene.
1. What is Mahout?
2. What is a Text Classifier Engine?
3. Building The Model in Mahout
4. Building the Parser in Lucene
5. Bringing It All Together
Chapter 7: Performance Tuning Your Lucene Applications - 15 pages
Performance is key to any search applications and small changes to the application can cause amplified changes to the performance of the application. We will performance benchmark applications, learn common pitfalls and learn best practices to tune performance in search applications with Lucene.
1. Lucene Performance Basics
2. Performance Bench-marking
3. Lucene Performance Tuning
4. Lucene Performance with System Performance Tools
Chapter 8: Your First Lucene Patch - 15 pages
This chapter will focus on building your first patch to the heart of the engine itself. We will go through the cycle of writing a patch, testing it, adhering to community code standards, JIRA navigation, community interaction etc.
1. Lucene Internals
2. Working with Git
3. Writing a Patch
4. Test Test Test!
5. Opening a JIRA for your issue
6. Community Interaction
Atri is a distributed systems engineer with expertise in building and scaling large data oriented systems, and an Apache Lucene/Solr committer. He has worked for Microsoft, where he was responsible for scaling the storage and query engines for Azure CosmosDB. He is also a long time PostgreSQL contributor and an Apache committer and PMC member for HAWQ, MADLib, and Apex.
Gain a thorough knowledge of Lucene's capabilities and use it to develop your own search applications. This book explores the Java-based, high-performance text search engine library used to build search capabilities in your applications.
Starting with the basics of Lucene and searching, you will learn about the types of queries used in it and also take a look at scoring models. Applying this basic knowledge, you will develop a hello world app using basic Lucene queries and explore functions like scoring and document level boosting.
Along the way you will also uncover the concepts of partial searching and matching in Lucene and then learn how to integrate geographical information (geospatial data) in Lucene using spatial queries and n-dimensional indexing. This will prepare you to build a location-aware search engine with a representative data set that allows location constraints to be specified during a search. You’ll also develop a text classifier using Lucene and Apache Mahout, a popular machine learning framework.
After a detailed review of performance bench-marking and common issues associated with it, you’ll learn some of the best practices of tuning the performance of your application. By the end of the book you’ll be able to build your first Lucene patch, where you will not only write your patch, but also test it and ensure it adheres to community coding standards.