Is NoSQL database an alternative for a search engine?
Lets talk about some terms first.
NoSQL – Not only SQL – meaning that a NoSQL database differs from a RDBMS in some way.
IR – information retieval – science of searching documents, their metadata, and retrieval.
- MongoDB is a document based database having following features ( reference http://www.mongodb.org/ ):
- Document-oriented storage
- Full Index Support
- Replication and High Availability
- Fast In-Place Updages
- Commercial Support
- Lucene features ( reference http://lucene.apache.org/java/docs/features.html ) :
- Scalable, High-Performace Indexing ( which is actually quite fast )
- Powerful, Accurate and Efficient Search Algorithms
- * ranked searching — best results returned first
- * many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more
- * fielded searching (e.g., title, author, contents)
- * date-range searching
- * sorting by any field
- * multiple-index searching with merged results
- * allows simultaneous update and searching
- Cross Platform
NoSQL is preferable when database needs to be scalable, highly available, with fast query results. However it doesn’t completely solve the problem of Information Retrieval.
Search (Information Retrieval) isn’t just about grabbing any documents that match, if you want your search results to have any relevance at all you’re going to need something along the lines of TF-IDF, phrase matching (words in a sequence score higher) or any number of other IR techniques to improve search precision.
NoSQL database such as MongoDB dont provide relevance based search results, which is one key point to consider. I think this is the biggest factor to consider when choosing a NoSQL database or a search engine framework.
An another alternative is to couple a database with a search engine to achieve the goals. For example:
- couchdb-lucene provides such an integration with CouchDB and Lucene
- Solr provides integration with RDBMSes ( such as MySQL ) and uses Lucene as its search library.
Thats all for now.
Comments and suggestions are welcome 🙂