Webdevelopment: search engines

The Apache Lucene project develops open-source search software, including:

  • Apache Lucene Core provides a Java-based indexing and search implementation, as well as spellchecking, hit highlighting and advanced analysis / tokenization capabilities
  • Apache Solr is a high performance enterprise search server, with XML / HTTP and JSON / Python / Ruby APIs, hit highlighting, faceted search, caching, replication, distributed search, database integration, web admin and search interfaces
  • Apache PyLucene is a Python port of the the Lucene Core project
  • Apache Open Relevance Project is a subproject with the aim of collecting and distributing free materials for relevance testing and performance

Sphinx is an open source full text search server, designed from the ground up with performance, relevance (aka search quality), and integration simplicity in mind. It’s written in C++ and works on Linux, Windows, MacOS, Solaris, FreeBSD, and a few other systems.

Sphinx lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily — or index and search data on the fly, working with Sphinx pretty much as with a database server.

A variety of text processing features enable fine-tuning Sphinx for your particular application requirements, and a number of relevance functions ensures you can tweak search quality as well.

Searching via SphinxAPI is as simple as 3 lines of code, and querying via SphinxQL is even simpler, with search queries expressed in good old SQL.

Xapian is an Open Source Search Engine Library. It’s written in C++, with bindings to allow use from Perl, Python, PHP, Java, Tcl, C#, Ruby and Lua.

Xapian is a highly adaptable toolkit which allows developers to easily add advanced indexing and search facilities to their applications. It supports the Probabilistic Information Retrieval model and also supports a rich set of boolean query operators.

Leave a Reply