Posted by: yuvan004 | September 6, 2010

Sphinx 1.10-beta


share this

About

Sphinx is a full-text search engine, publicly distributed under GPL version 2. Commercial licensing (eg. for embedded use) is available upon request.

Technically, Sphinx is a standalone software package provides fast and relevant full-text search functionality to client applications. It was specially designed to integrate well with SQL databases storing the data, and to be easily accessed scripting languages. However, Sphinx does not depend on nor require any specific database to function.

Applications can access Sphinx search daemon (searchd) using any of the three different access methods: a) via native search API (SphinxAPI), b) via Sphinx own implementation of MySQL network protocol (using a small SQL subset called SphinxQL), or c) via MySQL server with a pluggable storage engine (SphinxSE).

Official native SphinxAPI implementations for PHP, Perl, Ruby, and Java are included within the distribution package. API is very lightweight so porting it to a new language is known to take a few hours or days. Third party API ports and plugins exist for Perl, C#, Haskell, Ruby-on-Rails, and possibly other languages and frameworks.

Starting version 1.10-beta, Sphinx supports two different indexing backends: “disk” index backend, and “realtime” (RT) index backend. Disk indexes support online full-text index rebuilds, but online updates can only be done on non-text (attribute) data. RT indexes additionally allow for online full-text index updates. Previous versions only supported disk indexes.

Data can be loaded into disk indexes using a so-called data source. Built-in sources can fetch data directly from MySQL, PostgreSQL, ODBC compliant database (MS SQL, Oracle, etc), or a pipe in a custom XML format. Adding new data sources drivers (eg. to natively support other DBMSes) is designed to be as easy as possible. RT indexes, as of 1.10-beta, can only be populated using SphinxQL.

As for the name, Sphinx is an acronym which is officially decoded as SQL Phrase Index. Yes, I know about CMU’s Sphinx project.

Sphinx features

Key Sphinx features are:

  • high indexing and searching performance;
  • advanced indexing and querying tools (flexible and feature-rich text tokenizer, querying language, several different ranking modes, etc);
  • advanced result set post-processing (SELECT with expressions, WHERE, ORDER BY, GROUP BY etc over text search results);
  • proven scalability up to billions of documents, terabytes of data, and thousands of queries per second;
  • easy integration with SQL and XML data sources, and SphinxAPI, SphinxQL, or SphinxSE search interfaces;
  • easy scaling with distributed searches.

Where to get Sphinx

Sphinx is available through its official Web site at http://sphinxsearch.com/.

History

Sphinx development was started back in 2001, because I didn’t manage to find an acceptable search solution (for a database driven Web site) which would meet my requirements. Actually, each and every important aspect was a problem:

  • search quality (ie. good relevance)
    • statistical ranking methods performed rather bad, especially on large collections of small documents (forums, blogs, etc)
  • search speed
    • especially if searching for phrases which contain stopwords, as in “to be or not to be”
  • moderate disk and CPU requirements when indexing
    • important in shared hosting enivronment, not to mention the indexing speed.

Despite the amount of time passed and numerous improvements made in the other solutions, there’s still no solution which I personally would be eager to migrate to.

Considering that and a lot of positive feedback received from Sphinx users during last years, the obvious decision is to continue developing Sphinx (and, eventually, to take over the world).

Supported systems

Most modern UNIX systems with a C++ compiler should be able to compile and run Sphinx without any modifications.

Currently known systems Sphinx has been successfully running on are:

  • Linux 2.4.x, 2.6.x (many various distributions)
  • Windows 2000, XP
  • FreeBSD 4.x, 5.x, 6.x, 7.x
  • NetBSD 1.6, 3.0
  • Solaris 9, 11
  • Mac OS X

Required tools

On UNIX, you will need the following tools to build and install Sphinx:

  • a working C++ compiler. GNU gcc is known to work.
  • a good make program. GNU make is known to work.

On Windows, you will need Microsoft Visual C/C++ Studio .NET 2003 or 2005. Other compilers/environments will probably work as well, but for the time being, you will have to build makefile (or other environment specific project files) manually

Installing Sphinx on Linux

Before installing sphinx just do the following thingsin your terminal type this code.

sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install build-essential
sudo apt-get install libmysqlclient15-dev    
  1. Extract everything from the distribution tarball (haven’t you already?) and go to the sphinx subdirectory:

    $ tar xzvf sphinx-0.9.8.tar.gz
    $ cd sphinx

  2. Run the configuration program:

    $ ./configure

    There’s a number of options to configure. The complete listing may be obtained by using --help switch. The most important ones are:

    • --prefix, which specifies where to install Sphinx; such as --prefix=/usr/local/sphinx (all of the examples use this prefix)
    • --with-mysql, which specifies where to look for MySQL include and library files, if auto-detection fails;
    • --with-pgsql, which specifies where to look for PostgreSQL include and library files.
  3. Build the binaries:

    $ make

  4. Install the binaries in the directory of your choice: (defaults to /usr/local/bin/ on *nix systems, but is overridden with configure --prefix)

    $ make install

Installing Sphinx on Windows

Installing Sphinx on a Windows server is often easier than installing on a Linux environment; unless you are preparing code patches, you can use the pre-compiled binary files from the Downloads area on the website.

  1. Extract everything from the .zip file you have downloaded – sphinx-0.9.8-win32.zip (or sphinx-0.9.8-win32-pgsql.zip if you need PostgresSQL support as well.) You can use Windows Explorer in Windows XP and up to extract the files, or a freeware package like 7Zip to open the archive.For the remainder of this guide, we will assume that the folders are unzipped into C:\Sphinx, such that searchd.exe can be found in C:\Sphinx\bin\searchd.exe. If you decide to use any different location for the folders or configuration file, please change it accordingly.
  2. Edit the contents of sphinx.conf.in – specifically entries relating to @CONFDIR@ – to paths suitable for your system.
  3. Install the searchd system as a Windows service:C:\Sphinx\bin> C:\Sphinx\bin\searchd --install --config C:\Sphinx\sphinx.conf.in --servicename SphinxSearch
  4. The searchd service will now be listed in the Services panel within the Management Console, available from Administrative Tools. It will not have been started, as you will need to configure it and build your indexes with indexer before starting the service. A guide to do this can be found under Quick tour.During the next steps of the install (which involve running indexer pretty much as you would on Linux) you may find that you get an error relating to libmysql.dll not being found. If you have MySQL installed, you should find a copy of this library in your Windows directory, or sometimes in Windows\System32, or failing that in the MySQL core directories. If you do receive an error please copy libmysql.dll into the bin directory.

Known installation issues

If configure fails to locate MySQL headers and/or libraries, try checking for and installing mysql-devel package. On some systems, it is not installed by default.

If make fails with a message which look like

/bin/sh: g++: command not found
make[1]: *** [libsphinx_a-sphinx.o] Error 127

try checking for and installing gcc-c++ package.

If you are getting compile-time errors which look like

sphinx.cpp:67: error: invalid application of `sizeof' to
    incomplete type `Private::SizeError<false>'

this means that some compile-time type size check failed. The most probable reason is that off_t type is less than 64-bit on your system. As a quick hack, you can edit sphinx.h and replace off_t with DWORD in a typedef for SphOffset_t, but note that this will prohibit you from using full-text indexes larger than 2 GB. Even if the hack helps, please report such issues, providing the exact error message and compiler/OS details, so I could properly fix them in next releases.

Quick Sphinx usage tour

All the example commands below assume that you installed Sphinx in /usr/local/sphinx, so searchd can be found in /usr/local/sphinx/bin/searchd.

To use Sphinx, you will need to:

  1. Create a configuration file.Default configuration file name is sphinx.conf. All Sphinx programs look for this file in current working directory by default.Sample configuration file, sphinx.conf.dist, which has all the options documented, is created by configure. Copy and edit that sample file to make your own configuration: (assuming Sphinx is installed into /usr/local/sphinx/)

    $ cd /usr/local/sphinx/etc
    $ cp sphinx.conf.dist sphinx.conf
    $ vi sphinx.conf

    Sample configuration file is setup to index documents table from MySQL database test; so there’s example.sql sample data file to populate that table with a few documents for testing purposes:

    $ mysql -u test < /usr/local/sphinx/etc/example.sql

  2. Run the indexer to create full-text index from your data:

    $ cd /usr/local/sphinx/etc
    $ /usr/local/sphinx/bin/indexer –all

  3. Query your newly created index!

To query the index from command line, use search utility:

$ cd /usr/local/sphinx/etc
$ /usr/local/sphinx/bin/search test

To query the index from your PHP scripts, you need to:

  1. Run the search daemon which your script will talk to:

    $ cd /usr/local/sphinx/etc
    $ /usr/local/sphinx/bin/searchd

  2. Run the attached PHP API test script (to ensure that the daemon was succesfully started and is ready to serve the queries):

    $ cd sphinx/api
    $ php test.php test

  3. Include the API (it’s located in api/sphinxapi.php) into your own scripts and use it.

Happy searching!


Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: