Sphinx is a full-text search engine, publicly distributed under GPL version 2. Commercial licensing (eg. for embedded use) is available upon request.
Technically, Sphinx is a standalone software package provides fast and relevant full-text search functionality to client applications. It was specially designed to integrate well with SQL databases storing the data, and to be easily accessed scripting languages. However, Sphinx does not depend on nor require any specific database to function.
Applications can access Sphinx search daemon (searchd) using any of the three different access methods: a) via native search API (SphinxAPI), b) via Sphinx own implementation of MySQL network protocol (using a small SQL subset called SphinxQL), or c) via MySQL server with a pluggable storage engine (SphinxSE).
Official native SphinxAPI implementations for PHP, Perl, Ruby, and Java are included within the distribution package. API is very lightweight so porting it to a new language is known to take a few hours or days. Third party API ports and plugins exist for Perl, C#, Haskell, Ruby-on-Rails, and possibly other languages and frameworks.
Starting version 1.10-beta, Sphinx supports two different indexing backends: “disk” index backend, and “realtime” (RT) index backend. Disk indexes support online full-text index rebuilds, but online updates can only be done on non-text (attribute) data. RT indexes additionally allow for online full-text index updates. Previous versions only supported disk indexes.
Data can be loaded into disk indexes using a so-called data source. Built-in sources can fetch data directly from MySQL, PostgreSQL, ODBC compliant database (MS SQL, Oracle, etc), or a pipe in a custom XML format. Adding new data sources drivers (eg. to natively support other DBMSes) is designed to be as easy as possible. RT indexes, as of 1.10-beta, can only be populated using SphinxQL.
As for the name, Sphinx is an acronym which is officially decoded as SQL Phrase Index. Yes, I know about CMU’s Sphinx project.
Key Sphinx features are:
- high indexing and searching performance;
- advanced indexing and querying tools (flexible and feature-rich text tokenizer, querying language, several different ranking modes, etc);
- advanced result set post-processing (SELECT with expressions, WHERE, ORDER BY, GROUP BY etc over text search results);
- proven scalability up to billions of documents, terabytes of data, and thousands of queries per second;
- easy integration with SQL and XML data sources, and SphinxAPI, SphinxQL, or SphinxSE search interfaces;
- easy scaling with distributed searches.
Where to get Sphinx
Sphinx is available through its official Web site at http://sphinxsearch.com/.
Sphinx development was started back in 2001, because I didn’t manage to find an acceptable search solution (for a database driven Web site) which would meet my requirements. Actually, each and every important aspect was a problem:
- search quality (ie. good relevance)
- statistical ranking methods performed rather bad, especially on large collections of small documents (forums, blogs, etc)
- search speed
- especially if searching for phrases which contain stopwords, as in “to be or not to be”
- moderate disk and CPU requirements when indexing
- important in shared hosting enivronment, not to mention the indexing speed.
Despite the amount of time passed and numerous improvements made in the other solutions, there’s still no solution which I personally would be eager to migrate to.
Considering that and a lot of positive feedback received from Sphinx users during last years, the obvious decision is to continue developing Sphinx (and, eventually, to take over the world).
Most modern UNIX systems with a C++ compiler should be able to compile and run Sphinx without any modifications.
Currently known systems Sphinx has been successfully running on are:
- Linux 2.4.x, 2.6.x (many various distributions)
- Windows 2000, XP
- FreeBSD 4.x, 5.x, 6.x, 7.x
- NetBSD 1.6, 3.0
- Solaris 9, 11
- Mac OS X
On UNIX, you will need the following tools to build and install Sphinx:
- a working C++ compiler. GNU gcc is known to work.
- a good make program. GNU make is known to work.
On Windows, you will need Microsoft Visual C/C++ Studio .NET 2003 or 2005. Other compilers/environments will probably work as well, but for the time being, you will have to build makefile (or other environment specific project files) manually
Installing Sphinx on Linux
Before installing sphinx just do the following thingsin your terminal type this code.
sudo apt-get update sudo apt-get dist-upgrade sudo apt-get install build-essential sudo apt-get install libmysqlclient15-dev
- Extract everything from the distribution tarball (haven’t you already?) and go to the
$ tar xzvf sphinx-0.9.8.tar.gz
$ cd sphinx
- Run the configuration program:
There’s a number of options to configure. The complete listing may be obtained by using
--helpswitch. The most important ones are:
--prefix, which specifies where to install Sphinx; such as
--prefix=/usr/local/sphinx(all of the examples use this prefix)
--with-mysql, which specifies where to look for MySQL include and library files, if auto-detection fails;
--with-pgsql, which specifies where to look for PostgreSQL include and library files.
- Build the binaries:
- Install the binaries in the directory of your choice: (defaults to
/usr/local/bin/on *nix systems, but is overridden with
$ make install
Installing Sphinx on Windows
Installing Sphinx on a Windows server is often easier than installing on a Linux environment; unless you are preparing code patches, you can use the pre-compiled binary files from the Downloads area on the website.
- Extract everything from the .zip file you have downloaded –
sphinx-0.9.8-win32-pgsql.zipif you need PostgresSQL support as well.) You can use Windows Explorer in Windows XP and up to extract the files, or a freeware package like 7Zip to open the archive.For the remainder of this guide, we will assume that the folders are unzipped into
C:\Sphinx, such that
searchd.execan be found in
C:\Sphinx\bin\searchd.exe. If you decide to use any different location for the folders or configuration file, please change it accordingly.
- Edit the contents of sphinx.conf.in – specifically entries relating to @CONFDIR@ – to paths suitable for your system.
- Install the
searchdsystem as a Windows service:
C:\Sphinx\bin> C:\Sphinx\bin\searchd --install --config C:\Sphinx\sphinx.conf.in --servicename SphinxSearch
searchdservice will now be listed in the Services panel within the Management Console, available from Administrative Tools. It will not have been started, as you will need to configure it and build your indexes with
indexerbefore starting the service. A guide to do this can be found under Quick tour.During the next steps of the install (which involve running indexer pretty much as you would on Linux) you may find that you get an error relating to libmysql.dll not being found. If you have MySQL installed, you should find a copy of this library in your Windows directory, or sometimes in Windows\System32, or failing that in the MySQL core directories. If you do receive an error please copy libmysql.dll into the bin directory.
Known installation issues
configure fails to locate MySQL headers and/or libraries, try checking for and installing
mysql-devel package. On some systems, it is not installed by default.
make fails with a message which look like
/bin/sh: g++: command not found make: *** [libsphinx_a-sphinx.o] Error 127
try checking for and installing
If you are getting compile-time errors which look like
sphinx.cpp:67: error: invalid application of `sizeof' to incomplete type `Private::SizeError<false>'
this means that some compile-time type size check failed. The most probable reason is that off_t type is less than 64-bit on your system. As a quick hack, you can edit sphinx.h and replace off_t with DWORD in a typedef for SphOffset_t, but note that this will prohibit you from using full-text indexes larger than 2 GB. Even if the hack helps, please report such issues, providing the exact error message and compiler/OS details, so I could properly fix them in next releases.
Quick Sphinx usage tour
All the example commands below assume that you installed Sphinx in
searchd can be found in
To use Sphinx, you will need to:
- Create a configuration file.Default configuration file name is
sphinx.conf. All Sphinx programs look for this file in current working directory by default.Sample configuration file,
sphinx.conf.dist, which has all the options documented, is created by
configure. Copy and edit that sample file to make your own configuration: (assuming Sphinx is installed into
$ cd /usr/local/sphinx/etc
$ cp sphinx.conf.dist sphinx.conf
$ vi sphinx.conf
Sample configuration file is setup to index
documentstable from MySQL database
test; so there’s
example.sqlsample data file to populate that table with a few documents for testing purposes:
$ mysql -u test < /usr/local/sphinx/etc/example.sql
- Run the indexer to create full-text index from your data:
$ cd /usr/local/sphinx/etc
$ /usr/local/sphinx/bin/indexer –all
- Query your newly created index!
To query the index from command line, use
$ cd /usr/local/sphinx/etc
$ /usr/local/sphinx/bin/search test
To query the index from your PHP scripts, you need to:
- Run the search daemon which your script will talk to:
$ cd /usr/local/sphinx/etc
- Run the attached PHP API test script (to ensure that the daemon was succesfully started and is ready to serve the queries):
$ cd sphinx/api
$ php test.php test
- Include the API (it’s located in
api/sphinxapi.php) into your own scripts and use it.