Indexation de vos contenus

2019mai15

Vous souhaitez mettre en place une recherche textuelle avancée à votre site internet ou votre application mais l’utilisation des index Fulltext de Mysql ne vous donne pas entièrement satisfaction ou votre base de données est trop importante et la recherche est devenue lente.

Il est temps de découvrir Sphinxsearch, un moteur d’indexation Open Source pour vos contenus textuels. Jusqu’à 50x plus rapide que Mysql, cela peut être une solution si votre base de données est importante.

Installation sous Linux Debian 9

Votre serveur est sous Debian, rien de plus simple :
apt-get install sphinxsearch

Pour le paramétrage, activez le lancement du service dans le fichier /etc/default/sphinxsearch :
START=yes

Configurer l’acquisition des contenus

Pour indexer vos contenus, une petite configuration du fichier /etc/sphinxsearch/sphinx.conf suffit. Vous retrouverez ci-dessous ma configuration mais sur le principe, vous configurer un connecteur mysql en indiquant à Sphinx quels champs et quelles tables il faut indexer et ajouter tout cela dans une tâche Cron pour maintenir l’index à jour.

#
# Sphinx configuration file sample
#
# WARNING! While this sample file mentions all available options,
# it contains (very) short helper descriptions only. Please refer to
# doc/sphinx.html for details.
#

#############################################################################
## data source definition
#############################################################################

source src1
{
	# data source type. mandatory, no default value
	# known types are mysql, pgsql, mssql, xmlpipe, xmlpipe2, odbc
	type			= mysql

	#####################################################################
	## SQL settings (for 'mysql' and 'pgsql' types)
	#####################################################################

	# some straightforward parameters for SQL source types
	sql_host		= localhost
	sql_user		= XXXX
	sql_pass		= XXXX
	sql_db			= XXXX
	sql_port		= 3306	# optional, default is 3306

	# main document fetch query
	# mandatory, integer document ID field MUST be the first selected column
	sql_query		= \
		SELECT ID as id, post_content from wp_posts


	# ranged query throttling, in milliseconds
	# optional, default is 0 which means no delay
	# enforces given delay before each query step
	sql_ranged_throttle	= 0
	sql_query_pre = SET NAMES utf8
}

source src1_delta: src1{
	sql_query		= \
		SELECT ID as id, post_content from wp_posts where post_modified>=concat(CURRENT_DATE(), ' 00:00:00')
}

#############################################################################
## index definition
#############################################################################

# local index example
#
# this is an index which is stored locally in the filesystem
#
# all indexing-time options (such as morphology and charsets)
# are configured per local index
index posts
{
	source			= src1
	path			= /var/lib/sphinxsearch/data/posts
	docinfo			= extern
	dict			= keywords
	mlock			= 0
	morphology		= none
	min_word_len		= 1
	min_infix_len		= 1
	html_strip		= 1
	#charset_table		= 0..9, A..Z->a..z,.,@
	charset_type = utf-8
	charset_table = 0..9, a..z, _, A..Z->a..z, U+00C0->a, U+00C1->a,\
        U+00C2->a, U+00C3->a, U+00C4->a, U+00C5->a, U+00C7->c, U+00C8->e,\
        U+00C9->e, U+00CA->e, U+00CB->e, U+00CC->i, U+00CD->i, U+00CE->i,\
        U+00CF->i, U+00D1->n, U+00D2->o, U+00D3->o, U+00D4->o, U+00D5->o,\
        U+00D6->o, U+00D8->o, U+00D9->u, U+00DA->u, U+00DB->u, U+00DC->u,\
        U+00DD->y, U+00DF->s, U+00E0->a, U+00E1->a, U+00E2->a, U+00E3->a,\
        U+00E4->a, U+00E5->a, U+00E7->c, U+00E8->e, U+00E9->e, U+00EA->e,\
        U+00EB->e, U+00EC->i, U+00ED->i, U+00EE->i, U+00EF->i, U+00F1->n,\
        U+00F2->o, U+00F3->o, U+00F4->o, U+00F5->o, U+00F6->o, U+00F8->o,\
        U+00F9->u, U+00FA->u, U+00FB->u, U+00FC->u, U+00FD->y, U+00FF->y,\
        U+0100->a, U+0101->a, U+0102->a, U+0103->a, U+0104->a, U+0105->a,\
        U+0106->c, U+0107->c, U+0108->c, U+0109->c, U+010A->c, U+010B->c,\
        U+010C->c, U+010D->c, U+010E->d, U+010F->d, U+0110->d, U+0111->d,\
        U+0112->e, U+0113->e, U+0114->e, U+0115->e, U+0116->e, U+0117->e,\
        U+0118->e, U+0119->e, U+011A->e, U+011B->e, U+011C->g, U+011D->g,\
        U+011E->g, U+011F->g, U+0120->g, U+0121->g, U+0122->g, U+0123->g,\
        U+0124->h, U+0125->h, U+0126->h, U+0127->h, U+0128->i, U+0129->i,\
        U+012A->i, U+012B->i, U+012C->i, U+012D->i, U+012E->i, U+012F->i,\
        U+0130->i, U+0131->i, U+0134->j, U+0135->j, U+0136->k, U+0137->k,\
        U+0139->l, U+013A->l, U+013B->l, U+013C->l, U+013D->l, U+013E->l,\
        U+013F->l, U+0140->l, U+0141->l, U+0142->l, U+0143->n, U+0144->n,\
        U+0145->n, U+0146->n, U+0147->n, U+0148->n, U+0149->n, U+014C->o,\
        U+014D->o, U+014E->o, U+014F->o, U+0150->o, U+0151->o, U+0154->r,\
        U+0155->r, U+0156->r, U+0157->r, U+0158->r, U+0159->r, U+015A->s,\
        U+015B->s, U+015C->s, U+015D->s, U+015E->s, U+015F->s, U+0160->s,\
        U+0161->s, U+0162->t, U+0163->t, U+0164->t, U+0165->t, U+0166->t,\
        U+0167->t, U+0168->u, U+0169->u, U+016A->u, U+016B->u, U+016C->u,\
        U+016D->u, U+016E->u, U+016F->u, U+0170->u, U+0171->u, U+0172->u,\
        U+0173->u, U+0174->w, U+0175->w, U+0176->y, U+0177->y, U+0178->y,\
        U+0179->z, U+017A->z, U+017B->z, U+017C->z, U+017D->z, U+017E->z,\
        U+017F->s, U+0180->b, U+0181->b, U+0182->b, U+0183->b, U+0186->o,\
        U+0187->c, U+0188->c, U+0189->d, U+018A->d, U+018B->d, U+018C->d,\
        U+018E->e, U+0190->e, U+0191->f, U+0192->f, U+0193->g, U+0197->i,\
        U+0198->k, U+0199->k, U+019A->l, U+019C->m, U+019D->n, U+019E->n,\
        U+019F->o, U+01A0->o, U+01A1->o, U+01A4->p, U+01A5->p, U+01AB->t,\
        U+01AC->t, U+01AD->t, U+01AE->t, U+01AF->u, U+01B0->u, U+01B2->v,\
        U+01B3->y, U+01B4->y, U+01B5->z, U+01B6->z, U+01C5->d, U+01C8->j,\
        U+01C8->l, U+01CB->j, U+01CB->n, U+01CD->a, U+01CE->a, U+01CF->i,\
        U+01D0->i, U+01D1->o, U+01D2->o, U+01D3->u, U+01D4->u, U+01D5->u,\
        U+01D6->u, U+01D7->u, U+01D8->u, U+01D9->u, U+01DA->u, U+01DB->u,\
        U+01DC->u, U+01DD->e, U+01DE->a, U+01DF->a, U+01E0->a, U+01E1->a,\
        U+01E4->g, U+01E5->g, U+01E6->g, U+01E7->g, U+01E8->k, U+01E9->k,\
        U+01EA->o, U+01EB->o, U+01EC->o, U+01ED->o, U+01F0->j, U+01F2->d,\
        U+01F4->g, U+01F5->g, U+01F8->n, U+01F9->n, U+01FA->a, U+01FB->a,\
        U+01FE->o, U+01FF->o, U+0200->a, U+0201->a, U+0202->a, U+0203->a,\
        U+0204->e, U+0205->e, U+0206->e, U+0207->e, U+0208->i, U+0209->i,\
        U+020A->i, U+020B->i, U+020C->o, U+020D->o, U+020E->o, U+020F->o,\
        U+0210->r, U+0211->r, U+0212->r, U+0213->r, U+0214->u, U+0215->u,\
        U+0216->u, U+0217->u, U+0218->s, U+0219->s, U+021A->t, U+021B->t,\
        U+021E->h, U+021F->h, U+0220->n, U+0221->d, U+0224->z, U+0225->z,\
        U+0226->a, U+0227->a, U+0228->e, U+0229->e, U+022A->o, U+022B->o,\
        U+022C->o, U+022D->o, U+022E->o, U+022F->o, U+0230->o, U+0231->o,\
        U+0232->y, U+0233->y, U+0234->l, U+0235->n, U+0236->t, U+0237->j,\
        U+023A->a, U+023B->c, U+023C->c, U+023D->l, U+023E->t, U+023F->s,\
        U+0240->z, U+0243->b, U+0244->u, U+0245->v, U+0246->e, U+0247->e,\
        U+0248->j, U+0249->j, U+024A->q, U+024B->q, U+024C->r, U+024D->r,\
        U+024E->y, U+024F->y, U+0250->a, U+0253->b, U+0254->o, U+0255->c,\
        U+0256->d, U+0257->d, U+0258->e, U+025B->e, U+025C->e, U+025D->e,\
        U+025E->e, U+025F->j, U+0260->g, U+0261->g, U+0262->g, U+0265->h,\
        U+0266->h, U+0268->i, U+026A->i, U+026B->l, U+026C->l, U+026D->l,\
        U+026F->m, U+0270->m, U+0271->m, U+0272->n, U+0273->n, U+0274->n,\
        U+0275->o, U+0279->r, U+027A->r, U+027B->r, U+027C->r, U+027D->r,\
        U+027E->r, U+027F->r, U+0280->r, U+0281->r, U+0282->s, U+0284->j,\
        U+0287->t, U+0288->t, U+0289->u, U+028B->v, U+028C->v, U+028D->w,\
        U+028E->y, U+028F->y, U+0290->z, U+0291->z, U+0297->c, U+0299->b,\
        U+029A->e, U+029B->g, U+029C->h, U+029D->j, U+029E->k, U+029F->l,\
        U+02A0->q, U+02AE->h, U+02AF->h, U+02B0->h, U+02B1->h, U+02B2->j,\
        U+02B3->r, U+02B4->r, U+02B5->r, U+02B6->r, U+02B7->w, U+02B8->y,\
        U+02E1->l, U+02E2->s, U+02E3->x, U+040D->i, U+0418->i, U+0419->i,\
        U+0438->i, U+0439->i, U+043E->o, U+0456->i, U+04D0->a, U+04D1->a,\
        U+04E6->o, U+04E7->o, U+04E8->o, U+04E9->o, U+04EA->o, U+04EB->o,\
        U+16D2->b, U+1D03->b, U+1D05->d, U+1D07->e, U+1D08->e, U+1D09->i,\
        U+1D0A->j, U+1D0B->k, U+1D0C->l, U+1D0D->m, U+1D0E->n, U+1D0F->o,\
        U+1D10->o, U+1D11->o, U+1D12->o, U+1D13->o, U+1D16->o, U+1D17->o,\
        U+1D18->p, U+1D19->r, U+1D1A->r, U+1D1B->t, U+1D1C->u, U+1D1D->u,\
        U+1D1E->u, U+1D1F->m, U+1D20->v, U+1D21->w, U+1D22->z, U+1D2C->a,\
        U+1D2E->b, U+1D2F->b, U+1D30->d, U+1D31->e, U+1D32->e, U+1D33->g,\
        U+1D34->h, U+1D35->i, U+1D36->j, U+1D37->k, U+1D38->l, U+1D39->m,\
        U+1D3A->n, U+1D3B->n, U+1D3C->o, U+1D3E->p, U+1D3F->r, U+1D40->t,\
        U+1D41->u, U+1D42->w, U+1D43->a, U+1D44->a, U+1D47->b, U+1D48->d,\
        U+1D49->e, U+1D4B->e, U+1D4C->e, U+1D4D->g, U+1D4E->i, U+1D4F->k,\
        U+1D50->m, U+1D52->o, U+1D53->o, U+1D54->o, U+1D55->o, U+1D56->p,\
        U+1D57->t, U+1D58->u, U+1D59->u, U+1D5A->m, U+1D5B->v, U+1D62->i,\
        U+1D63->r, U+1D64->u, U+1D65->v, U+1D6C->b, U+1D6D->d, U+1D6E->f,\
        U+1D6F->m, U+1D70->n, U+1D71->p, U+1D72->r, U+1D73->r, U+1D74->s,\
        U+1D75->t, U+1D76->z, U+1D77->g, U+1D79->g, U+1D7B->i, U+1D7D->p,\
        U+1D7E->u, U+1D80->b, U+1D81->d, U+1D82->f, U+1D83->g, U+1D84->k,\
        U+1D85->l, U+1D86->m, U+1D87->n, U+1D88->p, U+1D89->r, U+1D8A->s,\
        U+1D8C->v, U+1D8D->x, U+1D8E->z, U+1D8F->a, U+1D91->d, U+1D92->e,\
        U+1D93->e, U+1D94->e, U+1D96->i, U+1D97->o, U+1D99->u, U+1D9C->c,\
        U+1D9D->c, U+1D9F->e, U+1DA0->f, U+1DA1->j, U+1DA2->g, U+1DA3->h,\
        U+1DA4->i, U+1DA6->i, U+1DA7->i, U+1DA8->j, U+1DA9->l, U+1DAA->l,\
        U+1DAB->l, U+1DAC->m, U+1DAD->m, U+1DAE->n, U+1DAF->n, U+1DB0->n,\
        U+1DB1->o, U+1DB3->s, U+1DB5->t, U+1DB6->u, U+1DB8->u, U+1DB9->v,\
        U+1DBA->v, U+1DBB->z, U+1DBC->z, U+1DBD->z, U+1DCA->r, U+1E00->a,\
        U+1E01->a, U+1E02->b, U+1E03->b, U+1E04->b, U+1E05->b, U+1E06->b,\
        U+1E07->b, U+1E08->c, U+1E09->c, U+1E0A->d, U+1E0B->d, U+1E0C->d,\
        U+1E0D->d, U+1E0E->d, U+1E0F->d, U+1E10->d, U+1E11->d, U+1E12->d,\
        U+1E13->d, U+1E14->e, U+1E15->e, U+1E16->e, U+1E17->e, U+1E18->e,\
        U+1E19->e, U+1E1A->e, U+1E1B->e, U+1E1C->e, U+1E1D->e, U+1E1E->f,\
        U+1E1F->f, U+1E20->g, U+1E21->g, U+1E22->h, U+1E23->h, U+1E24->h,\
        U+1E25->h, U+1E26->h, U+1E27->h, U+1E28->h, U+1E29->h, U+1E2A->h,\
        U+1E2B->h, U+1E2C->i, U+1E2D->i, U+1E2E->i, U+1E2F->i, U+1E30->k,\
        U+1E31->k, U+1E32->k, U+1E33->k, U+1E34->k, U+1E35->k, U+1E36->l,\
        U+1E37->l, U+1E38->l, U+1E39->l, U+1E3A->l, U+1E3B->l, U+1E3C->l,\
        U+1E3D->l, U+1E3E->m, U+1E3F->m, U+1E40->m, U+1E41->m, U+1E42->m,\
        U+1E43->m, U+1E44->n, U+1E45->n, U+1E46->n, U+1E47->n, U+1E48->n,\
        U+1E49->n, U+1E4A->n, U+1E4B->n, U+1E4C->o, U+1E4D->o, U+1E4E->o,\
        U+1E4F->o, U+1E50->o, U+1E51->o, U+1E52->o, U+1E53->o, U+1E54->p,\
        U+1E55->p, U+1E56->p, U+1E57->p, U+1E58->r, U+1E59->r, U+1E5A->r,\
        U+1E5B->r, U+1E5C->r, U+1E5D->r, U+1E5E->r, U+1E5F->r, U+1E60->s,\
        U+1E61->s, U+1E62->s, U+1E63->s, U+1E64->s, U+1E65->s, U+1E66->s,\
        U+1E67->s, U+1E68->s, U+1E69->s, U+1E6A->t, U+1E6B->t, U+1E6C->t,\
        U+1E6D->t, U+1E6E->t, U+1E6F->t, U+1E70->t, U+1E71->t, U+1E72->u,\
        U+1E73->u, U+1E74->u, U+1E75->u, U+1E76->u, U+1E77->u, U+1E78->u,\
        U+1E79->u, U+1E7A->u, U+1E7B->u, U+1E7C->v, U+1E7D->v, U+1E7E->v,\
        U+1E7F->v, U+1E80->w, U+1E81->w, U+1E82->w, U+1E83->w, U+1E84->w,\
        U+1E85->w, U+1E86->w, U+1E87->w, U+1E88->w, U+1E89->w, U+1E8A->x,\
        U+1E8B->x, U+1E8C->x, U+1E8D->x, U+1E8E->y, U+1E8F->y, U+1E90->z,\
        U+1E91->z, U+1E92->z, U+1E93->z, U+1E94->z, U+1E95->z, U+1E96->h,\
        U+1E97->t, U+1E98->w, U+1E99->y, U+1E9A->a, U+1E9B->s, U+1EA0->a,\
        U+1EA1->a, U+1EA2->a, U+1EA3->a, U+1EA4->a, U+1EA5->a, U+1EA6->a,\
        U+1EA7->a, U+1EA8->a, U+1EA9->a, U+1EAA->a, U+1EAB->a, U+1EAC->a,\
        U+1EAD->a, U+1EAE->a, U+1EAF->a, U+1EB0->a, U+1EB1->a, U+1EB2->a,\
        U+1EB3->a, U+1EB4->a, U+1EB5->a, U+1EB6->a, U+1EB7->a, U+1EB8->e,\
        U+1EB9->e, U+1EBA->e, U+1EBB->e, U+1EBC->e, U+1EBD->e, U+1EBE->e,\
        U+1EBF->e, U+1EC0->e, U+1EC1->e, U+1EC2->e, U+1EC3->e, U+1EC4->e,\
        U+1EC5->e, U+1EC6->e, U+1EC7->e, U+1EC8->i, U+1EC9->i, U+1ECA->i,\
        U+1ECB->i, U+1ECC->o, U+1ECD->o, U+1ECE->o, U+1ECF->o, U+1ED0->o,\
        U+1ED1->o, U+1ED2->o, U+1ED3->o, U+1ED4->o, U+1ED5->o, U+1ED6->o,\
        U+1ED7->o, U+1ED8->o, U+1ED9->o, U+1EDA->o, U+1EDB->o, U+1EDC->o,\
        U+1EDD->o, U+1EDE->o, U+1EDF->o, U+1EE0->o, U+1EE1->o, U+1EE2->o,\
        U+1EE3->o, U+1EE4->u, U+1EE5->u, U+1EE6->u, U+1EE7->u, U+1EE8->u,\
        U+1EE9->u, U+1EEA->u, U+1EEB->u, U+1EEC->u, U+1EED->u, U+1EEE->u,\
        U+1EEF->u, U+1EF0->u, U+1EF1->u, U+1EF2->y, U+1EF3->y, U+1EF4->y,\
        U+1EF5->y, U+1EF6->y, U+1EF7->y, U+1EF8->y, U+1EF9->y, U+2071->i,\
        U+207F->n, U+2090->a, U+2091->e, U+2092->o, U+2093->x, U+210C->h,\
        U+2111->i, U+211C->r, U+2128->z, U+212D->c, U+2184->c, U+2C60->l,\
        U+2C61->l, U+2C62->l, U+2C63->p, U+2C64->r, U+2C65->a, U+2C66->t,\
        U+2C67->h, U+2C68->h, U+2C69->k, U+2C6A->k, U+2C6B->z, U+2C6C->z,\
        U+2C74->v, U+2C75->h, U+2C76->h, U+2C9E->o, U+2C9F->o, U+10300->a,\
        U+10309->i, U+1030F->o, U+10316->u	
}

index posts_delta: posts{
	source			= src1_delta
	path			= /var/lib/sphinxsearch/data/posts_delta
}

#############################################################################
## indexer settings
#############################################################################

indexer
{
	# memory limit, in bytes, kiloytes (16384K) or megabytes (256M)
	# optional, default is 128M, max is 2047M, recommended is 256M to 1024M
	mem_limit		= 128M

	# maximum IO calls per second (for I/O throttling)
	# optional, default is 0 (unlimited)
	#
	# max_iops		= 40

	# maximum IO call size, bytes (for I/O throttling)
	# optional, default is 0 (unlimited)
	#
	# max_iosize		= 1048576


	# maximum xmlpipe2 field length, bytes
	# optional, default is 2M
	#
	# max_xmlpipe2_field	= 4M


	# write buffer size, bytes
	# several (currently up to 4) buffers will be allocated
	# write buffers are allocated in addition to mem_limit
	# optional, default is 1M
	#
	# write_buffer		= 1M


	# maximum file field adaptive buffer size
	# optional, default is 8M, minimum is 1M
	#
	# max_file_field_buffer	= 32M


	# how to handle IO errors in file fields
	# known values are 'ignore_field', 'skip_document', and 'fail_index'
	# optional, default is 'ignore_field'
	#
	# on_file_field_error = skip_document


	# lemmatizer cache size
	# improves the indexing time when the lemmatization is enabled
	# optional, default is 256K
	#
	# lemmatizer_cache = 512M
}

#############################################################################
## searchd settings
#############################################################################

searchd
{
	# [hostname:]port[:protocol], or /unix/socket/path to listen on
	# known protocols are 'sphinx' (SphinxAPI) and 'mysql41' (SphinxQL)
	#
	# multi-value, multiple listen points are allowed
	# optional, defaults are 9312:sphinx and 9306:mysql41, as below
	#
	# listen			= 127.0.0.1
	# listen			= 192.168.0.1:9312
	# listen			= 9312
	# listen			= /var/run/searchd.sock
	# listen			= 9312
	listen			= 127.0.0.1:9306:mysql41

	# log file, searchd run info is logged here
	# optional, default is 'searchd.log'
	log			= /var/log/sphinxsearch/searchd.log

	# query log file, all search queries are logged here
	# optional, default is empty (do not log queries)
	query_log		= /var/log/sphinxsearch/query.log

	# client read timeout, seconds
	# optional, default is 5
	read_timeout		= 5

	# request timeout, seconds
	# optional, default is 5 minutes
	client_timeout		= 300

	# maximum amount of children to fork (concurrent searches to run)
	# optional, default is 0 (unlimited)
	max_children		= 30

	# maximum amount of persistent connections from this master to each agent host
	# optional, but necessary if you use agent_persistent. It is reasonable to set the value
	# as max_children, or less on the agent's hosts.
	persistent_connections_limit	= 30

	# PID file, searchd process ID file name
	# mandatory
	pid_file		= /var/run/sphinxsearch/searchd.pid

	# seamless rotate, prevents rotate stalls if precaching huge datasets
	# optional, default is 1
	seamless_rotate		= 1

	# whether to forcibly preopen all indexes on startup
	# optional, default is 1 (preopen everything)
	preopen_indexes		= 1

	# whether to unlink .old index copies on succesful rotation.
	# optional, default is 1 (do unlink)
	unlink_old		= 1

	# attribute updates periodic flush timeout, seconds
	# updates will be automatically dumped to disk this frequently
	# optional, default is 0 (disable periodic flush)
	#
	# attr_flush_period	= 900

	# MVA updates pool size
	# shared between all instances of searchd, disables attr flushes!
	# optional, default size is 1M
	mva_updates_pool	= 1M

	# max allowed network packet size
	# limits both query packets from clients, and responses from agents
	# optional, default size is 8M
	max_packet_size		= 8M

	# max allowed per-query filter count
	# optional, default is 256
	max_filters		= 256

	# max allowed per-filter values count
	# optional, default is 4096
	max_filter_values	= 4096


	# socket listen queue length
	# optional, default is 5
	#
	# listen_backlog		= 5


	# per-keyword read buffer size
	# optional, default is 256K
	#
	# read_buffer		= 256K


	# unhinted read size (currently used when reading hits)
	# optional, default is 32K
	#
	# read_unhinted		= 32K


	# max allowed per-batch query count (aka multi-query count)
	# optional, default is 32
	max_batch_queries	= 32


	# max common subtree document cache size, per-query
	# optional, default is 0 (disable subtree optimization)
	#
	# subtree_docs_cache	= 4M


	# max common subtree hit cache size, per-query
	# optional, default is 0 (disable subtree optimization)
	#
	# subtree_hits_cache	= 8M


	# multi-processing mode (MPM)
	# known values are none, fork, prefork, and threads
	# threads is required for RT backend to work
	# optional, default is threads
	workers			= threads # for RT to work


	# max threads to create for searching local parts of a distributed index
	# optional, default is 0, which means disable multi-threaded searching
	# should work with all MPMs (ie. does NOT require workers=threads)
	#
	# dist_threads		= 4


	# binlog files path; use empty string to disable binlog
	# optional, default is build-time configured data directory
	#
	# binlog_path		= # disable logging
	# binlog_path		= /var/lib/sphinxsearch/data # binlog.001 etc will be created there


	# binlog flush/sync mode
	# 0 means flush and sync every second
	# 1 means flush and sync every transaction
	# 2 means flush every transaction, sync every second
	# optional, default is 2
	#
	# binlog_flush		= 2


	# binlog per-file size limit
	# optional, default is 128M, 0 means no limit
	#
	# binlog_max_log_size	= 256M


	# per-thread stack size, only affects workers=threads mode
	# optional, default is 64K
	#
	# thread_stack			= 128K


	# per-keyword expansion limit (for dict=keywords prefix searches)
	# optional, default is 0 (no limit)
	#
	# expansion_limit		= 1000


	# RT RAM chunks flush period
	# optional, default is 0 (no periodic flush)
	#
	# rt_flush_period		= 900


	# query log file format
	# optional, known values are plain and sphinxql, default is plain
	#
	# query_log_format		= sphinxql


	# version string returned to MySQL network protocol clients
	# optional, default is empty (use Sphinx version)
	#
	# mysql_version_string	= 5.0.37


	# default server-wide collation
	# optional, default is libc_ci
	#
	collation_server		= utf8_general_ci


	# server-wide locale for libc based collations
	# optional, default is C
	#
	# collation_libc_locale	= ru_RU.UTF-8


	# threaded server watchdog (only used in workers=threads mode)
	# optional, values are 0 and 1, default is 1 (watchdog on)
	#
	# watchdog				= 1

	
	# costs for max_predicted_time model, in (imaginary) nanoseconds
	# optional, default is "doc=64, hit=48, skip=2048, match=64"
	#
	# predicted_time_costs	= doc=64, hit=48, skip=2048, match=64


	# current SphinxQL state (uservars etc) serialization path
	# optional, default is none (do not serialize SphinxQL state)
	#
	# sphinxql_state			= sphinxvars.sql


	# maximum RT merge thread IO calls per second, and per-call IO size
	# useful for throttling (the background) OPTIMIZE INDEX impact
	# optional, default is 0 (unlimited)
	#
	# rt_merge_iops			= 40
	# rt_merge_maxiosize		= 1M


	# interval between agent mirror pings, in milliseconds
	# 0 means disable pings
	# optional, default is 1000
	#
	# ha_ping_interval		= 0


	# agent mirror statistics window size, in seconds
	# stats older than the window size (karma) are retired
	# that is, they will not affect master choice of agents in any way
	# optional, default is 60 seconds
	#
	# ha_period_karma			= 60


	# delay between preforked children restarts on rotation, in milliseconds
	# optional, default is 0 (no delay)
	#
	# prefork_rotation_throttle	= 100


	# a prefix to prepend to the local file names when creating snippets
	# with load_files and/or load_files_scatter options
	# optional, default is empty
	#
	# snippets_file_prefix		= /mnt/common/server1/
}

#############################################################################
## common settings
#############################################################################

common
{

	# lemmatizer dictionaries base path
	# optional, defaut is /usr/local/share (see ./configure --datadir)
	#
	# lemmatizer_base = /usr/local/share/sphinx/dicts


	# how to handle syntax errors in JSON attributes
	# known values are 'ignore_attr' and 'fail_index'
	# optional, default is 'ignore_attr'
	#
	# on_json_attr_error = fail_index


	# whether to auto-convert numeric values from strings in JSON attributes
	# with auto-conversion, string value with actually numeric data
	# (as in {"key":"12345"}) gets stored as a number, rather than string
	# optional, allowed values are 0 and 1, default is 0 (do not convert)
	#
	# json_autoconv_numbers = 1


	# whether and how to auto-convert key names in JSON attributes
	# known value is 'lowercase'
	# optional, default is unspecified (do nothing)
	#
	# json_autoconv_keynames = lowercase


	# path to RLP root directory
	# optional, defaut is /usr/local/share (see ./configure --datadir)
	#
	# rlp_root = /usr/local/share/sphinx/rlp


	# path to RLP environment file
	# optional, defaut is /usr/local/share/rlp-environment.xml (see ./configure --datadir)
	#
	# rlp_environment = /usr/local/share/sphinx/rlp/rlp/etc/rlp-environment.xml


	# maximum total size of documents batched before processing them by the RLP
	# optional, default is 51200
	#
	# rlp_max_batch_size = 100k


	# maximum number of documents batched before processing them by the RLP
	# optional, default is 50
	#
	# rlp_max_batch_docs = 100


	# trusted plugin directory
	# optional, default is empty (disable UDFs)
	#
	# plugin_dir			= /usr/local/sphinx/lib

}

# --eof--

Vous l’aurez compris ce fichier de configuration sert à indexer les contenus d’un blog WordPress. Il est découpé en 2 sections permettant d’augmentez la rapidité d’indexation.

  • une section posts, permettant d’indexer l’ensemble des posts actifs du blog, à actualiser 1 fois par jour par exemple
  • une section posts_delta, permettant de n’indexer que les contenus modifiés du jour

2 tâches cron sont à prévoir, la première permet d’indexer tous les posts, la seconde, les contenus modifier du jour puis d’effectuer un merge avec l’index principal

  • indexer posts –rotate
  • indexer posts_delta –rotate ; indexer –merge posts posts_delta –rotate

Comment s’en servir en PHP ?

Les contenus indexés, c’est bien beau, mais comment je les utilise en PHP ?
C’est très, nous pouvons nous connecter à Sphinx avec Mysqli par exemple qui est intégré à PHP !

Ensuite, il suffit de requêter Sphinx pour obtenir les ID’s des posts qui correspondent à votre recherche, par exemple :

SELECT id FROM posts WHERE MATCH('@post_content *clement*') limit 100

Et voilà, la requête vous retournera tous les contenu dont un mot contiendra “clement”. Vous pouvez mixer les asterisk ou les supprimer pour effectuer des recherches de type “commence par”, “se termine par”, ou “mot égal”.

Une fois les ID’s récupéré, intégré les sous forme de liste (AND ID IN (…)) pour finaliser votre recherche WordPress.

Ceci est un bref aperçu, car il existe de nombreux filtres pour effectuer vos requêtes, combiner des champs, en exclure, mettre des poids sur les résultats pour les trier, mais tout ceci est bien présenté dans la documentation.

Commentaires