sphinx sql_joined_field的妙用及设置技巧

主题:sphinx sql_joined_field的妙用及设置技巧

在sphinx 0.9.9版中,增加了sql_joined_field字段设置,但它的存在价值在哪里呢?

行内的人,可以用mysql的连表查询来组织需要的几乎所有的数据,对于sql_joined_field的功能可能不太有感觉

sql_joined_field:SQL连接字段设置

连接/有效载荷字段获取查询。 多值选项,可选,默认值为空。 仅对SQL数据源有效 (mysql, pgsql, mssql) .

sql_joined_field 提供两种不同的方式:连接字段,或者有效载荷(有效载荷字段)。其语法格式如下:

sql_joined_field = FIELD-NAME 'from' ( 'query' | 'payload-query' ); \ QUERY [ ; RANGE-QUERY ]

如:

sql_joined_field = \
atags from query; \
SELECT d.document_public_id,da.name FROM zjs_document_attribute_content_public da \
INNER JOIN zjs_document_public d ON d.document_id = da.document_id \
ORDER BY d.document_public_id ASC
参数说明:

where

FIELD-NAME 是 连接/有效载荷 字段名称;
QUERY 是一个用于获取数据到索引的SQL查询.
RANGE-QUERY 是一个可选的用于获取范围知道索引的SQL查询. (版本2.0.1-beta增加.)

Joined fields let you avoid JOIN and/or GROUP_CONCAT statements in the main document fetch query (sql_query). This can be useful when SQL-side JOIN is slow, or needs to be offloaded on Sphinx side, or simply to emulate MySQL-specific GROUP_CONCAT funcionality in case your database server does not support it.

The query must return exactly 2 columns: document ID, and text to append to a joined field. Document IDs can be duplicate, but they must be in ascending order. All the text rows fetched for a given ID will be concatented together, and the concatenation result will be indexed as the entire contents of a joined field. Rows will be concatenated in the order returned from the query, and separating whitespace will be inserted between them. For instance, if joined field query returns the following rows:

( 1, 'red' ) ( 1, 'right' ) ( 1, 'hand' ) ( 2, 'mysql' ) ( 2, 'sphinx' )

then the indexing results would be equivalent to that of adding a new text field with a value of 'red right hand' to document 1 and 'mysql sphinx' to document 2.

Joined fields are only indexed differently. There are no other differences between joined fields and regular text fields.

Starting with 2.0.1-beta, ranged queries can be used when a single query is not efficient enough or does not work because of the database driver limitations. It works similar to the ranged queries in the main indexing loop, 参见 第 3.7 节 “区段查询”. The range will be queried for and fetched upfront once, then multiple queries with different$start and $end substitutions will be run to fetch the actual data.

Payloads let you create a special field in which, instead of keyword positions, so-called user payloads are stored. Payloads are custom integer values attached to every keyword. They can then be used in search time to affect the ranking.

The payload query must return exactly 3 columns: document ID; keyword; and integer payload value. Document IDs can be duplicate, but they must be in ascending order. Payloads must be unsigned integers within 24-bit range, ie. from 0 to 16777215. For reference, payloads are currently internally stored as in-field keyword positions, but that is not guaranteed and might change in the future.

Currently, the only method to account for payloads is to use SPH_RANK_PROXIMITY_BM25 ranker. On indexes with payload fields, it will automatically switch to a variant that matches keywords in those fields, computes a sum of matched payloads multiplied by field wieghts, and adds that sum to the final rank.
示例:sql_joined_field = \
tagstext from query; \
SELECT docid, CONCAT('tag',tagid) FROM tags ORDER BY docid ASC

发表评论

电子邮件地址不会被公开。 必填项已用*标注