Calculation of search result relevance
Search result relevance
The Wise catalog uses Solr to perform search requests. Solr evaluates all possible search terms (titles, links, etc.) in response to a search query. For each item, the system calculates a score to determine how relevant the item is to the query.
The score is calculated based on these factors:
Factor | Description |
---|---|
tf - Term Frequency | The frequency with which a term appears in the item record. Given a search query, the higher the term frequency, the higher the item score. |
idf - Inverse Document Frequency | The more rare a term is across all fields of the record, the higher it's contribution to the score. |
coord - Coordination Factor | The more query terms that are found in a record, the higher its score. |
FieldNorm - Field length | The more words that a field contains, the lower it's score. This factor deprioritizes items with longer field values. |
For more information about how these factors are calculated, see http://lucene.apache.org/core/3_5_0/api/core/org/apache/lucene/search/Similarity.html
Fields are weighted in the following manner:
Field | Weight |
---|---|
Title | 10.0 |
Series | 1.0 |
Author | 0.9 |
Other author | 0.6 |
Author synonym | 0.5 |
Topic | 0.5 |
tt_info | 0.2 |
ISXN | 0.1 |
PIM | 0.1 |
SISO | 0.1 |
PPN | 0.1 |
Occupations | 0.4 |
Purchase info | 0.1 |
Corporation | 0.4 |
Other corporation | 0.3 |
Based on the structure above, a match in the title field will weigh 10x more than a match in the Series field.
Note: It is possible to search the catalog based on ISBN, PPN and the rest of the fields included above.
Boost factors
In addition to the above calculations, the following boost factors have been added to make catalog search results more relevant to queries. These factors will be used to calculate the relevancy score of a title, and titles will be displayed in the results based on their overall relevancy score. The higher the relevancy score of a title, the higher it will be positioned in the search results.
Boost by position and exactness of field match
Titles will be boosted based on where the search query matches on the field and how exact the match is. For example, a title will be boosted slightly higher if the search query matches the start of a field compared to another title where the search query matches towards the end of the same field. The following are some of the factors we use to apply weightage to the matched titles:
- Exact phrase match of the search query at the start of the field
- Exact phrase match of the search query anywhere in the field
- Exact match of individual terms in the query
- Partial match of the search query
Boost by Popularity
The popularity of a title is now considered in Wise to calculate the relevancy score of a title. The WorldShare ILL request count over the lifetime of a title is used to calculate its popularity.
Note: The WorldShare ILL request count will be updated in Wise titles with every release.
Boost by Category
The search relevancy boost for media type and language can be defined in the Wise Manager. The weightage will be applied to the matched titles in the search result based on the boost defined for the category to which the title belongs.
Based on circulation data from the last year, the following boost values are defined as defaults. These will be applied to the search results by default unless changed in Wise Manager.
Media Type
- Book – Very High
- DVD – High
- Large Print – High
Language
- English – Very High
Boost factor configuration
For more information on configuring boost factors, see Configure boost factors for searching.