Category Archives: Solr

Solr query syntax

The main query for a solr search is specified via the q parameter.  Example of standard query is shown below:

http://localhost:8983/solr/query?q=test

If you add debug=query, you can see see how Solr is parsing your query. You can do as shown below:

http://localhost:8983/solr/query?debug=query&q=hello

Example of solr query is shown below:

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"hello",
      "debug":"query"}},
  "response":{"numFound":0,"start":0,"docs":[]
  },
  "debug":{
    "rawquerystring":"hello",
    "querystring":"hello",
    "parsedquery":"text:hello",
    "parsedquery_toString":"text:hello",
    "QParser":"LuceneQParser"}}
The response section will normally contain the top ranking documents for the query. In the above example, no document matched query.
In the debug section, one can see how the query was parsed, and the fact that text was used as the default field to search.
Basic Queries:
A “term” query is a single word query in a single field that must match exactly. For example,
text:hello
Here, ‘text’ is field name, and ‘hello' is the word we are going to match.
Phrase Query:
A phrase query matches multiple terms (words) in sequence.
text:”john smith”
This query will match text containing Yonik Seeley but will not match smith  v john or smith john .
Proximity Query:
A proximity query, is like a phrase query with a tilda (~) followed by a slop that specifies the number of term position moves (edits) allowed.text:”solr analytics”~1
This query will match text containing solr analytics, solr faceted analytics (edit distance 1), and analytics solr (edit distance 1). It will not match solr super faceted analytics or analytics faceted solr since those would both require an edit distance of 2 to get the terms into the matching positions.

Boolean Query:
A boolean query contains multiple clauses. A clause may be optional, mandatory, or prohibited.solr search
The default operator is “OR”, meaning that clauses are optional. When there are no mandatory clauses, at least one of the optional clauses in a query must match for the full query to match. The example query above will thus match documents containing solr or search (or both) in the default search field.

Boolean query examples:

+solr +search facet -highlight /* uses + for mandatory and – for prohibited */
solr AND search OR facet NOT highlight /* this is equivalent to the first query */
Semantics: solr and search must both match, highlight must not match. facet may or may not match but will contribute to the query score if it does (i.e. the presence of the facet only affects scores of matching documents, not which documents match.)

Boosted Query:
Any query clause can be boosted with the ^ operator. The boost is multiplied into the normal score for the clause and will affect its importance relative to other clauses.Boosted Query Examples:

text:solr^10 text:rocks
(inStock:true AND text:solr)^123.45 text:hi

Range Query:
A range query selects documents with values between a specified lower and upper bound. Range queries work on numeric fields, date fields, and even string and text fields.Range Query Examples:

age:[18 TO 30] // matches age 18-30 inclusive (endpoints 18 and 30 are included)
age:[65 TO *]        // “open ended” range matches age>65

Constant Score Query:
A ConstantScoreQuery is like a boosted query, but it produces the same score for every document that matches the query. The score produced is equal to the query boost. The ^= operator is used to turn any query clause into a ConstantScoreQuery.Constant Score Query Examples:

+color:blue^=1 text:shoes
(inStock:true text:solr)^=100 native code faceting

Filter Query:
A filter query retrieves a set of documents matching a query from the filter cache. Since scores are not cached, all documents that match the filter produce the same score (0 by default). Cached filters will be extremely fast when they are used again in another query.Filter Query Example:

description:HDTV OR filter(+promotion:tv +promotion_date:[NOW/DAY-7DAYS TO NOW/DAY+1DAY])

Query Comments:
One can embed comments in a query using C-style comments surrounded with /* */. Comments can be nested.Query Comments Example:
description:HDTV /* TODO: +promotion:tv +promotion_date:[NOW/DAY-7DAYS TO NOW/DAY+1DAY] */


ProsperaSoft offers Solr development solutions. You can email at info@prosperasoft.com to get in touch with ProsperaSoft Solr experts and consultants.

 

How to make search request on Solr?

Solr is able to achieve fast search responses because, instead of searching the text directly, it searches an index instead.

You can make search request on Solr in following ways:

Search Request Query example:

Suppose, we need to get result of all books with title ‘Java’.

Then we would need to write query as shown below:

 http://localhost:8983/solr/demo/query?
q=title_t:java
fl=author_s,title_t

Here, in above example ‘fl’ is used specify which fields should be returned from documents matching the query.

We should see a result like the following:

{“response”:{“numFound”:2,”start”:0,”docs”:[
{
"title_t":"The Java Book",
"author_s":"Abc"},
{
"title_t":"Java Black Ook",
"author_s":"Def"}]
}}

Solr Search Request in JSON:

If you prefer using JSON to search the index, you can use the JSON Request API:

$ curl http://localhost:8983/solr/demo/query -d '
{
  "query" : "title_t:java",
  "fields" : ["title_t", "author_s"]
}'
Sorting and Paging Search Results:
By default, Solr  will return the top 10 documents ordered by highest score (relevance) first. We can change this count as shown below:
$ curl http://localhost:8983/solr/demo/query -d ‘
q=*:*&
fq=publisher_s:Abc&  // filter query based on publisher
rows=3&
sort=pubyear_i desc& //sorts the “pubyear_i” field in descending
fl=title_t,pubyear_i’
To manage search result count in above query we use row=3.
And we get the response as requested:“response”:{“numFound”:5,”start”:0,”docs”:[
{
“pubyear_i”:1999,
“title_t”:["Abc"]},
{
“pubyear_i”:1996,
“title_t”:["Def"]},
{
“pubyear_i”:1992,
“title_t”:["Fgh"]}]
}


ProsperaSoft offers Solr development solutions. You can email at info@prosperasoft.com to get in touch with ProsperaSoft Solr  experts and consultants.

 

 

Faceted search in Solr

Faceting is the arrangement of search results into categories based on indexed terms.  Faceted search provides an effective way to allow users to refine search results, continually drilling down until the desired items are found.

Implementing Faceting with Solr:

It’s  simple to get faceting information from Solr, as there are few prerequisites. Solr offers the following types of faceting:

Field faceting – retrieve the counts for all terms, or just the top terms in any given field. The field must be indexed.
Query faceting – return the number of documents in the current search results that also match the given query.

Faceting commands are added to any normal Solr query request, and the faceting counts come back in the same query response.

Example of Field Facet:

Suppose, user entered following query in search bar.The Solr query to retrieve the top “camera” matches would be:

http://localhost:8983/solr/query?q=camera

And now we want to add search on manufacturers also. we will just need to add following:

&facet=true
&facet.field=manu  // “manu” field available in schema

The query response will now contain facet count information for the given fields in addition to the top matches for the query.

“facet_fields” : {
“manu” : [
"Canon USA" , 25,
"Olympus" , 21,
"Sony" , 12,
"Panasonic" , 9,
"Nikon" , 4 ],

}

Example of Query facet:

For query facet , we will simply write facet.query command to our query request.

Here, we want to fetch result whose price lies between $100 or less, $100-$200.

&facet=true
&facet.query=price:[* TO 100]
&facet.query=price:[100 TO 200]

Response would be as shown below:

“facet_queries” : {
“price:[* TO 100]” : 28,
“price:[100 TO 200]” : 54,
}

Now let’s assume that the user wants to drill down on the constraint $400-$500 from the Price facet to get a new set of results that include only cameras in that price range. For this we use the fq (filter query) parameter, which allows one to filter by a query. We’ll also send the relevant faceting commands again since we also want to update the facet counts.

http://localhost:8983/solr/query?q=camera&facet=on&facet.field=manu&facet.field=camera_type&fq=price:[400 to 500]

There are so many facet.* available that you can use in your search query.


ProsperaSoft offers Solr development solutions. You can email at info@prosperasoft.com to get in touch with ProsperaSoft Solr experts and consultants.

 

 

 

 

 

 

 

 

 

 

 

Analyzers, Tokenizers, and Filters in Solr

Understanding Analyzers, Tokenizers, and Filters in Solr:

Field analyzers are used both during ingestion, when a document is indexed, and at query time. An analyzer examines the text of fields and generates a token stream.

Tokenizers break field data into lexical units, or tokens.

Filters examine a stream of tokens and keep them, transform or discard them, or create new ones.

Analyzers:

An analyzer examines the text of fields and generates a token stream. Analyzers are specified as a child of the <fieldType> element in the schema.xml configuration file. For example:

<fieldType name=”nametext” class=”solr.TextField”>
<analyzer class=”org.apache.lucene.analysis.WhitespaceAnalyzer”/>
</fieldType>

In this case a single class, WhitespaceAnalyzer, is responsible for analyzing the content of the named text field and emitting the corresponding tokens.

Tokenizers:

The job of a tokenizer is to break up a stream of text into tokens, where each token is (usually) a sub-sequence of the characters in the text. An analyzer is aware of the field it is configured for, but a tokenizer is not. Tokenizers read from a character stream (a Reader) and produce a sequence of Token objects (a TokenStream). For example:

<fieldType name=”text” class=”solr.TextField”><analyzer>
<tokenizer class=”solr.StandardTokenizerFactory”/></analyzer>
</fieldType>

The class named in the tokenizer element is not the actual tokenizer, but rather a class that implements the org.apache.solr.analysis. TokenizerFactory interface. This factory class will be called upon to create new tokenizer instances as needed. Objects created by the factory must derive from org.apache.lucene.analysis.TokenStream, which indicates that they produce sequences of tokens. If the tokenizer produces tokens that are usable as is, it may be the only component of the analyzer. Otherwise, the tokenizer’s output tokens will serve as input to the first filter stage in the pipeline.

Filters:

Filters consume input and produce a stream of tokens.The job of a filter is usually easier than that of a tokenizer since in most cases a filter looks at each token in the stream sequentially and decides whether to pass it along, replace it or discard it. For example,

<fieldType name=”text” class=”solr.TextField”>
<analyzer>
<tokenizer class=”solr.StandardTokenizerFactory”/>
<filter class=”solr.StandardFilterFactory”/>
<filter class=”solr.LowerCaseFilterFactory”/>
<filter class=”solr.EnglishPorterFilterFactory”/>
</analyzer>
</fieldType>

This example starts with Solr’s standard tokenizer, which breaks the field’s text into tokens. Those tokens then pass through Solr’s standard filter, which removes dots from acronyms, and performs a few other common operations. All the tokens are then set to lowercase, which will facilitate case-insensitive matching at query time.


ProsperaSoft offers Solr development solutions. You can email at info@prosperasoft.com to get in touch with ProsperaSoft Solr experts and consultants.

 

Solr Spellcheck

If Solr SpellCheck not working properly then do following:

Before indexing,

Remove/comment Stopfilterfactory from both index and query in analyzer.

Don’t use KeywordTokenizerFactory because it takes terms as a whole string. Use StandardTokenizerFactory.


ProsperaSoft offers Solr development solutions. You can email at info@prosperasoft.com to get in touch with ProsperaSoft Solr experts and consultants.

How to do spellcheck in Solr?

To do spellCheck in Solr you need to follow below given steps:

1. In your solrconfig.xml make following changes:

Most important add solr.SpellCheckComponent to solrconfig.xml file. As shown below:

<searchComponent name=”spellcheck” class=”solr.SpellCheckComponent”> // class for spellCheck mechanism
<lst name=”spellchecker”>
<str name=”classname”>solr.IndexBasedSpellChecker</str>
<str name=”spellcheckIndexDir”>./spellchecker</str> /*directory name which holds the spellcheck index*/
<str name=”field”>content</str>
<str name=”buildOnCommit”>true</str>
</lst>
</searchComponent>

2. Add handler to use above defined components. As shown below:

<requestHandler name=”standard” class=”solr.SearchHandler” default=”true”>
<lst name=”defaults”>
<str name=”echoParams”>explicit</str>
<str name=”spellcheck”>true</str>
<str name=”spellcheck.collate”>true</str>
</lst>
<arr name=”last-components”>
<str>spellcheck</str>
</arr>
</requestHandler>

3. Now , make changes to schema.xml file:

     <field name=”content” type=”text” indexed=”true” stored=”false” multiValued=”true”/>

<fieldType name=”text” positionIncrementGap=”100″>
<analyzer>
<tokenizer class=”solr.WhitespaceTokenizerFactory”/>
<filter class=”solr.PatternReplaceFilterFactory” pattern=”‘” replacement=”” replace=”all” />
<filter class=”solr.WordDelimiterFilterFactory”
generateWordParts=”1″
generateNumberParts=”1″
catenateWords=”1″
stemEnglishPossessive=”0″
/>
<filter class=”solr.LowerCaseFilterFactory”/>
</analyzer>
</fieldType>

Now, your Solr is ready with spell Check Functionality.


ProsperaSoft offers Solr development solutions. You can email at info@prosperasoft.com to get in touch with ProsperaSoft Solr experts and consultants.

 

How to filter data based on type of user?

Solr does not have document-level security, so you would have to retrieve and index access control lists for each document. Then you need to apply a filter query to every search and pass in the user’s security group and/or username.

Let’s say your document is indexed like this, where the values for the multivalued field “access” is determined at index time by the actual permissions on the file:

<doc>
<field name=”id”>42</field>
<field name=”name”>Products.xlsx</field>
<field name=”title”>Product list</field>
<field name=”content”>…</field>
<field name=”access”>OFFICE\Manager</field>
<field name=”access”>OFFICE\Staff</field>
</doc>
Then you can decorate the query request handler with a default filter query parameter in solrconfig.xml:

<requestHandler name=”/select” class=”solr.SearchHandler”>
<defaults>
<str name=”fq”>access:”OFFICE\Everyone”</str>
</default>
</requestHandler>
Now searches by default will not return the Products.xlsx document, since the default ‘user’ that is impersonated (namely “OFFICE\Everyone”) does not appear in the “access” field. But as soon as you pass in an actual user’s group to override the default filter query, Solr will return the document:

/solr/collection1/select?q=content:”product x”&fq=access:”OFFICE\Manager”
Of course when the permissions change, the index must be updated as soon as possible to reflect this.


ProsperaSoft offers Solr development solutions. You can email at info@prosperasoft.com to get in touch with ProsperaSoft Solr experts and consultants.

 

How to stop solr using jetty?

To start and stop Solr using jetty we have following commands:

Starting solr with Jetty –

java -DSTOP.PORT=8079 -DSTOP.KEY=mysecret -jar start.jar

Stopping Solr with Jetty –

java -DSTOP.PORT=8079 -DSTOP.KEY=mysecret -jar start.jar –stop


ProsperaSoft offers Solr development solutions. You can email at info@prosperasoft.com to get in touch with ProsperaSoft Solr experts and consultants.

Solr Schema Design Considerations

Indexed Fields

The number of indexed fields greatly increases the following:

  • Memory usage during indexing
  • Segment merge time
  • Optimization times
  • Index size

These effects can be reduced by the use of omitNorms=”true”

omitNorms=true|false

    • This is arguably an advanced option.
    • Set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory). Only full-text fields or fields that need an index-time boost need norms.
  1. Choose between “text” and “string” field property

Stored fields

Retrieving the stored fields of a query result can be a significant expense. This cost is affected largely by the number of bytes stored per document–the higher byte count, the sparser the documents will be distributed on disk and more I/O is necessary to retrieve the fields (usually this is a concern when storing large fields, like the entire contents of a document).

Consider storing large fields outside of Solr. If you feel inclined to do so regardless, consider using compressed fields, which increase the CPU cost of storing and retrieving the fields, but lowers the I/O burden and CPU usage.

If you aren’t always using all the stored fields, then enabling lazy field loading can be a huge boon, especially if compressed fields are used.