Tag Archives: Solr

Effective Use of Technology in Legal Process

 RESEARCHING FOR YOUR CASE – A CURSE OR BOON?

Being into legal domain, I’m sure research is the backbone for building a strong case. Research is one of the most vital and time-consuming activities in a lawyer’s workload. While your industry has innovated by improving search efficiency, the effectiveness of search largely relies on the research expertise of lawyers, paralegals and law librarians.

The vast growth of available data adds additional challenges to the task of identifying the most critical and relevant information to a case. You might be dependent on the duopoly of Lexis.com and WestLaw to provide vast information databases, combined with your internal expertise and experience in searching, to uncover relevant results.

It is worth mentioning that these giants are capable for relevant data but it’s true sometimes you may not require the features for which you are paying for or may become tedious to explore them or you may need some custom features which are not in these products!

You may be using “pay-per-search” or flat-rate subscription model, which maybe forcing you to be judicious in your use of search. You may often negotiate flat-rate subscriptions to large comprehensive databases.

START QUESTIONING NOW…

Have you ever thought of finding efficiency while not sacrificing accuracy and comprehensiveness? As such, any ability that you have to quickly obtain the most accurate and complete set of results is likely to help you meet this efficiency goal and client needs.

Now question arises – What alternatives do you that will give accuracy and efficient results?

AND THE SOLUTION IS….

Here it is –You can build a search engine solutions that will help your business better organize, access, and search your digital content and the complete system will become handy.

So next time when you research on

  • State Law
  • Federal Law
  • Analytical Materials
  • Public Records
  • News and Periodicals
  • Patent
  • Cases/sues

Have your own enterprise search solutions and access your data of federal and state court decisions, statutes, regulations, court rules, topical databases, legal newspapers and periodicals, as well as law and related information from common law countries very easily and handy and making your research part as a boon!

TECHNOLOGY TO LOOK UPON…

There are number of technologies such as Lucene, Solr, dtSearch, elasticsearch, Nutch, Hibernate Search and Hadoop which can be used for search based solutions.

Look into technologies like Apache Solr – an HTTP search server built on top of Lucene.  Specifically Solr gives you REST-like HTTP/XML and JSON APIs.

What Solr gives you is the following:

  • Full-text search
  • Highlighting of hits
  • Faceted search
  • Dynamic clustering
  • Database integration
  • Rich document handling of formats like Word and PDF

LEGAL AND SOLR GO HAND IN HAND

 Now coming to your Legal domain, then here is how Solr can be useful:

  • E-discovery and forensics search: Server packaging for Lucene provides a wide range of enterprise search functions and a convenient RESTful/xml interface for e-discovery and forensics based search solutions.
  • Trademark search: We can use Solr for Trademark search. Solr is a highly tunable search engine and can be customized to search for trademarks with high precision and relevance. Trademarks can be searched using different algorithms – exact, slop match, fuzzy match, phonetic, synonym, starts with, contains, ends with, sounds-like, etc – and ranked in the decreasing order of relevance.
  • Patent Search: Patent can be filtered or faceted based on their categories, goods and services and other parameters as required. The results can also be sorted in the order as per requirement with some configurations and customizations.
  • Cases/sues Search: Solr offers lightning-fast response times for queries. If you are searching for your earlier cases or law orders, solr can be tuned to return results with high precision

And many more as per your field and practice.

ARE YOU DOUBTING ON SOLR? THEN WAIT

Are there companies using Solr today?  To be convinced that Solr is actually used in a lot of enterprise projects, take a look at this amazing list of public projects powered by Solr

Whom to look upon to build a custom search solution

ProsperaSoft offers Full text search. ProsperaSoft can help you with building a search solution at a very minimal and affordable budget. ProsperaSoft provides enterprise search solutions, data ingestion, and classification and taxonomy solutions for clients across a wide range of sectors including legal services, news, digital publishing, media monitoring, e-commerce, recruitment to customers by leveraging open-source and proprietary search solutions such as Lucene, Solr, dtSearch, elasticsearch, Nutch, Hibernate Search and Hadoop.

With years of experience in the field and quest for delivering quality enterprise search solutions that meet our customers exact requirements, we can provide reliable, accurate, fast and economical search solutions.

WANT TO KNOW MORE?

Just get in touch with us and we can discuss how ProsperaSoft can contribute to your research

 

 

 

 

 

 

 

 

 

Solr query syntax

The main query for a solr search is specified via the q parameter.  Example of standard query is shown below:

http://localhost:8983/solr/query?q=test

If you add debug=query, you can see see how Solr is parsing your query. You can do as shown below:

http://localhost:8983/solr/query?debug=query&q=hello

Example of solr query is shown below:

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"hello",
      "debug":"query"}},
  "response":{"numFound":0,"start":0,"docs":[]
  },
  "debug":{
    "rawquerystring":"hello",
    "querystring":"hello",
    "parsedquery":"text:hello",
    "parsedquery_toString":"text:hello",
    "QParser":"LuceneQParser"}}
The response section will normally contain the top ranking documents for the query. In the above example, no document matched query.
In the debug section, one can see how the query was parsed, and the fact that text was used as the default field to search.
Basic Queries:
A “term” query is a single word query in a single field that must match exactly. For example,
text:hello
Here, ‘text’ is field name, and ‘hello' is the word we are going to match.
Phrase Query:
A phrase query matches multiple terms (words) in sequence.
text:”john smith”
This query will match text containing Yonik Seeley but will not match smith  v john or smith john .
Proximity Query:
A proximity query, is like a phrase query with a tilda (~) followed by a slop that specifies the number of term position moves (edits) allowed.text:”solr analytics”~1
This query will match text containing solr analytics, solr faceted analytics (edit distance 1), and analytics solr (edit distance 1). It will not match solr super faceted analytics or analytics faceted solr since those would both require an edit distance of 2 to get the terms into the matching positions.

Boolean Query:
A boolean query contains multiple clauses. A clause may be optional, mandatory, or prohibited.solr search
The default operator is “OR”, meaning that clauses are optional. When there are no mandatory clauses, at least one of the optional clauses in a query must match for the full query to match. The example query above will thus match documents containing solr or search (or both) in the default search field.

Boolean query examples:

+solr +search facet -highlight /* uses + for mandatory and – for prohibited */
solr AND search OR facet NOT highlight /* this is equivalent to the first query */
Semantics: solr and search must both match, highlight must not match. facet may or may not match but will contribute to the query score if it does (i.e. the presence of the facet only affects scores of matching documents, not which documents match.)

Boosted Query:
Any query clause can be boosted with the ^ operator. The boost is multiplied into the normal score for the clause and will affect its importance relative to other clauses.Boosted Query Examples:

text:solr^10 text:rocks
(inStock:true AND text:solr)^123.45 text:hi

Range Query:
A range query selects documents with values between a specified lower and upper bound. Range queries work on numeric fields, date fields, and even string and text fields.Range Query Examples:

age:[18 TO 30] // matches age 18-30 inclusive (endpoints 18 and 30 are included)
age:[65 TO *]        // “open ended” range matches age>65

Constant Score Query:
A ConstantScoreQuery is like a boosted query, but it produces the same score for every document that matches the query. The score produced is equal to the query boost. The ^= operator is used to turn any query clause into a ConstantScoreQuery.Constant Score Query Examples:

+color:blue^=1 text:shoes
(inStock:true text:solr)^=100 native code faceting

Filter Query:
A filter query retrieves a set of documents matching a query from the filter cache. Since scores are not cached, all documents that match the filter produce the same score (0 by default). Cached filters will be extremely fast when they are used again in another query.Filter Query Example:

description:HDTV OR filter(+promotion:tv +promotion_date:[NOW/DAY-7DAYS TO NOW/DAY+1DAY])

Query Comments:
One can embed comments in a query using C-style comments surrounded with /* */. Comments can be nested.Query Comments Example:
description:HDTV /* TODO: +promotion:tv +promotion_date:[NOW/DAY-7DAYS TO NOW/DAY+1DAY] */


ProsperaSoft offers Solr development solutions. You can email at info@prosperasoft.com to get in touch with ProsperaSoft Solr experts and consultants.

 

How to make search request on Solr?

Solr is able to achieve fast search responses because, instead of searching the text directly, it searches an index instead.

You can make search request on Solr in following ways:

Search Request Query example:

Suppose, we need to get result of all books with title ‘Java’.

Then we would need to write query as shown below:

 http://localhost:8983/solr/demo/query?
q=title_t:java
fl=author_s,title_t

Here, in above example ‘fl’ is used specify which fields should be returned from documents matching the query.

We should see a result like the following:

{“response”:{“numFound”:2,”start”:0,”docs”:[
{
"title_t":"The Java Book",
"author_s":"Abc"},
{
"title_t":"Java Black Ook",
"author_s":"Def"}]
}}

Solr Search Request in JSON:

If you prefer using JSON to search the index, you can use the JSON Request API:

$ curl http://localhost:8983/solr/demo/query -d '
{
  "query" : "title_t:java",
  "fields" : ["title_t", "author_s"]
}'
Sorting and Paging Search Results:
By default, Solr  will return the top 10 documents ordered by highest score (relevance) first. We can change this count as shown below:
$ curl http://localhost:8983/solr/demo/query -d ‘
q=*:*&
fq=publisher_s:Abc&  // filter query based on publisher
rows=3&
sort=pubyear_i desc& //sorts the “pubyear_i” field in descending
fl=title_t,pubyear_i’
To manage search result count in above query we use row=3.
And we get the response as requested:“response”:{“numFound”:5,”start”:0,”docs”:[
{
“pubyear_i”:1999,
“title_t”:["Abc"]},
{
“pubyear_i”:1996,
“title_t”:["Def"]},
{
“pubyear_i”:1992,
“title_t”:["Fgh"]}]
}


ProsperaSoft offers Solr development solutions. You can email at info@prosperasoft.com to get in touch with ProsperaSoft Solr  experts and consultants.

 

 

Faceted search in Solr

Faceting is the arrangement of search results into categories based on indexed terms.  Faceted search provides an effective way to allow users to refine search results, continually drilling down until the desired items are found.

Implementing Faceting with Solr:

It’s  simple to get faceting information from Solr, as there are few prerequisites. Solr offers the following types of faceting:

Field faceting – retrieve the counts for all terms, or just the top terms in any given field. The field must be indexed.
Query faceting – return the number of documents in the current search results that also match the given query.

Faceting commands are added to any normal Solr query request, and the faceting counts come back in the same query response.

Example of Field Facet:

Suppose, user entered following query in search bar.The Solr query to retrieve the top “camera” matches would be:

http://localhost:8983/solr/query?q=camera

And now we want to add search on manufacturers also. we will just need to add following:

&facet=true
&facet.field=manu  // “manu” field available in schema

The query response will now contain facet count information for the given fields in addition to the top matches for the query.

“facet_fields” : {
“manu” : [
"Canon USA" , 25,
"Olympus" , 21,
"Sony" , 12,
"Panasonic" , 9,
"Nikon" , 4 ],

}

Example of Query facet:

For query facet , we will simply write facet.query command to our query request.

Here, we want to fetch result whose price lies between $100 or less, $100-$200.

&facet=true
&facet.query=price:[* TO 100]
&facet.query=price:[100 TO 200]

Response would be as shown below:

“facet_queries” : {
“price:[* TO 100]” : 28,
“price:[100 TO 200]” : 54,
}

Now let’s assume that the user wants to drill down on the constraint $400-$500 from the Price facet to get a new set of results that include only cameras in that price range. For this we use the fq (filter query) parameter, which allows one to filter by a query. We’ll also send the relevant faceting commands again since we also want to update the facet counts.

http://localhost:8983/solr/query?q=camera&facet=on&facet.field=manu&facet.field=camera_type&fq=price:[400 to 500]

There are so many facet.* available that you can use in your search query.


ProsperaSoft offers Solr development solutions. You can email at info@prosperasoft.com to get in touch with ProsperaSoft Solr experts and consultants.

 

 

 

 

 

 

 

 

 

 

 

Analyzers, Tokenizers, and Filters in Solr

Understanding Analyzers, Tokenizers, and Filters in Solr:

Field analyzers are used both during ingestion, when a document is indexed, and at query time. An analyzer examines the text of fields and generates a token stream.

Tokenizers break field data into lexical units, or tokens.

Filters examine a stream of tokens and keep them, transform or discard them, or create new ones.

Analyzers:

An analyzer examines the text of fields and generates a token stream. Analyzers are specified as a child of the <fieldType> element in the schema.xml configuration file. For example:

<fieldType name=”nametext” class=”solr.TextField”>
<analyzer class=”org.apache.lucene.analysis.WhitespaceAnalyzer”/>
</fieldType>

In this case a single class, WhitespaceAnalyzer, is responsible for analyzing the content of the named text field and emitting the corresponding tokens.

Tokenizers:

The job of a tokenizer is to break up a stream of text into tokens, where each token is (usually) a sub-sequence of the characters in the text. An analyzer is aware of the field it is configured for, but a tokenizer is not. Tokenizers read from a character stream (a Reader) and produce a sequence of Token objects (a TokenStream). For example:

<fieldType name=”text” class=”solr.TextField”><analyzer>
<tokenizer class=”solr.StandardTokenizerFactory”/></analyzer>
</fieldType>

The class named in the tokenizer element is not the actual tokenizer, but rather a class that implements the org.apache.solr.analysis. TokenizerFactory interface. This factory class will be called upon to create new tokenizer instances as needed. Objects created by the factory must derive from org.apache.lucene.analysis.TokenStream, which indicates that they produce sequences of tokens. If the tokenizer produces tokens that are usable as is, it may be the only component of the analyzer. Otherwise, the tokenizer’s output tokens will serve as input to the first filter stage in the pipeline.

Filters:

Filters consume input and produce a stream of tokens.The job of a filter is usually easier than that of a tokenizer since in most cases a filter looks at each token in the stream sequentially and decides whether to pass it along, replace it or discard it. For example,

<fieldType name=”text” class=”solr.TextField”>
<analyzer>
<tokenizer class=”solr.StandardTokenizerFactory”/>
<filter class=”solr.StandardFilterFactory”/>
<filter class=”solr.LowerCaseFilterFactory”/>
<filter class=”solr.EnglishPorterFilterFactory”/>
</analyzer>
</fieldType>

This example starts with Solr’s standard tokenizer, which breaks the field’s text into tokens. Those tokens then pass through Solr’s standard filter, which removes dots from acronyms, and performs a few other common operations. All the tokens are then set to lowercase, which will facilitate case-insensitive matching at query time.


ProsperaSoft offers Solr development solutions. You can email at info@prosperasoft.com to get in touch with ProsperaSoft Solr experts and consultants.

 

Solr Spellcheck

If Solr SpellCheck not working properly then do following:

Before indexing,

Remove/comment Stopfilterfactory from both index and query in analyzer.

Don’t use KeywordTokenizerFactory because it takes terms as a whole string. Use StandardTokenizerFactory.


ProsperaSoft offers Solr development solutions. You can email at info@prosperasoft.com to get in touch with ProsperaSoft Solr experts and consultants.

How to do spellcheck in Solr?

To do spellCheck in Solr you need to follow below given steps:

1. In your solrconfig.xml make following changes:

Most important add solr.SpellCheckComponent to solrconfig.xml file. As shown below:

<searchComponent name=”spellcheck” class=”solr.SpellCheckComponent”> // class for spellCheck mechanism
<lst name=”spellchecker”>
<str name=”classname”>solr.IndexBasedSpellChecker</str>
<str name=”spellcheckIndexDir”>./spellchecker</str> /*directory name which holds the spellcheck index*/
<str name=”field”>content</str>
<str name=”buildOnCommit”>true</str>
</lst>
</searchComponent>

2. Add handler to use above defined components. As shown below:

<requestHandler name=”standard” class=”solr.SearchHandler” default=”true”>
<lst name=”defaults”>
<str name=”echoParams”>explicit</str>
<str name=”spellcheck”>true</str>
<str name=”spellcheck.collate”>true</str>
</lst>
<arr name=”last-components”>
<str>spellcheck</str>
</arr>
</requestHandler>

3. Now , make changes to schema.xml file:

     <field name=”content” type=”text” indexed=”true” stored=”false” multiValued=”true”/>

<fieldType name=”text” positionIncrementGap=”100″>
<analyzer>
<tokenizer class=”solr.WhitespaceTokenizerFactory”/>
<filter class=”solr.PatternReplaceFilterFactory” pattern=”‘” replacement=”” replace=”all” />
<filter class=”solr.WordDelimiterFilterFactory”
generateWordParts=”1″
generateNumberParts=”1″
catenateWords=”1″
stemEnglishPossessive=”0″
/>
<filter class=”solr.LowerCaseFilterFactory”/>
</analyzer>
</fieldType>

Now, your Solr is ready with spell Check Functionality.


ProsperaSoft offers Solr development solutions. You can email at info@prosperasoft.com to get in touch with ProsperaSoft Solr experts and consultants.

 

How to filter data based on type of user?

Solr does not have document-level security, so you would have to retrieve and index access control lists for each document. Then you need to apply a filter query to every search and pass in the user’s security group and/or username.

Let’s say your document is indexed like this, where the values for the multivalued field “access” is determined at index time by the actual permissions on the file:

<doc>
<field name=”id”>42</field>
<field name=”name”>Products.xlsx</field>
<field name=”title”>Product list</field>
<field name=”content”>…</field>
<field name=”access”>OFFICE\Manager</field>
<field name=”access”>OFFICE\Staff</field>
</doc>
Then you can decorate the query request handler with a default filter query parameter in solrconfig.xml:

<requestHandler name=”/select” class=”solr.SearchHandler”>
<defaults>
<str name=”fq”>access:”OFFICE\Everyone”</str>
</default>
</requestHandler>
Now searches by default will not return the Products.xlsx document, since the default ‘user’ that is impersonated (namely “OFFICE\Everyone”) does not appear in the “access” field. But as soon as you pass in an actual user’s group to override the default filter query, Solr will return the document:

/solr/collection1/select?q=content:”product x”&fq=access:”OFFICE\Manager”
Of course when the permissions change, the index must be updated as soon as possible to reflect this.


ProsperaSoft offers Solr development solutions. You can email at info@prosperasoft.com to get in touch with ProsperaSoft Solr experts and consultants.

 

How to stop solr using jetty?

To start and stop Solr using jetty we have following commands:

Starting solr with Jetty –

java -DSTOP.PORT=8079 -DSTOP.KEY=mysecret -jar start.jar

Stopping Solr with Jetty –

java -DSTOP.PORT=8079 -DSTOP.KEY=mysecret -jar start.jar –stop


ProsperaSoft offers Solr development solutions. You can email at info@prosperasoft.com to get in touch with ProsperaSoft Solr experts and consultants.

Solr Schema Design Considerations

Indexed Fields

The number of indexed fields greatly increases the following:

  • Memory usage during indexing
  • Segment merge time
  • Optimization times
  • Index size

These effects can be reduced by the use of omitNorms=”true”

omitNorms=true|false

    • This is arguably an advanced option.
    • Set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory). Only full-text fields or fields that need an index-time boost need norms.
  1. Choose between “text” and “string” field property

Stored fields

Retrieving the stored fields of a query result can be a significant expense. This cost is affected largely by the number of bytes stored per document–the higher byte count, the sparser the documents will be distributed on disk and more I/O is necessary to retrieve the fields (usually this is a concern when storing large fields, like the entire contents of a document).

Consider storing large fields outside of Solr. If you feel inclined to do so regardless, consider using compressed fields, which increase the CPU cost of storing and retrieving the fields, but lowers the I/O burden and CPU usage.

If you aren’t always using all the stored fields, then enabling lazy field loading can be a huge boon, especially if compressed fields are used.