SOLR: (field-based) relative Term Frequency driving result order -


we consolidating our collected content on record in single content field, main source solr. problem records content field has 100k characters, others 10m or more.

as result, search on term push 10m character records top of result list.

we limit/counterbalance introducing "relative term frequency" eg number of occurrences divided total number of words in content field. since don't know terms people search on, (i think) cannot calculate @ indexing time.

any suggestions/ideas on how this?

you can start custom similarity class.
allow modify above parameters , scoring factors.
need check tf (term frequency) method , customized it.
custom similarity class can refereed schema.xml file.

check lucene defaultsimilarity class reference actual implementation.

also check changing similarity


Comments

Popular posts from this blog

html - How to style widget with post count different than without post count -

How to remove text and logo OR add Overflow on Android ActionBar using AppCompat on API 8? -

IIS->Tomcat Redirect: multiple worker with default -