SOLR: (field-based) relative Term Frequency driving result order -
we consolidating our collected content on record in single content field, main source solr. problem records content field has 100k characters, others 10m or more.
as result, search on term push 10m character records top of result list.
we limit/counterbalance introducing "relative term frequency" eg number of occurrences divided total number of words in content field. since don't know terms people search on, (i think) cannot calculate @ indexing time.
any suggestions/ideas on how this?
you can start custom similarity class.
allow modify above parameters , scoring factors.
need check tf
(term frequency) method , customized it.
custom similarity class can refereed schema.xml file.
check lucene defaultsimilarity class reference actual implementation.
also check changing similarity
Comments
Post a Comment