Index as a NumericFields using Sitecore Advanced Database Crawler - sitecore

I needed to search a price field using lucene range query. However the results it gives are not accurate or consistent since I am using a TermRangeQuery in Lucene.Net API. I believe that using NumericRangeQuery I could get accurate results. To use NumericRangeQuery the field needs to be indexed using NumericField. Is there a way I can do this with Advanced Database Crawler.
I tried to do this by altering the Advanced Database Crawler source code but it is not working for me.
These are the changes I have done in Advanced Database Crawler. In scSearchContrib.Crawler.Crawlers.AdvancedDatabaseCrawler class in the CreateField method I have added the following code.
if (name.EndsWith("numeric"))
{
field = new NumericField(name, storageType, true);
}
in the index configuration I have given the field name name and appended the text "numeric" to it. However I am correctly passing the fieldname by removing the "numeric" part.
when building the index I get a error like this.
Job started: RebuildSearchIndex|System.NullReferenceException: Object reference not set to an instance of an object.
at Lucene.Net.Store.IndexOutput.WriteString(String s)
at Lucene.Net.Index.FieldsWriter.WriteField(FieldInfo fi, Fieldable field)
at Lucene.Net.Index.StoredFieldsWriterPerThread.AddField(Fieldable field, FieldInfo fieldInfo)
at Lucene.Net.Index.DocFieldProcessorPerThread.ProcessDocument()
at Lucene.Net.Index.DocumentsWriter.UpdateDocument(Document doc, Analyzer analyzer, Term delTerm)
at Lucene.Net.Index.DocumentsWriter.AddDocument(Document doc, Analyzer analyzer)
at Lucene.Net.Index.IndexWriter.AddDocument(Document doc, Analyzer analyzer)
at Lucene.Net.Index.IndexWriter.AddDocument(Document doc)
at Sitecore.Search.IndexUpdateContext.AddDocument(Document document)
at Sitecore.Search.Crawlers.DatabaseCrawler.AddItem(Item item, IndexUpdateContext context)
at Sitecore.Search.Crawlers.DatabaseCrawler.AddTree(Item root, IndexUpdateContext context)
at Sitecore.Search.Crawlers.DatabaseCrawler.AddTree(Item root, IndexUpdateContext context)
at Sitecore.Search.Crawlers.DatabaseCrawler.AddTree(Item root, IndexUpdateContext context)
at Sitecore.Search.Index.Rebuild()
at Sitecore.Shell.Applications.Search.RebuildSearchIndex.RebuildSearchIndexForm.Builder.Build()|Job ended: RebuildSearchIndex (units processed: 1)
Can someone tell me a way to do this using Advanced Database Crawler?
Thanks in Advance

Even though I couldn't index as a numeric field found a work around for the problem. It is to index with padded zeros so the lucene TermRangeQuery give correct seach results. Every price is indexed with padded zeros so each value would contain 10 digits. That way the results I get are accurate.

Related

Mapping user spreadsheet columns to database fields

I’m not sure where to start on this project. I know how to read the contents of the excel spreadsheet, I know how to identify the header row, I know how to loop over the contents. I believe I have the UX portion worked out but I am not sure how to process the data.
I’ve googled and only found .Net solutions but I’m looking for a ColdFusion/Lucee solution.
I have a working form allowing me to map a user's spreasheet column to my database values (this is being kept simple for this post; user does not have direct access to the database).
Now that I have my data, I'm not sure how to loop over the data results. I believe there will be several loops (an outer and an inner). Then of course I also need to loop over the file contents but I think if I can get the headings mapped out,I can figure out the remaining.
Any good links, tutorials, or guides would be greatly appreciated.
Some pseudo code might be enough to get me started.
User uploads form
System reads headers and content.
User is presented form with a list of columns from their uploaded spreadsheet to match with available database fields (eg “column1” matches “customer name”.
User submits form.
Now what?
UPDATED
Here is what the data looks like AFTER the mapping has been done in my form. The column deliiter is the ::: and within the column the ||| indicates the ID associated with the selected column value. I've included the id and the column value since I plan on displaying the mapping again as a confirmation. Having the ID saves a trip to the database.
If I understand correctly, your question is: how do you provide the user a form allowing them to map their spreadsheet columns to that of the database
Since you have their spreadsheet column names, and you have the database column names, then this problem is essentially a UI/UX problem. You need to show both lists, and allow the user to map them. I can imagine several approaches to this. My first thought would be some sort of drag/drop operation, as follows:
Create a list of boxes, one for each field in your database table, and include the field name in (or above) the box. I'll call this the db field list. Then, create another list for each column from the spreadsheet, which I'll call the spreadsheet column list. The user would drag/drop items from the spreadsheet column list to the db field list.
When a mapping has been completed by the user, you would store the column/field names in as data for the DOM element of the db field list box. Then upon submission, you would acquire the mapping data by visiting each box and adding it to an array. Then you would serialize that array into JSON and send that to your form submission handler.
This could be difficult or easy, depending on your knowledge of UI implementations using JavaScript. jQuery makes this easy (if you know jQuery). There's even a jquery UI plugin that does this: https://jqueryui.com/droppable/.
A quick search for javascript drag drop would help, and here's a few articles I found:
https://www.w3schools.com/html/html5_draganddrop.asp
https://medium.com/quick-code/simple-javascript-drag-drop-d044d8c5bed5
You would also need to submit the array of mappings using javascript. You could search for that as well, and here's an article I found:
https://codereview.stackexchange.com/questions/94493/submit-an-array-as-an-html-form-value-using-javascript

Kibana/Elastic Regex Query Returns No Results

We have Logstash receiving syslog files and then storing these in an Elasticsearch index.
We are trying to query this index with Kibana to find some particular information but we cannot get the regex queries to work.
The log date we are trying to search within is below.
Field name = message
Field type = keyword
<14>1 2018-05-02T13:53:48.079000Z snrvro04 vco - - [liagent#6876
anctoken="" component="WorkflowManagementServiceImpl" context=""
filepath="/var/log/vco/app-server/integration-server.log"
instanceid="6a6dbf1d-2f72-45db-ab57-04b84aa97b90"
log_message="Workflow 'Get ID of
Workflow/8f59ca66-7472-4efa-ac5f-dfc34059c5f1' updated (with
content)." priority="INFO" product="vro" token="" user="" wfid=""
wfname="" wfstack=""] 2018-05-02 13:53:48.079+0000 vco:
[component="WorkflowManagementServiceImpl" priority="INFO"
thread="https-jsse-nio-0.0.0.0-8281-exec-7" user="" context=""
token="" wfid="" wfname="" anctoken="" wfstack=""
instanceid="6a6dbf1d-2f72-45db-ab57-04b84aa97b90"] Workflow 'Get ID of
Workflow/8f59ca66-7472-4efa-ac5f-dfc34059c5f1' updated (with content).
The information we are trying to search for is:
component="WorkflowManagementServiceImpl"
AND more importantly:
Workflow 'Get ID of Workflow/8f59ca66-7472-4efa-ac5f-dfc34059c5f1'
The top criteria should always be the same, but the Workflow name and ID will change. The only part that remains the same within this bit of text is Workflow ' and the final '
We are currently trying our queries against the Workflow name and ID to see if we can match on that, but our queries return no results.
The regex we currently have is as follows, and we have tried numerous alternatives.
/(?<=Workflow '.*\/)(.*')/
If we run the search * Workflow * (wildcard, without the spaces) - it returns everything with the word Workflow as expected.
If we run the search Workflow we get no results.
If anyone can provide pointers towards where we are going wrong, or getting confused, that would be great!
Thanks
We resolved this by using Grok filters in Logstash to organise/clean the data before it hits the Elasticsearch Indexes, then we were able to search successfully within Kibana.

Drupal 8: Altering Search API queries

I'm working on a project which includes the following activated modules:
Drupal core 8.2.3
Database Search 8.x-1.0-beta4
Search API 8.x-1.0-beta4
Search API Term Handlers 8.x-1.0-beta4
Views 8.2.3
I have a list of nids which need to be excluded from the search result of the site-wide search. The search uses Search API and has been setup using Views.
The table in the database is: "search_api_db_default_index"
The field I wish to target is: "nid"
I wasn't able to get HOOK__search_api_query_alter or HOOK_search_api_results_alter to fire, so I am attempting to manipulate the query through HOOK_views_query_alter.
I have attempted to use both the "addWhere" and "addCondition" methods with the following syntax:
When using the addCondition method, I attempted
$query->addCondition('search_api_db_default_index.nid', $oneBadNid, '<>');
and
$query->addCondition('search_api_db_default_index.nid', $manyBadNids, 'NOT IN');
and when using the addWhere method, I attempted
$query->addWhere('AND', 'search_api_index_default_index.nid', $oneBadNid, '<>');
and
$query->addWhere('AND', 'search_api_index_default_index.nid', $manyBadNids, 'NOT IN');
Regardless of whether or not I prefix the field with the table name, searching always results in triggering the following notice:
Unknown field in filter clause: 'search_api_db_default_index.nid' .
It seems that the field name is always wrapped in an html encoded string representing a single quotation, but this occurs both when using double quotations or single quotations around the supplied table.field parameter.
I am not even sure that this is what is keeping me from altering my query, but it is the only thing close to an error which I have discovered in this process. It's also possible that I'm simply not supposed to be targeting the table in the manner written, but I did not find any documentation directing me to the proper methodology.
I would appreciate any insight into this issue! Thanks!
Generally you can use
$fields = $query->getIndex()->getFields();
on the query to get an array of fields you can use within the search_api query.
Piggy-backing off of Nebel54's comment, and attempting this on my own, you don't need to include the 'table' name when setting the addCondition. However, I did need to use hook_search_api_query_alter over a views-specific one.
function mymodule_search_api_query_alter(\Drupal\search_api\Query\QueryInterface &$query) {
// Ensure field_myfield is being indexed
$fields = $query->getIndex()->getFields();
if (isset($fields['field_myfield'])) {
$query->addCondition('field_myfield', 'myvalue', '<>');
}
}

Sitecore Multilist with search returns nothing on the second page

I am having problem with a "Multilist with search" field. This is a Sitecore 8 instance. the field is using a query like this to fetch a list from a lucene search index named "agents_master_index" :
TemplateFilter={3EA2CB30-0D04-4D73-9282-0103D8F34074} & StartSearchLocation={95A07C68-36B6-4D0D-AAE3-A2BFBF40C2C6}&SortField=Agent Name
I have multiple issues:
1) When I open an Item based on this template, it is very slow but it ultimately returns some results on first page of the list, however the pagination and go-to-item buttons are not working and the field is not showing number of pages.
2) If I try template's standard values' item, the above problem doesn't happen but if I click on "next page" button, it returns nothing.
I looked into Search log file to see what's going on. Turns out on when it successfully returns the first page results, it is executing following query:
4832 12:32:34 INFO ExecuteQueryAgainstLucene (agents_master_index): +_datasource:sitecore +(+(+_path:11111111111111111111111111111111 +_latestversion:1) +(+_path:95a07c6836b64d0daae3a2bfbf40c2c6 +_template:3ea2cb300d044d7392820103d8f34074)) - Filter :
but to return the second page of results, the multilist runs this query:
http://localhost/sitecore/shell/Applications/Buckets/Services/Search.ashx?fromBucketListField=*&sort=Agent%20Name&template={3EA2CB30-0D04-4D73-9282-0103D8F34074}&location=95a07c6836b64d0daae3a2bfbf40c2c6&pageSize=10&pageNumber=2&sc_content=master
An returns this JSON-like result which is basically empty:
({"CurrentPage":1,"Location":"current item","PageNumbers":0,"SearchCount":"0","SearchTime":"04.3539","facets":null,"items":[],"launchType":"contenteditor:launchtab","ContextData":[],"ContextDataView":[]})
and in the search log file what is actually being executed is this:
4832 12:28:05 INFO Search Query : +(_content:* _name:* _displayname:*) +_template:3ea2cb300d044d7392820103d8f34074 +_path:95a07c6836b64d0daae3a2bfbf40c2c6
4832 12:28:05 INFO Search Index : sitecore_index
4832 12:28:05 INFO Search Took : 4346ms
I don't understand why to retrieve the second page its looking into sitecore_index instead of "agents_master_index". What is wrong here? Should i fix my query? How can I force it to pick the correct lucene index (if this is the reason behind all this confusing problem?
Any help or insight is greatly appreciated.
Edit
By the way, the StartSearchLocation is referring to an items bucket. The "agents_master_index" is referring to same location in its definition.
UPDATE
OK, so far I have managed to work around the second problem. After exchanging some comments with Richard, I concluded that (at least in Sitecore 8) content editor expects to find items which we want to search in multilist component in the same index which "Root" item exists ( {11111111-1111-1111-1111-111111111111} ) so I just added same crawler we had in "agents_master_index" to the "sitecore_index" and it worked!
However, this is still working only on template's standard values item. So the first problem has not been solved yet. In other words, the multilist doesn't work when clicking on second page or do any search on the items which have been created based on that template, it only works properly on the template itself (the standard values item)
I came across this Fix for Sitecore Multilist and TreeList with Search Bug which looks very similar to my problem, I tried it but it didn't worked for me :(

Solr + Haystack searching

I am trying to implement a search engine for a new app.
The app allows people to rate items (+1 or -1) - Giving the items a +ve or -ve score.
When people search for items, I'd like to take into account their rating and to order the results accordingly. If the item is a match, it should show up. But if it's a match with a high score it should be boosted up the results a bit.
A really good match should win over a fairly good match with a high score, so it needs to be weighted along with the rest of it (i.e. I boosted my titles a bit).
Not stuck on Solr by any means, only just started playing today.
With Solr, you can maintain a field with the document which holds the difference.
The difference can be between the total +1ve's and the -1ve's.
Solr allows you to boost on field values using function queries.
So you can query with the boost on the difference field, with documents with better difference scoring over others.
From indexing front, as this difference would change quite often, the respective document needs to be updated everytime.
Solr does not allow the updation of the single field, so you need to handle the incremental updates of the difference field.
If that would be a concern to you, can try using ExternalFileField.
This allows mapping of certain fields of documents such as ranking, popularity external to the index in a separate file.
The file can be updated and index committed to reflect the changes.
The field can also be used with function queries to boost the results as needed, however have lot of limitations.
You can order your results by a field that stores the ranking.
sqs.filter(content='blah').order_by('rating')