Save substring from field to copyfield using regex in Solr

Save substring from field to copyfield using regex in Solr - regex

I'm importing data from mysql table using import handler. I have a column msg, of type text. Using regex, I have to save substring in a copy field.
msg: 94eb2c0cb17ef354bb052c57f40c\r\nContent-Type: text/plain; charset=UTF-8\r\nContent-Transfer-Encoding pnr:986|0978325
Expected Solr result:
{
"msg_body": "94eb2c0cb17ef354bb052c57f40c\\r\\nContent-Type: text/plain; charset=UTF-8\\r\\nContent-Transfer-Encoding pnr:986-0978325",
"pnr_number": "pnr:986-0978325"
}
My REGEX:
(pnr|(P|p)[ _.:,!"'-/$](N|n)[ _.:,!"'-/$](R|r))+[ _.:,!"'-/$]+[0-9]{3}[ _.:,!"'-/$]+[0-9]{7}
Please help me out as i'm new to solr

You'll need to define a custom field for pnr_number.
Use a copyField to copy msg_body to pnr_number
In the custom field definition, use
<filter class="solr.PatternCaptureGroupFilterFactory" pattern="regex goes here" preserve_original="false"/>

Since you are using Data Import Handler, you have 3 options:
Use a Regex Transformer in DIH definition.
Use a RegexReplaceProcessorFactory Update Request Processor (in solrconfig.xml).
Use a Regex filter in the analyzer chain
With the first two options, the regex will extract the pattern before the field is actually indexed. In the last option, the stored representation (if you store the field) will contain the original full string, but the indexed (searchable) representation will contain regex match.

Related

Django: display a preview of an object's attribute - Class based views

Using a ListView Class-based-view, I am looping over the objects present in the database of a certain model in my HTML template, and, for instance, I can access an object's "body_text" attribute with the following syntax: {{object.body_text}}
What if I wanted to only show the first 20 characters of that "body_text" attribute in my HTML template?
How can I set that?

1st Method
Use the truncatechars filter in your HTML template.Truncates a string if it is longer than the specified number of characters. Truncated strings will end with a translatable ellipsis character (“…”).
{{object.body_text|truncatechars:20}}
Reference:
https://docs.djangoproject.com/en/dev/ref/templates/builtins/?from=olddocs#truncatechars
2nd Method
Use the slice filter in your HTML template.
{{object.body_text|slice:":20"}}
Referernce: https://docs.djangoproject.com/en/dev/ref/templates/builtins/#slice

Found it, eventually.
Use the |slice:":" filter in your HTML template.
For example, if you only want to display the first 10 characters of a given attribute, use:
{{object.body_text|slice:":10"}}

Search and Replace Filter in Google Analytics

I'm using filters to consolidate hits to URLS with different variables , into one URL
so:
example.com/abc/123 - 1 hit
example.com/abc/345 - 1 hit
will aggregate consolidate to:
example.come/abc/ - 2 hits
I'm using the SEARCH and REPLACE filter like this :
Search string : /abc/.*
Replace string : /abc/
When I verify this filter, it says no data would be changed. When I change the config to
Search string : /abc/.*
Replace string : /
It reports a major change. It seems the replace string is not right. I basically want to strip the dynamic portion of the URL by replacing any hit that has a dynamic portion with a URL that only has a static portion.

It should work the way you've indicated, but just in case, here's the setup.
Set the Filter Field to "Request URI", and then in the search field use /abc/.*, and in the replace field use /abc/.
Check with the Real-time report, in the Contents report, and also make sure you are doing this in a test view first so that you don't accidentally apply it to your production data.

How to compare and substitute strings in Ruby on Rails?

Trying to build a barebones concept in Ruby on Rails that will take a string, map each individual word in this string, compare it and then substitute the word if it matches predefined strings in related databases.
For example: User inputs in text field "What does lol and brb mean?" Hits Submit button. The action gives back the same text with "lol" and "brb" changed to "laughing out loud" and "be right back".
So far I have a Post model & table for the User input that stores the string in the database.
I have an Acronym model & table that has "lol" and "brb" stored in database with a foreign key reference to Acronym_Translate model & table that has "laughing out loud" and "be right back" referenced to "lol" and "brb", respectively.
How would I connect the Post model/table to the Acronym model/table in order to compare the strings in Post and substitute with strings from Acronym model/table? And what command could achieve such function? Would gsub! method work here?
Any help would be appreciated!

Are you sure that you want to connect the Post table to the Acronym table? This means that you would have to identify and keep a record of each instance of an acronym within a post.
You can do this using a many to many relation or if you want to store extra data about each acronym occurrence you should create a link table named AcronymPost and use a has many through relationship between Post and Acronym. When you parse a post value, and when you identify an acronym in the post, you would have to record this in the database and then use gsub to replace the post value with the acronym.
You can iterate through your table of acronyms and use (string).include? method to check if it occurs in the post. Finally, you could use a gsub command to replace the acronym with its translation.

Remove a duplicate documents from results returned by SOLR

I have one scenario, in which I am searching a no pan string in a documents. In my application, we are sending two query request to SOLR i.e one is with Exact query(i.e phrase query) which returns me Exact results and next query is AND query. But it happens that the results of Exact query are also contains in the AND query, so I want to remove that records from AND query results. So its possible to remove from SOLR end?
I am using sunspot gem and rails.

If your document contains any unique field then use that field in second query as a not in query.
for example if you have id as unique field.
Then from the first query you get those ids
and put in the second query as
-id:(123 OR 345)
So from the second queries result these will get filtered...

Or simply you can re-formulate the second request to include AND but exclude the exact matches.
The updated second request becomes a negation of the current first request (exclude exact matches) combined with your current second request (via an and operation).

How do I search a db field for a the string after a "#" and add it to another db field in django

I want to have a content entry block. When a user types #word or #blah in a field, I want efficiently search that field and add the string right after the "#" to a different field as a n entry in a different table. Like what Twitter does. This would allow a user to sort by that string later.
I believe that I would do this as a part of the save method on the model, but I'm not sure. AND, if the #blah already exists, than the content would belong to that "blah"
Can anyone suggest samples of how to do this? This is a little beyond what I'm able to figure out on my own.
Thanks!

You can use regex (re) during save() or whenever to check if your field text contains #(?P<blah>\w+) , extract your blah and and use it for whatever you want .

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Save substring from field to copyfield using regex in Solr - regex

You'll need to define a custom field for pnr_number. Use a copyField to copy msg_body to pnr_number In the custom field definition, use <filter class="solr.PatternCaptureGroupFilterFactory" pattern="regex goes here" preserve_original="false"/>

Related

Django: display a preview of an object's attribute - Class based views

Search and Replace Filter in Google Analytics

How to compare and substitute strings in Ruby on Rails?

Remove a duplicate documents from results returned by SOLR

How do I search a db field for a the string after a "#" and add it to another db field in django

Categories

Resources