Regex Replace in Mirasvit Sphinx Search

Regex Replace in Mirasvit Sphinx Search - regex

I have a number of sku's listed on my site. The sku's found are 12 digits long. In my store they are listed on the product detail page as 8 chars.
Mirasvit Search has a function to replace this, however how it's supposed to work is a mystery...
I'm debugging the Sphinx Search Replace function on a an old magento store / client's website:
12 characters replace to 8 if regex matches following style:
/([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9])/
Match Replace (4 characters)
([0-9][0-9][0-9][0-9])$
By
(empty)
I need to replace 166278010201 to 16241702 in order to show matching search results...
I've included the documentation:
https://mirasvit.com/doc/extension_searchsphinx/current/ssp/global/long_tail

You may use
Match Expression - /[0-9]{12}/
Replace Expression - /[0-9]{4}$/
Replace Char - empty
This will find all 12-digit chunks of text and remove the last 4 digits from each match found.

Related

How to Match Tilde-Delimited Data Using Regex

I have data like this:
~10~682423~15~Test Data~10~68276127~15~More Data~10~6813~15~Also Data~
I'm trying to use Notepad++ to find and replace the values within tag 10 (682423, 68276127, 6813) with zeroes. I thought the syntax below would work, but it selects the first occurrence of the text I want and the rest of the line, instead of just the text I want (~10~682423~, for example). I also tried dozens of variations from searching online, but they also either did the same thing or wouldn't return any results.
~10~.*~

You can use: (?<=~10~)\d+(?=~) and replace with 0. This uses lookarounds to check that ~10~ precedes the digit sequence and the (?=~) ensures a ~ follows the digit sequence. If any character could be after the ~10~ field, use (?<=~10~)[^~]+(?=~).
The problem with ~10~.*~ is that the * is greedy, so it just slurps away matching any character and ~.

Use
\b10~\d+
Replace with 10~0. See proof. \b10~ will capture 10 as entire number (no match in 210 is allowed) and \d+ will match one or more digits.

Remove columns from CSV

I don't know anything about Notepad++ Regex.
This is the data I have in my CSV:
6454345|User1-2ds3|62562012032|324|148|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|0|0|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|1534|51564|411b0fdf54fe29745897288c6ad699f7be30f389
How can I use a Regex to remove the 5th and 6th column? The numbers in the 5th and 6th column are variable in length.
Another problem is the User row can also contain a |, to make it even worse.
I can use a macro to fix this, but the file is a few millions lines long.
This is the final result I want to achieve:
6454345|User1-2ds3|62562012032|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|411b0fdf54fe29745897288c6ad699f7be30f389
I am open for suggestions on how to do this with another program, command line utility, either Linux or Windows.

Match \|[^|]+\|[^|]+(\|[^|]+$)
Repalce $1
Basically, Anchor to the end of the line, and remove columns [-1] and [-2] (I assume columns can't be empty. Replace + with * if they can)
If you need finer detail then that, I'd recommend writing a Java or Python script to manual parse and rewrite the file for you.

I've captured three groups and given them names. If you use a replace utility like sed or vimregex, you can replace remove with nothing. Or you can use a programming language to concatenate keep_before and keep_after for the desired result.
^(?<keep_before>(?:[^|]+\|){3})(?<remove>(?:[^|]+\|){2})(?<keep_after>.*)$
You may have to remove the group namings and use \1 etc. instead, depending on what environment you use.
Demo

From Notepad++ hit ctrl + h then enter the following in the dialog:
Find what: \|\d+\|\d+(\|[0-9a-z]+)$
Replace with: $1
Search mode: Regular Expression
Click replace and done.
Regex Explain:
\|\d+ : match 1st string that starts with | followed by number
\|\d+ : match 2nd string that starts with | followed by number
(\|[0-9a-z]+): match and capture the string after the 2nd number.
$ : This is will force regex search to match the end of the string.
Replacement:
$1 : replace the found string with whatever we have between the captured group which is whatever we have between the parentheses (\|[0-9a-z]+)

mongo:how to query by regex in mongo

I am meeting a problem,just as follows:
{"_id":ObjectId("XXXXXXXX"),"phone":"123456"}
and now i want to query the document that the length of phone field is 5. I run the command as follows,
db.Phone.find({"phone":{"$regex":"\d{5}"}})
or
db.Phone.find({"phone":/\d{5}/})
they all do not Work. Could anybody help me figure out, how to use regex in mongo?

If you want to find docs where the phone number is exactly 5 digits, you need to anchor the regex to the start and end of the string with ^ and $:
db.Phone.find({phone: /^\d{5}$/})
Otherwise it will match any string that contains at least 5 digits in a row, anywhere in the string.

Regex: matching string with 2 specific characters

I'm working in Google Analytics and trying to use the RegEx advanced filter option to display page names that contain two /, but not three /. The text string within the first section will always be products; however, after the second / it is random.
For example,
I want to include these page name strings:
/products/skis
/products/snowboards
/products/skates
I want to exclude these page name strings:
/products/skis/mens
/products/snowboards/womens
/products/skates/red
Again, the products part is consistent...but the second text section is random.
Appreciate any help -- thanks!

One possibility would be this::
^\/products\/[a-zA-Z]+$
This would capture the first slash, followed by 'products', followed by a second slash, and then any text string (without special characters). Nothing else would come after.

To match pages names starting by /products/ and not containing a third slash, you can use this regex:
^\/products\/[^\/]+$

Search and replace regular expression in Open Office calc

I've got something like this (in Open Office Calc):
Streetname. Number
Streetname. Number a
etc.
Now I want to delete everything in front of the number.
So I need to do a search and replace I guess.
^.*?([0-9])
this one matches Streetname. Number .. but what should I put in the replace field?
If I do the search and replace, it deletes everything within the datafield :(

In Search for field, write the following regex: (.*?[:space:])([0-9]+)
And in Replace with, write: $2
That means that you search for:
any characters followed by a space
one or more digits.
Replace all that with $2 - the reference to the digits.
It will replace Streetname. Number 24 with 24. Why did you put a in your example?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regex Replace in Mirasvit Sphinx Search - regex

You may use Match Expression - /[0-9]{12}/ Replace Expression - /[0-9]{4}$/ Replace Char - empty This will find all 12-digit chunks of text and remove the last 4 digits from each match found.

Related

How to Match Tilde-Delimited Data Using Regex

Remove columns from CSV

mongo:how to query by regex in mongo

Regex: matching string with 2 specific characters

Search and replace regular expression in Open Office calc

Categories

Resources