Maria DB Regular Expression - regex

I wrote a query for finding data from a table in which the word I find occur.
SELECT tr.track_order AS `order`, tr.title AS title
FROM tracks tr
LEFT JOIN `options` op ON tr.id = op.track_id
WHERE tr.sim_id ='sdf' AND
(tr.description REGEXP '[[:<:]]word[[:>:]])'
It works well. But when the word I will find contain special characters at the beginning or at the end then this query is not working as I want.
Actually, I want to replace those words with new words.
Can any suggest the write regular expression?
Could you help me?

Related

RegEx works everywhere except in Pentaho RegEx Evaluation Step

I have a couple of RegEx that work on the online regex websites but not in Pentaho. Could you please help?
Here's the string:
:6585d0f0ba88767ac3b590f719596d864d73e9c1:
harmonicbalance/src/harmonicbalance/HarmonicBalanceFlowModel.cpp
harmonicbalance/src/harmonicbalance/HbFlutterModel.cpp
:8302994b565553c83a048b8905ae597349d99627:
emp/src/emp/PhasePairSingleParticleReynoldsNumber.h
emp/src/emp/TomiyamaDragCoefficientMethod.cpp
:9da194f17ec08bb20ad1be8df68b78ca137ab18a:
combustion/src/combustion/ReactingSpeciesTransportBasedModel.cpp
combustion/src/complexchemistry/TurbulentFlameClosure.cpp
:6a59f0be1e347a65e525e58742bb304639ea9bc4:
meshing/src/meshing/SurfaceMeshManipulation.cpp
physics/src/discretization/FvIndirectRegionInterfaceManager.cpp
physics/src/discretization/FvIndirectRegionInterfaceManager.h
physics/src/discretization/FvRepresentation.cpp
physics/src/discretization/FvRepresentation.h
:64b7f6d36b11b6cd94c20cad53463b7deef8c85a:
resourceclient/src/resourceclient/ResourcePool.cpp
resourceclient/src/resourceclient/ResourcePool.h
resourceclient/src/resourceclient/RestClient.cpp
resourceclient/src/resourceclient/RestClient.h
resourceclient/src/resourceclient/test/ResourcePoolTest.cpp
I would like to capture two groups. First group will extract all commit SHA1 and the other group would extract file names.
Below are the expressions I tried:
(?:^:([A-Za-z0-9]+):|(?!^)\G)\n+([A-Za-z/.-]+)
https://regex101.com/r/3IBkPz/1
^:(\w+):\s+((?:\s*(?!:)[^\s]+)+)
https://regex101.com/r/oIoDvM/1
Thoughts?
AFAIK (as of PDI-8.0), the Regex Evaluation step does NOT support the regex 'g' modifier, your regex pattern must cover all the text to be able to make a match.
For example: the following pattern will not match anything in Regex Evaluation step:
:([0-9a-f]+):\s+([^:]+)
but if I prepend .* to this pattern and pick "Enable dotall mode":
.*:([0-9a-f]+):\s+([^:]+)
it will match the last commit(sha1 + filenames). You can try move .* to the end of
the original pattern which will get you the first commit. So if you want to retrieve
the full list of commits(sha1 + filenames) with the g modifier, this step is
probably not a solution for you.
As the fields are basically split by colons ':' and new lines, you can probably try the following approach:
Use Split field to rows step, Delimiter=':' and include rownum in output, this rownum can be used to filter rows where even number is sha1 and odd number is filenames
Use Analytic Query step to create a new field with LEAD = 1, so now you can get sha1 and filenames in the same row
Use Calculator and Fileter step to calculate the remainer of rownum/2 and keep only rows with the odd number of rownum
Use Split fields to rows again to split filenames to filename using "\n"(Delimiter is a Regular Expression). you might want to filter out the EMPTY filename, since the delimiter only support one char

Regular Expression Match (get multiple stuff in a group)

I have trouble working on this regular expression.
Here is the string in one line, and I want to be able to extract the thing in the swatchColorList, specifically I want the word Natural Burlap, Navy, Red
What I have tried is '[(.*?)]' to get everything inside bracket, but what I really want is to do it in one line? is it possible, or do I need to do this in two steps?
Thanks
{"id":"1349306","categoryName":"Kids","imageSource":"7/optimized/8769127_fpx.tif","swatchColorList":[{"Natural Burlap":"8/optimized/8769128_fpx.tif"},{"Navy":"5/optimized/8748315_fpx.tif"},{"Red":"8/optimized/8748318_fpx.tif"}],"suppressColorSwatches":false,"primaryColor":"Natural Burlap","clickableSwatch":true,"selectedColorNameID":"Natural Burlap","moreColors":false,"suppressProductAttribute":false,"colorFamily":{"Natural Burlap":"Ivory/Cream"},"maxQuantity":6}
You can try this regex
(?<=[[,]\{\")[^"]+
If negative lookbehind is not supported, you can use
[[,]\{"([^"]+)
This will save needed word in group 1.
import json
str = '{"id":"1349306","categoryName":"Kids","imageSource":"7/optimized/8769127_fpx.tif","swatchColorList":[{"Natural Burlap":"8/optimized/8769128_fpx.tif"},{"Navy":"5/optimized/8748315_fpx.tif"},{"Red":"8/optimized/8748318_fpx.tif"}],"suppressColorSwatches":false,"primaryColor":"Natural Burlap","clickableSwatch":true,"selectedColorNameID":"Natural Burlap","moreColors":false,"suppressProductAttribute":false,"colorFamily":{"Natural Burlap":"Ivory/Cream"},"maxQuantity":6}'
obj = json.loads(str)
words = []
for thing in obj["swatchColorList"]:
for word in thing:
words.append(word)
print word
Output will be
Natural Burlap
Navy
Red
And words will be stored to words list. I realize this is not a regex but I want to discourage the use of regex on serialized object notations as regular expressions are not intended for the purpose of parsing strings with nested expressions.

Regular Expression for phrases starting with TO

I am pretty new to Regular Expression. I want to write a regular expression to get the TO Followed by the rest of it after each new line. I tried to use this but doesn't work properly.
^TO\n?\s?[A-Za-z0-9]\n?[A-Za-z0-9]
It only highlights properly the TO W11 which all are in one line. Highlights only TO from first data and the 3rd data only highlights the first line. Basically it doesn't read the new lines.
Some of my data looks like this:
TO
EXTERNAL
TRAVERSE
TO W11
TO CONTROL
TRAVERSE
I would appreciate if anybody can help me.
Make sure you use a multiline regex:
var options = RegexOptions.MultiLine;
foreach (Match match in Regex.Matches(input, pattern, options))
...
More at: http://msdn.microsoft.com/en-us/library/yd1hzczs(v=vs.110).aspx
It looks like your pattern isn't matching because the start of the string is really a space and not the T character. Also, [A-Za-z0-9] matches only one character, and you want the whole word. I used the + to denote that I want one or more matches of those characters.
(TO\n?\s?[A-Za-z0-9]+)
This regex matches "TO EXTERNAL", "TO W11" and "TO CONTROL". Be sure to use the global modifier so that you get all matches, not just the first one.

Regex substring

I'm trying to select a substring using regex and I'm going round in circles. I need to select everything before the first "_".
exampale URL - GI_2013_JUNE_10_VOL3_LASTCHANCE
So the result Im looking for from the URL above would be "GI". The text before the first "_" can vary in length.
Any help would be much apprecited
The regex would be:
^[^_]+
and grab the whole regex match. But as a comment says, using a substring function is more efficient!
^[^_]*
...is the expression you're looking for.
It basically says: Select everything that is not an underscore, starting at the beginning of the string.
http://regexr.com?356in

Expressing basic Access query criteria as regular expressions

I'm familiar with Access's query and filter criteria, but I'm not sure how to express similar statements as regular expression patterns. I'm wondering if someone can help relate them to some easy examples that I understand.
If I were using regular expressions to match fields like Access, how would I express the following statements? Examples are similar to those found on this Access Query and Filter Criteria webpage. As in Access, case is insensitive.
"London"
Strings that match the word London exactly.
"London" or "Paris"
Strings that match either the words London or Paris exactly.
Not "London"
Any string but London.
Like "S*"
Any string beginning with the letter s.
Like "*st"
Any string ending with the letters st.
Like "*the*dog*"
Any strings that contain the words 'the' and 'dog' with any characters before, in between, or at the end.
Like "[A-D]*"
Any strings beginning with the letters A through D, followed by anything else.
Not Like "*London*"
Any strings that do not contain the word London anywhere.
Not Like "L*"
Any strings that don't begin with an L.
Like "L*" And Not Like "London*"
Any strings that begin with the letter L but not the word London.
Regex as much more powerful than any of the patterns you have been used to for creating criteria in Access SQL. If you limit yourself to these types of patterns, you will miss most of the really interesting features of regexes.
For instance, you can't search for things like dates or extracting IP addresses, simple email or URL detection or validation, basic reference code validation (such as asking whether an Order Reference code follows a mandated coding structure, say something like PO123/C456 for instance), etc.
As #Smandoli mentionned, you'd better forget your preconceptions about pattern matching and dive into the regex language.
I found the book Mastering Regular Expressions to be invaluable, but tools are the best to experiment freely with regex patterns; I use RegexBuddy, but there are other tools available.
Basic matches
Now, regarding your list, and using fairly standardized regular expression syntax:
"London"
Strings that match the word London exactly.
^London$
"London" or "Paris"
Strings that match either the words London or Paris exactly.
^(London|Paris)$
Not "London"
Any string but London.
You match for ^London$ and invert the result (NOT)
Like "S*"
Any string beginning with the letter s.
^s
Like "*st"
Any string ending with the letters st.
st$
Like "*the*dog*"
Any strings that contain the words 'the' and 'dog' with any characters before, in between, or at the end.
the.*dog
Like "[A-D]*"
Any strings beginning with the letters A through D, followed by anything else.
^[A-D]
Not Like "*London*"
Any strings that do not contain the word London anywhere.
Reverse the matching result for London (you can use negative lookahead like:
^(.(?!London))*$, but I don't think it's available to the more basic Regex engine available to Access).
Not Like "L*"
Any strings that don't begin with an L.
^[^L] negative matching for single characters is easier than negative matching for a whole word as we've seen above.
Like "L*" And Not Like "London*"
Any strings that begin with the letter L but not the word London.
^L(?!ondon).*$
Using Regexes in SQL Criteria
In Access, creating a user-defined function that can be used directly in SQL queries is easy.
To use regex matching in your queries, place this function in a module:
' ----------------------------------------------------------------------'
' Return True if the given string value matches the given Regex pattern '
' ----------------------------------------------------------------------'
Public Function RegexMatch(value As Variant, pattern As String) As Boolean
If IsNull(value) Then Exit Function
' Using a static, we avoid re-creating the same regex object for every call '
Static regex As Object
' Initialise the Regex object '
If regex Is Nothing Then
Set regex = CreateObject("vbscript.regexp")
With regex
.Global = True
.IgnoreCase = True
.MultiLine = True
End With
End If
' Update the regex pattern if it has changed since last time we were called '
If regex.pattern <> pattern Then regex.pattern = pattern
' Test the value against the pattern '
RegexMatch = regex.test(value)
End Function
Then you can use it in your query criteria, for instance to find in a PartTable table, all parts that are matching variations of screw 18mm like Pan Head Screw length 18 mm or even SCREW18mm etc.
SELECT PartNumber, Description
FROM PartTable
WHERE RegexMatch(Description, "screw.*?d+\s*mm")
Caveat
Because the regex matching uses old scripting libraries, the flavour of Regex language is a bit more limited than the one found in .Net available to other programming languages.
It's still fairly powerful as it is more or less the same as the one used by JavaScript.
Read about the VBScript regex engine to check what you can and cannot do.
The worse though, is probably that the regex matching using this library is fairly slow and you should be very careful not to overuse it.
That said, it can be very useful sometimes. For instance, I used regexes to sanitize data input from users and detect entries with similar patterns that should have been normalised.
Well used, regexes can enhance data consistency, but use sparingly.
Regex is difficult to break into initially. Honestly, looking for spoon-fed examples is not going to help as much as "getting your hands dirty" with it. Also, MS Access is not a good springboard. Regex doesn't "cognate" well with the SQL query process -- not in application, and not in mental orientation. What you need is some text files to process, using a text editor.
Our solution was to open the Excel file in OpenCalc (part of Apache OpenOffice, https://www.openoffice.org/) which provides what seems like full regular expressions for both the find and replace.
We test the regular expressions at http://regexr.com/