Regex MySQL find words separated by punctuation - regex

Take an input like "This is, in text, an example! Cool stuff"
I have some C# code that takes it, removes the punctuation, splits on the spaces, and returns the first 6 elements:
var title = new string(input.Where(c => !char.IsPunctuation(c)).ToArray()).Split(' ').Take(6);
so I get an array of:
["This", "is", "in", "text", "an", "example"]
From that array, how can I work backwards to match it to the original input? I've tried doing:
'This|is|in|text|an|example' but it's not precise enough, as I think it's going or's instead of and's.
I'm going to use the regex expression in an SQL query, something like:
SELECT t.*, Max(e.Timestamp) As EventUpdated, Min(e.Timestamp) as Timestamp
From test t
Left Join edithistory e on t.IdTimelineinfo = e.IdTimelineinfo
where t.date = "2020-12-06" and t.Title REGEXP 'Testing|two|events|on|the';
I'm really new to regex and would appreciate any help.

I ended up using REGEX like the following:
DbTitle = string.Join("[^a-zA-Z]*", ArraryOfWords);
var title = $"[^a-zA-Z]*{DbTitle}";
SELECT t.*, Max(e.Timestamp) As EventUpdated, Min(e.Timestamp) as Timestamp
From test t
Left Join edithistory e on t.IdTimelineinfo = e.IdTimelineinfo
where t.date = #date and t.Title Regexp #title and Confirmed = 1;

Related

Replace Invalid Email Format with a Valid One

In Power Query, I have a list of emails that includes invalid emails. I am looking to use M codes to identify and "fix" them. For example, my email list would include something like "1234.my_email_gmail_com#error.invalid.com"
I am looking for Power Query to find similar email addresses, then produce an output of a valid email. For the example above, it should be "my_email#gmail.com"
Essentially, I want to do the following:
Remove the digits at the front (number of digits varies)
Remove the "#error.invalid.com"
Replace the first underscore "_" from the right to "."
Replace the second underscore "_" from the right to "#"
I'm still new to Power Query, especially with M codes. I appreciate any help and guidance I can get.
Try the function cleanEmailAddress below:
let
cleanEmailAddress = (invalidEmailAddress as text) as text =>
let
removeLeadingNumbers = Text.AfterDelimiter(invalidEmailAddress, "."), // Assumes invalid numbers are followed by "." which itself also needs removing.
removeInvalidDomain = Text.BeforeDelimiter(removeLeadingNumbers, "#"),
replaceLastOccurrence = (someText as text, oldText as text, newText as text) as text =>
let
lastPosition = Text.PositionOf(someText, oldText, Occurrence.Last),
replaced = if lastPosition >= 0 then Text.ReplaceRange(someText, lastPosition, Text.Length(oldText), newText) else someText
in replaced,
overwriteTopLevelDomainSeparator = replaceLastOccurrence(removeInvalidDomain, "_", "."),
overwriteAtSymbol = replaceLastOccurrence(overwriteTopLevelDomainSeparator, "_", "#")
in overwriteAtSymbol,
cleaned = cleanEmailAddress("1234.my_email_gmail_com#error.invalid.com")
in
cleaned
Regarding:
"Remove the digits at the front (number of digits varies)"
Your question doesn't mention what to do with the leading . (which remains if you remove the leading digits), but your expected output ("my_email#gmail.com") suggests it should be removed. Email addresses which do not have . immediately after the leading digits, will return an error (and the logic for removeLeadingNumbers expression will need to be improved).
This seems to work too:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Added Custom" = Table.AddColumn(Source, "Valid", each Text.ReplaceRange(Text.ReplaceRange(Text.BetweenDelimiters([Column1],".","#"),Text.PositionOf(Text.BetweenDelimiters([Column1],".","#"),"_",Occurrence.Last),1,"."),Text.PositionOf(Text.ReplaceRange(Text.BetweenDelimiters([Column1],".","#"),Text.PositionOf(Text.BetweenDelimiters([Column1],".","#"),"_",Occurrence.Last),1,"."),"_",Occurrence.Last),1,"#"))
in
#"Added Custom"

Find and Replace REGEX results with new string

I want to replace a 10 digits pictureID number to a single text string in my WP-database (wp_post field: post_content)
pictureid=0001234567 (where the last 7 digits are different for every photo)
to a single value:
sourceids=2518
When I query for the pictureID numbers wit REGEX it seems te return al the records I want to change.
SELECT * FROM `wp_posts` WHERE `post_content` REGEXP 'pictureid=000[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
Next: what to do to change pictureID in those records found to the sourceids=2518
I did try
update wp_posts set post_content = replace(post_content, 'REGEXP 'pictureid=000[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'','sourceids=2518');
but this won't work
Use REGEXP_REPLACE(pictureid,'000[0-9][0-9][0-9][0-9][0-9][0-9][0-9]',sourceid)
Sorry the reply is poorly formatted so will do it this way
Its not working,I did the following: testing REGEXP:
SELECT * FROM wp_posts WHERE post_content REGEXP 'pictureid=0001119708' = WORKING
SELECT * FROM wp_posts WHERE post_content REGEXP 'pictureid=000[0-9][0-9][0-9][0-9][0-9][0-9][0-9]' = WORKING
Trying to replace 'pictureid=000#######' (where # is any numeric value, example: 00012345670 by this single value 'sourceids-2518'
SELECT * FROM wp_posts WHERE post_content REGEXP_REPLACE ('pictureid=000[0-9][0-9][0-9][0-9][0-9][0-9][0-9]','sourceids=2518') = NOT WORKING

How can I replace a string which comes with some pattern and keeping the rest same?

I have a string something like
val changeMe= "select myTable.x,myTable.y,myTable.myTable from myTable join myTable2 ON myTable.x = myTable2.x"
and I just want to replace the table name myTable with another string like myTable3 and want to keep the column in myTable.myTable as the same myTable. and the output string should be like
val outputString= "select myTable3.x,myTable3.y,myTable3.myTable from myTable3 join myTable2 ON myTable3.x = myTable2.x"
Please let me know how can I do that using regex in scala?
Thanks.
Use a lookbehind.
(?<!\\.)\\bmyTable\\b
See demo.
https://regex101.com/r/OFNGdK/1
You can use : (?<=[, ])myTable(?=[^2])
Demo
Sounds like a job for replaceAll().
changeMe.replaceAll("(?<![.])myTable(?!2)", "myTable3")
//res0: String = select myTable3.x,myTable3.y,myTable3.myTable from myTable3 join myTable2 ON myTable3.x = myTable2.x
Use negative look-behind and look-ahead to help isolate the target string from its imitators.

compare list items against another list

So lets say I have 3 item list:
myString = "prop zebra cool"
items = myString.split(" ")
#items = ["prop", "zebra", "cool"]
And another list content containing hudreds of string items. Its actally a list of files.
Now I want to get only the items of content that contain all of the items
So I started this way:
assets = []
for c in content:
for item in items:
if item in c:
assets.append(c)
And then somehow isolate only the items that are duplicated in assets list
And this would work fine. But I dont like that, its not elegant. And Im sure that there is some other way to deal with that in python
If I interpret your question correctly, you can use all.
In your case, assuming:
content = [
"z:/prop/zebra/rig/cool_v001.ma",
"sjasdjaskkk",
"thisIsNoGood",
"shakalaka",
"z:/prop/zebra/rig/cool_v999.ma"
]
string = "prop zebra cool"
You can do the following:
assets = []
matchlist = string.split(' ')
for c in content:
if all(s in c for s in matchlist):
assets.append(c)
print assets
Alternative Method
If you want to have more control (ie. you want to make sure that you only match strings where your words appear in the specified order), then you could go with regular expressions:
import re
# convert content to a single, tab-separated, string
contentstring = '\t'.join(content)
# generate a regex string to match
matchlist = [r'(?:{0})[^\t]+'.format(s) for s in string.split(' ')]
matchstring = r'([^\t]+{0})'.format(''.join(matchlist))
assets = re.findall(matchstring, contentstring)
print assets
Assuming \t does not appear in the strings of content, you can use it as a separator and join the list into a single string (obviously, you can pick any other separator that better suits you).
Then you can build your regex so that it matches any substring containing your words and any other character, except \t.
In this case, matchstring results in:
([^\t]+(?:prop)[^\t]+(?:zebra)[^\t]+(?:cool)[^\t]+)
where:
(?:word) means that word is matched but not returned
[^\t]+ means that all characters but \t will match
the outer () will return whole strings matching your rule (in this case z:/prop/zebra/rig/cool_v001.ma and z:/prop/zebra/rig/cool_v999.ma)

How do I match a pattern which can be anywhere in a string but not between two words?

Say I've two strings,
$string1 = "select * from auditor where name in (select 'Jack' name from employee where id = 1)";
$string2 = "select * from employee where name = 'Jack'";
Now, I need a RegEx to find anything enclosed inside a single quote only in where clause. i.e., in $string1 it should not match, as the single quotes are used in select clause and $string2 should match as it is used in where clause.
I tried
(?!select .*\'(.*)\' where)where (.*\'(.*)\')
The correct advice if the queries can be arbitrarily complex is to use a parser because SQL is context-sensitive and thus beyond the power of sane regexes.
If the where clauses will always be simple, you can constrain the patterns with, for example
if (/\bwhere\s+\w+\s+=\s+'([^']*)'/i) {
print " MATCH [$1]\n";
}
to look for a quoted string in a clause of the form where column = 'foo'.
You can try this approach:
where(?!.*select|insert|update|delete).*?'([^']+)'
and get the result in group 1
Regex Demo
For the following 3 input:
$string1 = "select * from auditor where name in (select 'Jack0' name from employee where id = 1)";
$string2 = "select * from auditor where name in (blablabla 'Jack1' name from employee where id = 'jack2')";
$string3 = "select * from employee where name = 'Jack3'";
Output:
You will get jack1,jack2,jack3 in the first capture group but not won't get jack0
p.s: plz note that you don't need insert/update/delete these were put in the regex just to make it a bit more generic