Find and Replace REGEX results with new string - regex

I want to replace a 10 digits pictureID number to a single text string in my WP-database (wp_post field: post_content)
pictureid=0001234567 (where the last 7 digits are different for every photo)
to a single value:
sourceids=2518
When I query for the pictureID numbers wit REGEX it seems te return al the records I want to change.
SELECT * FROM `wp_posts` WHERE `post_content` REGEXP 'pictureid=000[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
Next: what to do to change pictureID in those records found to the sourceids=2518
I did try
update wp_posts set post_content = replace(post_content, 'REGEXP 'pictureid=000[0-9][0-9][0-9][0-9][0-9][0-9][0-9]'','sourceids=2518');
but this won't work

Use REGEXP_REPLACE(pictureid,'000[0-9][0-9][0-9][0-9][0-9][0-9][0-9]',sourceid)

Sorry the reply is poorly formatted so will do it this way
Its not working,I did the following: testing REGEXP:
SELECT * FROM wp_posts WHERE post_content REGEXP 'pictureid=0001119708' = WORKING
SELECT * FROM wp_posts WHERE post_content REGEXP 'pictureid=000[0-9][0-9][0-9][0-9][0-9][0-9][0-9]' = WORKING
Trying to replace 'pictureid=000#######' (where # is any numeric value, example: 00012345670 by this single value 'sourceids-2518'
SELECT * FROM wp_posts WHERE post_content REGEXP_REPLACE ('pictureid=000[0-9][0-9][0-9][0-9][0-9][0-9][0-9]','sourceids=2518') = NOT WORKING

Related

Optional parts of regex pattern in vba

I am trying to build regex pattern for the text like that
numb contra: 1.29151306 number mafo: 66662308
numb contra 1.30789668number mafo 60.046483
numb contra/ 1.29154056 number mafo: 666692638
numb contra 137459625
mafo: 666692638
mafo: 666692638 numb contra/ 1.29154056
Here's the pattern I could build
contra?.\s+?(\d+\.?\d+)(.+mafo.?\s+(\d+\.?\d+))?
It works fine for all the lines except the last one. How can I implement all the possibilities to include the last line too?
Please have a look at this link
https://regex101.com/r/pSThAU/1
All is OK as for contra but not as for mafo
I think the key here is to make your regexp do less and your vba do more. What I think I see here is either the word 'mafo' or 'contra' and a number following. Don't know what order or whether each is present or how many times. So you can scan each of your strings for ALL occurrences with a regexp like this:
(?:^|[^A-Z])(?:(mafo)|(contra))[^A-Z]\s*(\d*\.?\d+)
Then process it with some VBA code like this that I created in Excel:
Sub BreakItUp()
Dim rg As RegExp, scanned As MatchCollection, eachMatch As Match, i As Long, col As Long
Set rg = New RegExp
rg.Pattern = "(?:^|[^A-Z])(?:(mafo)|(contra))[^A-Z]\s*(\d*\.?\d+)"
rg.IgnoreCase = True
rg.Global = True
i = 1
Do While (Not IsEmpty(ActiveSheet.Cells(i, 1).Value))
Set scanned = rg.Execute(ActiveSheet.Cells(i, 1).Value)
col = 2
For Each eachMatch In scanned
ActiveSheet.Cells(i, col).Value = eachMatch.SubMatches(0) & eachMatch.SubMatches(1)
ActiveSheet.Cells(i, col + 1).Value2 = "'" & eachMatch.SubMatches(2)
col = col + 2
Next eachMatch
i = i + 1
Loop
End Sub
That MatchCollection object will get one item for each Match that occurs and the subMatches array contains each capturing group. You should be able write your own logic within this processing loop to interpret what was extracted. When I ran it on your data it created all the fields in blue:
Notice I added a line to your data that had two contra entries and one mafo and it found all the occurrences. You should be able to modify this to interpret the meanings.

Regex extract between trible double quotes and newlines

For example i want to parse python file with text between triple double quotes and make html table from this text.
Text block for example like that
"""
Replaces greater than operator ('>') with 'NOT BETWEEN 0 AND #'
Replaces equals operator ('=') with 'BETWEEN # AND #'
Tested against:
* Microsoft SQL Server 2005
* MySQL 4, 5.0 and 5.5
* Oracle 10g
* PostgreSQL 8.3, 8.4, 9.0
Requirement:
* Microsoft Access
Notes:
* Useful to bypass weak and bespoke web application firewalls that
filter the greater than character
* The BETWEEN clause is SQL standard. Hence, this tamper script
should work against all (?) databases
>>> tamper('1 AND A > B--')
'1 AND A NOT BETWEEN 0 AND B--'
>>> tamper('1 AND A = B--')
'1 AND A BETWEEN B AND B--'
"""
Html table must be simple table contains 5 columns
Column everything between """ and \n if new line is empty
Column everything between Tested against: and \n if new line is empty or Requirement: and \n if new line is empty
Column everything between Notes: and \n if new line is empty
Column everything between >>> and \n
Column everything between 4 column end and \n
So result must be:
Replaces greater than operator ('>') with 'NOT BETWEEN 0 AND #'
Replaces equals operator ('=') with 'BETWEEN # AND #'
Microsoft SQL Server 2005
MySQL 4, 5.0 and 5.5
Oracle 10g
PostgreSQL 8.3, 8.4, 9.0
or
Microsoft Access
Useful to bypass weak and bespoke web application firewalls that
filter the greater than character
The BETWEEN clause is SQL standard. Hence, this tamper script
should work against all (?) databases
tamper('1 AND A > B--')
tamper('1 AND A = B--')
'1 AND A NOT BETWEEN 0 AND B--'
'1 AND A BETWEEN B AND B--'
What kind of syntax can i use to extract that?
I will use VBScript.RegExp .
Set fso = CreateObject("Scripting.FileSystemObject")
txt = fso.OpenTextFile("C:\path\to\your.py").ReadAll
Set re = New RegExp
re.Pattern = """([^""]*)"""
re.Global = True
For Each m In re.Execute(txt)
WScript.Echo m.SubMatches(0)
Next
Your question is quite broad, so I'll just outline a way to deal with this. Otherwise I'd have to write the whole script for you, which isn't going to happen.
Extract everything between the docquotes. Use a regular expression like this to extract the text between the docquotes:
Set re1 = New RegExp
re1.Pattern = """""""([\s\S]*?)"""""""
For Each m In re1.Execute(txt)
docstr = m.SubMatches(0)
Next
Note that you need to set the re.Global to True if you have more than 1 docstring in your file and want all of them processed. Otherwise you'll get just the first match.
Remove leading and trailing whitespace with a second regular expression:
Set re2 = New RegExp
re2.Pattern = "^\s*|\s*$"
re2.Global = True 'find all matches
docstr = re2.Replace(docstr, "")
You can't use Trim for this, because the function handles only spaces, not other whitespace.
Either split the string at 2+ consecutive line breaks to get the doc sections, or use another regular expression to extract them:
Set re3 = New RegExp
re3.Pattern = "([\s\S]*?)\r\n\r\n" +
"Tested against:\r\n([\s\S]*?)\r\n\r\n" +
...
For Each m In re3.Execute(txt)
descr = m.SubMatches(0)
tested = m.SubMatches(1)
...
Next
Continue breaking down the sections until you have the elements you want to display. Then build the HTML from these elements.

How can I separate a string by underscore (_) in google spreadsheets using regex?

I need to create some columns from a cell that contains text separated by "_".
The input would be:
campaign1_attribute1_whatever_yes_123421
And the output has to be in different columns (one per field), with no "_" and excluding the final number, as it follows:
campaign1 attribute1 whatever yes
It must be done using a regex formula!
help!
Thanks in advance (and sorry for my english)
=REGEXEXTRACT("campaign1_attribute1_whatever_yes_123421","(("&REGEXREPLACE("campaign1_attribute1_whatever_yes_123421","((_)|(\d+$))",")$1(")&"))")
What this does is replace all the _ with parenthesis to create capture groups, while also excluding the digit string at the end, then surround the whole string with parenthesis.
We then use regex extract to actuall pull the pieces out, the groups automatically push them to their own cells/columns
To solve this you can use the SPLIT and REGEXREPLACE functions
Solution:
Text - A1 = "campaign1_attribute1_whatever_yes_123421"
Formula - A3 = =SPLIT(REGEXREPLACE(A1,"_+\d*$",""), "_", TRUE)
Explanation:
In cell A3 We use SPLIT(text, delimiter, [split_by_each]), the text in this case is formatted with regex =REGEXREPLACE(A1,"_+\d$","")* to remove 123421, witch will give you a column for each word delimited by ""
A1 = "campaign1_attribute1_whatever_yes_123421"
A2 = "=REGEXREPLACE(A1,"_+\d*$","")" //This gives you : *campaign1_attribute1_whatever_yes*
A3 = SPLIT(A2, "_", TRUE) //This gives you: campaign1 attribute1 whatever yes, each in a separate column.
I finally figured it out yesterday in stackoverflow (spanish): https://es.stackoverflow.com/questions/55362/c%C3%B3mo-separo-texto-por-guiones-bajos-de-una-celda-en...
It was simple enough after all...
The reason I asked to be only in regex and for google sheets was because I need to use it in Google data studio (same regex functions than spreadsheets)
To get each column just use this regex extract function:
1st column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){0}([^_]*)_')
2nd column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){1}([^_]*)_')
3rd column: REGEXP_EXTRACT(Campaña, '^(?:[^_]*_){2}([^_]*)_')
etc...
The only thing that has to be changed in the formula to switch columns is the numer inside {}, (column number - 1).
If you do not have the final number, just don't put the last "_".
Lastly, remember to do all the calculated fields again, because (for example) it gets an error with CPC, CTR and other Adwords metrics that are calculated automatically.
Hope it helps!

Postgresql - How do I extract the first occurence of a substring in a string using a regular expression pattern?

I am trying to extract a substring from a text column using a regular expression, but in some cases, there are multiple instances of that substring in the string.
In those cases, I am finding that the query does not return the first occurrence of the substring. Does anyone know what I am doing wrong?
For example:
If I have this data:
create table data1
(full_text text, name text);
insert into data1 (full_text)
values ('I 56, donkey, moon, I 92')
I am using
UPDATE data1
SET name = substring(full_text from '%#"I ([0-9]{1,3})#"%' for '#')
and I want to get 'I 56' not 'I 92'
You can use regexp_matches() instead:
update data1
set full_text = (regexp_matches(full_text, 'I [0-9]{1,3}'))[1];
As no additional flag is passed, regexp_matches() only returns the first match - but it returns an array so you need to pick the first (and only) element from the result (that's the [1] part)
It is probably a good idea to limit the update to only rows that would match the regex in the first place:
update data1
set full_text = (regexp_matches(full_text, 'I [0-9]{1,3}'))[1]
where full_text ~ 'I [0-9]{1,3}'
Try the following expression. It will return the first occurrence:
SUBSTRING(full_text, 'I [0-9]{1,3}')
You can use regexp_match() In PostgreSQL 10+
select regexp_match('I 56, donkey, moon, I 92', 'I [0-9]{1,3}');
Quote from documentation:
In most cases regexp_matches() should be used with the g flag, since
if you only want the first match, it's easier and more efficient to
use regexp_match(). However, regexp_match() only exists in PostgreSQL
version 10 and up. When working in older versions, a common trick is
to place a regexp_matches() call in a sub-select...

Replace using RegEx outside of text markers

I have the following sample text and I want to replace '[core].' with something else but I only want to replace it when it is not between text markers ' (SQL):
PRINT 'The result of [core].[dbo].[FunctionX]' + [core].[dbo].[FunctionX] + '.'
EXECUTE [core].[dbo].[FunctionX]
The Result shoud be:
PRINT 'The result of [core].[dbo].[FunctionX]' + [extended].[dbo].[FunctionX] + '.'
EXECUTE [extended].[dbo].[FunctionX]
I hope someone can understand this. Can this be solved by a regular expression?
With RegLove
Kevin
Not in a single step, and not in an ordinary text editor. If your SQL is syntactically valid, you can do something like this:
First, you remove every string from the SQL and replace with placeholders. Then you do your replace of [core] with something else. Then you restore the text in the placeholders from step one:
Find all occurrences of '(?:''|[^'])+' with 'n', where n is an index number (the number of the match). Store the matches in an array with the same number as n. This will remove all SQL strings from the input and exchange them for harmless replacements without invalidating the SQL itself.
Do your replace of [core]. No regex required, normal search-and-replace is enough here.
Iterate the array, replacing the placeholder '1' with the first array item, '2' with the second, up to n. Now you have restored the original strings.
The regex, explained:
' # a single quote
(?: # begin non-capturing group
''|[^'] # either two single quotes, or anything but a single quote
)+ # end group, repeat at least once
' # a single quote
JavaScript this would look something like this:
var sql = 'your long SQL code';
var str = [];
// step 1 - remove everything that looks like an SQL string
var newSql = sql.replace(/'(?:''|[^'])+'/g, function(m) {
str.push(m);
return "'"+(str.length-1)+"'";
});
// step 2 - actual replacement (JavaScript replace is regex-only)
newSql = newSql.replace(/\[core\]/g, "[new-core]");
// step 3 - restore all original strings
for (var i=0; i<str.length; i++){
newSql = newSql.replace("'"+i+"'", str[i]);
}
// done.
Here is a solution (javascript):
str.replace(/('[^']*'.*)*\[core\]/g, "$1[extended]");
See it in action