Regular expression: data between brackets (mysqli > pdo) - regex

For migrating a system to PDO, we would like to replace some queries. Via a regular expression, we can improve the progress. We are looking for a method to match:
mysqli_query($db, [expression follows here])
So, we made use of the following regex:
mysqli_query\(\$db, (.*?)\)
The problem is that we've a problem when we have for example a join query with an opening and closing () parameter.
Example: mysqli_query($db, "SELECT users.id FROM users JOIN (... ) on .. WHERE users.id=1")
Is it possible to edit the regex so we allow a ) when it is opened? So, the number of ( and ) should be equal.

You need an editor with PCRE-compatible regexes (Dreamweaver probably only supports JavaScript-style regexes); then you can use a recursive regex like this:
mysqli_query\(\$db, ((?:[^()]++|\((?1)\))*)\)
Test it live on regex101.com.
Explanation:
( # Match and capture in group 1:
(?: # Start of non-capturing group that either matches
[^()]++ # a sequence of characters except parentheses
| # or
\( # an opening parenthesis
(?1) # followed by an expression that follows the same rules as group 1
\) # and a closing parenthesis.
)* # Do this any number of times (including 0)
) # End of group 1

Try the following: mysqli_query\(\$db, ('|")(.+?)\1\)[\s;]. This assumes that the expression will be between single or double quotes. This might fail depending on what content comes after it though, and for some notations within the SQL query itself (e.g. if it encounters \')).
https://regex101.com/r/1NRYUw/1

Related

How to comment SQL statements in Notepad++?

How can I "block comment" SQL statements in Notepad++?
For example:
CREATE TABLE gmr_virtuemart_calc_categories (
id int(1) UNSIGNED NOT NULL,
virtuemart_calc_id int(1) UNSIGNED NOT NULL DEFAULT '0',
virtuemart_category_id int(1) UNSIGNED NOT NULL DEFAULT '0'
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
It should be wrapped with /* at the start and */ at the end using regex in Notepad++ to produce:
/*CREATE TABLE ... (...) ENGINE=MyISAM DEFAULT CHARSET=utf8;*/
You only offer one sample input, so I am forced to build the pattern literally. If this pattern isn't suitable because there are alternative queries and/or other interfering text, then please update your question.
Tick the "Match case" box.
Find what: (CREATE[^;]+;) Replace with: /*$1*/
Otherwise, you can use this for sql query blocks that start with a capital and end in semicolon:
Find what: ([A-Z][^;]+;) Replace with: /*$1*/
To improve accuracy, you might include ^ start of line anchors or add \r\n after the semi-colon or match the CHARSET portion before the semi-colon. There are several adjustments that can be made. I cannot be confident of accuracy without knowing more about the larger body of text.
You could use a recursive regex.
I think NP uses boost or PCRE.
This works with both.
https://regex101.com/r/P75bXC/1
Find (?s)(CREATE\s+TABLE[^(]*(\((?:[^()']++|'.*?'|(?2))*\))(?:[^;']|'.*?')*;)
Replace /*$1*/
Explained
(?s) # Dot-all modifier
( # (1 start) The whole match
CREATE \s+ TABLE [^(]* # Create statement
( # (2 start), Recursion code group
\(
(?: # Cluster group
[^()']++ # Possesive, not parenth's or quotes
| # or,
' .*? ' # Quotes (can wrap in atomic group if need be)
| # or,
(?2) # Recurse to group 2
)* # End cluster, do 0 to many times
\)
) # (2 end)
# Trailer before colon statement end
(?: # Cluster group, can be atomic (?> ) if need be
[^;'] # Not quote or colon
| # or,
' .*? ' # Quotes
)* # End cluster, do 0 to many times
; # Colon at the end
) # (1 end)

How to match only expressions containing not equal groups?

I'm trying to capture only expressions with a difference in given groups by using regular expressions.
For example I need to capture these (in bold):
;TEXT;2;34;1;0;;;;;;3200;
PRINT_Polohopis.dgn;Different TEXT;2;64;1;0;;;;;;3200;
but not these (if it is the same):
;TEXT;2;34;1;0;;;;;;3200;
PRINT_Polohopis.dgn;TEXT;2;64;1;0;;;;;;3200;
So far I managed to create this regex:
^;([\w\s]*;).*\n(?:[\w\s_\.]*);(?:(?!(\1))(\K[\w\s]*;))
which works only if I include a semicolon inside the capturing groups.
Is it possible to capture those groups in a better way?
Something like this might work for you:
/^;([^;]+);.*?\n[^;]+;(?!\1;)([^;]+)/
Try it online
The trick here is that a negative lookthahead is used to make sure \1 (back reference) is not at the desired position:
/^; / # Start of string and literal ;
([^;]+); # Capture all but ; followed by literal ;
.*?\n # Match rest of line
[^;]+; # Match all but ; followed by literal ;
(?!\1;) # Negative lookahead to make sure captured
# group is no at this position, followed
# by literal ;
([^;]+) # Capture all but ;

Remove the text outside the first brackets in R

I know that it was asked a lot of times, but I've tried to adapt the other answers to my need and I was not able to make it work using SKIP and FAIL (I'm a bit confused, I've to admit)
I'm using R actually.
The url I need to clean is:
url <- "posts.fields(id,from.fields(id,name),message,comments.summary(true).limit(0),likes.summary(true).limit(0))"
and I need to retain only the content inside the first brackets that are always prefixed by the word "fields" (while "posts" may vary). In other words something like
id,from.fields(id,name),message,comments.summary(true).limit(0),likes.summary(true).limit(0)
As you may see there're some nesting inside. But I eventually could change my source code to accept this string too (removing every parhentesis by every prefix)
id,from,message,comments,likes
I don't know on how to remove the trailing parhentesis which balances the first.
If it's good enough to just remove everything up to and including the first open parenthesis and also remove the last close parenthesis and thereafter then:
sub("^.*?\\((.*)\\)[^)]*$", "\\1", url)
Note:
If it's good enough to just remove the first open parenthesis and last close parenthesis then try this:
sub("\\((.*)\\)", "\\1", url)
Using lazy .* instead of greedy:
sub(".*?fields\\((.*)\\)", "\\1", url)
[1] "id,from.fields(id,name),message,comments.summary(true).limit(0),likes.summary(true).limit(0)"
You need to use a recursive pattern:
sub("[^.]*+(?:\\.(?!fields\\()[^.]*)*+\\.fields\\(([^()]*+(?:\\((?1)\\)[^()]*)*+)\\)(?s:.*)", "\\1", url, perl=T)
demo
details:
# reach the dot before "fields("
[^.]*+ # all except a dot (possessive)
(?: # open a non-capturing group
\\. # a literal dot
(?!fields\\() # not followed by "fields("
[^.]* # all except a dot
)*+ # repeat the group zero or more times
\\.fields\\(
# match a content between parenthesis with any level of nesting
( # open the capture group 1
[^()]*+ # 0 or more character that are not brackets (possessive)
(?: # open a non capturing group
\\(
(?1) # recursion in group 1
\\) #
[^()]* # all that is not a bracket
)*+ # close the non capturing group and repeat 0 or more time (possessive)
) # close the capture group 1
\\)
(?s:.*) # end of the string
Possessive quantifiers are used here to limit the backtracking when for any reason a part of the pattern fails.

Regex for unique user count

I'm trying to create a regex to check the number of unique users.
In this case, 3 different users in 1 string means it's valid.
Let's say we have the following string
lab\simon;lab\lieven;lab\tim;\lab\davy;lab\lieven
It contains the domain for each user (lab) and their first name.
Each user is seperated by ;
The goal is to have 3 unique users in a string.
In this case, the string is valid because we have the following unique users
simon, lieven, tim, davy = valid
If we take this string
lab\simon;lab\lieven;lab\simon
It's invalid because we only have 2 unique users
simon, lieven = invalid
So far, I've only come up with the following regex but I don't know how to continue
/(lab)\\(?:[a-zA-Z]*)/g
Could you help me with this regex?
Please let me know if you need more information if it's not clear.
What you are after cannot be achieved through regular expressions on their own. Regular expressions are to be used for parsing information and not processing.
There is no particular pattern you are after, which is what regular expression excel at. You will need to split by ; and use a data structure such as a set to store you string values.
Is this what you want:
1) Using regular expression:
import re
s = r'lab\simon;lab\lieven;lab\tim;\lab\davy;lab\lieven'
pattern = re.compile(r'lab\\([A-z]{1,})')
user = re.findall(pattern, s)
if len(user) == len(set(user)) and len(user) >= 3:
print('Valid')
else:
print('Invalid')
2) Without using regular expression:
s = r'lab\simon;lab\lieven;lab\tim;\lab\davy;lab\lieven'
users = [i.split('\\')[-1] for i in s.split(';')]
if len(users) == len(set(users)) and len(users) >= 3:
print('Valid')
else:
print('Invalid')
In order to have a successful match, we need at least 3 sets of lab\user, i.e:
(?:\\?lab\\[\w]+(?:;|$)){3}
You didn't specify your engine but with pythonyou can use:
import re
if re.search(r"(?:\\?lab\\[\w]+(?:;|$)){3}", string):
# Successful match
else:
# Match attempt failed
Regex Demo
Regex Explanation
(?:\\?lab\\[\w]+(?:;|$)){3}
Match the regular expression «(?:\\?lab\\[\w]+(?:;|$)){3}»
Exactly 3 times «{3}»
Match the backslash character «\\?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the character string “lab” literally «lab»
Match the backslash character «\\»
Match a single character that is a “word character” «[\w]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regular expression below «(?:;|$)»
Match this alternative «;»
Match the character “;” literally «;»
Or match this alternative «$»
Assert position at the end of a line «$»
Here is a beginner-friendly way to solve your problem:
You should .split() the string per each "lab" section and declare the result as the array variable, like splitted_string.
Declare a second empty array to save each unique name, like unique_names.
Use a for loop to iterate through the splitted_string array. Check for unique names: if it isn't in your array of unique_names, add the name to unique_names.
Find the length of your array of unique_names to see if it is equal to 3. If yes, print that it is. If not, then print a fail message.
You seem like a practical person that is relatively new to string manipulation. Maybe you would enjoy some practical background reading on string manipulation at beginner sites like Automate The Boring Stuff With Python:
https://automatetheboringstuff.com/chapter6/
Or Codecademy, etc.
Another pure regex answer for the sport. As other said, you should probably not be doing this
^([^;]+)(;\1)*;((?!\1)[^;]+)(;(\1|\3))*;((?!\1|\3)[^;]+)
Explanation :
^ from the start of the string
([^;]+) we catch everything that isn't a ';'.
that's our first user, and our first capturing group
(;\1)* it could be repeated
;((?!\1)[^;]+) but at some point, we want to capture everything that isn't either
our first user nor a ';'. That's our second user,
and our third capturing group
(;(\1|\3))* both the first and second user can be repeated now
;((?!\1|\3)[^;]+) but at some point, we want to capture yada yada,
our third user and fifth capturing group
This can be done with a simple regex.
Uses a conditional for each user name slot so that the required
three names are obtained.
Note that since the three slots are in a loop, the conditional guarantees the
capture group is not overwritten (which would invalidate the below mentioned
assertion test (?! \1 | \2 | \3 ).
There is a complication. Each user name uses the same regex [a-zA-Z]+
so to accommodate that, a function is defined to check that the slot
has not been matched before.
This is using the boost engine, that cosmetically requires the group be
defined before it is back referenced.
The workaround is to define a function at the bottom after the group is defined.
In PERL (and some other engines) it is not required to define a group ahead
of time before its back referenced, so you could do away with the function
and put
(?! \1 | \2 | \3 ) # Cannot have seen this user
[a-zA-Z]+
in the capture groups on top.
At a minimum, this requires conditionals.
Formatted and tested:
# (?:(?:.*?\blab\\(?:((?(1)(?!))(?&GetUser))|((?(2)(?!))(?&GetUser))|((?(3)(?!))(?&GetUser))))){3}(?(DEFINE)(?<GetUser>(?!\1|\2|\3)[a-zA-Z]+))
# Look for 3 unique users
(?:
(?:
.*?
\b lab \\
(?:
( # (1), User 1
(?(1) (?!) )
(?&GetUser)
)
| ( # (2), User 2
(?(2) (?!) )
(?&GetUser)
)
| ( # (3), User 3
(?(3) (?!) )
(?&GetUser)
)
)
)
){3}
(?(DEFINE)
(?<GetUser> # (4)
(?! \1 | \2 | \3 ) # Cannot have seen this user
[a-zA-Z]+
)
)

RegEx to match some wrapped texts

Consider following text:
aas( I)f df (as)(dfdsf)(adf).dgdf(sfg).(dfdf) asdfsdf dsfa(asd #54 54 !fa.) sdf
I want to retrive text between parenthesis, but adjacent parentheses should be consider a single unit. How can I do that?
For above example desired output is:
( I)
(as)(dfdsf)(adf)
(sfg).(dfdf)
(asd #54 54 !fa.)
Assumption
No nesting (), and no escaping of ()
Parentheses are chained together with the . character, or by being right next to each other (no flexible spacing allowed).
(a)(b).(c) is consider a single token (the . is optional).
Solution
The regex below is to be used with global matching (match all) function.
\([^)]*\)(?:\.?\([^)]*\))*
Please add the delimiter on your own.
DEMO
Explanation
Break down of the regex (spacing is insignificant). After and including # are comments and not part of the regex.
\( # Literal (
[^)]* # Match 0 or more characters that are not )
\) # Literal ). These first 3 lines match an instance of wrapped text
(?: # Non-capturing group
\.? # Optional literal .
\([^)]*\) # Match another instance of wrapped text
)* # The whole group is repeated 0 or more times
I'd go with: /(?:\(\w+\)(?:\.(?=\())?)+/g
\(\w+\) to match a-zA-Z0-9_ inside literal braces
(?:\.(?=\())? to capture a literal . only if it's followed by another opening brace
The whole thing wrapped in (?:)+ to join adjacent captures together
var str = "aas(I)f df (asdfdsf)(adf).dgdf(sfg).(dfdf) asdfsdf dsfa(asdfa) sdf";
str.match(/(?:\(\w+\)(?:\.(?=\())?)+/g);
// -> ["(I)", "(asdfdsf)(adf)", "(sfg).(dfdf)", "(asdfa)"]
try [^(](\([^()]+([)](^[[:alnum:]]*)?[(][^()]+)*\))[^)]. capture group 1 is what you want.
this expression assumes that every kind of character apart from parentheses mayy occur in the text between parentheses and it won't match portions with nested parentheses.
This one should do the trick:
\([A-Za-z0-9]+\)