This seems to be a duplicate question of an already asked one but not really. What I'm looking for is one or more regular expressions without the help of any programming language to change the following text String.Concat( new string[] { "some", "random", "text", string1, string2, "end" }) into "some" + "random" + "text" + string1 + string2 + "end".
I was thinking of using two regular expressions, replacing the commas with pluses, and then removing the String.Concat( new string[] { ... }). The second part is quite easy, but I am struggling with the first regular expression. I used a positive look-behind expression, but it matches only the first comma: (?<=String\.Concat\(new string\[\] \{[^,}]*),
I'm not an expert but I think that this is a limitation of the regular expression engine. Once the first comma is matched, the regular expression engine moves the starting matching index after the comma and it doesn't match anymore the look-behind group before it.
Is there a regular expression to make this substitution, pluses instead of commas, without the help on any programming language?
Just like you said: first replace all comma by plus signs:
Regex 1: /,/g
Replacement 1: " +"
Then remove all unnecessary stuff, capture what you need and use a backreference to the captured group as replacement:
Regex 2: /String\.Concat\(\s*new\s*string\s*\[\]\s*{\s*(.*?)\s*}\)/g
Replacement 2: "$1"(or however you can specify backreferences).
I'm assuming you're using a text editor:
,
substitution:
+
See the demo
Second:
.*?{(.*?)}.*
Replacement
$1
See the demo
You should do it in two steps.
First substitution
s/^\s*String\s*\.Concat\s*\(\s*new\s+string\[\]\s*\{\s*("[^\}]*")\s*\}\)\s*$/$1/i
gets you
"some", "random", "text", string1, string2, "end"
Second one
s/\s*("[^"]*"|\b\w+\b)\s*,\s*/$1 + /g
returns your desired output
"some" + "random" + "text" + string1 + string2 + "end"
Check this: https://ideone.com/QuRKmt (sample code in Perl).
Update Notepad++ single pass solution
(Note you can also do this in RegexFormat8 using the Boost extended replacement option)
Find (?:String\.Concat\(\s*new\s*string\s*\[\]\s*{\s*([^,})]*?)\s*(?=,|}\))|\G(?!^)\s*,\s*([^,})]*?)\s*(?=,|}\)))(?:}\)|(?=(?:\s*,\s*[^,})]*)+}\)))
Replace (?1$1: + $2)
(without conditional replace its + $1$2 https://regex101.com/r/Vqe44r/1)
Formatted-expand mode
(?:
String \. Concat\( \s* new \s* string \s* \[\] \s* { \s*
( [^,})]*? ) # (1)
\s*
(?= , | }\) )
|
\G
(?! ^ )
\s* , \s*
( [^,})]*? ) # (2)
\s*
(?= , | }\) )
)
(?:
}\)
|
(?=
(?:
\s* , \s*
[^,})]*
)+
}\)
)
)
Output
"some" + "random" + "text" + string1 + string2 + "end"
Related
The string I watch to match against is as follow:
5 + __FXN1__('hello', 1, 3, '__HELLO__(hello) + 5') + 5 + (2/2) + __FXN2__('Good boy')
I tried with regex express [A-Z0-9_]+\(.*?\) which matches
__FXN1__('hello', 1, 3, '__HELLO__(hello) and __FXN2__('Good boy')
What I am expecting is:
__FXN1__('hello', 1, 3, '__HELLO__(hello) + 5') and __FXN2__('Good boy')
How can we achieve it. Please help.
If the parentheses are always balanced, you can use a recursion-based regex like
__[A-Z0-9_]+__(\((?:[^()]++|(?-1))*\))
may fail if there is an unbalanced amount of ( or ) inside strings, see this regex demo. In brief:
__[A-Z0-9_]+__ - __, one or more uppercase letters, digits or _ and then __
(\((?:[^()]++|(?-1))*\)) - Group 1: (, then any zero or more occurrences of one or more chars other than ( and ) or the whole Group 1 pattern recursed, and then a ) (so the (...) substring with any amount of paired nested parentheses is matched).
If you need to support unbalanced parentheses, it is safer to use a regex that just matches all allowed data formats, e.g.
__[A-Z0-9_]+__\(\s*(?:'[^']*'|\d+)(?:\s*,\s*(?:'[^']*'|\d+))*\s*\)
See the regex demo. Or, if ' can be escaped with a \ char inside the '...' strings, you can use
__[A-Z0-9_]+__\(\s*(?:'[^'\\]*(?:\\.[^'\\]*)*'|\d+)(?:\s*,\s*(?:'[^'\\]*(?:\\.[^'\\]*)*'|\d+))*\s*\)
See this regex demo.
Details:
__[A-Z0-9_]+__ - __, one or more upper or digits and then __
\( - ( char
\s* - zero or more whitespaces
(?:'[^']*'|\d+) - ', zero or more non-' and then a ' or one or more digits
(?:\s*,\s*(?:'[^']*'|\d+))* - zero or more occurrences of a , enclosed with optional whitespace and then either a '...' substring or one or more digits
\s*\) - zero or more whitespace and then a ).
Note if you need to support any kind of numbers, you need to replace \d+ with a more sophisticated pattern like [+-]?\d+(?:\.\d+)? or more.
I have string "xyz(text1,(text2,text3)),asd" I want to explode it with , but only condition is that explode should happen only on , which are not inside any brackets (here it is ()).
I saw many such solutions on stackoverflow but it didn't work with my pattern. (example1) (example2)
What is correct regex for my pattern?
In my case xyz(text1,(text2,text3)),asd
result should be
xyz(text1,(text2,text3)) and asd.
You may use a matching approach using a regex with a subroutine:
preg_match_all('~\w+(\((?:[^()]++|(?1))*\))?~', $s, $m)
See the regex demo
Details
\w+ - 1+ word chars
(\((?:[^()]++|(?1))*\))? - an optional capturing group matching
\( - a (
(?:[^()]++|(?1))* - zero or more occurrences of
[^()]++ - 1+ chars other than ( and )
| - or
(?1) - the whole Group 1 pattern
\) - a ).
PHP demo:
$rx = '/\w+(\((?:[^()]++|(?1))*\))?/';
$s = 'xyz(text1,(text2,text3)),asd';
if (preg_match_all($rx, $s, $m)) {
print_r($m[0]);
}
Output:
Array
(
[0] => xyz(text1,(text2,text3))
[1] => asd
)
If the requirement is to split at , but only outside nested parenthesis another idea would be to use preg_split and skip the parenthesized stuff also by use of a recursive pattern.
$res = preg_split('/(\((?>[^)(]*(?1)?)*\))(*SKIP)(*F)|,/', $str);
See this pattern demo at regex101 or a PHP demo at eval.in
The left side of the pipe character is used to match and skip what is inside the parenthesis.
On the right side it will match remaining commas that are left outside of the parenthesis.
The pattern used is a variant of different common patterns to match nested parentehsis.
I want to keep only the last term of a string separated by dots
Example:
My string is:
abc"val1.val2.val3.val4"zzz
Expected string after i use regex:
abc"val4"zzz
Which means i want the content from left-hand side which was separated with dot (.)
The most relevant I tried was
val json="""abc"val1.val2.val3.val4"zzz"""
val sortie="""(([A-Za-z0-9]*)\.([A-Za-z0-9]*){2,10})\.([A-Za-z0-9]*)""".r.replaceAllIn(json, a=> a.group(3))
the result was:
abc".val4"zzz
Can you tell me if you have different solution for regex please?
Thanks
You may use
val s = """abc"val1.val2.val3.val4"zzz"""
val res = "(\\w+\")[^\"]*\\.([^\"]*\")".r replaceAllIn (s, "$1$2")
println(res)
// => abc"val4"zzz
See the Scala demo
Pattern details:
(\\w+\") - Group 1 capturing 1+ word chars and a "
[^\"]* - 0+ chars other than "
\\. - a dot
([^\"]*\") - Group 2 capturing 0+ chars other than " and then a ".
The $1 is the backreference to the first group and $2 inserts the text inside Group 2.
Maybe without Regex at all:
scala> json.split("\"").map(_.split("\\.").last).mkString("\"")
res4: String = abc"val4"zzz
This assumes you want each "token" (separated by ") to become the last dot-separated inner token.
I want to match string1 and anything that appears in the following lines:
['string1','string2','string3']
['string1' , 'string2' , 'string3']
['string1.domain.com' , 'string2.domain.com' , 'string3.domain.com']
['string1.domain.com:8080' , 'string2.domain.com:8080' , 'string3.domain.com:8080']
Until it encounters the following:
string2
So with the right regex in the above 4 cases the results in bold would be matched:
['string1','string2','string3']
['string1' , 'string2' , 'string3']
['string1.domain.com' , 'string2.domain.com' , 'string3.domain.com']
['string1.domain.com:8080' , 'string2.domain.com:8080' , 'string3.domain.com:8080']
I tried using the following thread to solve my issue with https://regex101.com/
The regex I tried is from Question 8020848, but was not successful with matching the string correctly:
((^|\.lpdomain\.com:8080' , ')(string1))+$
But I was not successful in only matching the part I wanted to in this text:
['string1.domain.com:8080' , 'string2.domain.com:8080' , 'string3.domain.com:8080']
The following is what I received using the regex that you suggested
## -108,7 +108,7 ## node stringA, stringB, stringC,stringD inherits default {
'ssl_certificate_file' => 'test.domain.net_sha2_n.crt',
'ssl_certificate_key_file'=> 'test.domain.net_sha2.key' }
},
- service_upstream_members => ['string1.domain.com:8080', 'string2.domain.com:8080', 'string3.domain.com:8080', 'string4.domain.com:8080', 'string5.domain.com:8080'],
+ service_upstream_members => [ 'string2.domain.com:8080', 'string3.domain.com:8080', 'string4.domain.com:8080', 'string5.domain.com:8080'],
service2_upstream_members => ['string9:8080','string10:8080'],
service3_upstream_members => ['string11.domain.com:8080','string12.domain.com:8080','string13.domain.com:8080'],
service_name => 'test_web_nginx_z1',
As you can see, there is a preceding space that for some reason wasn't removed, even regex101.com demonstrates that all whitespaces are captured in the regex using
'string1[^']*'\s*,\s*
This is what I'm currently using (where server is a variable already defined in the script)
sed -i '' "s/'${server}[^']*'\s*,\s*//"
To match a string starting with ' then having string1, then any chars other than ', 0 or more occurrences, and then optional number of whitespaces, a comma and again 0+ whitespaces, you may use
'string1[^']*'\s*,\s*
See the regex demo.
Breakdown:
'string1 - a literal char sequence 'string1
[^']* - zero or more (*) characters other than ' (due to the negated character class [^...])
' - an apostrophe
\s* - 0+ whitespaces
, - a comma
\s* - 0+ whitespaces.
This should match what you ask (according to your bold highlights) allowing for an unknown amount of spaces, etc.
(?:…) is a non-capturing group.
…+? is a non-greedy match (as few as possible of x)
(string1.+?)(?:'string2)
(string1.+?)'string2
See example: https://regex101.com/r/lFPSEM/3
I am trying to jot down regex to find where I am using ltrim rtrim in where clause in stored procedures.
the regex should match stuff like:
RTRIM(LTRIM(PGM_TYPE_CD))= 'P'))
RTRIM(LTRIM(PGM_TYPE_CD))='P'))
RTRIM(LTRIM(PGM_TYPE_CD)) = 'P'))
RTRIM(LTRIM(PGM_TYPE_CD))= P
RTRIM(LTRIM(PGM_TYPE_CD))= somethingelse))
etc...
I am trying something like...
.TRIM.*\)\s+
[RL]TRIM\s*\( Will look for R or L followed by TRIM, any number of whitespace, and then a (
This what you want:
[LR]TRIM\([RL]TRIM\([^)]+\)\)\s*=\s*[^)]+\)*
?
What's that doing is saying:
[LR] # Match single char, either "L" or "R"
TRIM # Match text "TRIM"
\( # Match an open parenthesis
[RL] # Match single char, either "R" or "L" (same as [LR], but easier to see intent)
TRIM # Match text "TRIM"
\( # Match an open parenthesis
[^)]+ # Match one or more of anything that isn't closing parenthesis
\)\) # Match two closing parentheses
\s* # Zero or more whitespace characters
= # Match "="
\s* # Again, optional whitespace (not req unless next bit is captured)
[^)]+ # Match one or more of anything that isn't closing parenthesis
\)* # Match zero or more closing parentheses.
If this is automated and you want to know which variables are in it, you can wrap parentheses around the relevant parts:
[LR]TRIM\([RL]TRIM\(([^)]+)\)\)\s*=\s*([^)]+)\)*
Which will give you the first and second variables in groups 1 and 2 (either \1 and \2 or $1 and $2 depending on regex used).
How about something like this:
.*[RL]TRIM\s*\(\s*[RL]TRIM\s*\([^\)]*)\)\s*\)\s*=\s*(.*)
This will capture the inside of the trim and the right side of the = in groups 1 and 2, and should handle all whitespace in all relevant areas.