Regex Expression - text between quotes and brackets - regex

i have the following JSON string that i need to parse:
{'ConnectionDetails':'{\'server\':\'johnssasd02\',\'database\':\'enterprise analytics\'}'}]}
i am already using the expression '([^']*)' to get everything in quotes, which correctly gets me the ConnectionDetails title. However i now need an expression to get me everything between '{ and '} in order to get the full path value. so i need to capture the following from above string:
{\'server\':\'johnssasd02\',\'database\':\'enterprise analytics\'}
but having trouble coming up the regex expression
thanks

In order to extract the data between the curly braces {} you can use the regex: \{(.*?)\}

i accomplished it within an SSIS derived column task where i removed unwanted characters from the input string. that way i don't have to worry about dealing with them using regex.

Related

Tcl split regex over multiple lines

I have a long RE to match dates in multiple files and I would like to split it out over multiple lines so it is easier to read and update. I am setting it as a variable and then calling that variable in the regex statement.
set ::eval::regexdate { \d[\/\.-]\d{2}[\/\.-]\d{4}|\d{2}[\/\.-]\d{2}[\/\.-]\d{4}|\d{4}[\/\.-]\d{2}[\/\.-]\d{2}|(([12]\d|3[01])|([12]\d|3[01])(th|nd|rd|st))\s(January|February|March|April|May|June|July|August|September|October|November|December)\s\d{4}|(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)[\/\.-]\d{2}[\/\.-]\d{4} }
I am then calling it with the following regexp line...
if {[regexp "($::eval::regexdate)" $linefromfile all date]} {
Do something...
}
This all works fine if the RE is set as one long string, but if I try to break it out over multiple lines using (?x) as outlined in this post.
regexp pattern across multiple lines
set ::eval::regexdate {(?x)
\d[\/\.-]\d{2}[\/\.-]\d{4}|
\d{2}[\/\.-]\d{2}[\/\.-]\d{4}|
\d{4}[\/\.-]\d{2}[\/\.-]\d{2}|
(([12]\d|3[01])|([12]\d|3[01])(th|nd|rd|st))\s(January|February|March|April|May|June|July|August|September|October|November|December)\s\d{4}|
(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)[\/\.-]\d{2}[\/\.-]\d{4}
}
I get the following error...
`couldn't compile regular expression pattern: quantifier operand invalid.`
I am not sure why this is happening, my understanding is that using (?x) ignores all white space and comments so it should just stitch the lines back together to create the one long RE, no? Are the "|" operands causing an issue in the way that I have split the RE up?
Any help would be greatly appreciated in figuring out why it won't work when using (?x).
Thanks
The problem is the way you use the regexdate variable in your regxep command. As the post you reference indicates, (?x) should be at the start of the regular expression. However, by using "($::eval::regexdate)" you put parentheses around it, effectively making the expression ((?x)…). Putting parentheses around the complete regular expression is not very useful, as the regexp command will already put the full match in the first variable handed to it.
So, either omit the parentheses and use the complete match as the date:
regexp $::eval::regexdate $linefromfile date
Or move the (?x) to the call:
set ::eval::regexdate {
\d[\/\.-]\d{2}[\/\.-]\d{4}|
\d{2}[\/\.-]\d{2}[\/\.-]\d{4}|
\d{4}[\/\.-]\d{2}[\/\.-]\d{2}|
(([12]\d|3[01])|([12]\d|3[01])(th|nd|rd|st))\s(January|February|March|April|May|June|July|August|September|October|November|December)\s\d{4}|
(JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)[\/\.-]\d{2}[\/\.-]\d{4}
}
if {[regexp "(?x)($::eval::regexdate)" $linefromfile all date]} {
Do something...
}

regular expression replace removes first and last character when using $1

I have string like this:
&breakUp=Mumbai;city,Puma;brand&
where Mumbai;city and Puma;brand are filters(let say) separated by comma(,). I have to add more filters like Delhi;State.
I am using following regular expression to find the above string:
&breakUp=.([\w;,]*).&
and following regular expression to replace it:
&breakUp=$1,Delhi;State&
It is finding the string correctly but while replacing it is removing the first and last character and giving the following result:
&breakUp=umbai;city,Puma;bran,Delhi;State&
How to resolve this?
Also, If I have no filters I don't want that first comma. Like
&breakUp=&
should become
&breakUp=Delhi;State&
How to do it?
My guess is that your expression is just fine, there are two extra . in there, that we would remove those:
&breakUp=([\w;,]*)&
In this demo, the expression is explained, if you might be interested.
To bypass &breakUp=&, we can likely apply this expression:
&breakUp=([^&]+)&
Demo
Your problem seems to be the leading and trailing period, they are matched to any character.
Try using this regex:
&breakUp=([\w;,]*)&

How to do regular Expression in AutoIt Script

In Autoit script Iam unable to do Regular expression for the below string Here the numbers will get changed always.
Actual String = _WinWaitActivate("RX_IST2_AM [PID:942564 NPID:10991 SID:498702881] sbivvrwm060.dev.ib.tor.Test.com:30000","")
Here the PID, NPID & SID : will be changing and rest of the things are always constant.
What i have tried below is
_WinWaitActivate("RX_IST2_AM [PID:'([0-9]{1,6})' NPID:'([0-9]{1,5})' SID:'([0-9]{1,9})' sbivvrwm060.dev.ib.tor.Test.com:30000","")
Can someone please help me
As stated in the documentation, you should write the prefix REGEXPTITLE: and surround everything with square brackets, but "escape" all including ones as the dots (.) and spaces () with a backslash (\) and instead of [0-9] you might use \d like "[REGEXPTITLE:RX_IST2_AM\ \[PID:(\d{1,6})\ NPID:(\d{1,5})\ SID:(\d{1,9})\] sbivvrwm060\.dev\.ib\.tor\.Test\.com:30000]" as your parameter for the Win...(...)-Functions.
You can even omit the round brackets ((...)) but keep their content if you don't want to capture the content to process it further like with StringRegExp(...) or StringRegExpReplace(...) - using the _WinWaitActivete(...)-Function it won't make sense anyways as it is only matching and not replacing or returning anything from your regular expression.
According to regex101 both work, with the round brackets and without - you should always use a tool like this site to confirm that your expression is actually working for your input string.
Not familiar with autoit, but remember that regex has to completely match your string to capture results. For example, (goat)s will NOT capture the word goat if your string is goat or goater.
You have forgotten to add a ] in your regex, so your pattern doesn't match the string and capture groups will not be extracted. Also I'm not completely sold on the usage of '. Based on this page, you can do something like StringRegExp(yourstring, 'RX_IST2_AM [PID:([0-9]{1,6}) NPID:([0-9]{1,5}) SID:([0-9]{1,9})]', $STR_REGEXPARRAYGLOBALMATCH) and $1, $2 and $3 would be your results respectively. But maybe your approach works too.

Select last character of a substring in regexp

I'm trying to clean a huge geoJson datafile. I need to change the format of "text" field from
"text": "(2:Placename,Placename)"
to
"text": "Placename".
In Sublime text I managed to write a regular expression which enabled me to select and remove the first part leaving something like this:
"text": "Placename)"
With following regexp I can select the text above, but I need to narrow it down to the last character:
text\": \".*?\)
No matter what I can't figure out how to select the ")" character in the end of Placename string in the whole file and remove it. Note that the "Placename" here can be any place name, like New York, London etc.
I tried to build an expression where first part finds the text field, then ignores n-amount of characters until it finds the ")" character.
After experimenting and Googling I couldn't find a solution here.
You can capture the value of the second placemark field with the following regexp:
/"text": "+\(\d+:[^,]+,(.*?)\)/
Which will capture "Placename" in $1
More info on capturing parenthesis: http://www.regular-expressions.info/brackets.html
The trick is to use the inverted character classes and to escape any parentheses you want to match.
HTH
I do not know if you are using a Unix system, but probably sed can do much of the work for you. It can interpret regular expressions, capture groups, and substitute by other groups of characters. I have tried an example with sed and the following sed command worked for me:
echo "\"text\": \"(2:Placename,Placename)\"" | sed -r 's/(\"text\": )\"\([[:digit:]]:[^0-9]+,([^0-9]+)\)\"/\1\"\2\"/g'
-r allows sed to interpret regular expressions. I am using parentheses to capture groups that I will use later in the substitution (e.g., a group for "text", and a group for the second placename). In the substitution part of sed, you can use groups by using \n where n is the group number that you want to used. This expression should help you to achieve your desired result.

Regex : parsing a file location

I am trying to parse the file location using regex but I am getting extra characters when i use regex. The line that I am trying to parse is
A HREF="/MISO/getEQRFile;jsessionid=1JgnSTXhgvbpSYLVhp3h4ZpGltNpphxr1ncwlGnK3YXsh2phxKh9!794217179?entity=WEPM&nodeId=key0">EQR_WEPM_20131001_123354_M_082013.zip</a></b></td>
I need the text between the quotes. Currently I am using
^.+?<A\s*?HREF\s*?=\W(.+?.+?>) but it gives me the value
match.Groups[1].Value: /MISO/getEQRFile;jsessionid=1JgnSTXhgvbpSYLVhp3h4ZpGltNpphxr1ncwlGnK3YXsh2phxKh9!794217179?entity=WEPM&nodeId=key0">
which is an extra "> in the end. I would appreciate if someone can help me out.
Your regex sure is strange... Note that you should use a proper HTML parser if you're trying to parse HTML.
What's wrong with your regex is that you have > inside the capture, so that it'll get anything up to >.
Try using a negated class:
^.+?<A\s*?HREF\s*?="([^"]+)"
Or if you have single and/or double quotes:
^.+?<A\s*?HREF\s*?=(["'])(.*?)\1>
And use match.Groups[2].Value.
You can use a regex replace command and use:
(<A\s*?HREF\s*?=\W(.+?.+?>))([^<]*)(</a\s*>)
replacing by the 3rd group (the filename itself)
\3