Regex : parsing a file location - regex

I am trying to parse the file location using regex but I am getting extra characters when i use regex. The line that I am trying to parse is
A HREF="/MISO/getEQRFile;jsessionid=1JgnSTXhgvbpSYLVhp3h4ZpGltNpphxr1ncwlGnK3YXsh2phxKh9!794217179?entity=WEPM&nodeId=key0">EQR_WEPM_20131001_123354_M_082013.zip</a></b></td>
I need the text between the quotes. Currently I am using
^.+?<A\s*?HREF\s*?=\W(.+?.+?>) but it gives me the value
match.Groups[1].Value: /MISO/getEQRFile;jsessionid=1JgnSTXhgvbpSYLVhp3h4ZpGltNpphxr1ncwlGnK3YXsh2phxKh9!794217179?entity=WEPM&nodeId=key0">
which is an extra "> in the end. I would appreciate if someone can help me out.

Your regex sure is strange... Note that you should use a proper HTML parser if you're trying to parse HTML.
What's wrong with your regex is that you have > inside the capture, so that it'll get anything up to >.
Try using a negated class:
^.+?<A\s*?HREF\s*?="([^"]+)"
Or if you have single and/or double quotes:
^.+?<A\s*?HREF\s*?=(["'])(.*?)\1>
And use match.Groups[2].Value.

You can use a regex replace command and use:
(<A\s*?HREF\s*?=\W(.+?.+?>))([^<]*)(</a\s*>)
replacing by the 3rd group (the filename itself)
\3

Related

Notepad++ RegEX how do I append a character based on start of the character and before a character?

I would like to append _OLD to the end of each strings that starts with SR_ but before the symbol ' or without it
For example my text is the following:
SR_Apple
When the 'SR_APPLE' rotten, we must discard it.
I would like the find and replace to do:
SR_Apple_OLD
When the 'SR_APPLE_OLD' rotten, we must discard it.
I have tried (SR_*)+$.*(?='\s) based on what i Learned but no luck so far. Please help. Thx in Adv
For simple cases you should be able to use
Find: (\bSR_[\w]+)
Replace: $1_OLD
(\bSR_.+?)('|$) and $1_OLD$2 could also work if the text after SR_ is more complex
The lookbehind you're using is only matching the string if it ends with a ' so it won't find the text not in quotes.
regex101 is a useful tool for debugging expressions

Regex Expression - text between quotes and brackets

i have the following JSON string that i need to parse:
{'ConnectionDetails':'{\'server\':\'johnssasd02\',\'database\':\'enterprise analytics\'}'}]}
i am already using the expression '([^']*)' to get everything in quotes, which correctly gets me the ConnectionDetails title. However i now need an expression to get me everything between '{ and '} in order to get the full path value. so i need to capture the following from above string:
{\'server\':\'johnssasd02\',\'database\':\'enterprise analytics\'}
but having trouble coming up the regex expression
thanks
In order to extract the data between the curly braces {} you can use the regex: \{(.*?)\}
i accomplished it within an SSIS derived column task where i removed unwanted characters from the input string. that way i don't have to worry about dealing with them using regex.

Parse the string using RegEx in notepad++

I am trying to parse out some data using notepad++ macro. Here is the example of the data I have
<abcdefghkdadajsdkdjg><hhDate>2019-12-31 <dklajdlajdkjasd>
I want hhDate 2019-12-31 from the above data. I am very new to RegEx so I didn't try anything but I used notepad++ techniques to select and delete the unnecessary text but didn't work out.
Any help is appreciated.
Thanks
Assuming each of the strings are on a new line because you have to capture the whole line to remove the 'junk' and leave the good stuff, find the start of the line (^), then find first bit you want to capture and wrap it in () then find the second bit and wrap it in (), then proceed on to the end of the line ($).
So in Notepad++ work to get all the strings on separate lines first if they are not already. Then find/replace with 'regex' mode selected:
Find:
^.*?<.*?<(hhDate)>(\d+-\d+-\d+).*$
Replace:
$1 $2
https://regex101.com/r/BKha4m/1
If you don't want < to be removed before hh ? Then try this short code.
Find what: \s<.*?>
Replace with: nothing
Otherwise use this \s<.*?><|<.*>
Uncheck match-case

Regex - Match multi-line content between double curly brackets

I am am trying to refactor some code and need to use regular expression to find a large number of strings. An example string is like:
{{ Form::text('twitter', Input::old('twitter'),
array(
'class'=>'form-control ',
'placeholder'=>'E.g http://www.twitter.com/MyTwitterPage'
))
}}
I have managed to use \{\{(.*\s*Form::.*\s*)\}\} to match strings when they're on a single line, but it fails to match multi-line strings such as the above.
Also, I'm using PHPStorm's regex find feature if that's of any help.
Any help is much appreciated.
You can use
\{\{(\s*Form::\w*\((?:[^}]*(?:}[^}]+)*))}}
See the regex demo
It is basically the same as \{\{(\s*Form::\w*\([\s\S]*?)}}, but it uses an unrolled logic and is thus much effecient.
Preparse the file and remove newlines.
or set the /s flag to the regex and see if that works, not sure if php supports it.
/\{\{(.*\s*Form::.*\s*)\}\}/s

replacing all open tags with a string

Before somebody points me to that question, I know that one can't parse html with regex :) And this is not what I am trying to do.
What I need is:
Input: a string containing html.
Output: replace all opening tags
***<tag>
So if I get
<a><b><c></a></b></c>, I want
***<a>***<b>***<c></a></b></c>
as output.
I've tried something like:
(<[~/].+>)
and replace it with
***$1
But doesn't really seem to work the way I want it to. Any pointers?
Clarification: it's guaranteed that there are no self closing tags nor comments in the input.
You just have two problems: ^ is the character to exclude items from a character class, not ~; and the .+ is greedy, so will match as many characters as possible before the final >. Change it to:
(<[^/].+?>)
You can also probably drop the parentheses and replace with $0 or $&, depending on the language.
Try using: (<[^/].*?>) and replace it with ***$1