Returning a portion of a regular expression match - regex

This question shows my ignorance of regular expressions. I've never understood it quite enough.
If I wanted to match, for instance, just the URL portion of an tag in HTML, what would I need to do?
My regular expression to get the entire tag is:
<A[^>]*?HREF\s*=\s*[""']?([^'"" >]+?)[ '""]?>
I have no idea what I would need to do to get the URL out of that and I have no clue where to look in regular expression documentation to figure this out.

If programming in Perl you could utilize the $1 operator within an if() statement. For ex.
if( $HREF =~ /<A[^>]*?HREF\s*=\s*[""']?([^'"" >]+?)[ '""]?>/ ) {
print $1;
}

the exactly HOW part depends on the regex library you're using, but the way is to use a grouped expression. You actually already have one in your example, as grouped expressions are parenthesized. The href attribute value is your first group (your zeroth group is the whole expression.)

You can use round brackets to group parts of the regular expression match. In this case you could use a round bracket around the URL part and then later use a number to refer to that group. See here to see how exactly you can do this.

I switched things up a bit - try something like this:
<a[^>]*href="([^"]*).*>

Related

Regular expression to check path of url as well as specific parameters

I have url's like the following:
/home/lead/statusupdate.php?callback=jQuery211010657244874164462_1455536082020&ref=e13ec8e3-99a8-411c-be50-7e57991d7acb&status=5&_=1455536082021
I would like a regular expression to use in my Google analytic goal that checks to see that the request uri is /home/lead/statusupdate.php and has ref and status parameter present regardless of what order these parameters are passed and regardless of if there are extra parameters because I really just care about the 2. I have looked at these examples
How to say in RegExp "contain this too"? and Regular Expressions: Is there an AND operator? but I can't seem to adapt the examples given there to work.
Im using this online tool to test http://www.regexr.com/ (perhaps the tool is the buggy one? I'l try in javascript in the mean time)
You can try:
\/home\/lead\/statusupdate\.php\?(ref=|.*(&ref=)).*(&status=)
if the order does not matter, then add the oppostite
\/home\/lead\/statusupdate\.php\?(status=|.*(&status=)).*(&ref=)
all put together
\/home\/lead\/statusupdate\.php\?(((ref=|.*(&ref=)).*(&status=))|((status=|.*(&status=)).*(&ref=)))
try:
(/home/lead/statusupdate.php?A)|(/home/lead/statusupdate.php?B)|(/home/lead/statusupdate.php?C)|(/home/lead/statusupdate.php?D)|(/home/lead/statusupdate.php?E)|(/home/lead/statusupdate.php?F)
Note that here A,B,C,D,E,F are notations for six different permutations for 'callback' string, 'ref' string, 'status' string and '_' string.
Not really elegant but this works:
\/home\/lead\/statusupdate\.php(.*(ref|status)){2}
Looks for /home/lad/statusupdate.php followed by 2x any character followed by ref or status. Admittedly this would be a match for an url with 2x ref or status though.
Demo

Reg Expression to scrape background:url but 'url(data:image'

I am working on gradle script to go through large css file and scrap out the URLs for images. So far:
def temp = ".post-format background:url(image/goes/here.jpg); {background: .post-format {background: url(../img/post //formats.png);display:;display:.woocommerce-info:before {background: url()center no-repeat #18919c }"
def list = temp.findAll(/background:[\s]?url\([^\)]*\)/){ match ->
match
}
This works but it also takes the 'data:image' file url that we don't need. So, here the temp variable contains both - the good 'image/goes/here.jpg' url and also the one we don't need 'data:image/png[..]'. How would we have to update the regular expression to make it work? If you could also share your rational behind of the correct regular expression to help us better learn regular expressions i would much appreciate. Thank You a lot
You can use the negative look ahead mechanism to accomplish what you want. Immediately following the escaped left parenthesis you insert (?!data:image) which means that you must not match that text at that point. So your regex becomes:
/background:[\s]?url\((?!data:image)[^\)]*\)/
You can see the approach illustrated in this rubular. See also How can I find everything BUT certain phrases with a regular expression?
You didn't specify what language you're using, but if the URL you want is always the first one, just don't do a global match (which is what findAll does, whatever language that is). Most likely, changing temp.findAll to temp.match and assigning the results to a scalar string variable will do it. But please tell us which language.

Regular Expressions with conditions

I have a string that looks like:
this is a string [[and]] it is [[awesome|amazing]]
I have the following regular expression so far:
(?<mygroup>(?<=\[\[).+?(?=\]\]))
I am basically trying to capture everything inside the brackets. However, I need to add another condition that says: If the matched result contains a pipe delimiter then only return the word to the right of the pipe delimiter. If there is no pipe then just return everything inside the brackets.
The parsing result I am looking for given the example above should look like:
and
amazing
Any input is appreciated.
(?<mygroup>(?<=\[\[)([^|\]]*|)?([^|]+?)(?=\]\]))
You could use this regex:
(?<=\[\[[^\]]*?)(?!\w+\|)\w+(?=\]\])
it matches both and and amazing words in your test example. You could check it out, I created a test app on Ideone.
From the regex info page:
The tremendous power and expressivity
of modern regular expressions can
seduce the gullible — or the foolhardy
— into trying to use regexes on every
string‐related task they come across.
My advice: Just grab what is between the brackets and parse it after.
Regular expressions are not the answer to everything. May those who follow after you be spared from deciphering the regex you come up with.

Need to filter IPs in Google Analytics

I need to filter, 213.190.149.120 - 213.190.149.127 inclusive
Anyone know if there is a regular expression I can use to do this?
Thanks,
C
If you need a strict regular expression, don't forget that . matches any character, so
^213.190.149.(1(2[0-7]))$
will match "213d190c149a125" for example, which is not what you want.
On top of what, you're capturing each of the 3 digits, which is resource consuming for no apparent reason. A simple yet stricter regex would be closer to what #Marc suggested:
^213\.190\.149\.12[0-7]$
Don't know how Google Analytics expects the expression but this would be a valid regualar expression for your request:
213.190.149.12[0-7]
Okay.Found this link...
and it outputs...
^213\.190\.149\.(1(2[0-7]))$
Very handy for anyone else looking to do this.

matching table tag by regular expression in php

I need to match a substring in php substring is like
<table class="tdicerik" id="dgVeriler"
I wrote a regular expression to it like <table\s*\sid=\"dgVeriler\" but it didnot work where is my problem ?
You forgot a dot:
<table\s.*\sid="dgVeriler"
would have worked.
<table\s+.*?\s+id="dgVeriler"
would have been better (making the repetition lazy, matching as little as possible).
<table\s+[^>]*?\s+id="dgVeriler"
would have been better still (making sure that we don't accidentally match outside of the <table>tag).
And not trying to parse HTML with regular expressions, using a parser instead, would probably have been best.
I dont know what you want get but try this:
<table\s*.*id=\"dgVeriler\"