How to get this field in regex using a pattern? - regex

Regex is not being very friendly with me, giving me 0 matches haha.
Basically, I have a big string, that includes this:
<td class="fieldLabel02Std">FIELD_LABEL</td>
<td class="fieldLabel02Std">
VALUE
</td>
Thanks to the FIELD_LABEL I should be able to find it inside the bigger string. The "VALUE" is what I want to get.
I tried this pattern
String field = "FIELD_NAME";
String pattern = field + #"[\s\S]*?\<td[\s\S]*?\<\/td\>";
That didn't work. I was thinking about this:
Get the field_name + some characters + => which would be able to give me VALUE.
This gives me 0 matches.
Help is very appreciated!

You can use something like this:
FIELD_LABEL</td>[\n\r\s]*<td class="fieldLabel02Std">[\n\r\s]*(.+?)[\n\r\s]*</td>
Generally it's bad to use a regex to parse HTML, but if you have a small problem with a known html format and you don't mind if this stop working when they change a comma...

Consider the following Regex...
(?<=FIELD_LABEL[\S\s]*?\<td.*?\>[\S\s]*?)\w+(?=[\S\s]*?\</td\>)
Good Luck!

Is this what You looking for?
FIELD_LABEL<\/td>[.\s]*?<td.*?>[.\s]*?VALUE[.\s]*?<\/td>
or
String pattern = field + #"<\/td>[.\s]*?<td.*?>[.\s]*?VALUE[.\s]*?<\/td>";

Related

Unable to accurately search a particular text in a html tag using Python

I have the below regex to identify text in a html tag that doesn't yields the result expected.
HTML Tag:
<td>Issue Amount</td>
<td>:</td>
<td>20,000,000.00</td>
Find = re.findall(?<=Issue Amount</td> <td>:</td> <td>) [0-9,]),soup_string)[0]
I need to get the numerical value 20,000,000.00 from this tag.
Any advise what am I doing wrong here. I did try couple of other ways but with no success.
Do not under any circumstances try to parse XML with a regex unless you wish to invoke rite 666 Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn.
Use an HTML parsing library see this page for some ways to do it.
However in your case you have mucked up your regex by looking for a space between your </td> and <td> tags. Whereas your data has carriage returns. You can use the \s meta-character to look for any white space character
Below is the regex piece that helped me get the desired output. Thanks all for your inputs.
(?<=Issue Amount[td\W]{21})([\d,.]+)

Regex - How to pick out these integers, individually

I have the following HTML that I am trying to pick apart. For some reason I can't figure out the Regex (which, admittedly, I suck at):
<td class="score">
286
<span class="pos">(2455 of 3921)</span>
</td>
I'm looking to get the 3 integers out, individually. So, basically:
Score = 286
Place = 2455
Entries = 3921
I went through the 'numeric ranges' page on regular-expressions.info, but still can't figure it out!!! Yes, I know it is easy... apparently my brain can't comprehend this type of logic.
I will be using it in vb.net, BTW. In case that matters.
Here's a simple example of code that does it for you at ideone.com.
The gut looks something like:
Dim regex As Regex = New Regex("(\d+)[^\d]*(\d+)[^\d]*(\d+)")
Dim match As Match = regex.Match("<td class='score'> 286 <span class='pos'>(2455 of 3921)</span> </td>")
If match.Success Then
Console.WriteLine(match.Groups(1).Value)
Console.WriteLine(match.Groups(2).Value)
Console.WriteLine(match.Groups(3).Value)
End If
This regex fetches all numbers in a string.
/\d+/g;

regex wont find a match

I am trying to pull some info here is my regex
<tr>
<td>([^<]+)<i><a href="([^<]+)" title="([^<]+)">([^<]+)<\/a><\/i><sup id="([^<]+)" class="([^<]+)"><a href="([^<]+)"><span>[<\/span>1<span>]<\/span><\/a><\/sup><\/td>
<td><a href="([^<]+)" title="([^<]+)">([^<]+)<\/a><\/td>
<td><a href="([^<]+)" title="([^<]+)">([^<]+)<\/a><\/td>
<td>([^<]+)<\/td>
<td>([^<]+)<\/td>
</tr>
here is sample html
<tr>
<td><i>3Xtreme</i><sup id="cite_ref-18" class="reference"><span>[</span>18<span>]</span></sup></td>
<td>989 Studios</td>
<td>989 Studios</td>
<td>1999-03-31<sup>NA</sup></td>
<td>NA</td>
</tr>
As of now i just want to get the data to find matches.. Can you see any reason why it would not match this?
for all the haters....
I dont care about your options on if i should use regex on html or not.. For this case it will work great. I have one page , the data i need is in a table. Once i can get the data i will save it to my db and never have to use the regex again.. Soooo if your comment or answer is about your option on using regex with html.. dont post.
...Second line:
<td>([^<]+)<i>
cannot hope to match:
<td><i>
as you put a '+' equivalent to '{1,}' while there is nothing between your tags. Didn't check the rest of your regex, but anyway it can't work.
Edit:
Please also correct the "([^<]+)" and so on (I hope you see why)... And edit your regex when you correct it.
Edit 2:
Seeing as it's quite a disaster (sorry but it's the truth :/): please consider replacing all your ([^<]+) things that won't work for all your cases by a simple (.*?)
Edit 3:
[ and ] must be escaped. (\d will help you catch numbers)
<span>[<\/span>1<span>]<\/span>
Lots of problems here: you must escape the brackets and obviously 1 won't match 18

Regex html tags

I'm trying to figure out the regex for the following:
String</td><td>[number 0-100]%</td><td>[number 0-100]%</td><td>String</td><td>String</td>
Also, some of these td tags may have style attributes at some point.
I tried this:
String<.*>
and that returned
String</td>
but trying
String<.*><.*>
returned nothing. Why is this?
You probably shouldn't be trying to use a regex to parse HTML, because that way lies madness.
(.+)</td><td>(1?\d?\d)%</td><td>(1?\d?\d)%</td><td>(.+)</td><td>(.+)</td>
use Character class, like <td[^>]*> if <td> or <td class="abc">
Try the following:
(.+)(<[^>]+>){2}(1?\d?\d)%(<[^>]+>){2}(1?\d?\d)%(<[^>]+>){2}(.+)(<[^>]+>){2}(.+)<[^>]+>
You can test it here.
EDIT: Although this will work for most of the time, if there is > character in one attribute of the tag, this regex won't work.

ASP.net REGEX Question: Find specific match, then skip everything to end tag

strRegex = New StringBuilder
strRegex.Append("<td class=""[\s\w\W]*?"">(?<strTKOWins>[^<]+)[\s]*?<span
class='[\s\w\W]*?'>(T)KOs[\s\w\W]*?</span>[\s\S]*</td>")
Regex = New System.Text.RegularExpressions.Regex(strRegex.ToString,
RegexOptions.None)
Matches = Regex.Match(results, strRegex.ToString)
This is my code. I want to match:
[? what ? Please insert here what you want to match]
The problem is that after the end of the SPAN tag, I want to skip everything inside the Table Cell and skip all the way to the end tag </td>
How can I do that?
i have no idea what you are trying to do. but this regex will find a tablecell, with a span inside of it then go to its corresponding closing tag. fill in all the specifics you need to and change it how you need to....
for eg,
text:
<td class="td class"> anything at all in here?! <span class="span class">span text</span>text in the tablecell?</td>
regex:
<td\s+class=".*?">.*?<span\s+class=".*?">.*?</span>.*?</td>
no idea what all this "strTKOWins" crap is or whether you want specific stuff in your span found?
(T)KOs[\s\w\W]*?
guess i cant really help until you respond anyways....