Regular expression with multiple results

Regular expression with multiple results - regex

What's wrong with my regex ?
"/Blabla\(2\) :.*<tr><td class=\"generic\">(.*)<\/td>.+<\/tr>/Uis"
....
<tr>
<td class="aaa">Blabla(1) :</td>
<td>
<table class="bbb"><tbody>
<tr class="ccc"><th>title1</th><th>title2</th><th>title3</th></tr>
<tr><td class="generic">word1</td><td class="generic">word2 </td><td class="generic">word3</td></tr>
<tr><td class="generic">word4</td><td class="generic">word5 </td><td class="generic">word6</td></tr>
</tbody></table>
</td>
</tr>
<tr>
<td class="aaa">Blabla(2) :</td>
<td>
<table class="bbb"><tbody>
<tr class="ccc"><th>title1</th><th>title2</th><th>title3</th></tr>
<tr><td class="generic">word1b</td><td class="generic">word2b </td><td class="generic">word3b</td></tr>
<tr><td class="generic">word4b</td><td class="generic">word5b </td><td class="generic">word6b</td></tr>
</tbody></table>
</td>
</tr
What I want to do is to get the content of the FIRST TD of each TR from the block beginning with Blabla(2).
So the expected answer is word1b AND word4b
But only the first is returned...
Thank you for your help. Please don't answer me to use a DOM navigator, it's not possible in my case.

That's an interesting regex, in which I learned about the ungreedy flag, nice!
And for your problem, you might make use of \G to match immediately after the previous match and the flag g, assuming PCRE engine:
/(?:Blabla\(2\) :|(?<!^)\G).*<tr><td class=\"generic\">(.*)<\/td>.+<\/tr>/Uisg
regex101 demo
Or a little shorter with different delimiters:
'~(?:Blabla\(2\) :|(?<!^)\G).*<tr><td class="generic">(.*)</td>.+</tr>~Uisg'

Thanks to #Jerry, I learn today new tricks:
(Blabla\(2\) :.*?|\G)<tr><td class=\"generic\">\K([^<]+).+?<\/tr>\r\n

Related

regular expression: what's wrong with my expression?

I have a difficulty building a regex.
Suppose there is a html clip as below.
I want to use Javascript to cut the <tbody> part with the link of "apple"(which <a> is inside of the <td class="by">)
I construct the following expression :
/<tbody.*?text[\s\S]*?<td class="by"[\s\S]*?<a.*?>apple<\/a>[\s\S]*?<\/tbody>/g
But the result is different from what I wanted. Each match contains more than one block of <tbody>. How it should be? Regards!!!!
(I tested with https://regex101.com/ and get the unexpected selection. Please forgive me I can't figure out the problem :( )
<tbody id="text_0">
<td class="by">
...lots of other tags
cat
...lots of other tags
</td>
</tbody>
<tbody id="text_1">
...lots of other tags
<td class="by">
apple
</td>
...lots of other tags
</tbody>
<tbody id="text_2">
...lots of other tags
<td class="by">
cat
</td>
...lots of other tags
</tbody>
<tbody id="text_3">
...lots of other tags
<td class="by">
...lots of other tags
tiger
</td>
...lots of other tags
</tbody>
<tbody id="text_4">
<td class="by">
banana
</td>
</tbody>
<tbody id="text_5">
<td class="by">
peach
</td>
</tbody>
<tbody id="text_6">
<td class="by">
apple
</td>
</tbody>
<tbody id="text_7">
<td class="by">
banana
</td>
</tbody>
And this is what i expect to get
<tbody id="text_1">
<td class="by">
apple
</td>
</tbody>
<tbody id="text_6">
<td class="by">
apple
</td>
</tbody>

This is not an answer to the regex part of the question, but shouldn't the td elements be embedded in tr elements? tr stands for "table row", while tbody stands for "table body". tbody usually groups the table rows. It is not prohibited to have more than one tbody in the same table, but it is usually not necessary. (tbody is actually optional; you can have tr directly inside the table element.)

First, Regex is not a good solution for parsing anything like HTML or XML.
I can fix your pattern to work with this specific example but I can't guarantee that it will work in all cases. Regex just is not the right tool for the job.
But anyway, replace the first 2 instances of [\s\S] in your pattern with [^<].
<tbody.*?text[^<]*?<td class="by"[^<]*?<a.*?>apple<\/a>[\s\S]*?</tbody>

Start with this working regexp and go from there:
/<a href="(.*?)">apple<\/a>/g
If that is too broad and you want to make it more specific, add the next surrounding tag:
/<td.*?>\s*<a href="(.*?)">apple<\/a>/g
Then continue:
/<tbody.*?>\s*<td.*?>\s*<a href="(.*?)">apple<\/a>/g
Also, consider an alternate solution such as XPATH. Regular expressions can't really parse all variations of HTML.

Tree-like matches in regex with a fixed chain

i have a very specific task to achieve with a single regex.
Here's the pattern of the text i have to extract the data from (note i'm parsing HTML-like code, stored in an immutable file) :
<tr>
<td > <a ><img /></a>
</td>
<td > <a ><span >RootData</span></a>
</td>
<td > Data1.1
</td>
<td > <a ><img /></a>
</td>
<td > <a ><span >Data1.2</span></a>
</td>
<td >  
</td></tr>
<tr>
<td > Data2.1
</td>
<td > <a ><img /></a>
</td>
<td > <a ><span >Data2.2</span></a>
</td>
<td >  
</td></tr>
...
First there's a root contained inside the first "tr". Still inside this one, there's some datq (Data1.1 and Data1.2) to extract.
Then comes a finite number of "tr" block each containing data to extract.
I'd like the matches to be like this :
match 1 : 'RootData' 'Data1.1' 'Data1.2'
match 2 : 'RootData' 'Data2.1' 'Data2.2'
etc
So far i see what to do with 2 regex and 2 loops (like 1 searching for the Root, and the other to find all datas from this root) but i'd like it to be in a single regex.
If some of you already encountered that and could help, that'd be nice :)
Thanks in advance.

If I understand you correctly, you'd like to have a single regular expression provide more than one match for the same input. Regular expressions do not work that way, and are probably just not the right tool for the problem you're trying to solve.

Legacy Site vulnerable to XSS Attack [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Closed 8 years ago.
Improve this question
I'm experiencing XSS in a legacy site.
The Parameter vulnerable to this attack is: ldapSearch.jsp?f=
After adding the XSS payload to check whether it is vulnerable or not: "><img src=x onerror=prompt(0);>
The URL will look like:
http://idenservices.hostname.com/axrac/ldapSearch.jsp?f=%22%3E%3Cimg%20src=x%20onerror=prompt%280%29;%3E
The XSS pop up comes up and proves that the site is vulnerable to XSS attacks.
Snippet from JSP
<tr>
<td class="required">*</td>
<td class="label"><h3>Enter User's Core ID</h3></td>
<td class="field"><input type="text" name="userid" size="25" maxlength="20" onkeypress="return isAlphaNumberKey(event)" onblur="return LowerCaseAlphanumeric(document.getElementById('userid'));">Lookup User</td>
</tr>
Snippet from JS
function userlookup(fieldName, formName)
{
var uri = "/axrac/ldapSearch.jsp?f=" + formName + "&f1=" + fieldName;
msgWindow=open(uri,'lookup','width=600,height=400,resizable=yes,toolbar=no,menubar=no,location=no,directories=no,status=no');
msgWindow.focus();
}
Adding ldapsearch.jsp
<%
String backFieldName = request.getParameter("f1");
String backFormName = request.getParameter("f");
%>
<table width="100%" cellpadding="0" cellspacing="0" border="0" class="PageSubHeader1">
<tr class="bg">
<td class="flag"> </td>
<td class="banner" width="100%"><h2>LDAP Search</h2></td>
</tr>
</table>
<table cellpadding="0" cellspacing="0" border="0" class="PageIntroduction">
<tr>
<td class="copy">
<br/>When searching for a person by their name, please provide 2 or more letters for their first and last name.
If less than 2 letters are entered for both fields or if one field is empty, the search may not return any results.
</td>
</tr>
</table>
<p class="HorizontalRule"></p>
<form action='ldapSearchResults.jsp' method='post'>
<input type="HIDDEN" name="backFieldName" value="<%=backFieldName%>">
<input type="HIDDEN" name="backFormName" value="<%=backFormName%>">
<table width="100%" cellspacing="0" border="0" class="Forms">
<tr>
<td class="required">*</td>
<td class="instruction" colspan="2"><h2>Indicates required field</h3></td>
</tr>
<tr>
<td class="required">*</td>
<td class="label"><h3>First Name</h3></td>
<td class="field"><input type=text name='firstName' size="20"></td>
</tr>
<tr>
<td class="required">*</td>
<td class="label"><h3>Last Name</h3></td>
<td class="field"><input type=text name='lastName' size="20"></td>
</tr>
<tr>
<td> </td>
<td class="label" colspan="2"><h3>- Or -</h3></td>
</tr>
<tr>
<td class="required">*</td>
<td class="label"><h3>Core ID</h3></td>
<td class="field"><input type=text name='coreID' size="20"></td>
</tr>
</table>
<p class="HorizontalRule"></p>
<table width="100%" cellpadding="0" cellspacing="0" border="0" class="Buttons">
<tr>
<td><input type="submit" class="systemButton1" value="Submit Form" id="Submit"> <input type="reset" class="systemButton2" value="Reset Form" id="Reset"></td>
</tr>
</table>
</form>
</body>
</html>
I do not see any issue with JavaScript, but still it is prone to XSS attack.Need help in understanding why it is vulnerable and what should I do to fix this.

Need help in understanding why it is vulnerable
You take user input here:
String backFieldName = request.getParameter("f1");
Then your output it, without modification, here:
<input type="HIDDEN" name="backFieldName" value="<%=backFieldName%>">
(You do the same with other data too, but we'll use this for the example).
This allows anyone to craft a link that contains a "> followed by any HTML (including <script> elements or a Payment Required form) they want, send it to someone, and then have their HTML appear on your site when that person follows the link.
and what should I do to fix this.
Either convert any characters with special meaning in HTML to their respective entities, or run the data through a whitelist to filter out potentially bad input.
Further reading: OWASP XSS Prevention Cheat Sheet

Clean html code with a Regex

I am parsing some transactions, for example 3 transactions look like this:
<TR class=DefGVRow>
<TD>29/04/2013</TD>
<TD>DEPOSITO 0140959158</TD>
<TD>0140959158</TD>
<TD align=right>336,00</TD>
<TD align=center>+</TD>
<TD align=right>16.210,60</TD></TR>H
<TR class=DefGVAltRow>
<TD>29/04/2013</TD>
<TD>RETIRO ATM CTA/CTE</TD>
<TD>1171029739</TD>
<TD align=right>600,00</TD>
<TD align=center>-</TD>
<TD align=right>15.610,60</TD></TR>
<TR class=DefGVRow>
<TD>29/04/2013</TD>
<TD>C.SERV.CAJERO AUT.</TD>
<TD>1171029739</TD>
<TD align=right>3,25</TD>
<TD align=center>-</TD>
<TD align=right>15.607,35</TD></TR>
And my current Regex is:
<TR class=\w+>
<TD>(?<day>\d{1,2})/(?<month>\d{1,2})/(?<year>\d{4})</TD>
<TD>(?<description>.+?)</TD>
<TD>(?<id>\d{3,30})</TD>
<TD.+?>(?<amount>[\d\.]{1,20},\d{1,10})</TD>
<TD.+?>(?<info>.+?)</TD>
<TD.+?>(?<balance>[\d\.]{1,20},\d{1,10})</TD></TR>
How can I edit the
<TD>(?<description>.+?)</TD>
To process optional tags that match other parts of the same extraction? (basically: how to ignore the A tag when capturing the group)
Thanks!

It is a very common problem. Please check this epic answer and stop using regexp to "parse" html, instead use a proper parser and get what you need with XPath or even a CSS selector.

This removes the 'optional' link:
<TR class=\w+>
<TD>(?<day>\d{1,2})/(?<month>\d{1,2})/(?<year>\d{4})</TD>
<TD>(?:<A href=".*>)?(?<description>.+?)(?:</A>)?</TD>
<TD>(?<id>\d{3,30})</TD>
<TD.+?>(?<amount>[\d\.]{1,20},\d{1,10})</TD>
<TD.+?>(?<info>.+?)</TD>
<TD.+?>(?<balance>[\d\.]{1,20},\d{1,10})</TD></TR>

jmeter grab value from response data

I have a question about grabbing a certain value from the html response data in Jmeter.
I've been trying both regular expression and xpath extractor(see below) but having no luck.
This is part of the response data I receive:
<table border="0" cellpadding="2" cellspacing="1" style="border-collapse: collapse" id="AutoNumber2" bordercolorlight="#999999" bordercolordark="#999999" width="100%">
<tr>
<td class="head" align="center" colspan="2">Routing Sheet</td>
</tr>
<tr class="altrow">
<td align="right" width="50%" class="formtext">Today's Date:</td>
<td valign="top" width="50%" class="formtext">06/19/2012</td>
</tr>
<tr class="altrow">
<td align="right" width="50%" class="formtext"> HCSC Received Date:</td>
<td valign="top" width="50%" class="formtext">06/19/2012</td>
</tr>
<tr class="tablerow">
<td align="right" width="50%" class="formtext"> Package Log Date:</td>
<td valign="top" width="50%" class="formtext">06/19/2012 04:21PM</td>
</tr>
<tr class="altrow">
<td align="right" width="50%" class="formtext"> Group Specialist:</td>
<td valign="top" width="50%" class="formtext">WATTS, JOHN</td>
</tr>
<tr class="tablerow">
<td align="right" width="50%" class="formtext"> Case Underwriter:</td>
<td valign="top" width="50%" class="formtext">N/A</td>
</tr>
<tr class="altrow">
<td align="right" width="50%" class="formtext"> Medical Underwriter:</td>
<td valign="top" width="50%" class="formtext">N/A</td>
</tr>
<tr class="tablerow">
<td align="right" width="50%" class="formtext">Case Number:</td>
<td valign="top" width="50%" class="formtext">7402628</td>
</tr>
And I'm trying to grab the case number.
I have been trying the regex extractor:
Case Number:</td><td valign="top" width="50%" class="formtext">(.+?)</td>
But got a null value back.
And for xpath extractor I tried this:
//table[#id='AutoNumber2']/tbody/tr[8]/td[2]
but it's not working either.
I've been thinking of using Beanshell to grab the source code as a string and parse the number.
Is there any better way of grabbing that number?
And how can I use beanshell to grab the source code of the response data?
I tried using xpath of /html but have no luck.
Thanks a lot

Try this, I tested it on your sample and it works :
Let me know if that works for you

Try this xpath:
//table[#id='AutoNumber2']/tr[8]/td[2]

If you are using XPath Extractor to parse HTML (not XML!..) response ensure that Use Tidy (tolerant parser) option is CHECKED.
Your xpath query should return value you want to extract.
Refine your xpath query, e.g.
//table[#id='AutoNumber2']/tbody/tr[td/text()='Case Number:']/td[2]/text()
To use Beanshell for parsing look into this: Using jmeter variables in xpath extractor.
You can first test your xpath query using any other tool - Firefox addons at least:
XPath Checker
XPather
XPath Finder

You can use ViewResultsTree listener component to test and tweak your regex expression on your actual response data.
To find out what happens in runtime use Debug component.
At the first glance I see that it doesn't match because you're missing new line in your regex expression (following Case Number:</td>).
See here for special characters that emulate new line.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regular expression with multiple results - regex

Thanks to #Jerry, I learn today new tricks: (Blabla\(2\) :.*?|\G)<tr><td class=\"generic\">\K([^<]+).+?<\/tr>\r\n

Related

regular expression: what's wrong with my expression?

Tree-like matches in regex with a fixed chain

Legacy Site vulnerable to XSS Attack [closed]

Clean html code with a Regex

jmeter grab value from response data

Categories

Resources