jmeter grab value from response data - regex

I have a question about grabbing a certain value from the html response data in Jmeter.
I've been trying both regular expression and xpath extractor(see below) but having no luck.
This is part of the response data I receive:
<table border="0" cellpadding="2" cellspacing="1" style="border-collapse: collapse" id="AutoNumber2" bordercolorlight="#999999" bordercolordark="#999999" width="100%">
<tr>
<td class="head" align="center" colspan="2">Routing Sheet</td>
</tr>
<tr class="altrow">
<td align="right" width="50%" class="formtext">Today's Date:</td>
<td valign="top" width="50%" class="formtext">06/19/2012</td>
</tr>
<tr class="altrow">
<td align="right" width="50%" class="formtext"> HCSC Received Date:</td>
<td valign="top" width="50%" class="formtext">06/19/2012</td>
</tr>
<tr class="tablerow">
<td align="right" width="50%" class="formtext"> Package Log Date:</td>
<td valign="top" width="50%" class="formtext">06/19/2012 04:21PM</td>
</tr>
<tr class="altrow">
<td align="right" width="50%" class="formtext"> Group Specialist:</td>
<td valign="top" width="50%" class="formtext">WATTS, JOHN</td>
</tr>
<tr class="tablerow">
<td align="right" width="50%" class="formtext"> Case Underwriter:</td>
<td valign="top" width="50%" class="formtext">N/A</td>
</tr>
<tr class="altrow">
<td align="right" width="50%" class="formtext"> Medical Underwriter:</td>
<td valign="top" width="50%" class="formtext">N/A</td>
</tr>
<tr class="tablerow">
<td align="right" width="50%" class="formtext">Case Number:</td>
<td valign="top" width="50%" class="formtext">7402628</td>
</tr>
And I'm trying to grab the case number.
I have been trying the regex extractor:
Case Number:</td><td valign="top" width="50%" class="formtext">(.+?)</td>
But got a null value back.
And for xpath extractor I tried this:
//table[#id='AutoNumber2']/tbody/tr[8]/td[2]
but it's not working either.
I've been thinking of using Beanshell to grab the source code as a string and parse the number.
Is there any better way of grabbing that number?
And how can I use beanshell to grab the source code of the response data?
I tried using xpath of /html but have no luck.
Thanks a lot

Try this, I tested it on your sample and it works :
Let me know if that works for you

Try this xpath:
//table[#id='AutoNumber2']/tr[8]/td[2]

If you are using XPath Extractor to parse HTML (not XML!..) response ensure that Use Tidy (tolerant parser) option is CHECKED.
Your xpath query should return value you want to extract.
Refine your xpath query, e.g.
//table[#id='AutoNumber2']/tbody/tr[td/text()='Case Number:']/td[2]/text()
To use Beanshell for parsing look into this: Using jmeter variables in xpath extractor.
You can first test your xpath query using any other tool - Firefox addons at least:
XPath Checker
XPather
XPath Finder

You can use ViewResultsTree listener component to test and tweak your regex expression on your actual response data.
To find out what happens in runtime use Debug component.
At the first glance I see that it doesn't match because you're missing new line in your regex expression (following Case Number:</td>).
See here for special characters that emulate new line.

Related

How to find the element part of the anchor tag

I am totally new to the selenium. Please accept apologies for asking daft or silly question.
I have below on the website. What I am interested is that how can I get the data-selectdate value using selenium + python . Once I have the data-selectdate value, I would like to compare this against the the date I am interested in.
You help is deeply appreciated.
Note: I am not using Beautiful soup or anything.
Cheers
<table role="grid" tabindex="0" summary="October 2018">
<thead>
<tr>
<th role="columnheader" id="dayMonday"><abbr title="Monday">Mon</abbr></th>
<th role="columnheader" id="dayTuesday"><abbr title="Tuesday">Tue</abbr></th>
<th role="columnheader" id="dayWednesday"><abbr title="Wednesday">Wed</abbr></th>
<th role="columnheader" id="dayThursday"><abbr title="Thursday">Thur</abbr></th>
<th role="columnheader" id="dayFriday"><abbr title="Friday">Fri</abbr></th>
<th role="columnheader" id="daySaturday"><abbr title="Saturday">Sat</abbr></th>
</tr>
</thead>
<tbody>
<tr>
<td role="gridcell" headers="dayMonday">
<a data-selectdate="2018-10-22T00:00:00+01:00" data-selected="false" id="day22"
class="day-appointments-available">22</a>
</td>
<td role="gridcell" headers="dayTuesday">
<a data-selectdate="2018-10-23T00:00:00+01:00" data-selected="false" id="day23"
class="day-appointments-available">23</a>
</td>
<td role="gridcell" headers="dayWednesday">
<a data-selectdate="2018-10-24T00:00:00+01:00" data-selected="false" id="day24"
class="day-appointments-available">24</a>
</td>
<td role="gridcell" headers="dayThursday">
<a data-selectdate="2018-10-25T00:00:00+01:00" data-selected="false" id="day25"
class="day-appointments-available">25</a>
</td>
<td role="gridcell" headers="dayFriday">
<a data-selectdate="2018-10-26T00:00:00+01:00" data-selected="false" id="day26"
class="day-appointments-available">26</a>
</td>
<td role="gridcell" headers="daySaturday">
<a data-selectdate="2018-10-27T00:00:00+01:00" data-selected="false" id="day27"
class="day-appointments-available">27</a>
</td>
</tr>
</tbody>
</table>
To get the values of the attribute data-selectdate you can use the following solution:
elements = driver.find_elements_by_css_selector("table[summary='October 2018'] tbody td[role='gridcell'][headers^='day']>a")
for element in elements:
print(element.get_attribute("data-selectdate"))
You can use get_attribute api of element class to read attribute value of element.
css_locator = "table tr:nth-child(1) > td[headers='dayMonday'] > a"
ele = driver.find_element_by_css_selector(css_locator)
selectdate = ele.get_attribute('data-selectdate')

regular expression: what's wrong with my expression?

I have a difficulty building a regex.
Suppose there is a html clip as below.
I want to use Javascript to cut the <tbody> part with the link of "apple"(which <a> is inside of the <td class="by">)
I construct the following expression :
/<tbody.*?text[\s\S]*?<td class="by"[\s\S]*?<a.*?>apple<\/a>[\s\S]*?<\/tbody>/g
But the result is different from what I wanted. Each match contains more than one block of <tbody>. How it should be? Regards!!!!
(I tested with https://regex101.com/ and get the unexpected selection. Please forgive me I can't figure out the problem :( )
<tbody id="text_0">
<td class="by">
...lots of other tags
cat
...lots of other tags
</td>
</tbody>
<tbody id="text_1">
...lots of other tags
<td class="by">
apple
</td>
...lots of other tags
</tbody>
<tbody id="text_2">
...lots of other tags
<td class="by">
cat
</td>
...lots of other tags
</tbody>
<tbody id="text_3">
...lots of other tags
<td class="by">
...lots of other tags
tiger
</td>
...lots of other tags
</tbody>
<tbody id="text_4">
<td class="by">
banana
</td>
</tbody>
<tbody id="text_5">
<td class="by">
peach
</td>
</tbody>
<tbody id="text_6">
<td class="by">
apple
</td>
</tbody>
<tbody id="text_7">
<td class="by">
banana
</td>
</tbody>
And this is what i expect to get
<tbody id="text_1">
<td class="by">
apple
</td>
</tbody>
<tbody id="text_6">
<td class="by">
apple
</td>
</tbody>
This is not an answer to the regex part of the question, but shouldn't the td elements be embedded in tr elements? tr stands for "table row", while tbody stands for "table body". tbody usually groups the table rows. It is not prohibited to have more than one tbody in the same table, but it is usually not necessary. (tbody is actually optional; you can have tr directly inside the table element.)
First, Regex is not a good solution for parsing anything like HTML or XML.
I can fix your pattern to work with this specific example but I can't guarantee that it will work in all cases. Regex just is not the right tool for the job.
But anyway, replace the first 2 instances of [\s\S] in your pattern with [^<].
<tbody.*?text[^<]*?<td class="by"[^<]*?<a.*?>apple<\/a>[\s\S]*?</tbody>
Start with this working regexp and go from there:
/<a href="(.*?)">apple<\/a>/g
If that is too broad and you want to make it more specific, add the next surrounding tag:
/<td.*?>\s*<a href="(.*?)">apple<\/a>/g
Then continue:
/<tbody.*?>\s*<td.*?>\s*<a href="(.*?)">apple<\/a>/g
Also, consider an alternate solution such as XPATH. Regular expressions can't really parse all variations of HTML.

get values from table with BeautifulSoup Python

I have a table where I am extracting links and text. Although I can only do one or the other. Any idea how to get both?
Essentially I need to pull the text: "TEXT TO EXTRACT HERE"
for tr in rows:
cols = tr.findAll('td')
count = len(cols)
if len(cols) >1:
third_column = tr.findAll('td')[2].contents
third_column_text = str(third_column)
third_columnSoup = BeautifulSoup(third_column_text)
#issue starts here. How can I get either the text of the elm <td>text here</td> or the href texttext here
for elm in third_columnSoup.findAll("a"):
#print elm.text, third_columnSoup
item = { "code": random.upper(),
"name": elm.text }
items.insert(item )
The HTML Code is the following
<table cellpadding="2" cellspacing="0" id="ListResults">
<tbody>
<tr class="even">
<td colspan="4">sort results: <a href=
"/~/search/af.aspx?some=LOL&Category=All&Page=0&string=&s=a"
rel="nofollow" title=
"sort results in alphabetical order">alphabetical</a> | <strong>rank</strong> ?</td>
</tr>
<tr class="even">
<th>aaa</th>
<th>vvv.</th>
<th>gdfgd</th>
<td></td>
</tr>
<tr class="odd">
<td align="right" width="32">******</td>
<td nowrap width="60"><a href="/aaa.html" title=
"More info and direct link for this meaning...">AAA</a></td>
<td>TEXT TO EXTRACT HERE</td>
<td width="24"></td>
</tr>
<tr class="even">
<td align="right" width="32">******</td>
<td nowrap width="60"><a href="/someLink.html"
title="More info and direct link for this meaning...">AAA</a></td>
<td><a href=
"http://www.fdssfdfdsa.com/aaa">TEXT TO EXTRACT HERE</a></td>
<td width="24">
<a href=
"/~/search/google.aspx?q=lhfjl&f=a&cx=partner-pub-2259206618774155:1712475319&cof=FORID:10&ie=UTF-8"><img border="0"
height="21" src="/~/st/i/find2.gif" width="21"></a>
</td>
</tr>
<tr>
<td width="24"></td>
</tr>
<tr>
<td align="center" colspan="4" style="padding-top:6pt">
<b>Note:</b> We have 5575 other definitions for <strong><a href=
"http://www.ddfsadfsa.com/aaa.html">aaa</a></strong> in our
database</td>
</tr>
</tbody>
</table>
You can just use the text property on a td element:
from bs4 import BeautifulSoup
html = """HERE GOES THE HTML"""
soup = BeautifulSoup(html, 'html.parser')
for tr in soup.find_all('tr'):
columns = tr.find_all('td')
if len(columns) > 2:
print columns[2].text
prints:
TEXT TO EXTRACT HERE
TEXT TO EXTRACT HERE
Hope that helps.
The way to do it is by doing the following:
third_column = tr.find_all('td')[2].contents
third_column_text = str(third_column)
third_columnSoup = BeautifulSoup(third_column_text)
if third_columnSoup:
print third_columnSoup.text

Regular expression with multiple results

What's wrong with my regex ?
"/Blabla\(2\) :.*<tr><td class=\"generic\">(.*)<\/td>.+<\/tr>/Uis"
....
<tr>
<td class="aaa">Blabla(1) :</td>
<td>
<table class="bbb"><tbody>
<tr class="ccc"><th>title1</th><th>title2</th><th>title3</th></tr>
<tr><td class="generic">word1</td><td class="generic">word2 </td><td class="generic">word3</td></tr>
<tr><td class="generic">word4</td><td class="generic">word5 </td><td class="generic">word6</td></tr>
</tbody></table>
</td>
</tr>
<tr>
<td class="aaa">Blabla(2) :</td>
<td>
<table class="bbb"><tbody>
<tr class="ccc"><th>title1</th><th>title2</th><th>title3</th></tr>
<tr><td class="generic">word1b</td><td class="generic">word2b </td><td class="generic">word3b</td></tr>
<tr><td class="generic">word4b</td><td class="generic">word5b </td><td class="generic">word6b</td></tr>
</tbody></table>
</td>
</tr
What I want to do is to get the content of the FIRST TD of each TR from the block beginning with Blabla(2).
So the expected answer is word1b AND word4b
But only the first is returned...
Thank you for your help. Please don't answer me to use a DOM navigator, it's not possible in my case.
That's an interesting regex, in which I learned about the ungreedy flag, nice!
And for your problem, you might make use of \G to match immediately after the previous match and the flag g, assuming PCRE engine:
/(?:Blabla\(2\) :|(?<!^)\G).*<tr><td class=\"generic\">(.*)<\/td>.+<\/tr>/Uisg
regex101 demo
Or a little shorter with different delimiters:
'~(?:Blabla\(2\) :|(?<!^)\G).*<tr><td class="generic">(.*)</td>.+</tr>~Uisg'
Thanks to #Jerry, I learn today new tricks:
(Blabla\(2\) :.*?|\G)<tr><td class=\"generic\">\K([^<]+).+?<\/tr>\r\n

Clean html code with a Regex

I am parsing some transactions, for example 3 transactions look like this:
<TR class=DefGVRow>
<TD>29/04/2013</TD>
<TD>DEPOSITO 0140959158</TD>
<TD>0140959158</TD>
<TD align=right>336,00</TD>
<TD align=center>+</TD>
<TD align=right>16.210,60</TD></TR>H
<TR class=DefGVAltRow>
<TD>29/04/2013</TD>
<TD>RETIRO ATM CTA/CTE</TD>
<TD>1171029739</TD>
<TD align=right>600,00</TD>
<TD align=center>-</TD>
<TD align=right>15.610,60</TD></TR>
<TR class=DefGVRow>
<TD>29/04/2013</TD>
<TD>C.SERV.CAJERO AUT.</TD>
<TD>1171029739</TD>
<TD align=right>3,25</TD>
<TD align=center>-</TD>
<TD align=right>15.607,35</TD></TR>
And my current Regex is:
<TR class=\w+>
<TD>(?<day>\d{1,2})/(?<month>\d{1,2})/(?<year>\d{4})</TD>
<TD>(?<description>.+?)</TD>
<TD>(?<id>\d{3,30})</TD>
<TD.+?>(?<amount>[\d\.]{1,20},\d{1,10})</TD>
<TD.+?>(?<info>.+?)</TD>
<TD.+?>(?<balance>[\d\.]{1,20},\d{1,10})</TD></TR>
How can I edit the
<TD>(?<description>.+?)</TD>
To process optional tags that match other parts of the same extraction? (basically: how to ignore the A tag when capturing the group)
Thanks!
It is a very common problem. Please check this epic answer and stop using regexp to "parse" html, instead use a proper parser and get what you need with XPath or even a CSS selector.
This removes the 'optional' link:
<TR class=\w+>
<TD>(?<day>\d{1,2})/(?<month>\d{1,2})/(?<year>\d{4})</TD>
<TD>(?:<A href=".*>)?(?<description>.+?)(?:</A>)?</TD>
<TD>(?<id>\d{3,30})</TD>
<TD.+?>(?<amount>[\d\.]{1,20},\d{1,10})</TD>
<TD.+?>(?<info>.+?)</TD>
<TD.+?>(?<balance>[\d\.]{1,20},\d{1,10})</TD></TR>