using jsoup to modify data

using jsoup to modify data - coldfusion

i have successfully used and got html from the website, i am having some troubles while showing the Data
Here is my generated code
<tr class="2" id="AS 2238_2022-10-18T08:50:00"> <td id=" Air"> <img src="/webfids/logos/AS.jpg" width="138" height="31" title=" Air" alt=" Air"> </td> <td id="2238"> 2238</td> <td id="Phoenix"> Phoenix</td> <td id="1666108200000"> 8:50A 10-18-22</td> <td id="AS 2238_2022-10-18T08:50:00_status"> <font class="default"> On Time </font></td> <td id="AS 2238_2022-10-18T08:50:00_gate">2A</td> <td id="AS 2238_2022-10-18T08:50:00_terminal"> </td> <td id="AS 2238_2022-10-18T08:50:00_codeShares"> </td> <td id="AS 2238_2022-10-18T08:50:00_CDS"> </td> <td id="marker" style="display: none">0</td> </tr>
i am trying to remove the last TD of every row, i have many rowd, i am running over the loop
here is my code
rows = TheTable.select("tr");
for ( row in rows ){
writedump(row.ToString());
writeoutput('<br><br><br>');
row.select('##marker').remove();
row.select("td:eq(0)").attr("rel", "nofollow");
// writeoutput(image.toString());
}
i am trying to remove the last TD
I want to remove the Img and just use the text in the img tag like title or alt

i am trying to remove the last TD
I want to remove the Img and just use the text in the img tag like title or alt
for( row in rows ){
// get the first image object
image = row.select( "img" )[ 1 ]
// extract the alt or title text
imageAlt = image.attr( "alt" )?:image.attr( "title" )?:""
// replace the image with the extracted text
image.parent().append( imageAlt )
image.remove()
//remove the last column
row.select( "td" ).last().remove()
}

Related

Multidimensional list in Thymeleaf (Java) - List<List<Object>>

Have anyone of you any suggestion, how to iterate through a multidimensional list in Thymeleaf?
My multidimensional list looks as follow:
#Override
public List<List<PreferredZone>> findZonesByPosition(List<Position> positionList) {
List <PreferredZone> prefZone = new ArrayList<>();
List<List<PreferredZone>> listPrefZone = new ArrayList<>();
long positionId = 0;
for (int i = 0; i < positionList.size(); i++) {
positionId = positionList.get(i).getPositionId();
prefZone = prefZoneDAO.findFilteredZone(positionId);
listPrefZone.add(prefZone);
}
return listPrefZone;
}
In my controller as attribute:
List<List<PreferredZone>> prefZoneList = prefZoneService.findZonesByPosition(positionList);
model.addAllAttributes(prefZoneList);
Finally I try to iterate this two dimensional list in a HTML table:
<table th:each="prefList :#{prefZoneList}" class="table table-striped display hover">
<thead>
<tr>
<th>ISO</th>
<th>Name</th>
<th>Ausschluss</th>
</tr>
</thead>
<!-- Loop für die Daten -->
<tr th:each="row, iterState :${prefList}" class="clickable-row">
<td th:text="${row[__${iterState.index}__]}.zoneIso"></td>
<td th:text="${row[__${iterState.index}__]}.zoneName"></td>
<td style="text-align:center;">
<input type="checkbox" th:value="${${row[__${iterState.index}__]}.zoneId}" id="zone" class="checkbox-round" />
</td>
</tr>
</table>
It doesn't work however. I don't have any other idea how to solve this.
I have to have a multidimensional list, because I have got a table with multiple records and each record contains a button to open a modal window. Each of this windows contains either a HTML table where I have to display the records.
Have you got any suggestion for me?

You have a mistake in #{prefZoneList} and (as noted in comments) in using iterState.index
Try it:
<table th:each="prefList : ${prefZoneList}" class="table table-striped display hover">
<thead>
<tr>
<th>ISO</th>
<th>Name</th>
<th>Ausschluss</th>
</tr>
</thead>
<tr th:each="row : ${prefList}" class="clickable-row">
<td th:text="${row.zoneIso}"></td>
<td th:text="${row.zoneName}"></td>
<td style="text-align:center;">
<input type="checkbox" th:value="${row.zoneId}" id="zone" class="checkbox-round" />
</td>
</tr>
</table>
Syntax #{...} - a message Expressions
iterState.index is the current iteration index, starting with 0, using like ${prefList[__${iterState.index}__].element} where element - filed in prefList.

Selenium Python I want to check an element does not have a value I get the error NoSuchElementException: Message: Unable to find element with xpath

I have a HTML table with some rows and columns. I can get the value I want from for a row from column 3 which has the value "14"
When a user deletes a record from the GUI I would like to check that 14 is not present anymore.
I get the error:
NoSuchElementException: Message: Unable to find element with xpath == //table[#id="reporting_view_report_dg_main_body"]//tr//td[3]/div/span[#title="14"]
My XPATH to find the value is:
usn_id_element = self.get_element(By.XPATH, '//table[#id="reporting_view_report_dg_main_body"]//tr//td[3]/div/span[#title="14"]')
My function routine to check the value is not there is
def is_usn_id_not_displayed_in_all_records_report_results(self, usn_id): # When a record has been disconnected call this method to check the record for usn id is not there anymore.
usn_id_element = self.get_element(By.XPATH, '//table[#id="reporting_view_report_dg_main_body"]//tr//td[3]/div/span[#title="14"]')
print "usn_id_element"
print usn_id_element
print usn_id_element.text
if usn_id not in usn_id_element:
return True
get_element routine:
from selenium.webdriver.common.by import By
# returns the element if found
def get_element(self, how, what):
# params how: By locator type
# params what: locator value
try:
element = self.driver.find_element(by=how, value=what)
except NoSuchElementException, e:
print what
print "Element not found "
print e
screenshot_name = how + what + get_datetime_now() # create screenshot name of the name of the element + locator + todays date time. This way the screenshot name will be unique and be able to save
self.save_screenshot(screenshot_name)
raise
return element
The HTML snippet is:
<table id="reporting_view_report_dg_main_body" cellspacing="0" style="table-layout: fixed; width: 100%; margin-bottom: 17px;">
<colgroup>
<tbody>
<tr class="GFNQNVHJM" __gwt_subrow="0" __gwt_row="0"\>
<tr class="GFNQNVHIN" __gwt_subrow="0" __gwt_row="1"\>
<div __gwt_cell="cell-gwt-uid-9530" style="outline-style:none;">
<span title="14" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;padding-right: 1px;">14</span>
</div>
</td>
<td class="GFNQNVHIM GFNQNVHKM"\>
<td class="GFNQNVHIM GFNQNVHKM"\>
</tr>
<tr class="GFNQNVHIN" __gwt_subrow="0" __gwt_row="13">
<td class="GFNQNVHIM GFNQNVHJN GFNQNVHLM">
<td class="GFNQNVHIM GFNQNVHJN">
<td class="GFNQNVHIM GFNQNVHJN">
<div __gwt_cell="cell-gwt-uid-9530" style="outline-style:none;">
<span class="" title="14" style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;display:block;padding-right: 1px;">14</span>
</div>
</td>
<td class="GFNQNVHIM GFNQNVHJN"\>
<td class="GFNQNVHIM GFNQNVHJN"\>
</tr>
<tr class="GFNQNVHJM" __gwt_subrow="0" __gwt_row="14"\>
<tr class="GFNQNVHIN" __gwt_subrow="0" __gwt_row="15"\>
</tbody>
</table>
How can check if the value is not there?
Thanks, Riaz

Right now you are checking the attribute 'title' has a value of 14 and not the contents of the cell. What happens after the delete occurs? Does the span remain in the cell? Does the value of the cell becomes blank and does the value of the attribute 'title' also becomes blank?
The xpath below checks that the value of the cell is blank after deletion. Assumption you get a blank cell after deletion.
"//table[#id='reporting_view_report_dg_main_body']//tr//td[3]/div/span[.='']"
If you wanna check with value of title after deletion
"//table[#id='reporting_view_report_dg_main_body']//tr//td[3]/div/span[not(#title='14')]"

vb.net webbrowser regex extract to textbox

I want to extract from those numbers webbrowser
<tr style="font-size: 14pt;">
<td align="center">1</td>
<td align="center">2</td>
<td align="center">3</td>
<td align="center">4</td>
<td align="center">5</td>
</tr>
textbox.text = 12345

You can do it using regex but it is not recommanded extract it like this:
Dim elemcol As HtmlElementCollection = Webbrowser1.Document.GetElementsByTagName("td")
For i As Integer = 0 To (elemcol.Count - 1)
Textbox1.Text &= elemcol(i).InnerHTML ' here do whatever you want with its content
Next i

get values from table with BeautifulSoup Python

I have a table where I am extracting links and text. Although I can only do one or the other. Any idea how to get both?
Essentially I need to pull the text: "TEXT TO EXTRACT HERE"
for tr in rows:
cols = tr.findAll('td')
count = len(cols)
if len(cols) >1:
third_column = tr.findAll('td')[2].contents
third_column_text = str(third_column)
third_columnSoup = BeautifulSoup(third_column_text)
#issue starts here. How can I get either the text of the elm <td>text here</td> or the href texttext here
for elm in third_columnSoup.findAll("a"):
#print elm.text, third_columnSoup
item = { "code": random.upper(),
"name": elm.text }
items.insert(item )
The HTML Code is the following
<table cellpadding="2" cellspacing="0" id="ListResults">
<tbody>
<tr class="even">
<td colspan="4">sort results: <a href=
"/~/search/af.aspx?some=LOL&Category=All&Page=0&string=&s=a"
rel="nofollow" title=
"sort results in alphabetical order">alphabetical</a> | <strong>rank</strong> ?</td>
</tr>
<tr class="even">
<th>aaa</th>
<th>vvv.</th>
<th>gdfgd</th>
<td></td>
</tr>
<tr class="odd">
<td align="right" width="32">******</td>
<td nowrap width="60"><a href="/aaa.html" title=
"More info and direct link for this meaning...">AAA</a></td>
<td>TEXT TO EXTRACT HERE</td>
<td width="24"></td>
</tr>
<tr class="even">
<td align="right" width="32">******</td>
<td nowrap width="60"><a href="/someLink.html"
title="More info and direct link for this meaning...">AAA</a></td>
<td><a href=
"http://www.fdssfdfdsa.com/aaa">TEXT TO EXTRACT HERE</a></td>
<td width="24">
<a href=
"/~/search/google.aspx?q=lhfjl&f=a&cx=partner-pub-2259206618774155:1712475319&cof=FORID:10&ie=UTF-8"><img border="0"
height="21" src="/~/st/i/find2.gif" width="21"></a>
</td>
</tr>
<tr>
<td width="24"></td>
</tr>
<tr>
<td align="center" colspan="4" style="padding-top:6pt">
<b>Note:</b> We have 5575 other definitions for <strong><a href=
"http://www.ddfsadfsa.com/aaa.html">aaa</a></strong> in our
database</td>
</tr>
</tbody>
</table>

You can just use the text property on a td element:
from bs4 import BeautifulSoup
html = """HERE GOES THE HTML"""
soup = BeautifulSoup(html, 'html.parser')
for tr in soup.find_all('tr'):
columns = tr.find_all('td')
if len(columns) > 2:
print columns[2].text
prints:
TEXT TO EXTRACT HERE
TEXT TO EXTRACT HERE
Hope that helps.

The way to do it is by doing the following:
third_column = tr.find_all('td')[2].contents
third_column_text = str(third_column)
third_columnSoup = BeautifulSoup(third_column_text)
if third_columnSoup:
print third_columnSoup.text

How to extract data by matching a variable with the tag value in python

!--This is the first table from where i get 4 id's (abc1---abc4) which i need to match with the table below and get the required data--!
<table width="100%" border="0" class=""BigClass">
<tbody>..</tbody>
</table>
!--This is the second table --!
<table width="100%" border="0" class=""BigClass">
<tbody>
<tr align="left">
<td valign="top" colspan="2">
<strong> 1.
First Topic
</strong>
<a name="abc1" id="abc1"></a>
</td>
</tr>
!--This is the place where the first speaker and his/her text comes---!
<tr align="left">
<td style="text-align:justify;line-height:2;padding-right:10px;" colspan="2">
<strong> " First Speaker " </strong>
<br>
" Some Text "
</td>
</tr>
!--This is where the second speaker comes in---!
<tr align="left">
<td style="text-align:justify;line-height:2;padding-right:10px;" colspan="2">
<strong> " Second Speaker " </strong>
<br>
" Some Text "
</td>
</tr>
<tr><td colspan="2"><br></td></tr>
<tr><td colspan="2"><br></td></tr>
!--Then here comes the row with another id--!
<tr align="left">
<td valign="top" colspan="2">
<strong> 2.
Second Topic
</strong>
<a name="abc2" id="abc2"></a>
</td>
</tr>
!--Just like before, this will also have set of speakers who have some text--!
I have two tables with the same class name which is BigClass. From the first table i extracted 4 ids which are abc1,abc2,abc3,abc4.
Now i want to check that if these ids is present in this second table(which it is)
after it matches with the ids in the second table, i want to extract the speakers and the text of those speakers.
You can see the code structure for the second table rom which i want to extract the data.

It seems the best way to extract speaker and text information is to extract all ids in a list and all speaker info in another list. Then just cross-reference the ids needed and get the corresponding speaker info.
I create a dictionary here with key as ids and value as speaker info. I found the speaker info by the condition that the td field has a style attribute defined in all fields containing speaker info.
For extracting info from HTML, I am using the BeautifulSoup library.
from bs4 import BeautifulSoup
from itertools import izip
soup = BeautifulSoup(open('table.html'))
idList = []
speakerList = []
idsRequired = ['abc1','abc2']
for a in soup.findAll('a'):
if 'id' in a.attrs.keys():
idList.append(a.attrs['id'])
for i in soup.findAll('td'):
if 'style' in i.attrs.keys():
speakerList.append(i.text)
for key,value in izip(idList,speakerList):
if key in idsRequired:
print value
This gives me the output as:
" First speaker "
" Some text "
" Second speaker "
" Some text "

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

using jsoup to modify data - coldfusion

Related

Multidimensional list in Thymeleaf (Java) - List<List<Object>>

Selenium Python I want to check an element does not have a value I get the error NoSuchElementException: Message: Unable to find element with xpath

vb.net webbrowser regex extract to textbox

get values from table with BeautifulSoup Python

How to extract data by matching a variable with the tag value in python

Categories

Resources