I have a html table like this:
<table style="width:100%">
<tr>
<td class="country">Germany</td>
</tr>
<tr>
<td class="city">Berlin</td>
</tr>
<tr>
<td class="city">Cologne</td>
</tr>
<tr>
<td class="city">Munich</td>
</tr>
<tr>
<td class="country">France</td>
</tr>
<tr>
<td class="city">Paris</td>
</tr>
<tr>
<td class="country">USA</td>
</tr>
<tr>
<td class="city">New York</td>
</tr>
<tr>
<td class="city">Las Vegas</td>
</tr>
</table>
From this table, I want to generate Objects like the classes Country and City. Country would have a List of Cities.
Now to the problem:
It's easy to create a regex to get all countries and all cities, but i wonder if i can get groups for the cities to repeat until the next country starts? I need to do this, because I can't figure out programmatically which city belongs to which country if I have them in seperated regex-matches.
It should be like (quick&dirty solution):
country">([\w]*)<{.*\n.*\n.*\n.*"city">([\w]*)}
the curly braces should be repeated until the next country item shows up.
If you have a completely different idea on how to get objects out of a html table in c#, let me know!
Thanks in advance!
Agree that for any non-trivial HTML a HTML parser like HtmlAgilityPack should be used. With that said, if your HTML is as simple as the snippet above, this works, even if there are multiple line breaks in the string:
string HTML = #"
<table style='width:100%'>
<tr><td class='country'>Germany</td></tr>
<tr><td class='city'>Berlin</td></tr>
<tr><td class='city'>Cologne</td></tr>
<tr><td class='city'>Munich</td></tr>
<tr><td class='country'>France</td></tr>
<tr><td class='city'>Paris</td></tr>
<tr><td class='country'>USA</td></tr>
<tr><td class='city'>New York</td></tr>
<tr><td class='city'>Las Vegas</td></tr>
</table>";
var regex = new Regex(
#"
class=[^>]*?
(?<class>[-\w\d_]+)
[^>]*>
(?<text>[^<]+)
<
",
RegexOptions.Compiled | RegexOptions.IgnoreCase
| RegexOptions.IgnorePatternWhitespace
);
var country = string.Empty;
var Countries = new Dictionary<string, List<string>>();
foreach (Match match in regex.Matches(HTML))
{
string countryCity = match.Groups["class"].Value.Trim();
string text = match.Groups["text"].Value.Trim();
if (countryCity.Equals("country", StringComparison.OrdinalIgnoreCase))
{
country = text;
Countries.Add(text, new List<string>());
}
else
{
Countries[country].Add(text);
}
}
Related
I have a table on an index page in my razor page app. If the Applicant Last Name is null (or field is empty in database), I would like to have the Applicant Company Name be placed in the cell instead.
I'm just not getting the syntax correct though. Here is what I currently have. Note: The comma between #obj.ApplicantLHame and #obj.ApplicantFName seems to be an issue as well. I would like this to look like: Doe, John
Results currently do not return any Applicant Company Names and #obj.ApplicantLName #obj.ApplicantFName returns DoeJohn (w/o the space)
<table id="ReferralTable" class="table table-bordered table-striped" style="width:100%">
<thead>
<tr>
<th>
<a asp-page="./Index" asp-route-sortOrder="#Model.RefNoCompleteSort">Referral No</a> / Tax Map No
</th>
<th>
Municipality
</th>
<th>
Referring Board
</th>
<th>
Applicant
</th>
<th>
Application Type - Class
</th>
</tr>
</thead>
<tbody>
#foreach(var obj in Model.Referral)
{
<tr>
<td width="15%">#obj.RefNoComplete</td>
<td width="30%">#obj.RefMunicipality</td>
<td width="20%">#obj.RefAgencyName</td>
<td width="20%">
#if(#obj.ApplicantLName is null)
{
#obj.ApplicantCompany
}
else
{
#obj.ApplicantLName, #obj.ApplicantFName
}
</td>
<td width="15%">#obj.ApplicationType - #obj.CurrentClass</td>
</tr>
<tr>
<td colspan="1">#obj.TaxMapNo</td>
<td colspan="4">#obj.Comments</td>
</tr>
}
</tbody>
</table>
Your code in order to work you will have to make it look like below:
#if (obj.ApplicantLName is null)
{
#obj.ApplicantCompany
}
else
{
<text>#obj.ApplicantLName, #obj.ApplicantFName</text>
}
or you can replace it with the following one line code.
#(obj.ApplicantLName is null ? obj.ApplicantCompany : obj.ApplicantLName + ", " + obj.ApplicantFName)
I am working with this data in Apache OpenOffice 4.1.2 with the goal of a vlookup that allows for cross sheet lookup of data in Sheet1 being added to Sheet2 base on the column pn. Here is the equation I have now but its not lining up right now. Any suggestions/corrections welcome.
In Sheet 2 using this.
=VLOOKUP(A2; Sheet1.A2:Sheet1.C500; 2; 1)
From What I understand Im expecting the code to return the information in the name column from Sheet1 based on the match of the pn column across all 500 rows.
Sheet1<br>
<table>
<thead>
<tr>
|<th>code</th>|
|<th>name</th>|
|<th>pn</th>|
</tr>
</thead>
<br>
<tbody>
<tr>
|<td>111</td>|
|<td>one</td>|
|<td>101</td>|
</tr>
<br>
<tr>
|<td>112</td>|
|<td>two</td>|
|<td>102</td>|
</tr>
</table>
<br>
Sheet2<br>
<table>
<thead>
<tr>
|<th>pn</th>|
|<th>qty</th>|
|<th>cur</th>|
</tr>
</thead>
<br>
<tbody>
<tr>
|<td>102</td>|
|<td>200</td>|
|<td> $ </td>|
</tr>
<br>
<tr>
|<td>101</td>|
|<td>150</td>|
|<td> $ </td>|
</tr>
</table>
Have anyone of you any suggestion, how to iterate through a multidimensional list in Thymeleaf?
My multidimensional list looks as follow:
#Override
public List<List<PreferredZone>> findZonesByPosition(List<Position> positionList) {
List <PreferredZone> prefZone = new ArrayList<>();
List<List<PreferredZone>> listPrefZone = new ArrayList<>();
long positionId = 0;
for (int i = 0; i < positionList.size(); i++) {
positionId = positionList.get(i).getPositionId();
prefZone = prefZoneDAO.findFilteredZone(positionId);
listPrefZone.add(prefZone);
}
return listPrefZone;
}
In my controller as attribute:
List<List<PreferredZone>> prefZoneList = prefZoneService.findZonesByPosition(positionList);
model.addAllAttributes(prefZoneList);
Finally I try to iterate this two dimensional list in a HTML table:
<table th:each="prefList :#{prefZoneList}" class="table table-striped display hover">
<thead>
<tr>
<th>ISO</th>
<th>Name</th>
<th>Ausschluss</th>
</tr>
</thead>
<!-- Loop für die Daten -->
<tr th:each="row, iterState :${prefList}" class="clickable-row">
<td th:text="${row[__${iterState.index}__]}.zoneIso"></td>
<td th:text="${row[__${iterState.index}__]}.zoneName"></td>
<td style="text-align:center;">
<input type="checkbox" th:value="${${row[__${iterState.index}__]}.zoneId}" id="zone" class="checkbox-round" />
</td>
</tr>
</table>
It doesn't work however. I don't have any other idea how to solve this.
I have to have a multidimensional list, because I have got a table with multiple records and each record contains a button to open a modal window. Each of this windows contains either a HTML table where I have to display the records.
Have you got any suggestion for me?
You have a mistake in #{prefZoneList} and (as noted in comments) in using iterState.index
Try it:
<table th:each="prefList : ${prefZoneList}" class="table table-striped display hover">
<thead>
<tr>
<th>ISO</th>
<th>Name</th>
<th>Ausschluss</th>
</tr>
</thead>
<tr th:each="row : ${prefList}" class="clickable-row">
<td th:text="${row.zoneIso}"></td>
<td th:text="${row.zoneName}"></td>
<td style="text-align:center;">
<input type="checkbox" th:value="${row.zoneId}" id="zone" class="checkbox-round" />
</td>
</tr>
</table>
Syntax #{...} - a message Expressions
iterState.index is the current iteration index, starting with 0, using like ${prefList[__${iterState.index}__].element} where element - filed in prefList.
I have a table where I am extracting links and text. Although I can only do one or the other. Any idea how to get both?
Essentially I need to pull the text: "TEXT TO EXTRACT HERE"
for tr in rows:
cols = tr.findAll('td')
count = len(cols)
if len(cols) >1:
third_column = tr.findAll('td')[2].contents
third_column_text = str(third_column)
third_columnSoup = BeautifulSoup(third_column_text)
#issue starts here. How can I get either the text of the elm <td>text here</td> or the href texttext here
for elm in third_columnSoup.findAll("a"):
#print elm.text, third_columnSoup
item = { "code": random.upper(),
"name": elm.text }
items.insert(item )
The HTML Code is the following
<table cellpadding="2" cellspacing="0" id="ListResults">
<tbody>
<tr class="even">
<td colspan="4">sort results: <a href=
"/~/search/af.aspx?some=LOL&Category=All&Page=0&string=&s=a"
rel="nofollow" title=
"sort results in alphabetical order">alphabetical</a> | <strong>rank</strong> ?</td>
</tr>
<tr class="even">
<th>aaa</th>
<th>vvv.</th>
<th>gdfgd</th>
<td></td>
</tr>
<tr class="odd">
<td align="right" width="32">******</td>
<td nowrap width="60"><a href="/aaa.html" title=
"More info and direct link for this meaning...">AAA</a></td>
<td>TEXT TO EXTRACT HERE</td>
<td width="24"></td>
</tr>
<tr class="even">
<td align="right" width="32">******</td>
<td nowrap width="60"><a href="/someLink.html"
title="More info and direct link for this meaning...">AAA</a></td>
<td><a href=
"http://www.fdssfdfdsa.com/aaa">TEXT TO EXTRACT HERE</a></td>
<td width="24">
<a href=
"/~/search/google.aspx?q=lhfjl&f=a&cx=partner-pub-2259206618774155:1712475319&cof=FORID:10&ie=UTF-8"><img border="0"
height="21" src="/~/st/i/find2.gif" width="21"></a>
</td>
</tr>
<tr>
<td width="24"></td>
</tr>
<tr>
<td align="center" colspan="4" style="padding-top:6pt">
<b>Note:</b> We have 5575 other definitions for <strong><a href=
"http://www.ddfsadfsa.com/aaa.html">aaa</a></strong> in our
database</td>
</tr>
</tbody>
</table>
You can just use the text property on a td element:
from bs4 import BeautifulSoup
html = """HERE GOES THE HTML"""
soup = BeautifulSoup(html, 'html.parser')
for tr in soup.find_all('tr'):
columns = tr.find_all('td')
if len(columns) > 2:
print columns[2].text
prints:
TEXT TO EXTRACT HERE
TEXT TO EXTRACT HERE
Hope that helps.
The way to do it is by doing the following:
third_column = tr.find_all('td')[2].contents
third_column_text = str(third_column)
third_columnSoup = BeautifulSoup(third_column_text)
if third_columnSoup:
print third_columnSoup.text
I got a coldfusion query where the result is grouped on country names. With a click on this one, I try to open or close the list under the country. But i cannot work correctly with this siblings and this parents. The result is, if i click on a country name, the fourth one, for example, it close all childrens, and the three country name which are before too.
Can someone help me to choose the right selectors ?
Thank you in advance ,
Michel
The code:
<script type="text/javascript" language="javascript">
$(document).ready(function(){
var toggleMinus = '<cfoutput>#variables.strWebAddress#</cfoutput>/images/bullet_toggle_minus.png';
var togglePlus = '<cfoutput>#variables.strWebAddress#</cfoutput>/images/bullet_toggle_plus.png';
var $subHead = $('table#categorylist tbody th:first-child');
$subHead.prepend('<img src="' +toggleMinus+ '" alt="collapse this section" /> ');
$('img', $subHead).addClass('clickable').click(function(){
var toggleSrc = $(this).attr('src');
if(toggleSrc == toggleMinus){
$(this).attr('src',togglePlus).parents('.country').siblings().fadeOut('fast');
}else{
$(this).attr('src',toggleMinus).parents('.country').siblings().fadeIn('fast');
}
});
});
</script>
<table width="95%" border="0" cellspacing="2" cellpadding="2" align="center id="categorylist">
<thead>
<tr>
<th class="text3" width="15%">
<cfmodule template="../custom_tags/get_message.cfm" keyName="L_ACTOR_CODENUMBER">
</th>
<th class="text3" width="15%">
<cfmodule template="../custom_tags/get_message.cfm" keyName="L_ACTOR_CODE">
</th>
<th class="text3" width="55%">
<cfmodule template="../custom_tags/get_message.cfm" keyName="L_ACTOR_NAME">
</th>
<th class="text3" width="15%">
<cfmodule template="../custom_tags/get_message.cfm" keyName="L_ACTIVE">
</th>
</tr>
</thead>
<tbody id="content">
<cfoutput query="qryCategoryUrl" group="country_name" groupcasesensitive="false">
<tr class="country">
<th style="font-weight:bold; text-align:left;" colspan="4">#country_name#</th>
</tr>
<cfoutput>
<tr>
<td valign="top" class="text3">#Replace(ACTOR_CODENUMBER, Chr(13) & Chr(10), "<br>", "ALL")# </td>
<td valign="top" class="text3">#Replace(ACTOR_CODE, Chr(13) & Chr(10), "<br>", "ALL")# </td>
<td valign="top" class="text3">#Replace(ACTOR_NAME, Chr(13) & Chr(10), "<br>", "ALL")# </td>
<td valign="top" class="text3"><cfmodule template="../custom_tags//get_message.cfm" keyName="#ACTIVE_display(qryCategoryUrl.ACTIVE)#"></td>
</tr>
</cfoutput>
</cfoutput>
</tbody>
</table>
Instead of:
.parents('.country').siblings().fadeOut('fast');
Try this:
.closest('.country').nextUntil('.country').fadeOut('fast');
And of course, apply the same change to the .fadeIn(). You might also look into .fadeToggle()docs.
Here's a (reduced) example: http://jsfiddle.net/redler/5sqJz/. While it doesn't affect the example, presumably you would be setting the initial state of those detail rows as hidden.
woah all that cfmodule usage, cfmodule can be a memory hog.
Although what I always recommend is that people try their pages in whatever browser, and use the SelectorGadget bookmarklet at http://www.selectorgadget.com/
This makes it easier to test and check the correct selector, for your app needs.