Beautifulsoup get the value of hrefs separated by commas - python-2.7

sup2 = soup2.find_all("div", {"class": "xxxxxxx"})
When i use find_all over a div i get the following result
<div class="xxxxxxx" data-reactid="37">aa , bb </div>
how to get href between these two commas

Iterate over the Tag elements in sup2 and select the 'href' attribute, eg:
hrefs = [a['href'] for tag in sup2 for a in tag.find_all('a')]
Using css selectors:
hrefs = [tag['href'] for tag in soup2.select("div.xxxxxxx a")]

Related

How to extract href value from an html response in postman

I have been trying to figure out how to extract the value from of a href attribute from an html response and have not had any luck.
I have the following response:
<body id="bodytag" class="taskTab">
<script></script>
<div id="downloads">
<div class="files">Files.pdf
</div>
</div>
</body>
I have gathered that I can use cheerio to load the html and potentially get the value but the only thing that I have managed to get is the text Files.pdf. What I need is the path in the href attribute so that I can store it in a variable to use in a sub0sequent request.
This is just one example of what I have tried:
const $ = cheerio.load(pm.response.text());
console.log($('.files', '#downloads').text());
I also tried to use xpath without any luck. Any help would be greatly appreciated.
Try
const $ = cheerio.load(pm.response);
console.log($('.files').attr('href'));
This should return you the href of the element. Documentation here
I was close:
const $ = cheerio.load(pm.response.text());
var href = $('.files a').attr('href');
pm.environment.set('downloadLink', href);

Yesod Hamlet breaks HTML by replacing single quotes with double quotes

I have some HTML code that I'm using in Hamlet:
<div .modal-card .card data-options='{"valueNames": ["name"]}' data-toggle="lists">
Notice that the single quotes for data-options allows the use of double quotes inside the string.
The problem is that when Hamlet renders the page, Hamlet puts " around the ' and so the HTML is broken:
<div class="modal-card card" data-options="'{" valuenames":"="" ["name"]}'="" data-toggle="lists">
Some external JS library plugin code runs, it tries to parse the JSON inside data-options and fails.
How can I tell Hamlet to include a literal string?
I've tried various combinations of:
let theString = "{\"valueNames\": [\"name\"]}"
let theString2 = "data-options='{\"valueNames\": [\"name\"]}'"
etc
And in the hamlet file:
<div .modal-card .card data-options='#{ preEscapedText theString }' data-toggle="lists">
or
<div .modal-card .card #{ preEscapedText theString2 } data-toggle="lists">
But all attempts produce invalid HTML or invalid JSON inside the string.
How can I instruct Hamlet to simply include a literal string in the output HTML?
Update:
Tried more things, no result.
The string2 example doesn't work because Hamlet seems to think that I'm trying to set id="{" as per https://www.yesodweb.com/book/shakespearean-templates#shakespearean-templates_attributes
Why not render the JSON escaped (" become ") and “handle” the quotes later when parsing?
Interpolate in Hamlet:
<div #the-modal .modal-card .card data-options='#{theString}' data-toggle="lists">
Parse the data attribute as JSON:
let json = document.getElementById("the-modal").getAttribute("data-options");
let opts = JSON.parse(json); // At least in Chrome, it works!
As for theString2 alternative, you can also interpolate attributes in Hamlet using a tuple or list of tuples and the star symbol:
let dataOptions = ("data-options", "{\"valueNames\": [\"name\"]}") :: (Text, Text)
...
<div #the-modal .modal-card .card *{dataOptions} data-toggle="lists">

Selenium XPATH how to get text from Span tag underneath the input id tag

I have the following html snippet:
<div>
<span class="gwt-InlineLabel myinlineblock" style="display: none;" aria-hidden="true">Go to row</span>
<input id="data_configuration_view_preview_ib_row" class="gwt-IntegerBox marginleft red" type="text" size="8"/>
<span class="gwt-InlineLabel error myinlineblock marginleft" style="width: 7ex;" aria-hidden="false">Error!</span>
</div>
I am trying to locate the text Error!
I start from the input id tag as that has an ID. I am not able to go down to the span tag which has the text Error!
My xpath to start from the id is:
//input[#id="data_configuration_view_preview_ib_row"]
I have tried:
//input[#id="data_configuration_view_preview_ib_row"]/span[contains(text(), "Error!")]
What CSS or XPath can I use to locate the text Error!?
I have managed to locate the element with the following Xpath:
//input[#id="data_configuration_view_preview_ib_row"]//following-sibling::span[contains(text(), "Error!")]
Thanks, Riaz
You can use cssSelector as :
using with error class
span.error
using with id data_configuration_view_preview_ib_row
#data_configuration_view_preview_ib_row + span.error
OR you can use xpath as :
using with error class
//span[contains(#class, 'error')]
using with preceding id data_configuration_view_preview_ib_row
//span[preceding::*[#id = 'data_configuration_view_preview_ib_row']]
using with preceding-sibling id data_configuration_view_preview_ib_row
//span[preceding-sibling::*[#id = 'data_configuration_view_preview_ib_row']]
Hope it helps..:)
Use the axis following-sibling to get the next element on the same level:
//input[#id="data_configuration_view_preview_ib_row"]/following-sibling::span
You could also use a CSS selector:
#data_configuration_view_preview_ib_row + span

How to find tags with certain children attributes? - BeautifulSoup 4

I'm new to Python and BeautifulSoup, how would I search certain tags whose children have certain attributes?
For example,
<section ...>
<a href="URL" ...>
<h4 itemprop="name">ABC</h4>
<p class="open"></p>
</a>
</section>
I hope if I could get all names ('ABC') and urls("URL") if class="open". I can get all sections by
soup.findAll(lambda tag: tag.name="section")
But I don't know how to add other conditions since tag.children is a listiterator.
Because you're looking for certain attributes with the <p> tags, I would search for only <p> tags with attrs={"class": "open"} and then select the parent (which is the <a> tag) and gather the rest of the information from that.
soup = BeautifulSoup(data, "html.parser")
items = soup.find_all("p", attrs={"class": "open"})
for item in items:
name = item.parent.h4.text
url = item.parent.attrs.get('href', None)
print("{} : {}".format(name, url))

Selenium Xpath how do i get the value of an html id tag using starts-with

I have a html with a div tag which has an id attribute. I would like to get the value of the id attribute using starts-with in Xpath.
Here is an HTML snippet:
<div id="operations_edit_process_list_task_3">
<span/>
<span>
<span class=" myinlineblock" title="Clean"
style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;">
<select tabindex="-1">
</span>
</span>
Using Xpath starts with I would like to get the id value into a variable so i can use it later in my code.
The number 3 at the end is dynamic, if i can use starts-with then i can get the id value out.
I tried the following Xpath, it does not work:
//div[starts-with[#div="operations_edit_process_list_task"]]
What is the correct syntax?
Thanks,
Riaz
I'm not sure how to do it with xpath, but with css_selector you can do
element = driver.find_element_by_css_selector("[id^='operations_edit_process_list_task']")
id = element.get_attribute("id")
# do whatever you want with the id value
I think the correct syntax for xpath is
//*[starts-with(div, "operations_edit_process_list_task")]
To find an element with an XPath where the id starts with :
"//*[starts-with(#id,'beginning of id')]"
To find an element with a CSS selector where the id starts with :
"[id^='beginning of id']"