Beautifulsoup get the value of hrefs separated by commas

Beautifulsoup get the value of hrefs separated by commas - python-2.7

sup2 = soup2.find_all("div", {"class": "xxxxxxx"})
When i use find_all over a div i get the following result
<div class="xxxxxxx" data-reactid="37">aa , bb </div>
how to get href between these two commas

Iterate over the Tag elements in sup2 and select the 'href' attribute, eg:
hrefs = [a['href'] for tag in sup2 for a in tag.find_all('a')]
Using css selectors:
hrefs = [tag['href'] for tag in soup2.select("div.xxxxxxx a")]

Related

How to extract href value from an html response in postman

I have been trying to figure out how to extract the value from of a href attribute from an html response and have not had any luck.
I have the following response:
<body id="bodytag" class="taskTab">
<script></script>
<div id="downloads">
<div class="files">Files.pdf
</div>
</div>
</body>
I have gathered that I can use cheerio to load the html and potentially get the value but the only thing that I have managed to get is the text Files.pdf. What I need is the path in the href attribute so that I can store it in a variable to use in a sub0sequent request.
This is just one example of what I have tried:
const $ = cheerio.load(pm.response.text());
console.log($('.files', '#downloads').text());
I also tried to use xpath without any luck. Any help would be greatly appreciated.

Try
const $ = cheerio.load(pm.response);
console.log($('.files').attr('href'));
This should return you the href of the element. Documentation here

I was close:
const $ = cheerio.load(pm.response.text());
var href = $('.files a').attr('href');
pm.environment.set('downloadLink', href);

Yesod Hamlet breaks HTML by replacing single quotes with double quotes

I have some HTML code that I'm using in Hamlet:
<div .modal-card .card data-options='{"valueNames": ["name"]}' data-toggle="lists">
Notice that the single quotes for data-options allows the use of double quotes inside the string.
The problem is that when Hamlet renders the page, Hamlet puts " around the ' and so the HTML is broken:
<div class="modal-card card" data-options="'{" valuenames":"="" ["name"]}'="" data-toggle="lists">
Some external JS library plugin code runs, it tries to parse the JSON inside data-options and fails.
How can I tell Hamlet to include a literal string?
I've tried various combinations of:
let theString = "{\"valueNames\": [\"name\"]}"
let theString2 = "data-options='{\"valueNames\": [\"name\"]}'"
etc
And in the hamlet file:
<div .modal-card .card data-options='#{ preEscapedText theString }' data-toggle="lists">
or
<div .modal-card .card #{ preEscapedText theString2 } data-toggle="lists">
But all attempts produce invalid HTML or invalid JSON inside the string.
How can I instruct Hamlet to simply include a literal string in the output HTML?
Update:
Tried more things, no result.
The string2 example doesn't work because Hamlet seems to think that I'm trying to set id="{" as per https://www.yesodweb.com/book/shakespearean-templates#shakespearean-templates_attributes

Why not render the JSON escaped (" become ") and “handle” the quotes later when parsing?
Interpolate in Hamlet:
<div #the-modal .modal-card .card data-options='#{theString}' data-toggle="lists">
Parse the data attribute as JSON:
let json = document.getElementById("the-modal").getAttribute("data-options");
let opts = JSON.parse(json); // At least in Chrome, it works!
As for theString2 alternative, you can also interpolate attributes in Hamlet using a tuple or list of tuples and the star symbol:
let dataOptions = ("data-options", "{\"valueNames\": [\"name\"]}") :: (Text, Text)
...
<div #the-modal .modal-card .card *{dataOptions} data-toggle="lists">

Selenium XPATH how to get text from Span tag underneath the input id tag

I have the following html snippet:
<div>
<span class="gwt-InlineLabel myinlineblock" style="display: none;" aria-hidden="true">Go to row</span>
<input id="data_configuration_view_preview_ib_row" class="gwt-IntegerBox marginleft red" type="text" size="8"/>
<span class="gwt-InlineLabel error myinlineblock marginleft" style="width: 7ex;" aria-hidden="false">Error!</span>
</div>
I am trying to locate the text Error!
I start from the input id tag as that has an ID. I am not able to go down to the span tag which has the text Error!
My xpath to start from the id is:
//input[#id="data_configuration_view_preview_ib_row"]
I have tried:
//input[#id="data_configuration_view_preview_ib_row"]/span[contains(text(), "Error!")]
What CSS or XPath can I use to locate the text Error!?
I have managed to locate the element with the following Xpath:
//input[#id="data_configuration_view_preview_ib_row"]//following-sibling::span[contains(text(), "Error!")]
Thanks, Riaz

You can use cssSelector as :
using with error class
span.error
using with id data_configuration_view_preview_ib_row
#data_configuration_view_preview_ib_row + span.error
OR you can use xpath as :
using with error class
//span[contains(#class, 'error')]
using with preceding id data_configuration_view_preview_ib_row
//span[preceding::*[#id = 'data_configuration_view_preview_ib_row']]
using with preceding-sibling id data_configuration_view_preview_ib_row
//span[preceding-sibling::*[#id = 'data_configuration_view_preview_ib_row']]
Hope it helps..:)

Use the axis following-sibling to get the next element on the same level:
//input[#id="data_configuration_view_preview_ib_row"]/following-sibling::span
You could also use a CSS selector:
#data_configuration_view_preview_ib_row + span

How to find tags with certain children attributes? - BeautifulSoup 4

I'm new to Python and BeautifulSoup, how would I search certain tags whose children have certain attributes?
For example,
<section ...>
<a href="URL" ...>
<h4 itemprop="name">ABC</h4>
<p class="open"></p>
</a>
</section>
I hope if I could get all names ('ABC') and urls("URL") if class="open". I can get all sections by
soup.findAll(lambda tag: tag.name="section")
But I don't know how to add other conditions since tag.children is a listiterator.

Because you're looking for certain attributes with the <p> tags, I would search for only <p> tags with attrs={"class": "open"} and then select the parent (which is the <a> tag) and gather the rest of the information from that.
soup = BeautifulSoup(data, "html.parser")
items = soup.find_all("p", attrs={"class": "open"})
for item in items:
name = item.parent.h4.text
url = item.parent.attrs.get('href', None)
print("{} : {}".format(name, url))

Selenium Xpath how do i get the value of an html id tag using starts-with

I have a html with a div tag which has an id attribute. I would like to get the value of the id attribute using starts-with in Xpath.
Here is an HTML snippet:
<div id="operations_edit_process_list_task_3">
<span/>
<span>
<span class=" myinlineblock" title="Clean"
style="white-space:nowrap;overflow:hidden;text-overflow:ellipsis;empty-cells:show;">
<select tabindex="-1">
</span>
</span>
Using Xpath starts with I would like to get the id value into a variable so i can use it later in my code.
The number 3 at the end is dynamic, if i can use starts-with then i can get the id value out.
I tried the following Xpath, it does not work:
//div[starts-with[#div="operations_edit_process_list_task"]]
What is the correct syntax?
Thanks,
Riaz

I'm not sure how to do it with xpath, but with css_selector you can do
element = driver.find_element_by_css_selector("[id^='operations_edit_process_list_task']")
id = element.get_attribute("id")
# do whatever you want with the id value
I think the correct syntax for xpath is
//*[starts-with(div, "operations_edit_process_list_task")]

To find an element with an XPath where the id starts with :
"//*[starts-with(#id,'beginning of id')]"
To find an element with a CSS selector where the id starts with :
"[id^='beginning of id']"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Beautifulsoup get the value of hrefs separated by commas - python-2.7

sup2 = soup2.find_all("div", {"class": "xxxxxxx"}) When i use find_all over a div i get the following result <div class="xxxxxxx" data-reactid="37">aa , bb </div> how to get href between these two commas

Iterate over the Tag elements in sup2 and select the 'href' attribute, eg: hrefs = [a['href'] for tag in sup2 for a in tag.find_all('a')] Using css selectors: hrefs = [tag['href'] for tag in soup2.select("div.xxxxxxx a")]

Related

How to extract href value from an html response in postman

Yesod Hamlet breaks HTML by replacing single quotes with double quotes

Selenium XPATH how to get text from Span tag underneath the input id tag

How to find tags with certain children attributes? - BeautifulSoup 4

Selenium Xpath how do i get the value of an html id tag using starts-with

Categories

Resources