I am using the Django application to generate some graphs using Plotly so
I tried to generate a pdf file of the following plots of
my HTML page is:
<center> <p class="lead"><strong><em> All Plots </em></strong></p></center>
<div class="row">
<div class="col-md-6" >
{{ sunburst|safe}}
</div>
<div class="col-md-6" >
{{ sunburst2|safe}}
</div>
</div>
and my url.py :
path('export_pdf/', views.export_pdf, name='export_pdf'),
and the views.py;
def export_pdf(request):
df = px.data.gapminder().query("country=='Canada'")
fig = px.line(df, x="year", y="lifeExp", title='Life expectancy in Canada')
df2 = px.data.gapminder().query("continent=='Oceania'")
fig2 = px.line(df2, x="year", y="lifeExp", color='country')
chart = fig.to_html()
chart2 = fig2.to_html()
context= {
'sunburst':chart,
'sunburst2':chart2,
}
return render(request, 'Home/Reporting/part_pdf.html',context)
I tried some code from this page but I can't generate the file any help?
https://plotly.com/python/v3/pdf-reports/
are there any other methods for doing that (html page with plots and tables)?
All my best
I searched high and low and could not find anything. Hacked together a solution that worked very well for me:
import plotly.graph_objs as go
import plotly.io as pio
labels = ['Alice','Bob','Carl']
vals = [2,5,4]
data = [go.Bar(x=vals, y=labels, orientation='h')]
layout = go.Layout(margin_pad=10)
fig = go.Figure(data=data,layout=layout)
svg = pio.to_image(fig, format="svg")
context["svg"] = svg.decode("utf-8")
and then in the template something like this
<div>
{{ svg|safe }}
</div>
The first solution was the one you referenced as well. The quality was very bad though. This solution is a lot more crisp.
I used it with weasyprint, but I'm sure it will work with other solutions as well.
Related
With django_filters, I have my filterset, everything works fine it's just the display that I am stuck with (box width).
As you can see below, I have changed the "size" attribute of some of the filter options - because the default is too wide. But the "rating" one, which is a NumberInput, doesn't work for some reason. The "size" attribute works for TextInput but not NumberInput.
I want to change the size or rather the width of the NumberInput box that will be displayed in the template (see template pictures below).
Can anyone help? Thanks in advance!
class ReviewFilter(django_filters.FilterSet):
comments = CharFilter(field_name='comments', lookup_expr='icontains', label="Comments ", widget=TextInput(attrs= {'size': 15 } ))
role_title = CharFilter(field_name='role_title', lookup_expr='icontains', label="Role title ", widget=TextInput(attrs={'size': 15 } ))
date_range = DateRangeFilter(field_name="date_created", label=" Posted date ")
**rating = CharFilter(field_name="rating", lookup_expr='gte', label="Minimum rating ", widget=NumberInput(attrs={'size': 1 } ))**
class Meta:
model = Review1
fields = '__all__'
exclude = {'recruiter', 'date_created', 'user'}
I have this:
Screenshot - Look at "Minimum rating" box width
But I want this:
Screenshot with filter search bar - Look at "Minimum rating" box width
I found a solution in the thread below by using the attribute - "style" : 'width6ch'
Apparently NumberInput doesn't have a working "size" attribute in HTML yet.
Solution to my problem
You cannot directly style the django-filters search fields. I was stuck with the same problem and here is my way of solving it.
Step 0 : intall widget_tweaks by pip install django-widget-tweaks
step 1 : add widget_tweaks to INSTALLED_APPS in your settings.py
step 2 : add {% load widget_tweaks %} into your template.html
step 3 : style individual elements in template
<div class="form-group col-sm-4 col-md-3">
{{ filter.form.comments.label_tag }}
{% render_field filter.form.comments class="form-control" %}
</div>
<div class="form-group col-sm-4 col-md-3">
{{ filter.form.role_title.label_tag }}
{% render_field filter.form.role_title class="form-control" %}
</div>
this way you can style individual form elements from filters, you can also add html tag attributes such as type or class and style them based on your needs.
you can use this link for reference. Link for Reference
trying to get specific text that is in a span class from a web page. I can get the first instance, but not sure how to iterate to get the one i need.
<div class="pricing-base__plan-pricing">
<div class="pricing-base__plan-price pricing-base__plan-price--annual">
<sup class="pricing-base__price-symbol">$</sup>
<span class="pricing-base__price-value">14</span></div>
<div class="pricing-base__plan-price pricing-base__plan-price--monthly">
<sup class="pricing-base__price-symbol">$</sup>
<span class="pricing-base__price-value">18</span>
</div>
<div class="pricing-base__term">
<div class="pricing-base__term-wrapper">
<div class="pricing-base__date">mo*</div>
</div>
I need to get the "18" in the line
18
that number changes quite often and that is what my code is looking to scrape.
You can use a class selector as shown to retrieve a list of all prices then index into that list to get annual and monthly
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://www.gotomeeting.com/meeting/pricingc')
soup = bs(r.content, 'lxml')
prices = [item.text for item in soup.select('.pricing-base__price-value')]
monthly = prices[1]
annual = prices[0]
You could also add in parent classes:
monthly = soup.select_one('.pricing-base__plan-price--monthly .pricing-base__price-value').text
annual = soup.select_one('.pricing-base__plan-price--annual .pricing-base__price-value').text
Example:
I'm using Python 3.7, Django, and BeautifulSoup. I am currnently looking for "span" elements in my document that contain the text "Review". I do so like this
html = urllib2.urlopen(req, timeout=settings.SOCKET_TIMEOUT_IN_SECONDS).read()
my_soup = BeautifulSoup(html, features="html.parser")
rev_elts = my_soup.findAll("span", text=re.compile("Review"))
for rev_elt in rev_elts:
... processing
but I'd like to add a wrinkle to where I don't want to consider those elements if they have a DIV ancestor with the class "child". So for example, I don't want to consider something like this
<div class="child">
<p>
<span class="s">Reviews</span>
...
</p>
</div>
How can I adjust my search to take this into account?
If you are using BeautifulSoup 4.7+, it has some improved CSS selector support. It handles many selectors up through CSS level 4 and a couple of custom ones like :contains(). In addition to all of that, it handles complex selectors in pseudo-classes like :not() which level 4 was supposed to handle, but they've recently pushed that support out to CSS level 5 selector support.
So in this example we will use the custom :contains selector to search for spans which contain the text Review. In addition, we will say we don't want it to match div.class span.
from bs4 import BeautifulSoup
html = """
<div>
<p><span>Review: Let's find this</span></p>
</div>
<div class="child">
<p><span>Review: Do not want this</span></p>
</div>
"""
soup = BeautifulSoup(html, features="html.parser")
spans = soup.select('span:contains(Review):not(div.child span)')
print(spans)
Output
[<span>Review: Let's find this</span>]
Depending on your case, maybe :contains isn't robust enough. In that case, you can still do something similar. Soup Sieve is the underlying library included with Beautiful Soup 4.7+, and you can import it directly to filter your regular expression returns:
from bs4 import BeautifulSoup
import soupsieve as sv
import re
html = """
<div>
<p><span>Review: Let's find this</span></p>
</div>
<div class="child">
<p><span>Review: Do not want this</span></p>
</div>
"""
soup = BeautifulSoup(html, features="html.parser")
spans = soup.find_all("span", text=re.compile("Review"))
spans = sv.filter(':not(div.child span)', spans)
print(spans)
Output
[<span>Review: Let's find this</span>]
CSS selector is the way to go in this case as #facelessuser has answered. But just in case you are wondering this can be done without using css selector as well.
You can iterate over all of an element’s parents with .parents. You could define a custom filter function which checks if any of the parents has a class of "child" and return True otherwise (in addition to all your other conditions).
from bs4 import BeautifulSoup, Tag
html="""
<div class="child">
<p><span id="1">Review</span></p>
</div>
<div>
<p><span id="2">Review</span></p>
</div>
"""
soup=BeautifulSoup(html,'html.parser')
def my_func(item):
if isinstance(item,Tag) and item.name=='span' and 'Review' in item.text:
for parent in item.parents:
if parent.has_attr('class'):
if 'child' in parent.get('class'):
return False
return True
my_spans=soup.find_all(my_func)
print(my_spans)
Outputs:
[<span id="2">Review</span>]
I have the follow html structure:
<div id="mod_imoveis_result">
<a class="mod_res" href="#">
<div id="g-img-imo">
<div class="img_p_results">
<img src="/img/image.jpg">
</div>
</div>
</a>
</div>
This is a product result page, so is 7 blocks for page with that mod_imoveis_result id. I need get image src from all blocks. Each page have 7 blocks like above.
I try:
import scrapy
from scrapy.pipelines.images import ImagesPipeline
from scrapy.exceptions import DropItem
class QuotesSpider(scrapy.Spider):
name = "magichat"
start_urls = ['https://magictest/results']
def parse(self, response):
for bimb in response.xpath('//div[#id="mod_imoveis_result"]'):
yield {
'img_url': bimb.xpath('//div[#id="g-img-imo"]/div[#class="img_p_results"]/img/#src').extract_first(),
'text': bimb.css('#titulo_imovel::text').extract_first()
}
next_page = response.xpath('//a[contains(#class, "num_pages") and contains(#class, "pg_number_next")]/#href').extract_first()
if next_page is not None:
yield response.follow(next_page, self.parse)
I can't understand why text target is ok, but img_url get first result for all blocks for page. Example: each page have 7 blocks, so 7 texts and 7 img_urls, but, img_urls is the same for all other 6 blocks, and text is right, why?
If i change extract_first to extract i get others urls, but the result come in the same brackts. Example:
text: 1aaaa
img_url : a,b,c,d,e,f,g
but i need
text: 1aaaa
img_url: a
text: 2aaaa
img_url: b
What is wrong with that loop?
// selects the root node i.e. <div id="mod_imoveis_result"> of for node you're trying to get which is div[#id="g-img-imo"] so the two tage that were missed it the reason of NO DATA
**. **selects the current node which is mentioned in your xpath irrespective of how deep it is.
In your case xpath('./div[#id="g-img-imo"]/div[#class="img_p_results"]/img/#src') denotes selection from root node i.e. from arrow
<div id="mod_imoveis_result">
<a class="mod_res" href="#">
---> <div id="g-img-imo">
<div class="img_p_results">
<img src="/img/image.jpg">
</div>
</div>
</a>
</div>
I hope you i made it clear.
If all your classes have separate div names, in your case different class tag, then you can directly call image div and extract image URL.
//*[#class="img_p_results"]/img/#src
I am using Scrapy and XPath to parse web-site in Russian language.
In this topic, alecxe suggested me how to construct the xpath expression to get the values. However, I don't understand how can I handle the case when the Param1_name is in Russian?
Here is the xpath expression:
//*[text()="Param1_name_in_russian"]/following-sibling::text()
Html snippet:
<div class="obj-params">
<div class="wrap">
<div class="obj-params-col" style="min-width:50%;">
<p>
<b>Param1_name_in_russian</b>" Param1_value"</p>
<p>
<strong>Param2_name_in_russian</strong>" Param2_value</p>
<p>
<strong>Param3_name_in_russian</strong>" Param3_value"</p>
</div>
</div>
<div class="wrap">
<div class="obj-params-col">
<p>
<b>Param4_name_in_russian</b>Param4_value</p>
<div class="inline-popup popup-hor left">
<b>Param5_name</b>
<a target="_blank" href="link">Param5_value</a></div></div>
EDITED based on comments
I assume I didn't specify properly the question since all suggested solutions didn't work for me i.e. when I tested the suggested XPath expressions in Scrapy console output was nothing. Thus, I provide more detailed information about web-site that I need to parse:
link to the web-site: link to real-estate web site
screenshot of what I need to parse:
Consider declaring your encoding at the beginning of the file as latin-1. See the documentation for a thorough explanation as to why.
I'll be using lxml instead of Scrapy below, but the logic is the same.
Code:
#!/usr/bin/env python
# -*- coding: latin-1 -*-
from lxml import html
markup = """div class="obj-params">
<div class="wrap">
<div class="obj-params-col" style="min-width:50%;">
<p>
<b>Некий текст</b>" Param1_value"</p>
<p>
<strong>Param2_name_in_russian</strong>" Param2_value</p>
<p>
<strong>Param3_name_in_russian</strong>" Param3_value"</p>
</div>
</div>
<div class="wrap">
<div class="obj-params-col">
<p>
<b>Param4_name_in_russian</b>Param4_value</p>
<div class="inline-popup popup-hor left">
<b>Param5_name</b>
<a target="_blank" href="link">Param5_value</a></div></div>"""
tree = html.fromstring(markup)
pone_val = tree.xpath(u"//*[text()='Некий текст']/following-sibling::text()")
print pone_val
Result:
['" Param1_value"']
[Finished in 0.5s]
Note that since this is a unicode string, the u at the beginning of the Xpath is necessary, same as #warwaruk's comment in your question.
Let us know if this helps.
EDIT:
Based on the site's markup, there's actually a better way to get the values. Again, using lxml and not Scrapy since the difference between the two here is just .extract() anyway. Basically, check my XPath for the name, room, square, and floor.
import requests as rq
from lxml import html
url = "http://www.lun.ua/%D0%BF%D1%80%D0%BE%D0%B4%D0%B0%D0%B6%D0%B0-%D0%BA%D0%B2%D0%B0%D1%80%D1%82%D0%B8%D1%80-%D0%BA%D0%B8%D0%B5%D0%B2"
r = rq.get(url)
tree = html.fromstring(r.text)
divs = tree.xpath("//div[#class='obj-left']")
for div in divs:
name = div.xpath("./h3/span/a/text()")[0]
details = div.xpath(".//div[#class='obj-params-col'][1]")[0]
room = details.xpath("./p[1]/text()[last()]")[0]
square = details.xpath("./p[2]/text()[last()]")[0]
floor = details.xpath("./p[3]/text()[last()]")[0]
print name.encode("utf-8")
print room.encode("utf-8")
print square.encode("utf-8")
print floor.encode("utf-8")
This doesn't print them out all well on my end (getting some [Decode error - output not utf-8]). However, I believe that encoding aside, using this approach is much better scraping practice overall.
Let us know what you think.