So I have on my page this line of code:
<cfoutput><li><img src="#ADDSlide#" /></li>
<li><img src="#TAPSlide#" /></li>
<li><img src="#ATHSlide#" /></li>
<li><img src="#STASlide#" /></li></cfoutput>
These items are generated randomly using an array. How could I further randomly order this listing of 4 tiles?
You can use the shuffle method of a java.util.Collections to do this:
<cfscript>
items = [
{"id":1, "key":"a"},
{"id":2, "key":"b"},
{"id":3, "key":"c"}
];
Collection = CreateObject("java", "java.util.Collections");
Collection.Shuffle(items);
writeDump(items);
</cfscript>
Each time you run it you'll get the items in a different order.
Hat tip to Mark Mandel who introduced it to me.
Related
<div class="col col-1-1"><h2 class="heading">Flowers</h2><ul class="icon-list"> <li class="col col-1-2 no-gutter">
<svg class="icon icon--medium">
<use xlink:href="https://"></use>
</svg>
measure 1<span class="icon-list__count">81</span> </li>
<li class="col col-1-2 no-gutter">
<svg class="icon icon--medium">
<use xlink:href="https://"></use>
</svg>
measure 2 <span class="icon-list__count">52</span> </li>
<li class="col col-1-2 no-gutter">
<svg class="icon icon--medium">
<use xlink:href="https://"></use>
</svg>
measure 3<span class="icon-list__count">29</span> </li>
</ul></div>
This is one example of a list of measures for one type of flowers. How to scrape the value of the measures and store in a python dictionary? Hope the code would be flexible to allow for the possibility that on another pager there might be measure 2 and 3 only, or measure 3 and 4 (a new measure not appearing on this page), or completely new measure 4 and 5.
New to python - would appreciate any advice.
BeautifulSoup is the best when you are scraping a more static and less dynamic website.
Try using unique identifiers present in a tag to navigate in this tree like structure. This piece of code will give you a dictionary with measure n as key and value as its value.
from bs4 import BeautifulSoup
import re
html = '<div class="col col-1-1"><h2 class="heading">Flowers</h2><ul class="icon-list"><li class="col col-1-2 no-gutter"><svg class="icon icon--medium"><use xlink:href="https://"></use></svg>measure 1<span class="icon-list__count">81</span></li><li class="col col-1-2 no-gutter"><svg class="icon icon--medium"><use xlink:href="https://"></use></svg>measure 2 <span class="icon-list__count">52</span></li><li class="col col-1-2 no-gutter"><svg class="icon icon--medium"><use xlink:href="https://"></use></svg>measure 3<span class="icon-list__count">29</span></li></ul></div>'
soup = BeautifulSoup(html,'lxml')
li_tags = soup.find_all('li') # ['measure 181', 'measure 2 52', 'measure 329']
span_tags = soup.find_all('span',class_='icon-list__count') # ['81', '52', '29']
li_list= []
for li in li_tags:
li_list.append(li.text)
measure_dict = {}
for i in range(len(li_list)):
li_list[i] = re.sub(span_tags[i].text,'',li_list[i]) #converting 'measure 181 into 'measure 1' and likewise
measure_dict[li_list[i]] = span_tags[i].text # if you want the values as integers then use int(span_tags[i].text) in this line
print(measure_dict)
#{'measure 1': '81', 'measure 2 ': '52', 'measure 3': '29'}
The code will be flexible if the identifier I have used here class = 'icon-list__count' is present in every page you access and moreover when it also contains the data that you want to scrape. So you can hope it's the same and if not you have to traverse into the html tags to find your desired data by identify them on your own.
If in case the website uses Javascript() in the place where you want to scrape then it's better to use Selenium as it's a better scraping tool for dynamic websites.
Advice:
Reading the documentation of the module is far more helpful than watching random YT videos in the long run!
Try using re module whenever you want to play with strings, it's much better than the pre-defined methods in string
I am using Django 2.2 and psql 10.8 on Ubuntu 18.04.1.
I have a collection of items that I want to iterate over and render the results in a template.
They are expected to be rendered in exactly the order that they have been created in the database (by pk). However, they seem to be rendered in a random order instead.
The problem does not occur when using sqlite.
I have not found the solution for this problem; reverse iterating through the objects results also not in the desired behaviour. A simple portion of the code would be:
<div class="row">
<ul class="tabs">
{% for category in categories %}
<li class="tab col s3">{{category}}</li>
{% endfor %}
</ul>
</div>
Say I have created four categories A, B, C, D;
when using sqlite in dev, they would be rendered in that order on the frontend page.
With psql, I am seeing an unordered result.
Any help in the right direction is appreciated!
I have the follow html structure:
<div id="mod_imoveis_result">
<a class="mod_res" href="#">
<div id="g-img-imo">
<div class="img_p_results">
<img src="/img/image.jpg">
</div>
</div>
</a>
</div>
This is a product result page, so is 7 blocks for page with that mod_imoveis_result id. I need get image src from all blocks. Each page have 7 blocks like above.
I try:
import scrapy
from scrapy.pipelines.images import ImagesPipeline
from scrapy.exceptions import DropItem
class QuotesSpider(scrapy.Spider):
name = "magichat"
start_urls = ['https://magictest/results']
def parse(self, response):
for bimb in response.xpath('//div[#id="mod_imoveis_result"]'):
yield {
'img_url': bimb.xpath('//div[#id="g-img-imo"]/div[#class="img_p_results"]/img/#src').extract_first(),
'text': bimb.css('#titulo_imovel::text').extract_first()
}
next_page = response.xpath('//a[contains(#class, "num_pages") and contains(#class, "pg_number_next")]/#href').extract_first()
if next_page is not None:
yield response.follow(next_page, self.parse)
I can't understand why text target is ok, but img_url get first result for all blocks for page. Example: each page have 7 blocks, so 7 texts and 7 img_urls, but, img_urls is the same for all other 6 blocks, and text is right, why?
If i change extract_first to extract i get others urls, but the result come in the same brackts. Example:
text: 1aaaa
img_url : a,b,c,d,e,f,g
but i need
text: 1aaaa
img_url: a
text: 2aaaa
img_url: b
What is wrong with that loop?
// selects the root node i.e. <div id="mod_imoveis_result"> of for node you're trying to get which is div[#id="g-img-imo"] so the two tage that were missed it the reason of NO DATA
**. **selects the current node which is mentioned in your xpath irrespective of how deep it is.
In your case xpath('./div[#id="g-img-imo"]/div[#class="img_p_results"]/img/#src') denotes selection from root node i.e. from arrow
<div id="mod_imoveis_result">
<a class="mod_res" href="#">
---> <div id="g-img-imo">
<div class="img_p_results">
<img src="/img/image.jpg">
</div>
</div>
</a>
</div>
I hope you i made it clear.
If all your classes have separate div names, in your case different class tag, then you can directly call image div and extract image URL.
//*[#class="img_p_results"]/img/#src
I want to add to the photo the attributes "data-full" and "data-thumb" containing the image address. So for example:
<img data-full="images/photo.jpg" data-thumb="images/photo.jpg" src="images/photo.jpg">
Unfortunately, my attempts end in a failure.
Please help me.
If you use jquery then you can do it like this:
This is your html
<img id="myPhoto" src="images/photo.jpg">
Then in your jquery
$(#myPhoto).attr('data-full', 'images/photo.jpg');
$(#myPhoto).attr('data-thumb', 'images/photo.jpg');
I apologize for the delay with the response. I thought I was going to be notified that someone wrote something in the subject.
I want to implement https://github.com/HemantNegi/jquery.sumogallery to my website.
"My attempt end in a failure." That means I can not write well.
No. My html is:
...
<li><img src="images/o6/photo.jpg"></li>
<li><img src="images/o7/photo2.png"></li>
<li><img src="images/23/image1245.jpg"></li>
...
Image addresses are different because they are from different directories.
I would like to receive the result:
...
<li><img data-full="images/o6/photo.jpg" data-thumb="images/o6/photo.jpg" src="images/o6/photo.jpg"></li>
<li><img data-full="images/o7/photo2.png" data-thumb="images/o7/photo2.png" src="images/o7/photo2.png"></li>
<li><img data-full="images/23/image1245.jpg" data-thumb="images/23/image1245.jpg" src="images/23/image1245.jpg"></li>
...
I'm trying to scrape sites like this one on the BBC website to grab the relevant parts of the programme listing, and I've just started using BeautifulSoup to do this.
The parts of interest start with sections like:
<li about="/programmes/p013zzsl#segment" class="segment track" id="segmentevent-p013zzsm" typeof="po:MusicSegment">
<li about="/programmes/p014003v#segment" class="segment speech alt" id="segmentevent_p014003w" typeof="po:SpeechSegment">
What I've done so far is opened the HTML as soup and then used soup.findAll(typeof=['po:MusicSegment', 'po:SpeechSegment']) to give a ResultSet of the parts I'm interested in the order in which they appear.
What I then want to do is check whether a section refers to po:MusicSegment or po:SpeechSegment in HTML that looks like:
<li about="/programmes/p01400m9#segment" class="segment track" id="segmentevent-p01400mb" typeof="po:MusicSegment"> <span class="artist-image"> <span class="depiction" rel="foaf:depiction"><img alt="" height="63" src="http://static.bbci.co.uk/programmes/2.54.3/img/thumbnail/artists_default.jpg" width="112"/></span> </span> <script type="text/javascript"> window.programme_data.tracklist.push({ segment_event_pid : "p01400mb", segment_pid : "p01400m9", playlist : "http://www.bbc.co.uk/programmes/p01400m9.emp" }); </script> <h3> <span rel="mo:performer"> <span class="artist no-image" property="foaf:name" typeof="mo:MusicArtist">Mala</span> </span> <span class="title" property="dc:title">Calle F</span> </h3></li>
I want to access the typeof attribute associated with <li>, but if this chunk of HTML (as a BS4 tag) is called section and I enter section.li, it returns None.
Note that if I do section.img instead, I get something back:
<img alt="" height="63" src="http://static.bbci.co.uk/programmes/2.54.3/img/thumbnail/artists_default.jpg" width="112"/>
and I could then do, e.g. section.img['height'] to get back u'63'
What I want is something analogous for the section.li part, so section.li['typeof'] to give me po:MusicSegment or po:SpeechSegment
Of course, I could simply convert each result to text and then do a simple string search, but searching by attribute seems more elegant.
I'd iterate over the list returned by findAll:
soup = BeautifulSoup('<li about="/programmes/p013zzsl#segment" class="segment track" id="segmentevent-p013zzsm" typeof="po:MusicSegment"><li about="/programmes/p014003v#segment" class="segment speech alt" id="segmentevent_p014003w" typeof="po:SpeechSegment">')
for elem in soup.findAll(typeof=['po:MusicSegment', 'po:SpeechSegment']):
print elem['typeof']
returns
po:MusicSegment
po:SpeechSegment
and then conditionally perform your other tasks:
if elem['typeof'] == 'po:MusicSegment'
do.something()
elif elem['typeof'] == 'po:SpeechSegment':
do.something_else()