I need the last 9 numbers of a list and I'm sure there is a way to do it with slicing, but I can't seem to get it. I can get the first 9 like this:
num_list[0:9]
You can use negative integers with the slicing operator for that. Here's an example using the python CLI interpreter:
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
>>> a[-9:]
[4, 5, 6, 7, 8, 9, 10, 11, 12]
the important line is a[-9:]
a negative index will count from the end of the list, so:
num_list[-9:]
Slicing
Python slicing is an incredibly fast operation, and it's a handy way to quickly access parts of your data.
Slice notation to get the last nine elements from a list (or any other sequence that supports it, like a string) would look like this:
num_list[-9:]
When I see this, I read the part in the brackets as "9th from the end, to the end." (Actually, I abbreviate it mentally as "-9, on")
Explanation:
The full notation is
sequence[start:stop:step]
But the colon is what tells Python you're giving it a slice and not a regular index. That's why the idiomatic way of copying lists in Python 2 is
list_copy = sequence[:]
And clearing them is with:
del my_list[:]
(Lists get list.copy and list.clear in Python 3.)
Give your slices a descriptive name!
You may find it useful to separate forming the slice from passing it to the list.__getitem__ method (that's what the square brackets do). Even if you're not new to it, it keeps your code more readable so that others that may have to read your code can more readily understand what you're doing.
However, you can't just assign some integers separated by colons to a variable. You need to use the slice object:
last_nine_slice = slice(-9, None)
The second argument, None, is required, so that the first argument is interpreted as the start argument otherwise it would be the stop argument.
You can then pass the slice object to your sequence:
>>> list(range(100))[last_nine_slice]
[91, 92, 93, 94, 95, 96, 97, 98, 99]
islice
islice from the itertools module is another possibly performant way to get this. islice doesn't take negative arguments, so ideally your iterable has a __reversed__ special method - which list does have - so you must first pass your list (or iterable with __reversed__) to reversed.
>>> from itertools import islice
>>> islice(reversed(range(100)), 0, 9)
<itertools.islice object at 0xffeb87fc>
islice allows for lazy evaluation of the data pipeline, so to materialize the data, pass it to a constructor (like list):
>>> list(islice(reversed(range(100)), 0, 9))
[99, 98, 97, 96, 95, 94, 93, 92, 91]
The last 9 elements can be read from left to right using numlist[-9:], or from right to left using numlist[:-10:-1], as you want.
>>> a=range(17)
>>> print a
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
>>> print a[-9:]
[8, 9, 10, 11, 12, 13, 14, 15, 16]
>>> print a[:-10:-1]
[16, 15, 14, 13, 12, 11, 10, 9, 8]
Here are several options for getting the "tail" items of an iterable:
Given
n = 9
iterable = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Desired Output
[2, 3, 4, 5, 6, 7, 8, 9, 10]
Code
We get the latter output using any of the following options:
from collections import deque
import itertools
import more_itertools
# A: Slicing
iterable[-n:]
# B: Implement an itertools recipe
def tail(n, iterable):
"""Return an iterator over the last *n* items of *iterable*.
>>> t = tail(3, 'ABCDEFG')
>>> list(t)
['E', 'F', 'G']
"""
return iter(deque(iterable, maxlen=n))
list(tail(n, iterable))
# C: Use an implemented recipe, via more_itertools
list(more_itertools.tail(n, iterable))
# D: islice, via itertools
list(itertools.islice(iterable, len(iterable)-n, None))
# E: Negative islice, via more_itertools
list(more_itertools.islice_extended(iterable, -n, None))
Details
A. Traditional Python slicing is inherent to the language. This option works with sequences such as strings, lists and tuples. However, this kind of slicing does not work on iterators, e.g. iter(iterable).
B. An itertools recipe. It is generalized to work on any iterable and resolves the iterator issue in the last solution. This recipe must be implemented manually as it is not officially included in the itertools module.
C. Many recipes, including the latter tool (B), have been conveniently implemented in third party packages. Installing and importing these these libraries obviates manual implementation. One of these libraries is called more_itertools (install via > pip install more-itertools); see more_itertools.tail.
D. A member of the itertools library. Note, itertools.islice does not support negative slicing.
E. Another tool is implemented in more_itertools that generalizes itertools.islice to support negative slicing; see more_itertools.islice_extended.
Which one do I use?
It depends. In most cases, slicing (option A, as mentioned in other answers) is most simple option as it built into the language and supports most iterable types. For more general iterators, use any of the remaining options. Note, options C and E require installing a third-party library, which some users may find useful.
So to display a small bargraph using Django and Chart.js I constructed the following query on my model.
views.py
class BookingsView(TemplateView):
template_name = 'orders/bookings.html'
def get_context_data(self, **kwargs):
today = datetime.date.today()
seven_days = today + datetime.timedelta(days=7)
bookings = dict(Booking.objects.filter(start_date__range = [today, seven_days]) \
.order_by('start_date') \
.values_list('start_date') \
.annotate(Count('id')))
# Edit set default for missing dictonairy values
for dt in range(7):
bookings.setdefault(today+datetime.timedelta(dt), 0)
# Edit reorder the dictionary before using it in a template
context['bookings'] = OrderedDict(sorted(bookings.items()))
This led me to the following result;
# Edit; after setting the default on the dictionary and the reorder
{
datetime.date(2019, 8, 6): 12,
datetime.date(2019, 8, 7): 12,
datetime.date(2019, 8, 8): 0,
datetime.date(2019, 8, 9): 4,
datetime.date(2019, 8, 10): 7,
datetime.date(2019, 8, 11): 0,
datetime.date(2019, 8, 12): 7
}
To use the data in a chart I would like to add the missing start_dates into the dictionary but I'm not entirely sure how to do this.
So I want to update the dictionary with a value "0" for the 8th and 11th of August.
I tried to add the for statement but I got the error;
"'datetime.date' object is not iterable"
Like the error says, you can not iterate over a date object, so for start_date in seven_days will not work.
You can however use a for loop here like:
for dt in range(7):
bookings.setdefault(today+datetime.timedelta(dt), 0)
A dictionary has a .setdefault(..) function that allows you to set a value, given the key does not yet exists in the dicionary. This is thus shorter and more efficient than first checking if the key exists yourself since Python does not have to perform two lookups.
EDIT: Since python-3.7 dictionaries are ordered in insertion order (in the CPython version of python-3.6 that was already the case, but seen as an "implementation detail"). Since python-3.7, you can thus sort the dictionaries with:
bookings = dict(sorted(bookings.items()))
Prior to python-3.7, you can use an OrderedDict [Python-doc]:
from collections import OrderedDict
bookings = OrderedDict(sorted(bookings.items()))
I have got some code from git and i was trying to understand it, here's a part of it, i didn't understand the second line of this code
G = nx.Graph(network_map) # Graph for the whole network
components = list(nx.connected_components(G))
Whats does this function connected_components do? I went through the documentation and couldn't understand it properly.
nx.connected_components(G) will return "A generator of sets of nodes, one for each component of G". A generator in Python allows iterating over values in a lazy manner (i.e., will generate the next item only when necessary).
The documentation provides the following example:
>>> import networkx as nx
>>> G = nx.path_graph(4)
>>> nx.add_path(G, [10, 11, 12])
>>> [len(c) for c in sorted(nx.connected_components(G), key=len, reverse=True)]
[4, 3]
Let's go through it:
G = nx.path_graph(4) - create the directed graph 0 -> 1 -> 2 -> 3
nx.add_path(G, [10, 11, 12]) - add to G: 10 -> 11 -> 12
So, now G is a graph with 2 connected components.
[len(c) for c in sorted(nx.connected_components(G), key=len, reverse=True)] - list the sizes of all connected components in G from the largest to smallest. The result is [4, 3] since {0, 1, 2, 3} is of size 4 and {10, 11, 12} is of size 3.
So just to recap - the result is a generator (lazy iterator) over all connected components in G, where each connected component is simply a set of nodes.
I am creating a webcrawler and in the first step, I need to crawl a website and extract all its link however my code is not looping to extract. I tried using append but that results in a list of lists. I'm trying to use foo and I get an error. Any help would be appreciated. Thank you
from urllib2 import urlopen
import re
def get_all_urls(url):
get_content = urlopen(url).read()
url_list = []
find_url = re.compile(r'a\s?href="(.*)">')
url_list_temp = find_url.findall(get_content)
for i in url_list_temp:
url_temp = url_list_temp.pop()
source = 'http://blablabla/'
url = source + url_temp
url_list.append(url)
#print url_list
return url_list
def web_crawler(seed):
tocrawl = [seed]
crawled = []
i = 0
while i < len(tocrawl):
page = tocrawl.pop()
if page not in crawled:
#tocrawl.append(get_all_urls(page))
foo = (get_all_urls(page))
tocrawl = foo
crawled.append(page)
if not tocrawl:
break
print crawled
return crawled
First of all, it's a bad idea to parse HTML with regular expressions, for all reasons listed:
here: Python regular expression for HTML parsing (BeautifulSoup)
here: Python regex to match HTML
here: regexp python with parsing html page
and so on..
You should use an HTML parser to to the job. Python provides one in its standard library: HTMLParser, but you could also use BeautifulSoup or even lxml. I tend to favor BeautifulSoup, for its nice API.
Now, back to your problem, you're modifying the list you're iterating on:
for i in url_list_temp:
url_temp = url_list_temp.pop()
source = 'http://blablabla/'
...
This is bad, because it metaphorically amounts to sawing a branch you're sitting on.
Also, you do not seem to require this removal, as there is no condition for which an url must be removed or not.
Finally, you get an error after using append because, as you said, it creates a list of list. You should use extend instead.
>>> l1 = [1, 2, 3]
>>> l2 = [4, 5, 6]
>>> l1.append(l2)
>>> l1
[1, 2, 3, [4, 5, 6]]
>>> l1 = [1, 2, 3]
>>> l1.extends(l2)
>>> l1
[1, 2, 3, 4, 5, 6]
NB: Take a look at http://www.pythonforbeginners.com/python-on-the-web/web-scraping-with-beautifulsoup/ for additional help with scraping with beautifulsoup
I'm trying to fill out the registration for a website with python mechanize. Everything is going well but I can't figure out how to do the select controls. For example, if I'm picking my birthday month, here's the form that I need to fill out:
<SelectControl(mm=[*, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])>
I've viewed all the answers on stackoverflow already and they all seem to be some variation like this:
br.find_control(name="mm").value = ["0"]
or
form["mm"] = ["1"]
The problem here is that it gives me a error ItemNotFoundError: insufficient items with name '0'
item = br.find_control(name="mm" type="select").get("12")
item.selected = True
Nvm I just needed to do br.form['mm'] = ["1"] <--- I selected this but could have picked any of the values they allowed.
I have used all of the following:
br['mm'] = ['9']
br['mm'] = ['9',]
br.form['mm'] = ['9']
br.form['mm'] = ['9',]
I seem to remember one case where the comma was mandatory.