Pythonic way to get some rows of a matrix - list

I was thinking about a code that I wrote a few years ago in Python, at some point it had to get just some elements, by index, of a list of lists.
I remember I did something like this:
def getRows(m, row_indices):
tmp = []
for i in row_indices:
tmp.append(m[i])
return tmp
Now that I've learnt a little bit more since then, I'd use a list comprehension like this:
[m[i] for i in row_indices]
But I'm still wondering if there's an even more pythonic way to do it. Any ideas?
I would like to know also alternatives with numpy o any other array libraries.

It's worth looking at NumPy for its slicing syntax. Scroll down in the linked page until you get to "Indexing, Slicing and Iterating".

It's the clean an obvious way. So, I'd say it doesn't get more Pythonic than that.

As Curt said, it seems that Numpy is a good tool for this. Here's an example,
from numpy import *
a = arange(16).reshape((4,4))
b = a[:, [1,2]]
c = a[[1,2], :]
print a
print b
print c
gives
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]]
[[ 1 2]
[ 5 6]
[ 9 10]
[13 14]]
[[ 4 5 6 7]
[ 8 9 10 11]]

Related

What is the code to filter a list in netlogo? Changes in Netlogo 6?

The following code given in the dictionary does not work in NetLogo 6:
show filter [ i -> i < 3 ] [1 3 2]
=> [1 2]
The error message is:
ERROR: Nothing named I has been defined.
What I want to do is pathetically simple: count the number of certain items in a list. I thought filtering for the item and then counting the number of that item in the resulting list is a reasonable way of doing it. Other ways? Or how to fix the problem?
THANKS.
I copy/pasted and ran the code you posted in NetLogo 6.1.0, show filter [ i -> i < 3 ] [1 3 2], and I get the result [1 2].
If you're using NetLogo 6.0.0 you will need to put square brackets are the reporter for filter, so show filter [ [i] -> i < 3 ] [1 3 2] The 6.0.0 docs, including filter, are still online if you need them.
You can get the count, then, by doing length filter [ [i] -> i < 3 ] [ 1 3 2 ] and get 2 as the result, as expected. Or you can upgrade to 6.1.0 and do length filter [ i -> i < 3 ] [ 1 3 2 ].

Can somebody explain what does this 'nx.connected_components()' does?

I have got some code from git and i was trying to understand it, here's a part of it, i didn't understand the second line of this code
G = nx.Graph(network_map) # Graph for the whole network
components = list(nx.connected_components(G))
Whats does this function connected_components do? I went through the documentation and couldn't understand it properly.
nx.connected_components(G) will return "A generator of sets of nodes, one for each component of G". A generator in Python allows iterating over values in a lazy manner (i.e., will generate the next item only when necessary).
The documentation provides the following example:
>>> import networkx as nx
>>> G = nx.path_graph(4)
>>> nx.add_path(G, [10, 11, 12])
>>> [len(c) for c in sorted(nx.connected_components(G), key=len, reverse=True)]
[4, 3]
Let's go through it:
G = nx.path_graph(4) - create the directed graph 0 -> 1 -> 2 -> 3
nx.add_path(G, [10, 11, 12]) - add to G: 10 -> 11 -> 12
So, now G is a graph with 2 connected components.
[len(c) for c in sorted(nx.connected_components(G), key=len, reverse=True)] - list the sizes of all connected components in G from the largest to smallest. The result is [4, 3] since {0, 1, 2, 3} is of size 4 and {10, 11, 12} is of size 3.
So just to recap - the result is a generator (lazy iterator) over all connected components in G, where each connected component is simply a set of nodes.

how to read generator data as numpy array

def laser_callback(self, laserMsg):
cloud = self.laser_projector.projectLaser(laserMsg)
gen = pc2.read_points(cloud, skip_nans=True, field_names=('x', 'y', 'z'))
self.xyz_generator = gen
print(gen)
I'm trying to convert the laser data into pointcloud2 data, and then display them using matplotlib.pyplot. I tried traversing individual points in the generator but it takes a long time. Instead I'd like to convert them into a numpy array and then plot it. How do I go about doing that?
Take a look at some of these other posts which seem to answer the basic question of "convert a generator to an array":
How do I build a numpy array from a generator?
How to construct an np.array with fromiter
How to fill a 2D Python numpy array with values from a generator?
numpy fromiter with generator of list
Without knowing exactly what your generator is returning, the best I can do is provide a somewhat generic (but not particularly efficient) example:
#!/usr/bin/env -p python
import numpy as np
# Sample generator of (x, y, z) tuples
def my_generator():
for i in range(10):
yield (i, i*2, i*2 + 1)
i += 1
def gen_to_numpy(gen):
return np.array([x for x in gen])
gen = my_generator()
array = gen_to_numpy(gen)
print(type(array))
print(array)
Output:
<class 'numpy.ndarray'>
[[ 0 0 1]
[ 1 2 3]
[ 2 4 5]
[ 3 6 7]
[ 4 8 9]
[ 5 10 11]
[ 6 12 13]
[ 7 14 15]
[ 8 16 17]
[ 9 18 19]]
Again though, I cannot comment on the efficiency of this. You mentioned that it takes a long time to plot by reading points directly from the generator, but converting to a Numpy array will still require going through the whole generator to get the data. It would probably be much more efficient if the laser to pointcloud implementation you are using could provide the data directly as an array, but that is a question for the ROS Answers forum (I notice you already asked this there).

Obtaining a pandas dataframe from a dict with tuples as keys

I am new to python and have been struggling with this problem for quite a while. I have a dict like this:
dict1 = {(a,a) : 5, (a,b) :10, (a,c) : 11, (b,a): 4, (b,b) : 8, (b,c) : 3....}
What I would like to do is convert this into a pandas dataframe that looks like this:
a b c
a 5 10 11
b 4 8 3
c .. .. ..
After that I would like to create a multiple bar plot in the jupyter notebook. I know you can display the data as a pandas series to show the following:
dataset = pd.Series(dict1)
print dataset
a a 5
b 10
c 11
b a 4
b 8
c 3
c a ..
b ..
c ..
However, I was not able to create a multiple bar plot from that.
You're almost there, just need to unstack:
dataset.unstack()
I prefer to use this page for reference, rather than the official documentation.

Iterating over selection with query of an HDFStore

I have a very large table in an HDFStore of which I would like to select a subset using a query and then iterate over the subset chunk by chunk. I would like the query to take place before the selection is broken into chunks, so that all of the chunks are the same size.
The documentation here seems to indicate that this is the default behavior but is not so clear. However, it seems to me that the chunking is actually taking place before the query, as shown in this example:
In [1]: pd.__version__
Out[1]: '0.13.0-299-gc9013b8'
In [2]: df = pd.DataFrame({'number': np.arange(1,11)})
In [3]: df
Out[3]:
number
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
[10 rows x 1 columns]
In [4]: with pd.get_store('test.h5') as store:
store.append('df', df, data_columns=['number'])
In [5]: evens = [2, 4, 6, 8, 10]
In [6]: with pd.get_store('test.h5') as store:
for chunk in store.select('df', 'number=evens', chunksize=5):
print len(chunk)
2
3
I would expect only a single chunk of size 5 if the querying were happening before the result is broken into chunks, but this example gives two chunks of lengths 2 and 3.
Is this the intended behavior and if so is there an efficient workaround to give chunks of the same size without reading the table into memory?
I think when I wrote that, the intent was to use chunksize of the results of the query. I think it was changed as was implementing it. The chunksize determines sections that the query is applied, and then you iterate on those. The problem is you don't apriori know how many rows that you are going to get.
However their IS a way to do this. Here is the sketch. Use select_as_coordinates to actually execute your query; this returns an Int64Index of the row number (the coordinates). Then apply an iterator to that where you select based on those rows.
Something like this (this makes a nice recipe, will include in the docs I think):
In [15]: def chunks(l, n):
return [l[i:i+n] for i in xrange(0, len(l), n)]
....:
In [16]: with pd.get_store('test.h5') as store:
....: coordinates = store.select_as_coordinates('df','number=evens')
....: for c in chunks(coordinates, 2):
....: print store.select('df',where=c)
....:
number
1 2
3 4
[2 rows x 1 columns]
number
5 6
7 8
[2 rows x 1 columns]
number
9 10
[1 rows x 1 columns]