I wish to convert the following dataframe into a list - list

Sl No Vertical Verticale Code Org Work Location \
0 1.0 IT 5 New Delhi
1 2.0 IT 5 Raipur
2 3.0 Infrastructure 7 Coimbatore
3 4.0 Telecom 3 Chennai
4 5.0 Telecom 3 Ahmedabad
5 6.0 IT 5 Chennai
6 7.0 IT 5 Chennai
7 8.0 IT Products 6 Bangalore
8 9.0 IT 5 Chennai
9 10.0 IT 5 Chennai
10 11.0 Telecom 3 Bangalore
11 12.0 IT 5 Mysore
12 13.0 IT Products 6 Navi Mumbai
13 14.0 Telecom 3 Bangalore
14 15.0 Infrastructure 7 Chennai
15 16.0 IT 5 Chennai
16 17.0 IT 5 Chennai
17 18.0 Infrastructure 7 Coimbatore
18 19.0 Telecom 3 Chennai
19 20.0 Telecom 3 Bangalore
20 21.0 Telecom 3 Bengalore
21 22.0 IT 5 Chennai

depends on if you want a flat list or nested lists.
nested lists
df.values.tolist()
[[0, 1.0, 'IT', '5', 'New', 'Delhi', nan, nan],
[1, 2.0, 'IT', '5', 'Raipur', nan, nan, nan],
[2, 3.0, 'Infrastructure', '7', 'Coimbatore', nan, nan, nan],
[3, 4.0, 'Telecom', '3', 'Chennai', nan, nan, nan],
[4, 5.0, 'Telecom', '3', 'Ahmedabad', nan, nan, nan],
[5, 6.0, 'IT', '5', 'Chennai', nan, nan, nan],
[6, 7.0, 'IT', '5', 'Chennai', nan, nan, nan],
[7, 8.0, 'IT', 'Products', '6', 'Bangalore', nan, nan],
[8, 9.0, 'IT', '5', 'Chennai', nan, nan, nan],
[9, 10.0, 'IT', '5', 'Chennai', nan, nan, nan],
[10, 11.0, 'Telecom', '3', 'Bangalore', nan, nan, nan],
[11, 12.0, 'IT', '5', 'Mysore', nan, nan, nan],
[12, 13.0, 'IT', 'Products', '6', 'Navi', 'Mumbai', nan],
[13, 14.0, 'Telecom', '3', 'Bangalore', nan, nan, nan],
[14, 15.0, 'Infrastructure', '7', 'Chennai', nan, nan, nan],
[15, 16.0, 'IT', '5', 'Chennai', nan, nan, nan],
[16, 17.0, 'IT', '5', 'Chennai', nan, nan, nan],
[17, 18.0, 'Infrastructure', '7', 'Coimbatore', nan, nan, nan],
[18, 19.0, 'Telecom', '3', 'Chennai', nan, nan, nan],
[19, 20.0, 'Telecom', '3', 'Bangalore', nan, nan, nan],
[20, 21.0, 'Telecom', '3', 'Bengalore', nan, nan, nan],
[21, 22.0, 'IT', '5', 'Chennai', nan, nan, nan]]
flat list
df.values.ravel().tolist()
[0,
1.0,
'IT',
'5',
'New',
'Delhi',
nan,
nan,
1,
2.0,
'IT',
'5',
'Raipur',
nan,
nan,
nan,
2,
3.0,
'Infrastructure',
'7',
'Coimbatore',
nan,
nan,
nan,
3,
4.0,
'Telecom',
'3',
'Chennai',
nan,
nan,
nan,
4,
5.0,
'Telecom',
'3',
'Ahmedabad',
nan,
nan,
nan,
5,
6.0,
'IT',
'5',
'Chennai',
nan,
nan,
nan,
6,
7.0,
'IT',
'5',
'Chennai',
nan,
nan,
nan,
7,
8.0,
'IT',
'Products',
'6',
'Bangalore',
nan,
nan,
8,
9.0,
'IT',
'5',
'Chennai',
nan,
nan,
nan,
9,
10.0,
'IT',
'5',
'Chennai',
nan,
nan,
nan,
10,
11.0,
'Telecom',
'3',
'Bangalore',
nan,
nan,
nan,
11,
12.0,
'IT',
'5',
'Mysore',
nan,
nan,
nan,
12,
13.0,
'IT',
'Products',
'6',
'Navi',
'Mumbai',
nan,
13,
14.0,
'Telecom',
'3',
'Bangalore',
nan,
nan,
nan,
14,
15.0,
'Infrastructure',
'7',
'Chennai',
nan,
nan,
nan,
15,
16.0,
'IT',
'5',
'Chennai',
nan,
nan,
nan,
16,
17.0,
'IT',
'5',
'Chennai',
nan,
nan,
nan,
17,
18.0,
'Infrastructure',
'7',
'Coimbatore',
nan,
nan,
nan,
18,
19.0,
'Telecom',
'3',
'Chennai',
nan,
nan,
nan,
19,
20.0,
'Telecom',
'3',
'Bangalore',
nan,
nan,
nan,
20,
21.0,
'Telecom',
'3',
'Bengalore',
nan,
nan,
nan,
21,
22.0,
'IT',
'5',
'Chennai',
nan,
nan,
nan]

Related

I've some problems understanding how AVX shuffle intrinsics are working for 8 bits

I'm trying to pack 16 bits data to 8 bits by using _mm256_shuffle_epi8 but the result i have is not what i'm expecting.
auto srcData = _mm256_setr_epi8(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32);
__m256i vperm = _mm256_setr_epi8( 0, 2, 4, 6, 8, 10, 12, 14,
16, 18, 20, 22, 24, 26, 28, 30,
-1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1);
auto result = _mm256_shuffle_epi8(srcData, vperm);
I'm expecting that result contains:
1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
But i have instead:
1, 3, 5, 7, 9, 11, 13, 15, 1, 3, 5, 7, 9, 11, 13, 15,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
I surely misunderstood how Shuffle works.
If anyone can enlighten me, it will be very appreciated :)
Yeah, to be expected. Look at the docs for _mm_shuffle_epi8. The 256bit avx version simply duplicates the behaviour of that 128bit instruction for the two 16byte values in the YMM register.
So you can shuffle the first 16 values, or the last 16 values; however you cannot shuffle values across the 16byte boundary. (You'll notice that all numbers over 16, are the same numbers minus 16. e.g. 19->3, 31->15, etc).
you'll need to do this with an additional step.
__m256i vperm = _mm256_setr_epi8( 0, 2, 4, 6, 8, 10, 12, 14,
-1, -1, -1, -1, -1, -1, -1, -1,
0, 2, 4, 6, 8, 10, 12, 14,
-1, -1, -1, -1, -1, -1, -1, -1);
and then use _mm256_permute2f128_si256 to pull the 0th and 2nd byte into the first 128bits.

How to construct a sobel filter for a 3d convolution?

In my code snippet, I want to construct Sobel filter which is applied to each layer of an image (RGB) separately and in the end stuck (again rgb, but filtered) together.
I do not know how to construct the Sobel filter with input shape [filter_depth, filter_height, filter_width, in_channels, out_channesl], that is in my case:
sobel_x_filter = tf.reshape(sobel_x, [1, 3, 3, 3, 3])
The entire code looks like that:
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
im0 = plt.imread('../../data/im0.png') # already divided by 255
sobel_x = tf.constant([
[[[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]],
[[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]],
[[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]]],
[[[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]],
[[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]],
[[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]]],
[[[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]],
[[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]],
[[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]]]], tf.float32) # is this correct?
sobel_x_filter = tf.reshape(sobel_x, [1, 3, 3, 3, 3])
image = tf.placeholder(tf.float32, shape=[496, 718, 3])
image_resized = tf.expand_dims(tf.expand_dims(image, 0), 0)
filters_x = tf.nn.conv3d(image_resized, filter=sobel_x_filter, strides=[1,1,1,1,1],
padding='SAME', data_format='NDHWC')
with tf.Session('') as sess:
sess.run([tf.global_variables_initializer(), tf.local_variables_initializer()])
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
feed_dict = {image: im0}
img = filters_x.eval(feed_dict=feed_dict)
plt.figure(0), plt.title('red'), plt.imshow(np.squeeze(img[...,0])),
plt.figure(1), plt.title('green'), plt.imshow(np.squeeze(img[...,1])),
plt.figure(2), plt.title('blue'), plt.imshow(np.squeeze(img[...,2]))
You can use tf.nn.depthwise_conv2d:
sobel_x = tf.constant([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]], tf.float32)
kernel = tf.tile(sobel_x[...,None],[1,1,3])[...,None]
conv = tf.nn.depthwise_conv2d(image[None,...], kernel,strides=[1,1,1,1],padding='SAME')
With tf.nn.conv3d:
im = tf.expand_dims(tf.transpose(image, [2, 0, 1]),0)
sobel_x = tf.constant([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]], tf.float32)
sobel_x_filter = tf.reshape(sobel_x, [1, 3, 3, 1, 1])
conv = tf.transpose(tf.squeeze(tf.nn.conv3d(im[...,None], sobel_x_filter,
strides=[1,1,1,1,1],padding='SAME')), [1,2,0])

fast way of summing row entries in diagonal position of matrix in python

Hi I am trying to solve below equation where A is a sparse matrix and ptotal is an array of numbers. I have to sum all the entries in a row at diagonal position.
A[ptotal, ptotal] = -sum(A[ptotal, :])
The code seems to give right answer but since my ptotal array is long almost (100000 entries), it is computationally not efficient. Is there any fast method to solve this problem.
First a dense array version:
In [87]: A = np.arange(36).reshape(6,6)
In [88]: ptotal = np.arange(6)
Assuming ptotal is all the row indices, it can be replace with a sum method call:
In [89]: sum(A[ptotal,:])
Out[89]: array([ 90, 96, 102, 108, 114, 120])
In [90]: A.sum(axis=0)
Out[90]: array([ 90, 96, 102, 108, 114, 120])
We can make an array with those values on the diagonal:
In [92]: np.diagflat(A.sum(axis=0))
Out[92]:
array([[ 90, 0, 0, 0, 0, 0],
[ 0, 96, 0, 0, 0, 0],
[ 0, 0, 102, 0, 0, 0],
[ 0, 0, 0, 108, 0, 0],
[ 0, 0, 0, 0, 114, 0],
[ 0, 0, 0, 0, 0, 120]])
Add it to the original array - and the result is a 'zero-sum' array:
In [93]: A -= np.diagflat(A.sum(axis=0))
In [94]: A
Out[94]:
array([[-90, 1, 2, 3, 4, 5],
[ 6, -89, 8, 9, 10, 11],
[ 12, 13, -88, 15, 16, 17],
[ 18, 19, 20, -87, 22, 23],
[ 24, 25, 26, 27, -86, 29],
[ 30, 31, 32, 33, 34, -85]])
In [95]: A.sum(axis=0)
Out[95]: array([0, 0, 0, 0, 0, 0])
We could do the same with sparse
In [99]: M = sparse.csr_matrix(np.arange(36).reshape(6,6))
In [100]: M
Out[100]:
<6x6 sparse matrix of type '<class 'numpy.int32'>'
with 35 stored elements in Compressed Sparse Row format>
In [101]: M.sum(axis=0)
Out[101]: matrix([[ 90, 96, 102, 108, 114, 120]], dtype=int32)
A sparse diagonal matrix:
In [104]: sparse.dia_matrix((M.sum(axis=0),0),M.shape)
Out[104]:
<6x6 sparse matrix of type '<class 'numpy.int32'>'
with 6 stored elements (1 diagonals) in DIAgonal format>
In [105]: _.A
Out[105]:
array([[ 90, 0, 0, 0, 0, 0],
[ 0, 96, 0, 0, 0, 0],
[ 0, 0, 102, 0, 0, 0],
[ 0, 0, 0, 108, 0, 0],
[ 0, 0, 0, 0, 114, 0],
[ 0, 0, 0, 0, 0, 120]], dtype=int32)
Take the difference, getting a new matrix:
In [106]: M-sparse.dia_matrix((M.sum(axis=0),0),M.shape)
Out[106]:
<6x6 sparse matrix of type '<class 'numpy.int32'>'
with 36 stored elements in Compressed Sparse Row format>
In [107]: _.A
Out[107]:
array([[-90, 1, 2, 3, 4, 5],
[ 6, -89, 8, 9, 10, 11],
[ 12, 13, -88, 15, 16, 17],
[ 18, 19, 20, -87, 22, 23],
[ 24, 25, 26, 27, -86, 29],
[ 30, 31, 32, 33, 34, -85]], dtype=int32)
There is also a setdiag method
In [117]: M.setdiag(-M.sum(axis=0).A1)
/usr/local/lib/python3.5/dist-packages/scipy/sparse/compressed.py:774: SparseEfficiencyWarning: Changing the sparsity structure of a csr_matrix is expensive. lil_matrix is more efficient.
SparseEfficiencyWarning)
In [118]: M.A
Out[118]:
array([[ -90, 1, 2, 3, 4, 5],
[ 6, -96, 8, 9, 10, 11],
[ 12, 13, -102, 15, 16, 17],
[ 18, 19, 20, -108, 22, 23],
[ 24, 25, 26, 27, -114, 29],
[ 30, 31, 32, 33, 34, -120]], dtype=int32)
Out[101] is a 2d matrix; .A1 turns it into a 1d array which setdiag can use.
The sparse efficiency warning is aimed more at iterative use than a one time application like this. Still, looking at the setdiag code, I suspect the first approach is faster. But we really need to do time tests.

Is AVX intrinsic _mm256_cmp_ps supposed to return NaN when true?

When i try:
__m256 a = _mm256_set_ps(1, 1, 1, 1, 1, 1, 1, 1);
__m256 b = _mm256_set_ps(0, 0, 0, 0, 0, 0, 0, 0);
__m256 c = _mm256_cmp_ps(a, b, _CMP_LT_OQ);
Which is a < b I get the output:
[0, 0, 0, 0, 0, 0, 0, 0]
But when trying:
__m256 a = _mm256_set_ps(1, 1, 1, 1, 1, 1, 1, 1);
__m256 b = _mm256_set_ps(0, 0, 0, 0, 0, 0, 0, 0);
__m256 c = _mm256_cmp_ps(b, a, _CMP_LT_OQ);
or
__m256 a = _mm256_set_ps(1, 1, 1, 1, 1, 1, 1, 1);
__m256 b = _mm256_set_ps(0, 0, 0, 0, 0, 0, 0, 0);
__m256 c = _mm256_cmp_ps(a, b, _CMP_GT_OQ);
I get
[NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN]
is this expected behaviour? The documentation at https://software.intel.com/en-us/node/524077 just says that it returns the result without specifying.
Yes, the returned value is a bitmask: it is set to all zeroes for false, or all ones for true. 32 bits of ones happen to be encoding of NaN when interpreted as a 32-bit float.
Bitmasks are useful because you can use them to mask out some results, e.g. (A & M) | (B & ~M) will select the value of A when the mask M was true (all ones) and the value of B when the mask was false (all zeroes).

Flattening a list with a nested dictionary

I have a list that includes a nested dictionary:
l =[('M', 1, 2, {'t': (2, 1.0)}), ('L', 2, 4, {'b': (4, 0.25), 'fi': (4, 0.75)}),
('J', 4, 5, {'a': (5, 0.2), 'w': (5, 0.2), 'wh': (5, 0.4), 'en': (5, 0.2)}),
('T', 4, 6, {'sl': (6, 0.5), 'f': (6, 0.1), 'pz': (6, 0.17), 'al': (6, 0.1)}),
('P', 5, 5, {'tr': (5, 0.2), 'in': (5, 0.2), 'fa': (5, 0.2), 'if': (5, 0.2)})]
I would like to flat out this list, in order to have a plain list like this:
[('M', 1, 2, 't', 2, 1.0, 'L', 2, 4, 'b', 4, 0.25, 'fi', 4, 0.7),
('J', 4, 5, 'a', 5, 0.2, 'w', 5, 0.2, 'wh', 5, 0.4, 'en', 5, 0.2)]
I tried some flatten functions, but I got confused with how to flatten the dictionary within the list. I am new to python, as you can tell. Could anyone help me with getting around this.
I could only think of below brute force method...
your given iterator:
d =[('M', 1, 2, {'t': (2, 1.0)}), ('L', 2, 4, {'b': (4, 0.25), 'fi': (4, 0.75)}),
('J', 4, 5, {'a': (5, 0.2), 'w': (5, 0.2), 'wh': (5, 0.4), 'en': (5, 0.2)}),
('T', 4, 6, {'sl': (6, 0.5), 'f': (6, 0.1), 'pz': (6, 0.17), 'al': (6, 0.1)}),
('P', 5, 5, {'tr': (5, 0.2), 'in': (5, 0.2), 'fa': (5, 0.2), 'if': (5, 0.2)})]
My solution:
l = [ ]
for item in d:
for i in item:
if type(i) is dict:
for j in i.items():
for p in j:
l.append(p) if not isinstance(p,tuple) else [l.extend(k for k in p)]
else:
l.append(i)
print l
output:
['M', 1, 2, 't', 2, 1.0, 'L', 2, 4, 'fi', 4, 0.75, 'b', 4, 0.25, 'J', 4, 5, 'a', 5, 0.20000000000000001, 'en', 5, 0.20000000000000001, 'w', 5, 0.20000000000000001, 'wh', 5, 0.40000000000000002, 'T', 4, 6, 'pz', 6, 0.17000000000000001, 'f', 6, 0.10000000000000001, 'al', 6, 0.10000000000000001, 'sl', 6, 0.5, 'P', 5, 5, 'fa', 5, 0.20000000000000001, 'if', 5, 0.20000000000000001, 'tr', 5, 0.20000000000000001, 'in', 5, 0.20000000000000001]
Hope this helps :)