Convert a single column into 2d Matrix in python - python-2.7

I have data as shown below in a single column and i want to split that single column into n number of columns and name rows and columns. How can i do that in python ?
-----------sample data----------
5
3
5
0
0
1
0
0
18
23
11
1
2
10
1
0
5
6
1
0
1
1
1
0
158
132
150
17
------------ The output should look like ---------
column0 column1 column2 column3 column4 column5 column6
row1 5 0 18 2 5 1 158
row2 3 1 23 10 6 1 132
row3 5 0 11 1 1 1 150
row4 0 0 1 0 0 0 17

One of the easiest ways is to use numpy and the reshape function
import numpy as np
k = np.array(data)
k.reshape([row,column],order='F')
As for your example. You mentioned the data is from a text file so to acquire the data from the text file and reshape
import numpy as np
data = np.genfromtxt("sample-data.txt");
data.reshape([4,7],order='F')
output will be
Out[27]:
array([[ 5, 0, 18, 2, 5, 1, 158],
[ 3, 1, 23, 10, 6, 1, 132],
[ 5, 0, 11, 1, 1, 1, 150],
[ 0, 0, 1, 0, 0, 0, 17]])
I do not know the structure of the data but assuming it is in 1 giant column as seen in the sample above. When importing the data using open. The following happens.
data = open("sample-data.txt",'r').readlines()
data
Out[64]:
['5\n',
'3\n',
'5\n',
'0\n',
'0\n',
'1\n',
'0\n',
'0\n',
'18\n',
'23\n',
'11\n',
'1\n',
'2\n',
'10\n',
'1\n',
'0\n',
'5\n',
'6\n',
'1\n',
'0\n',
'1\n',
'1\n',
'1\n',
'0\n',
'158\n',
'132\n',
'150\n',
'17']
This results in an array of string values as the \n means next line. Assuming this is numerical data, you will want to use the code above to get the numbers.

Related

How to shuffle values in an array and store shuffled values into different arrays with AVX

Thanks in advance for the help. I need to be able to perform the following shuffle pattern in an array with uint16_t data. My unprocessed array will look like the following
0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
I have transformed my unprocessed data into the format below with _mm512_permutexvar_epi16
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3
and then store the contents of the AVX register into 4 different arrays, this is the part I'm unsure on the best way to do.
next eight values of arrayofZero's 0 0 0 0 0 0 0 0
next eight values of arrayofOne's 1 1 1 1 1 1 1 1
next eight values of arrayofTwo's 2 2 2 2 2 2 2 2
next eight values of arrayofThree's 3 3 3 3 3 3 3
I need to loop through my unprocessed data and populate the arrayofZero's with all the 0 values and so on and so forth with my 1, 2, and 3 values.
NOTE: my actual data is not hardcoded 0, 1, 2, 3. It is calculated data that I need to put the
1st value in the 1st array,
2nd value in the 2nd array,
3rd value in the 3rd processed data array,
and 4th value in the 4th processed data array
that pattern repeats for the entire unprocessed data array. Such that after all processing is done
1st Array holds all the 0 values
2nd Array holds all the 1 values
3rd array holds all the 2 values
4th array holds all the 3 values
I have been looking at _mm512_permutexvar_epi16 to get my unprocessed data into the format.
Below is the code that I have started.
#include <immintrin.h>
#include <array>
#include <iostream>
int main()
{
alignas(64) std::array<uint16_t, 128> unprocessedData;
alignas(64) std::array<uint16_t, 32> processedData0, processedData1, processedData2, processedData3;
alignas(64) constexpr std::array<uint16_t, 32> shuffleMask {
0, 4, 8, 12, 16, 20, 24, 28,
1, 5, 9, 13, 17, 21, 25, 29,
2, 6, 10, 14, 18, 22, 26, 30,
3, 7, 11, 15, 19, 23, 27, 31,
};
//prepare sample data
for (uint16_t i {0}; i < unprocessedData.size(); i+=4)
{
unprocessedData[i] = 0;
unprocessedData[i+1] = 1;
unprocessedData[i+2] = 2;
unprocessedData[i+3] = 3;
}
for (size_t i {0}; i < unprocessedData.size(); i+=32)
{
auto v {_mm512_loadu_epi16(&unprocessedData[i]) };
_mm512_storeu_epi16(&unprocessedData[i],
_mm512_permutexvar_epi16(_mm512_load_si512((__m512i*)shuffleMask.data()), v));
//Somehow Store values 0-7 of permuted array into processedData0
//Store values 8-15 of permuted array into processedData1
//Store values 16-23 of permuted array into processedData2
//Store values 24-31 of permuted array into processedData3
}
return 0;
}

Test for new id combinations in R

I am looking to create an indicator that checks whether the a group takes new combinations of numbers or not. I have a dataset like this one:
combinations <- data.frame(combination_id = c(1, 1, 1, 1,
2, 2, 2,
3,
4,
5, 5, 5, 5,
6, 6, 6),
number = c(20, 10, 12, 18,
20, 10, 12,
20,
40,
20, 10, 30, 18,
18, 30, 10))
What I want is the following:
dataset_2 <- data.frame(combination_id = c(1, 1, 1, 1,
2, 2, 2,
3,
4,
5, 5, 5, 5,
6, 6, 6),
number = c(20, 10, 12, 18,
20, 10, 12,
20,
40,
20, 10, 30, 18,
18, 30, 10),
new_combination = c(1, 1, 1, 1,
0,0,0,
0,
1,
1,1, 1, 1,
0, 0, 0))
Basically an indicator new_combination that is 1 if any of the possible combinations in that combination_id is new (i.e. not present in the lower values of combination_id) or if it is just one number that has not been seen, and is zero if a number is alone but has been seen before (as 20 in group 3) or if all combinations have been seen before (as in groups 2 and 6).
So the first group takes value of 1 because none of those numbers or combinations have been taken before, group 2 takes the value of 0 because all possible combinations are also in group 1, group 3 is only one number that has been seen before so takes the value of 0. Group 4 has a new number (40) so takes the value of 1. Group 5 has new combinations with the number 30 so takes the value of 1 and group 6 has no new combinations so takes the value of zero.
I hope this made it clear what I am looking for.
Any ideas? Thank you so much.
library(data.table)
setDT(combinations)
combinations[, new_combinations := ifelse(
combination_id %in% combinations[rowid(number) == 1, combination_id], 1, 0)]
# combination_id number new_combinations
# 1: 1 20 1
# 2: 1 10 1
# 3: 1 12 1
# 4: 1 18 1
# 5: 2 20 0
# 6: 2 10 0
# 7: 2 12 0
# 8: 3 20 0
# 9: 4 40 1
#10: 5 20 1
#11: 5 10 1
#12: 5 30 1
#13: 5 18 1
#14: 6 18 0
#15: 6 30 0
#16: 6 10 0
dplyr approach:
require(dplyr)
combinations %>% dplyr::mutate(new_combination = !duplicated(number)) %>%
group_by(combination_id) %>%
dplyr::mutate(new_combination = as.numeric(any(new_combination))) %>%
ungroup()
combination_id number new_combination
<dbl> <dbl> <dbl>
1 1 20 1
2 1 10 1
3 1 12 1
4 1 18 1
5 2 20 0
6 2 10 0
7 2 12 0
8 3 20 0
9 4 40 1
10 5 20 1
11 5 10 1
12 5 30 1
13 5 18 1
14 6 18 0
15 6 30 0
16 6 10 0
A base R option with ave + duplicated
transform(
combinations,
new_combination = ave(+!duplicated(number), combination_id, FUN = max)
)
gives
combination_id number new_combination
1 1 20 1
2 1 10 1
3 1 12 1
4 1 18 1
5 2 20 0
6 2 10 0
7 2 12 0
8 3 20 0
9 4 40 1
10 5 20 1
11 5 10 1
12 5 30 1
13 5 18 1
14 6 18 0
15 6 30 0
16 6 10 0

pyspark spark RDD to 3D matrix

I am using pyspark to perform matrix operations. I have been trying to convert a spark dataframe to 3D matrix.
My dataframe looks like this(x_input):
Prod | Location | Day | Sales
---------------------------------
0 0 0 65.4
1 0 0 96.1
2 0 0 98.1
3 0 0 9.5
0 0 1 16.1
1 0 1 59.4
2 0 1 17
3 0 1 92.8
0 1 0 52.3
1 1 0 41.1
2 1 0 14.6
3 1 0 92.2
0 1 1 27.7
1 1 1 69
2 1 1 50.7
3 1 1 53.4
0 2 0 54
1 2 0 50.4
2 2 0 54.6
3 2 0 1.3
0 2 1 2.7
1 2 1 60.6
2 2 1 77.7
3 2 1 21.7
Prod, Location and Day columns are the indices of matrix where I want to put the Sales value and the output should be like:
[
[
[65.4, 96.1, 98.1, 9.5],
[16.1, 59.4, 17, 92.8]
],
[
[52.3, 41.1, 14.6, 92.2],
[27.7, 69, 50.7, 53.4]
],
[
[54, 50.4, 54.6, 1.3],
[2.7, 60.6, 77.7, 21.7]
]
]
I tried to follow this post and came up with this solution :
x_rdd = x_input.rdd
rows = (x_rdd
.zipWithIndex()
.groupBy(lambda (x,i): i/4)
.mapValues(lambda vals: [x.Sales for (x,i) in sorted(vals, key=lambda (x, i): i)]))
f = rows.map(lambda x: x[1])
after doing so, f is now a 2D matrix, how can I convert it to 3D matrix as mentioned above?
A for loop approach is not possible as the 3D matrix will have large dimensions.

how to write a matrix to a file in python with this format?

I need to write a matrix to a file with this format (i, j, a[i,j]) row by row, but I don't know how to get it. I tried with: np.savetxt(f, A, fmt='%1d', newline='\n'), but it write only matrix values and don't write i, j!
import numpy as np
a = np.arange(12).reshape(4,3)
a_with_index = np.array([idx+(val,) for idx, val in np.ndenumerate(a)])
np.savetxt('/tmp/out', a_with_index, fmt='%d')
writes to /tmp/out the contents
0 0 0
0 1 10
0 2 20
1 0 30
1 1 40
1 2 50
2 0 60
2 1 70
2 2 80
3 0 90
3 1 100
3 2 110
If your array datatype is not a sort of integer, you'll probably have to write your own function to save it along with its indices, since these are integers. For example,
import numpy as np
def savetxt_with_indices(filename, arr, fmt):
nrows, ncols = arr.shape
indexes = np.empty((nrows*ncols, 2))
indexes[:,0] = np.repeat(np.arange(nrows), ncols)
indexes[:,1] = np.tile(np.arange(ncols), nrows)
fmt = '%4d %4d ' + fmt
flat_arr = arr.flatten()
with open(filename, 'w') as fo:
for i in range(nrows*ncols):
print(fmt % (indexes[i, 0], indexes[i, 1], flat_arr[i]), file=fo)
A = np.arange(12.).reshape((4,3))
savetxt_with_indices('test.txt', A, '%6.2f')
0 0 0.00
0 1 1.00
0 2 2.00
1 0 3.00
1 1 4.00
1 2 5.00
2 0 6.00
2 1 7.00
2 2 8.00
3 0 9.00
3 1 10.00
3 2 11.00

pandas pivot table using index data of dataframe

I want to create a pivot table from a pandas dataframe
using dataframe.pivot()
and include not only dataframe columns but also the data within the dataframe index.
Couldn't find any docs that show how to do that.
Any tips?
Use reset_index to make the index a column:
In [45]: df = pd.DataFrame({'y': [0, 1, 2, 3, 4, 4], 'x': [1, 2, 2, 3, 1, 3]}, index=np.arange(6)*10)
In [46]: df
Out[46]:
x y
0 1 0
10 2 1
20 2 2
30 3 3
40 1 4
50 3 4
In [47]: df.reset_index()
Out[47]:
index x y
0 0 1 0
1 10 2 1
2 20 2 2
3 30 3 3
4 40 1 4
5 50 3 4
So pivot uses the index as values:
In [48]: df.reset_index().pivot(index='y', columns='x')
Out[48]:
index
x 1 2 3
y
0 0 NaN NaN
1 NaN 10 NaN
2 NaN 20 NaN
3 NaN NaN 30
4 40 NaN 50