pyspark spark RDD to 3D matrix

pyspark spark RDD to 3D matrix - python-2.7

I am using pyspark to perform matrix operations. I have been trying to convert a spark dataframe to 3D matrix.
My dataframe looks like this(x_input):
Prod | Location | Day | Sales
---------------------------------
0 0 0 65.4
1 0 0 96.1
2 0 0 98.1
3 0 0 9.5
0 0 1 16.1
1 0 1 59.4
2 0 1 17
3 0 1 92.8
0 1 0 52.3
1 1 0 41.1
2 1 0 14.6
3 1 0 92.2
0 1 1 27.7
1 1 1 69
2 1 1 50.7
3 1 1 53.4
0 2 0 54
1 2 0 50.4
2 2 0 54.6
3 2 0 1.3
0 2 1 2.7
1 2 1 60.6
2 2 1 77.7
3 2 1 21.7
Prod, Location and Day columns are the indices of matrix where I want to put the Sales value and the output should be like:
[
[
[65.4, 96.1, 98.1, 9.5],
[16.1, 59.4, 17, 92.8]
],
[
[52.3, 41.1, 14.6, 92.2],
[27.7, 69, 50.7, 53.4]
],
[
[54, 50.4, 54.6, 1.3],
[2.7, 60.6, 77.7, 21.7]
]
]
I tried to follow this post and came up with this solution :
x_rdd = x_input.rdd
rows = (x_rdd
.zipWithIndex()
.groupBy(lambda (x,i): i/4)
.mapValues(lambda vals: [x.Sales for (x,i) in sorted(vals, key=lambda (x, i): i)]))
f = rows.map(lambda x: x[1])
after doing so, f is now a 2D matrix, how can I convert it to 3D matrix as mentioned above?
A for loop approach is not possible as the 3D matrix will have large dimensions.

Related

J how to make a shape of random numbers

I'm trying to make a shape of random numbers (0 or 1) in this case as I'm trying to create a minesweeper field.
I've tried using the "?" symbol for random to receive it but it normally turns into an unrandom, repeated pattern which for my purposes is unsatisfactory:
5 5 $ ? 0 1
0 1 0 1 0
1 0 1 0 1
0 1 0 1 0
1 0 1 0 1
0 1 0 1 0
Because of this, I tried other ways like pulling numbers from an index (this is called roll). But this returns random decimals. Other small changes to the code also resulted in these random decimals.

I've done this a few times myself. The key thing is when you apply the ?. You get the result that you want if you apply it after the matrix has been created.
We know that ?2 returns a 1 or a 0 value generated randomly.
? 2
0
? 2
1
? 2
0
So if we create a 5X5 matrix of 2's
5 5 $ 2
2 2 2 2 2
2 2 2 2 2
2 2 2 2 2
2 2 2 2 2
2 2 2 2 2
then we apply ? to each 2 in the matrix you get the random 1 or 0 for each position.
? 5 5 $ 2 NB. first 5 X 5 matrix of random 1's and 0's
0 0 0 1 1
1 1 1 0 1
0 0 0 0 1
1 1 1 1 0
1 1 1 0 0
? 5 5 $ 2 NB. different 5 X 5 matrix of random 1's and 0's
0 0 0 1 1
1 0 1 1 0
0 0 0 1 1
1 0 0 1 0
1 1 1 0 0

onHotEncoding and lists in a pandas dataFrame

I have a pandas dataframe:
import pandas as pd
d={'col1':[[1,2,3],[4,5,6]],'col2':[[7,8,9],[10,11,12]]}
df=pd.DataFrame(d)
which results in:
however I want to implement a onHotEncoder, which will treat each list with the cells of the dataFrame as a string, and I want it to treat each value independently.
How would I implement this? My actual dataFrame contains lists of 500 items, and has 4000 unique values.

I think you can use stack for creating Series, then cast list to string by astype, remove [] by strip and last call get_dummies:
df = df.stack().astype(str).str.strip('[]').str.get_dummies(sep=', ')
print (df)
1 10 11 12 2 3 4 5 6 7 8 9
0 col1 1 0 0 0 1 1 0 0 0 0 0 0
col2 0 0 0 0 0 0 0 0 0 1 1 1
1 col1 0 0 0 0 0 0 1 1 1 0 0 0
col2 0 1 1 1 0 0 0 0 0 0 0 0
One column only:
df = df['col1'].astype(str).str.strip('[]').str.get_dummies(sep=', ')
print (df)
1 2 3 4 5 6
0 1 1 1 0 0 0
1 0 0 0 1 1 1

Distribution of M objects in N container

Given a N size array whose elements denotes the capacity of containers ...In how many ways M similar objects can be distributed so that each containers is filled at the end.
for example
for arr={2,1,2,1} N=4 and M=10 there comes out be 35 ways.
Please help me out with this question.

First calculate the sum of the container sizes. I your case 2+1+2+1 = 6 let this be P. Find the number of ways of choosing P objects from M. There are M choices for the first object, M-1 for the second, M-2 for the third etc. This gives use M * (M-1) * ... (M-p+1) or M! / (M-P)!. This will give us more states than you want for example
1 2 | 3 | 4 5 | 6
2 1 | 3 | 4 5 | 6
There is q! ways of arranging q object in q slots so we need to divide by factorial(arr[0]) and factorial(arr[1]) etc. In this case divide by 2! * 1! * 2! * 1! = 4.
I'm getting a very much larger number than 35. 10! / 4! = 151200 divide that by 4 gives 37800, so I'm not sure if I have understood your question correctly.
Ah so looking at the problem you need to find N integers n1, n2, ... ,nN so that n1+n2+...+nN = M and n1>= arr[1], n2>=arr[2].
Looks quite simple let P be as above. Take the first P pills and give the students their minimum number, arr[1], arr[2] etc. You will have M-P pills left, let this be R.
Essentially the problem simplifies to finding N number >=0 which sum to R. This is a classic problem. As its a challenges I won't do the answer for you but if we break the N=4, R=4 answer down you may see the pattern
4 0 0 0 - 1 case starting with 4
3 1 0 0 - 3 cases starting with 3
3 0 1 0
3 0 0 1
2 2 0 0 - 6 cases
2 1 1 0
2 1 0 1
2 0 2 0
2 0 1 1
2 0 0 2
1 3 0 0 - 10 cases
1 2 1 0
1 2 0 1
1 1 2 0
1 1 1 1
1 1 0 2
1 0 3 0
1 0 2 1
1 0 1 2
1 0 0 3
0 4 0 0 - 15 cases
0 3 1 0
0 3 0 1
0 2 2 0
0 2 1 1
0 2 0 2
0 1 3 0
0 1 2 1
0 1 1 2
0 1 0 3
0 0 4 0
0 0 3 1
0 0 2 2
0 0 1 3
0 0 0 4
You should recognise the numbers 1, 3, 6, 10, 15.

How to rearrange vector to be cols not rows?

I am solving systems of equations using Armadillo. I make a matrix from one array of doubles, specifying the rows and columns. The problem is that it doesn't read it the way I make the array, (it's a vector but then converted to an array) so I need to manipulate the vector.
To be clear, it takes a vector with these values:
2 0 0 0 2 1 1 1 0 1 1 0 3 0 0 1 1 1 1 0 0 1 0 1 2
And it makes this matrix:
2 1 1 1 0
0 1 0 1 1
0 1 3 1 0
0 0 0 1 1
2 1 0 0 2
But I want this matrix:
2 0 0 0 2
1 1 1 0 1
1 0 3 0 0
1 1 1 1 0
0 1 0 1 2
How do I manipulate my vector to make it like this?

I feel as if you are looking for a transposition of a matrix. There is relevant documentation here.

J (Tacit) Sieve Of Eratosthenes

I'm looking for a J code to do the following.
Suppose I have a list of random integers (sorted),
2 3 4 5 7 21 45 49 61
I want to start with the first element and remove any multiples of the element in the list then move on to the next element cancel out its multiples, so on and so forth.
Thus the output
I'm looking at is 2 3 5 7 61. Basically a Sieve Of Eratosthenes. Would appreciate if someone could explain the code as well, since I'm learning J and find it difficult to get most codes :(
Regards,
babsdoc

It's not exactly what you ask but here is a more idiomatic (and much faster) version of the Sieve.
Basically, what you need is to check which number is a multiple of which. You can get this from the table of modulos: |/~
l =: 2 3 4 5 7 21 45 49 61
|/~ l
0 1 0 1 1 1 1 1 1
2 0 1 2 1 0 0 1 1
2 3 0 1 3 1 1 1 1
2 3 4 0 2 1 0 4 1
2 3 4 5 0 0 3 0 5
2 3 4 5 7 0 3 7 19
2 3 4 5 7 21 0 4 16
2 3 4 5 7 21 45 0 12
2 3 4 5 7 21 45 49 0
Every pair of multiples gives a 0 on the table. Now, we are not interested in the 0s that correspond to self-modulos (2 mod 2, 3 mod 3, etc; the 0s on the diagonal) so we have to remove them. One way to do this is to add 1s on their place, like so:
=/~ l
1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 1
(=/~l) + (|/~l)
1 1 0 1 1 1 1 1 1
2 1 1 2 1 0 0 1 1
2 3 1 1 3 1 1 1 1
2 3 4 1 2 1 0 4 1
2 3 4 5 1 0 3 0 5
2 3 4 5 7 1 3 7 19
2 3 4 5 7 21 1 4 16
2 3 4 5 7 21 45 1 12
2 3 4 5 7 21 45 49 1
This can be also written as (=/~ + |/~) l.
From this table we get the final list of numbers: every number whose column contains a 0, is excluded.
We build this list of exclusions simply by multiplying by column. If a column contains a 0, its product is 0 otherwise it's a positive number:
*/ (=/~ + |/~) l
256 2187 0 6250 14406 0 0 0 18240
Before doing the last step, we'll have to improve this a little. There is no reason to perform long multiplications since we are only interested in 0s and not-0s. So, when building the table, we'll keep only 0s and 1s by taking the "sign" of each number (this is the signum:*):
* (=/~ + |/~) l
1 1 0 1 1 1 1 1 1
1 1 1 1 1 0 0 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 0 1 1
1 1 1 1 1 0 1 0 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
so,
*/ * (=/~ + |/~) l
1 1 0 1 1 0 0 0 1
From the list of exclusion, you just copy:# the numbers to your final list:
l #~ */ * (=/~ + |/~) l
2 3 5 7 61
or,
(]#~[:*/[:*=/~+|/~) l
2 3 5 7 61

Tacit iteration is usually done with the conjunction Power. When the test for completion needs to be something other than hitting a fixpoint, the Do While construction works well.
In this solution filterMultiplesOfHead is applied repeatedly until there are no more numbers not either applied or filtered. Numbers already applied are accumulated in a partial answer. When the list to be processed is empty the partial answer is the result, after stripping off the boxing used to segregate processed from unprocessed data.
filterMultiplesOfHead=: {. (((~: >.)# %~) # ]) }.
appendHead=: (>#[ , {.#>#])/
pass=: appendHead ; filterMultiplesOfHead#>#{:
prep=: a: , <
unfinished=: [: -. a: -: {:
sieve=: [: ; [: pass^:unfinished^:_ prep
sieve 2 3 4 5 7 21 45 49 61
2 3 5 7 61
prep 2 3 4 7 9 10
┌┬────────────┐
││2 3 4 7 9 10│
└┴────────────┘
appendHead prep 2 3 4 7 9 10
2
filterMultiplesOfHead 2 3 4 7 9 10
3 7 9
pass^:2 prep 2 3 4 7 9 10
┌───┬─┐
│2 3│7│
└───┴─┘
sieve 1-.~/:~~.>:?.$~100
2 3 7 11 29 31 41 53 67 73 83 95 97

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

pyspark spark RDD to 3D matrix - python-2.7

Related

J how to make a shape of random numbers

onHotEncoding and lists in a pandas dataFrame

Distribution of M objects in N container

How to rearrange vector to be cols not rows?

J (Tacit) Sieve Of Eratosthenes

Categories

Resources