I have a list that is always binary. I want to XOR every two consequentive elements to satisfy a condition in a research paper. For instance: Given list=[1,0,1,1], XORing each consequtive pairs should be something like this: 1 XOR 0 = 1, 0 XOR 1 = 1, 1 XOR 1 = 0. To do so, is it correct to XOR two lists where the second list is the shifted verison of the original Something like: numpy.bitwise_xor([1,0,1,1],[0,1,0,1])?
You can load the input list as an array (called a) and use numpy.roll to shift the array so that you now have another array (called b) which has stores the shifted array. Now bitwise_xor can be used on a,b.
import numpy as np
a = np.array([1,0,1,1])
b= np.roll(a,len(a)-1)
c = np.bitwise_xor(a,b)
print(' A :',a,'\n','B :',b,'\n','C :',c)
Output:
A : [1 0 1 1]
B : [0 1 1 1]
C : [1 1 0 0]
If you're using python 2.7, make sure the change the print statement!
You can also use slices
>>> import numpy as np
>>> a = [1, 0, 1, 1]
>>> print np.bitwise_xor(a[1:], a[:-1])
array([1, 1, 0], dtype=int32)
Related
Using np.argpartition, it does not sort the entire array. It only guarantees that the kth element is in sorted position and all smaller elements will be moved before it. Thus, the first k elements will be the k-smallest elements
>>> num = 3
>>> myBigArray=np.array([[1,3,2,5,7,0],[14,15,6,5,7,0],[17,8,9,5,7,0]])
>>> top = np.argpartition(myBigArray, num, axis=1)[:, :num]
>>> print top
[[5 0 2]
[3 5 2]
[5 3 4]]
>>> myBigArray[np.arange(myBigArray.shape[0])[:, None], top]
[[0 1 2]
[5 0 6]
[0 5 7]]
This returns the k-smallest values of each column. Note that these may not be in sorted order.I use this method because To get the top-k elements in sorted order in this way takes O(n + k log k) time
I want to get the k-smallest values of each column in sorted order, without increasing the time complexity.
Any suggestions??
To use np.argpartition and maintain the sorted order, we need to use those range of elements as range(k) instead of feeding in just the scalar kth param -
idx = np.argpartition(myBigArray, range(num), axis=1)[:, :num]
out = myBigArray[np.arange(idx.shape[0])[:,None], idx]
You can use the exact same trick that you used in the case of rows; combining with #Divakar's trick for sorting, this becomes
In [42]: num = 2
In [43]: myBigArray[np.argpartition(myBigArray, range(num), axis=0)[:num, :], np.arange(myBigArray.shape[1])[None, :]]
Out[43]:
array([[ 1, 3, 2, 5, 7, 0],
[14, 8, 6, 5, 7, 0]])
A bit of indirect indexing does the trick. Pleaese note that I worked on rows since you started off on rows.
fdim = np.arange(3)[:, None]
so = np.argsort(myBigArray[fdim, top], axis=-1)
tops = top[fdim, so]
myBigArray[fdim, tops]
# array([[0, 1, 2],
[0, 5, 6],
[0, 5, 7]])
A note on argpartition with range argument: I strongly suspect that it is not O(n + k log k); in any case it is typically several-fold slower than a manual argpartition + argsort see here
I finally got to a message that I expected could solve my problem. I have two columns in a dataFrame (height, upper) with values either 1 or 0. The combination of this is 4 elements and with them I am trying to create a third column containing the 4 combinations, but I cannot figure out what is going wrong, My code is as follows:
def quad(clasif):
if (raw['upper']==0 and raw['height']==0):
return 1
if (raw['upper']==1 and raw['height']==0):
return 2
if (raw['upper']==0 and raw['height']==1):
return 3
if (raw['upper']==1 and raw['height']==1):
return 4
raw['cuatro']=raw.apply(lambda clasif: quad(clasif), axis=1)
I am getting the following error:
'The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', u'occurred at index 0'
if someone could help?
Assuming that upper and height can only be 0 or 1, you can rewrite this as a simple addition:
raw['cuatro'] = 1 + raw['upper'] + 2 * raw['height']
The reason you see this error is because raw['upper'] == 0 is a Boolean series, which you can't use and... See the "gotcha" section of the docs.
I think you're missing the fundamentals of apply, when passed the Series clasif, your function should do something with clasif (at the moment, the function body makes no mention of it).
You have to pass the function to apply.
import pandas as pd
def quad(clasif):
if (clasif['upper']==0 and clasif['height']==0):
return 1
if (clasif['upper']==1 and clasif['height']==0):
return 2
if (clasif['upper']==0 and clasif['height']==1):
return 3
if (clasif['upper']==1 and clasif['height']==1):
return 4
raw = pd.DataFrame({'upper': [0, 0, 1, 1], 'height': [0, 1, 0, 1]})
raw['cuatro']=raw.apply(quad, axis=1)
print raw
height upper cuatro
0 0 0 1
1 1 0 3
2 0 1 2
3 1 1 4
Andy Hayden's answer is better suited for your case.
in a previous thread, a brilliant response was given to the following problem(Pandas: reshaping data).
The goal is to reshape a pandas series containing lists into a pandas dataframe in the following way:
In [9]: s = Series([list('ABC'),list('DEF'),list('ABEF')])
In [10]: s
Out[10]:
0 [A, B, C]
1 [D, E, F]
2 [A, B, E, F]
dtype: object
should be shaped into this:
Out[11]:
A B C D E F
0 1 1 1 0 0 0
1 0 0 0 1 1 1
2 1 1 0 0 1 1
That is, a dataframe is created where every element in the lists of the series becomes a column. For every element in the series, a row in the dataframe is created. For every element in the lists, a 1 is assigned to the corresponding dataframe column (and 0 otherwise). I know that the wording may be cumbersome, but hopefully the example above is clear.
The brilliant response by user Jeff (https://stackoverflow.com/users/644898/jeff) was to write this simple yet powerful line of code:
In [11]: s.apply(lambda x: Series(1,index=x)).fillna(0)
That turns [10] into out[11].
That line of code served me extremely well, however I am running into memory issues with a series of roughly 50K elements and about 100K different elements in all lists. My machine has 16G of memory. Before resorting to a bigger machine, I would like to think of a more efficient implementation of the function above.
Does anyone know how to re-implement the above line:
In [11]: s.apply(lambda x: Series(1,index=x)).fillna(0)
to make it more efficient, in terms of memory usage?
You could try breaking your dataframe into chunks and writing to a file as you go, something like this:
chunksize = 10000
def f(df):
return f.apply(lambda x: Series(1,index=x)).fillna(0)
with open('out.csv','w') as f:
f.write(df.ix[[]].to_csv()) #write the header
for chunk in df.groupby(np.arange(len(df))//chunksize):
f.write(f(chunk).to_csv(header=None))
If memory use is the issue, it seems like a sparse matrix solution would be better. Pandas doesn't really have sparse matrix support, but you could use scipy.sparse like this:
data = pd.Series([list('ABC'),list('DEF'),list('ABEF')])
from scipy.sparse import csr_matrix
cols, ind = np.unique(np.concatenate(data), return_inverse=True)
indptr = np.cumsum([0] + list(map(len, data)))
vals = np.ones_like(ind)
M = csr_matrix((vals, ind, indptr))
This sparse matrix now contains the same data as the pandas solution, but the zeros are not explicitly stored. We can confirm this by converting the sparse matrix to a dataframe:
>>> pd.DataFrame(M.toarray(), columns=cols)
A B C D E F
0 1 1 1 0 0 0
1 0 0 0 1 1 1
2 1 1 0 0 1 1
Depending on what you're doing with the data from here, having it in a sparse form may help solve your problem without using excessive memory.
I have a list of integers that looks like this when performing the print command:
0
1
0
1
1
I want to create a list out of it: [0, 1, 0, 1, 1]
How to do it?
I cannot find such information anywhere :(
s = """0
1
0
1
1"""
integers = map(int, s.splitlines())
idea taken from https://stackoverflow.com/a/27171335/1644901
With given permutation 1...n for example 5 3 4 1 2
how to find all ascending subsequences of length 3 in linear time ?
Is it possible to find other ascending subsequences of length X ? X
I don't have idea how to solve it in linear time.
Do you need the actual ascending sequences? Or just the number of ascending subsequences?
It isn't possible to generate them all in less than the time it takes to list them. Which, as has been pointed out, is O(NX / (X-1)!). (There is a possibly unexpected factor of X because it takes time O(X) to list a data structure of size X.) The obvious recursive search for them scales not far from that.
However counting them can be done in time O(X * N2) if you use dynamic programming. Here is Python for that.
counts = []
answer = 0
for i in range(len(perm)):
inner_counts = [0 for k in range(X)]
inner_counts[0] = 1
for j in range(i):
if perm[j] < perm[i]:
for k in range(1, X):
inner_counts[k] += counts[j][k-1]
counts.add(inner_counts)
answer += inner_counts[-1]
For your example 3 5 1 2 4 6 and X = 3 you will wind up with:
counts = [
[1, 0, 0],
[1, 1, 0],
[1, 0, 0],
[1, 1, 0],
[1, 3, 1],
[1, 5, 5]
]
answer = 6
(You only found 5 above, the missing one is 2 4 6.)
It isn't hard to extend this answer to create a data structure that makes it easy to list them directly, to find a random one, etc.
You can't find all ascending subsequences on linear time because there may be much more subsequences than that.
For instance in a sorted original sequence all subsets are increasing subsequences, so a sorted sequence of of length N (1,2,...,N) has N choose k = n!/(n-k)!k! increasing subsequences of length k.