Related
What I want to do is something like this:
import torch
a = torch.arange(120).reshape(2, 3, 4, 5)
b = torch.cat(list(a), dim=2)
I want to know:
I have to convert tensor to a list, will this cause performance not good?
Even performance is OK, can I do this just with tensor?
You want to:
Reduce the number of copies: in this specific scenario, copies need to be made since we are rearranging the layout of our underlying data.
Reduce or remove any torch.Tensor -> non-torch.Tensor conversions: this will be a pain point when working with a GPU since you're transferring data in and out of the device.
You can perform the same operation by permuting the axes such that axis=0 goes to axis=-2 (the before the last axis), then flattening the last two axes:
>>> a.permute(1,2,0,3).flatten(-2)
tensor([[[ 0, 1, 2, 3, 4, 60, 61, 62, 63, 64],
[ 5, 6, 7, 8, 9, 65, 66, 67, 68, 69],
[ 10, 11, 12, 13, 14, 70, 71, 72, 73, 74],
[ 15, 16, 17, 18, 19, 75, 76, 77, 78, 79]],
[[ 20, 21, 22, 23, 24, 80, 81, 82, 83, 84],
[ 25, 26, 27, 28, 29, 85, 86, 87, 88, 89],
[ 30, 31, 32, 33, 34, 90, 91, 92, 93, 94],
[ 35, 36, 37, 38, 39, 95, 96, 97, 98, 99]],
[[ 40, 41, 42, 43, 44, 100, 101, 102, 103, 104],
[ 45, 46, 47, 48, 49, 105, 106, 107, 108, 109],
[ 50, 51, 52, 53, 54, 110, 111, 112, 113, 114],
[ 55, 56, 57, 58, 59, 115, 116, 117, 118, 119]]])
I'm already new to Haskell with Cryptol dialect, also I'm irritated because I can't use loops...
I would like to implement array like this one... Initialization Matrix but I have the only idea to get every 4th element starting by [0] index and load this new list to S0. Similarly starting by 1 index of the list and load to new S1 array every 4th element.
Cryptol's type system is designed so that these sorts of bit splits that you find in crypto algorithms is almost trivial to express. And in fact, lack of loops is a plus, not a detriment, once you get used to that style.
There could be multiple ways to code your "initialization" code. But here's how I would go about it:
load : {a} [128][a] -> [4][32][a]
load(elts) = reverse (transpose cols)
where cols : [32][4][a]
cols = split elts
Note that the type here is more general than what you need, but it allows for easier testing. Here's what I get at the cryptol prompt:
Main> :set base=10
Main> load [127, 126 .. 0]
Showing a specific instance of polymorphic result:
* Using '7' for type argument 'a' of 'Main::load'
[[124, 120, 116, 112, 108, 104, 100, 96, 92, 88, 84, 80, 76, 72,
68, 64, 60, 56, 52, 48, 44, 40, 36, 32, 28, 24, 20, 16, 12, 8, 4,
0],
[125, 121, 117, 113, 109, 105, 101, 97, 93, 89, 85, 81, 77, 73, 69,
65, 61, 57, 53, 49, 45, 41, 37, 33, 29, 25, 21, 17, 13, 9, 5, 1],
[126, 122, 118, 114, 110, 106, 102, 98, 94, 90, 86, 82, 78, 74, 70,
66, 62, 58, 54, 50, 46, 42, 38, 34, 30, 26, 22, 18, 14, 10, 6, 2],
[127, 123, 119, 115, 111, 107, 103, 99, 95, 91, 87, 83, 79, 75, 71,
67, 63, 59, 55, 51, 47, 43, 39, 35, 31, 27, 23, 19, 15, 11, 7, 3]]
This is a little hard to read, so here it is formatted:
[[124, 120, 116, 112, 108, 104, 100, 96, 92, 88, 84, 80, 76, 72, 68, 64, 60, 56, 52, 48, 44, 40, 36, 32, 28, 24, 20, 16, 12, 8, 4, 0],
[125, 121, 117, 113, 109, 105, 101, 97, 93, 89, 85, 81, 77, 73, 69, 65, 61, 57, 53, 49, 45, 41, 37, 33, 29, 25, 21, 17, 13, 9, 5, 1],
[126, 122, 118, 114, 110, 106, 102, 98, 94, 90, 86, 82, 78, 74, 70, 66, 62, 58, 54, 50, 46, 42, 38, 34, 30, 26, 22, 18, 14, 10, 6, 2],
[127, 123, 119, 115, 111, 107, 103, 99, 95, 91, 87, 83, 79, 75, 71, 67, 63, 59, 55, 51, 47, 43, 39, 35, 31, 27, 23, 19, 15, 11, 7, 3]]
which is exactly the structure you wanted. Now we can specialize:
loadBits : [128] -> [4][32]
loadBits(vector) = reverse (transpose cols)
where cols : [32][4]
cols = split vector
Note that the code is exactly the same as before, we just made it specific to the type you wanted.
Hope this gets you started!
I'm going to have trouble verbalizing this, so I'll just include some code and describe what I need to do afterward instead:
import pandas as pd
start = [1, 5, 102, 300]
end = [3, 90, 150, 304]
df1 = pd.DataFrame({'start':start, 'end':end})
df2 = pd.DataFrame([0, 3, 10, 14, 100, 101, 102, 113, 300])
df2.columns=["bp_pos"]
So, for every start-end pair, I need to check if any of my values in df2 fall within that range. If they do, I need to exclude that index from df2.
I have this working. The problem is that my I have 22 df1s, and each one is a few million rows, and my df2 is also a few million rows. This gets really slow with my solution, which looks something like:
for idx, row in df1.iterrows():
df2 = df2.loc[~((row['start'] <= df2['bp_pos']) &
(row['end'] >= df2['bp_pos']))]
I'm hoping to get a faster solution than what I have above. Are there faster solutions you can think of? I'm using Python 2.7.12, and Pandas/NumPy solutions accepted. (Sorry if the code above doesn't actually work- I don't have Python on the PC I'm posting from)
I would get an array of your values that you want to exclude, then use normal pandas indexing:
vals = np.concatenate([np.arange(x,y) for x,y in zip(start,end)])
df2[~df2['bp_pos'].isin(vals)]
bp_pos
0 0
1 3
4 100
5 101
Just as further explanation: vals ends up being an array of all your ranges:
>>> vals
array([ 1, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28,
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41,
42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,
55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 102, 103, 104, 105,
106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118,
119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131,
132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144,
145, 146, 147, 148, 149, 300, 301, 302, 303])
I have been using _list = np.unique(np.stack(np.meshgrid(*_load), -1).reshape(-1, len(_load)), axis=0) to generate a list of all possible combinations, which worked fine on list of list that look like,
[[1, 2, 3], [8, 4, 5, 6, 7], [8, 4, 5, 6, 7], [8, 4, 5, 6, 7], [8, 4, 5, 6, 7], [9, 10, 11, 12, 13, 14], [9, 10, 11, 12, 13, 14], [9, 10, 11, 12, 13, 14], [9, 10, 11, 12, 13, 14], [9, 10, 11, 12, 13, 14]]
Howeveer, if I want to find all possibles on something like
[[3, 4, 69, 134, 39, 42, 46, 15, 99, 20, 120, 123, 93], [130, 5, 7, 139, 14, 143, 33, 48, 50, 51, 52, 53, 55, 58, 60, 62, 67, 84, 85, 87, 91, 105, 106, 107, 111, 121, 127], [130, 5, 7, 139, 14, 143, 33, 48, 50, 51, 52, 53, 55, 58, 60, 62, 67, 84, 85, 87, 91, 105, 106, 107, 111, 121, 127], [1, 132, 133, 135, 138, 11, 12, 142, 16, 147, 24, 25, 27, 28, 29, 30, 31, 35, 36, 40, 47, 54, 57, 63, 66, 68, 70, 71, 72, 140, 76, 81, 83, 88, 90, 92, 144, 98, 100, 103, 109, 110, 112, 114, 118, 122], [1, 132, 133, 135, 138, 11, 12, 142, 16, 147, 24, 25, 27, 28, 29, 30, 31, 35, 36, 40, 47, 54, 57, 63, 66, 68, 70, 71, 72, 140, 76, 81, 83, 88, 90, 92, 144, 98, 100, 103, 109, 110, 112, 114, 118, 122], [128, 129, 2, 131, 6, 8, 9, 10, 13, 141, 17, 18, 19, 21, 22, 23, 26, 32, 34, 37, 38, 41, 43, 44, 45, 49, 137, 56, 59, 61, 64, 65, 73, 74, 75, 77, 78, 79, 80, 82, 86, 89, 94, 95, 96, 97, 101, 102, 145, 104, 108, 146, 113, 115, 116, 117, 119, 136, 124, 125, 126], [128, 129, 2, 131, 6, 8, 9, 10, 13, 141, 17, 18, 19, 21, 22, 23, 26, 32, 34, 37, 38, 41, 43, 44, 45, 49, 137, 56, 59, 61, 64, 65, 73, 74, 75, 77, 78, 79, 80, 82, 86, 89, 94, 95, 96, 97, 101, 102, 145, 104, 108, 146, 113, 115, 116, 117, 119, 136, 124, 125, 126], [128, 129, 2, 131, 6, 8, 9, 10, 13, 141, 17, 18, 19, 21, 22, 23, 26, 32, 34, 37, 38, 41, 43, 44, 45, 49, 137, 56, 59, 61, 64, 65, 73, 74, 75, 77, 78, 79, 80, 82, 86, 89, 94, 95, 96, 97, 101, 102, 145, 104, 108, 146, 113, 115, 116, 117, 119, 136, 124, 125, 126], [128, 129, 2, 131, 6, 8, 9, 10, 13, 141, 17, 18, 19, 21, 22, 23, 26, 32, 34, 37, 38, 41, 43, 44, 45, 49, 137, 56, 59, 61, 64, 65, 73, 74, 75, 77, 78, 79, 80, 82, 86, 89, 94, 95, 96, 97, 101, 102, 145, 104, 108, 146, 113, 115, 116, 117, 119, 136, 124, 125, 126]]
I get a MemoryError in python, obviously I need to change my approach, any ideas? I was thinking I would need to end up writing the intermittent events to file, but I don't know how to get these built ins to do that.
Your algorithm is fine, but your result is unrepresentable.
Note in particular that your np.unique is useless, because each load[i] contains no duplicates, so the size of the result is the product of the lengths of the lists times the number of lists
>>> np.prod([len(i) for i in second_example], dtype=np.int64) * 9
2498897217529908
Assuming optimistically that each integer is a uint8, that's 2.2 PiB (Pebibytes), which far exceeds current RAM configurations.
Even if you don't try to put the whole result in memory at once, even iterating over this is going to take a long time - assuming a generous 4GHz processor and a single clock cycle per result, you're looking at longer than a week to finish
I have a data and want to get the results stored in a python dictionary as below:
#example mydicdata[key] = values in the brackets
#example mydicdata[0] = [""]
#example mydicdata[7] = ['0', '1', '2', '3', '4', '5', '6', '7'... ]
import re
data = "{0=[], 1=[], 2=[], 3=[], 4=[], 5=[], 6=[], 7=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103], 8=[], 9=[], 10=[], 11=[], 12=[], 13=[], 14=[], 15=[], 16=[], 17=[], 18=[], 19=[], 20=[], 21=[], 22=[], 23=[], 24=[], 25=[], 26=[], 27=[], 28=[], 29=[], 30=[], 31=[], 32=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103], 33=[], 34=[], 35=[], 36=[], 37=[], 38=[], 39=[], 40=[], 41=[], 42=[], 43=[], 44=[], 45=[], 46=[], 47=[], 48=[], 49=[], 50=[], 51=[], 52=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103], 53=[], 54=[], 55=[], 56=[], 57=[], 58=[], 59=[], 60=[], 61=[], 62=[], 63=[], 64=[], 65=[], 66=[], 67=[], 68=[], 69=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103], 70=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103], 71=[], 72=[], 73=[], 74=[], 75=[], 76=[], 77=[], 78=[], 79=[], 80=[], 81=[], 82=[], 83=[], 84=[], 85=[], 86=[], 87=[], 88=[88], 89=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103], 90=[], 91=[], 92=[], 93=[], 94=[], 95=[], 96=[], 97=[], 98=[], 99=[], 100=[], 101=[], 102=[], 103=[]}"
data = data[1:-1]
temp = re.split(r'[\d=/[[0-9?,/d?/]]]',data)
I think I'm almost there, but the result I got here is a big chunk of list 0=[]....,103[].
Could you please give me the guideline?
Thank you.
this is a way to do it without regex: your data string is almost python (just need to replace the = by :); then ast.literal_eval can turn the string into a python dictionary:
from ast import literal_eval
data = '''{ 0=[], 1=[], 2=[], 3=[], 4=[], 5=[], 6=[],
7=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46,
47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,
62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 89, 90, 91, 92,
93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103], 8=[], 9=[],
10=[], 11=[], 12=[], 13=[], 14=[], 15=[], 16=[], 17=[], 18=[]
# more data
}'''
dct = literal_eval(data.replace('=', ':'))
print dct[0] # -> []
print dct[7] # -> [0, 1, 2, 3, 4,...,100, 101, 102, 103]