Related
I'm trying to implement the grad-camm algorithm:
https://arxiv.org/pdf/1610.02391.pdf
My arguments are:
activations: Tensor with shape torch.Size([1, 512, 14, 14])
alpha values : Tensor with shape torch.Size([512])
I want to multiply each activation (in dimension index 1 (sized 512)) in each corresponding alpha value: for example if the i'th index out of the 512 in the activation is 4 and the i'th alpha value is 5, then my new i'th activation would be 20.
The shape of the output should be torch.Size([1, 512, 14, 14])
Assuming the desired output is of shape (1, 512, 14, 14).
You can achieve this with torch.einsum:
torch.einsum('nchw,c->nchw', x, y)
Or with a simple dot product, but you will first need to add a couple of additional dimensions on y:
x*y[None, :, None, None]
Here's an example with x.shape = (1, 4, 2, 2) and y = (4,):
>>> x = torch.arange(16).reshape(1, 4, 2, 2)
tensor([[[[ 0, 1],
[ 2, 3]],
[[ 4, 5],
[ 6, 7]],
[[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15]]]])
>>> y = torch.arange(1, 5)
tensor([1, 2, 3, 4])
>>> x*y[None, :, None, None]
tensor([[[[ 0, 1],
[ 2, 3]],
[[ 8, 10],
[12, 14]],
[[24, 27],
[30, 33]],
[[48, 52],
[56, 60]]]])
Given list of dictionary in python
my_list=[{'id':0,'name':'cube0_cluster0','member_ids': [429, 432, 435]},
{'id': 1,'name': 'cube0_cluster1','member_ids': [0, 4, 5]},
{'id':0,'name':'cube1_cluster1','member_ids': [4, 706, 800]}]
I want to print all member_ids for cube{ }_cluster1
My expected output is to print [0,4,5,706,800]
any help would be highly appreciated
I have tried it
for k in my_list:
for j in range(len(my_list)):
if k['name']=='cube{}_cluster1'.format(j):
print(k['member_ids'])
But I am getting two separate outputs as [0,4,5] and [4,706,800]
Try this one.
import re
member_ids = []
for di in my_list:
if re.match('cube\d_cluster1', di['name']):
member_ids += di['member_ids']
print(member_ids)
You can also use list comprehension.
my_list=[{'id':0,'name':'cube0_cluster0','member_ids': [429, 432, 435]},
{'id': 1,'name': 'cube0_cluster1','member_ids': [0, 4, 5]},
{'id':0,'name':'cube1_cluster1','member_ids': [4, 706, 800]}]
res = [j for i in my_list for j in i['member_ids'] if "cluster1" in i["name"]]
print (res) # return list
print (set(res)) # to return distinct data
# Result
# [0, 4, 5, 4, 706, 800]
# {0, 800, 706, 4, 5}
I hope this helps and counts!
I am trying to vertically concatenate two Dask DataFrames
I have the following Dask DataFrame:
d = [
['A','B','C','D','E','F'],
[1, 4, 8, 1, 3, 5],
[6, 6, 2, 2, 0, 0],
[9, 4, 5, 0, 6, 35],
[0, 1, 7, 10, 9, 4],
[0, 7, 2, 6, 1, 2]
]
df = pd.DataFrame(d[1:], columns=d[0])
ddf = dd.from_pandas(df, npartitions=5)
Here is the data as a Pandas DataFrame
A B C D E F
0 1 4 8 1 3 5
1 6 6 2 2 0 0
2 9 4 5 0 6 35
3 0 1 7 10 9 4
4 0 7 2 6 1 2
Here is the Dask DataFrame
Dask DataFrame Structure:
A B C D E F
npartitions=4
0 int64 int64 int64 int64 int64 int64
1 ... ... ... ... ... ...
2 ... ... ... ... ... ...
3 ... ... ... ... ... ...
4 ... ... ... ... ... ...
Dask Name: from_pandas, 4 tasks
I am trying to concatenate 2 Dask DataFrames vertically:
ddf_i = ddf + 11.5
dd.concat([ddf,ddf_i],axis=0)
but I get this error:
Traceback (most recent call last):
...
File "...", line 572, in concat
raise ValueError('All inputs have known divisions which cannot '
ValueError: All inputs have known divisions which cannot be concatenated
in order. Specify interleave_partitions=True to ignore order
However, if I try:
dd.concat([ddf,ddf_i],axis=0,interleave_partitions=True)
then it appears to be working. Is there a problem with setting this to True (in terms of performance - speed)? Or is there another way to vertically 2 concatenate Dask DataFrames?
If you inspect the divisions of the dataframe ddf.divisions, you will find, assuming one partition, that it has the edges of the index there: (0, 4). This is useful to dask, as it knows when you do some operation on the data, not to use a partition not including required index values. This is also why some dask operations are much faster when the index is appropriate for the job.
When you concatenate, the second dataframe has the same index as the first. Concatenation would work without interleaving if the values of the index had different ranges in the two partitions.
mdurant's answer is correct and this answer elaborate with MCVE code snippets using Dask v2021.08.1. Examples make it easier to understand divisions and interleaving.
Vertically concatenating DataFrames
Create two DataFrames, concatenate them, and view the results.
df = pd.DataFrame(
{"nums": [1, 2, 3, 4, 5, 6], "letters": ["a", "b", "c", "d", "e", "f"]}
)
ddf1 = dd.from_pandas(df, npartitions=2)
df = pd.DataFrame({"nums": [88, 99], "letters": ["xx", "yy"]})
ddf2 = dd.from_pandas(df, npartitions=1)
ddf3 = dd.concat([ddf1, ddf2])
print(ddf3.compute())
nums letters
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
5 6 f
0 88 xx
1 99 yy
Divisions metadata when vertically concatenating
Create two DataFrames, concatenate them, and illustrate that sometimes this operation will cause divisions metadata to be lost.
def print_partitions(ddf):
for i in range(ddf.npartitions):
print(ddf.partitions[i].compute())
df = pd.DataFrame(
{"nums": [1, 2, 3, 4, 5, 6], "letters": ["a", "b", "c", "d", "e", "f"]}
)
ddf1 = dd.from_pandas(df, npartitions=2)
ddf1.divisions # (0, 3, 5)
df = pd.DataFrame({"nums": [88, 99], "letters": ["xx", "yy"]})
ddf2 = dd.from_pandas(df, npartitions=1)
ddf2.divisions # (0, 1)
ddf3 = dd.concat([ddf1, ddf2])
ddf3.divisions # (None, None, None, None)
Set interleave_partitions=True to avoid losing the divisions metadata.
ddf3_interleave = dd.concat([ddf1, ddf2], interleave_partitions=True)
ddf3_interleave.divisions # (0, 1, 3, 5)
When interleaving isn't necessary
Create two DataFrames without overlapping divisions, concatenate them, and confirm that the divisions metadata is not lost:
df = pd.DataFrame(
{"nums": [1, 2, 3, 4], "letters": ["a", "b", "c", "d"], "some_index": [4, 5, 6, 7]}
)
ddf1 = dd.from_pandas(df, npartitions=2)
ddf1 = ddf1.set_index("some_index")
df = pd.DataFrame({"nums": [88, 99], "letters": ["xx", "yy"], "some_index": [10, 20]})
ddf2 = dd.from_pandas(df, npartitions=1)
ddf2 = ddf2.set_index("some_index")
ddf3 = dd.concat([ddf1, ddf2])
ddf3.divisions # (4, 6, 10, 20)
I wrote a blog post to explain this in more detail. Let me know if you'd like the link.
I have a list with 10 records, and each record has one or more elements with 3 categories like below:
list = [('0.4', 2, 'doc4.txt'),('0.04', 13, 'doc4.txt'), ('0.5', 4, 'doc4.txt')]
[('0.5', 6, 'doc3.txt'),('0.04', 13, 'doc3.txt'), ('0.5', 4, 'doc3.txt')]
[('0.6', 8, 'doc2.txt')]
[('0.4', 2, 'doc5.txt'), ('1.0', 7, 'doc5.txt')]
[('0.2', 2, 'doc6.txt'), ('0.4', 2, 'doc6.txt'),('0.8', 2, 'doc6.txt'), ('0.34', 5, 'doc6.txt'),('0.76', 4, 'doc6.txt'), ('0.5', 3, 'doc6.txt')]
[('0.3', 7, 'doc9.txt')]
[('0.1', 8, 'doc12.txt')]
[('0.3', 9, 'doc11.txt'),('1.0', 8, 'doc11.txt')]
[('0.9', 7, 'doc22.txt')]
[('0.3', 7, 'doc24.txt')]
You many notice the third category of every record has the same text for each record. There are 10 categories as the list consists of 10 records.
According to the structure of the list:
For example, [('0.6', 8, 'doc2.txt')]
First element, '0.6' represents X-axis value in the range of [0.1 -> 1.0]
Second element of an integer represents Y-axis value in graph
Third element, 'doc2.txt' represents the Category name in graph
The list should be plotted as the image below,
I've been trying with several approaches, but still couldn't figure that out
>>> plt.scatter(*zip(*list))
>>> plt.xlabel('X-Axis')
>>> plt.ylabel('Y-Axis')
>>> plt.show()
I think you can just keep the list as it is and iterate over it. You'd then produce a scatter plot for each sublist in the outer list, as the items from the sublist should share the same marker, color and legend label.
import matplotlib.pyplot as plt
#don't call a variable "list" or "print" or any other python command's name
liste=[[('0.4', 2, 'doc4.txt'),('0.04', 13, 'doc4.txt'), ('0.5', 4, 'doc4.txt')],
[('0.5', 6, 'doc3.txt'),('0.04', 13, 'doc3.txt'), ('0.5', 4, 'doc3.txt')],
[('0.6', 8, 'doc2.txt')],
[('0.4', 2, 'doc5.txt'), ('1.0', 7, 'doc5.txt')],
[('0.2', 2, 'doc6.txt'), ('0.4', 2, 'doc6.txt'),('0.8', 2, 'doc6.txt'), ('0.34', 5, 'doc6.txt'),('0.76', 4, 'doc6.txt'), ('0.5', 3, 'doc6.txt')],
[('0.3', 7, 'doc9.txt')],
[('0.1', 8, 'doc12.txt')],
[('0.3', 9, 'doc11.txt'),('1.0', 8, 'doc11.txt')],
[('0.9', 7, 'doc22.txt')],
[('0.3', 7, 'doc24.txt')]]
markers=[ur"$\u25A1$", ur"$\u25A0$", ur"$\u25B2$", ur"$\u25E9$"]
colors= ["k", "crimson", "#112b77"]
fig, ax = plt.subplots()
for i, l in enumerate(liste):
x,y,cat = zip(*l)
ax.scatter(list(map(float, x)),y, s=64,c=colors[(i//4)%3],
marker=markers[i%4], label=cat[0])
ax.legend(bbox_to_anchor=(1.01,1), borderaxespad=0)
plt.subplots_adjust(left=0.1,right=0.8)
plt.show()
There are multiple issues. You assignment of list makes no sense (presumably you forgot some parentheses). Also, you really shouldn't reuse built-in names like "list". You should not represent floats as strings (your x coordinates). You cannot simply unpack a list into plt.scatter and hope that magically all of these issues work themselves out.
Below some code how to properly pass your data to scatter (I use plot instead of scatter as you can pass plot proper colour names).
import numpy as np
import matplotlib.pyplot as plt
# 'list' is a bad name for a variable as it overwrites the list() built-in function
# -> rename to data
data = [
[('0.4', 2, 'doc4.txt'),('0.04', 13, 'doc4.txt'), ('0.5', 4, 'doc4.txt')],
[('0.5', 6, 'doc3.txt'),('0.04', 13, 'doc3.txt'), ('0.5', 4, 'doc3.txt')],
[('0.6', 8, 'doc2.txt')],
[('0.4', 2, 'doc5.txt'), ('1.0', 7, 'doc5.txt')],
[('0.2', 2, 'doc6.txt'), ('0.4', 2, 'doc6.txt'),('0.8', 2, 'doc6.txt'), ('0.34', 5, 'doc6.txt'),('0.76', 4, 'doc6.txt'), ('0.5', 3, 'doc6.txt')],
[('0.3', 7, 'doc9.txt')],
[('0.1', 8, 'doc12.txt')],
[('0.3', 9, 'doc11.txt'),('1.0', 8, 'doc11.txt')],
[('0.9', 7, 'doc22.txt')],
[('0.3', 7, 'doc24.txt')]
]
# flatten nested list
flat = [item for sublist in data for item in sublist]
# convert strings to numbers
numeric = [(float(x), y, label) for (x, y, label) in flat]
# create a dictionary that maps a label to a set of x,y coordinates
data = dict()
for (x, y, label) in numeric:
if label in data:
data[label].append((x,y))
else:
data[label] = [(x,y)]
# initialise figure
fig, ax = plt.subplots(1,1)
colors = ['blue', 'red', 'yellow', 'green', 'orange', 'brown', 'violet', 'magenta', 'white', 'black']
# populate figure
for color, (label, xy) in zip(colors, data.iteritems()):
x, y = np.array(xy).T
ax.plot(x, y, 'o', label=label, color=color)
ax.set_xlim(0, 1.1)
ax.set_ylim(0, 16)
ax.legend(numpoints=1)
plt.show()
I am new in using python. My problem might seems easy but unfortunately I could not find a solution for it. I have a set of images in Geotiff format which are at the same size, their pixel values range between 0 to 5 and their non values are -9999. I would like to do kind of image stacking using Numpy and Gdal. I am looking for an stacking algorithm in which those pixels of each image that have a value between 0 to 5 are used and the no data values are not used in computing the average. For example if I have 30 images and for two of them the value at the index Image[20,20] are 2 & 3 respectively and for the rest of images it is -9999 at this index. I want the single band output image to be 2.5 at this index. I am wondering if anyone knows the way to do it?
Any suggestions or hints are highly appreciated.
Edit:
let me clarify it a bit more. Here is a sample :
import numpy as np
myArray = np.random.randint(5,size=(3,3,3))
myArray [1,1,1] = -9999
myArray
>> array([[[ 0, 2, 1],
[ 1, 4, 1],
[ 1, 1, 2]],
[[ 4, 2, 0],
[ 3, -9999, 0],
[ 1, 0, 3]],
[[ 2, 0, 3],
[ 1, 3, 4],
[ 2, 4, 3]]])
suppose that myArray is an ndarray which contains three images as follow:
Image_01 = myArray[0]
Image_02 = myArray[1]
Image_03 = myArray[2]
the final stacked image is :
stackedImage = myArray.mean(axis=0)
>> array([[ 2.00000000e+00, 1.33333333e+00, 1.33333333e+00],
[ 1.66666667e+00, -3.33066667e+03, 1.66666667e+00],
[ 1.33333333e+00, 1.66666667e+00, 2.66666667e+00]])
But I want it to be this :
array([[ 2.00000000e+00, 1.33333333e+00, 1.33333333e+00],
[ 1.66666667e+00, 3.5, 1.66666667e+00],
[ 1.33333333e+00, 1.66666667e+00, 2.66666667e+00]])
Masked arrays are a good way to deal with missing or invalid values. Masked arrays have a .data attribute, which contains the numerical value for each element, and a .mask attribute that specifies which values should be considered 'invalid' and ignored.
Here's a full example using your data:
import numpy as np
# your example data, with a bad value at [1, 1, 1]
M = np.array([[[ 0, 2, 1],
[ 1, 4, 1],
[ 1, 1, 2]],
[[ 4, 2, 0],
[ 3, -9999, 0],
[ 1, 0, 3]],
[[ 2, 0, 3],
[ 1, 3, 4],
[ 2, 4, 3]]])
# create a masked array where all of the values in `M` that are equal to
# -9999 are masked
masked_M = np.ma.masked_equal(M, -9999)
# take the mean over the first axis
masked_mean = masked_M.mean(0)
# `masked_mean` is another `np.ma.masked_array`, whose `.data` attribute
# contains the result you're looking for
print masked_mean.data
# [[ 2. 1.33333333 1.33333333]
# [ 1.66666667 3.5 1.66666667]
# [ 1.33333333 1.66666667 2.66666667]]