I have converted grid1 and grid2 into arrays and using following function which iterates through table and should return corresponding value form table when grid1 and grid2 values are matched. But somehow the final output contain only 4 integer values which isn't correct. Any suggestion what is possibly wrong here?
def grid(grid1,grid2):
table = {(10,1):61,(10,2):75,(10,3):83,(10,4):87,
(11,1):54,(11,2):70,(11,3):80,(11,4):85,
(12,1):61,(12,2):75,(12,3):83,(12,4):87,
(13,1):77,(13,2):85,(13,3):90,(13,4):92,}
grid3 = np.zeros(grid1.shape, dtype = np.int)
for k,v in table.iteritems():
grid3[[grid1 == k[0]] and [grid2 == k[1]]] = v
return grid3
I think what's happening is that the assignment to the variables "k" and "v" not done using "deepcopy". This means the assignment is just to the variables and not their values. For example, when the value of "k" changes on subsequent iterations, all previous "gridx" assignments now reflect the new/current status of "k".
Related
Sorry for this newbie questions.
I have a dict like this:
{'id':'1', 'Book':'21', 'Member':'3', 'Title':'Chameleon vol. 2',
'Author':'Jason Bridge'}
I want to convert that dict to:
{'id':1, 'Book':21, 'Member':3, 'Title':'Chameleon vol. 2',
'Author':'Jason Bridge'}
I need to convert only the first 3 key value to int
Thanks in advance
dict1 = {'id':'1', 'Book':'21', 'Member':'3', 'Title':'Chameleon vol. 2', 'Author':'Jason Bridge'}
y_dict = dict(list(dict1.items())[:3])
print(y_dict) #dict sliced to the first 3 items that their values will be converted
z_dict = dict(list(dict1.items())[3:])
print(z_dict) #the rest of item that their values will not be converted to integer
x_dict = {k:int(v) for k, v in y_dict.items()}
print(x_dict) # dict values converted to integer
w_dict = {**x_dict, **z_dict}
print(w_dict) # merge of first 3 items with values as integer and the rest of the dict intact
w_dict is the result you are looking for.
Let's say your dict stored in "book_data" variable.
What means first 3 keys?
If you have static keys, you can set manually for it:
for key in ['id', 'Book', 'Member']:
book_data[key] = int(book_data[key])
If you have mutable dictionary, you may get it with it:
for key, val in list(book_data.items())[:3]:
book_data[key] = int(val)
method items help you avoid iterate over values.
I'm trying to make a for loop count up and print values from defined variables
Value1 = "X"
Value2 = "Y"
for x in range (1, 2):
print Valuex
So I want this print value1 then value2
What is the syntax for this?
Use an array (also don't start variable names with capital letters):
value = [ "X", "Y" ]
for x in range (0, 2):
print value[x]
Also you probably wanted two elements in your range.
A simple approach can be to put all values in list and then loop. e.g
values = ["X", "Y"]
for x in values:
print x
This will automatically avoid accessing invalid index. while using range you have to be careful to avoid exceptions.
1.Three lists a, b and c. If a[index] is in b[index] then get the element in list c corresponding to list b[index]. That is if a[0]=b[1],get c[1]:
a = ['ASAP','WTHK']
b = ['ABCD','ASAP','EFGH','HIJK']
c = ['1','2','3','4','5']
I hope this is what you were looking for. You can add the b and the corresponding c value to the dictionary in a loop if the a array contains the b value. After that you can get the c value by a value as key like in the code below.
a = ['ASAP','WTHK']
# b c
dictionary_trans = {'ASAP' : '1'}
dictionary_trans = {'WTHK' : '1337'}
# etc. put all b values existing in a to the dict
# with thier coresponding c values.
key = a[0]
c_value = dictionary_trans.get(key)
print c_value
My python skills are very limited, but I think I would try to solve the problem this way.
This solution could crash if you use an a value which is not contained in the dictionary, so you need to implement some logic to handle missing relations between a and c, like insert dummy entries to the dictionary or so.
I have two columns in a Pandas DataFrame that has datetime as its index. The two column contain data measuring the same parameter but neither column is complete (some row have no data at all, some rows have data in both column and other data on in column 'a' or 'b').
I've written the following code to find gaps in columns, generate a list of indices of dates where these gaps appear and use this list to find and replace missing data. However I get a KeyError: Not in index on line 3, which I don't understand because the keys I'm using to index came from the DataFrame itself. Could somebody explain why this is happening and what I can do to fix it? Here's the code:
def merge_func(df):
null_index = df[(df['DOC_mg/L'].isnull() == False) & (df['TOC_mg/L'].isnull() == True)].index
df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']
notnull_index = df[(df['DOC_mg/L'].isnull() == True) & (df['TOC_mg/L'].isnull() == False)].index
df['DOC_mg/L'][notnull_index] = df[notnull_index]['TOC_mg/L']
df.insert(len(df.columns), 'Mean_mg/L', 0.0)
df['Mean_mg/L'] = (df['DOC_mg/L'] + df['TOC_mg/L']) / 2
return df
merge_func(sve)
Whenever you are considering performing assignment then you should use .loc:
df.loc[null_index,'TOC_mg/L']=df['DOC_mg/L']
The error in your original code is the ordering of the subscript values for the index lookup:
df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']
will produce an index error, I get the error on a toy dataset: IndexError: indices are out-of-bounds
If you changed the order to this it would probably work:
df['TOC_mg/L'][null_index] = df['DOC_mg/L'][null_index]
However, this is chained assignment and should be avoided, see the online docs
So you should use loc:
df.loc[null_index,'TOC_mg/L']=df['DOC_mg/L']
df.loc[notnull_index, 'DOC_mg/L'] = df['TOC_mg/L']
note that it is not necessary to use the same index for the rhs as it will align correctly
I have time series data in two separate DataFrame columns which refer to the same parameter but are of differing lengths.
On dates where data only exist in one column, I'd like this value to be placed in my new column. On dates where there are entries for both columns, I'd like to have the mean value. (I'd like to join using the index, which is a datetime value)
Could somebody suggest a way that I could combine my two columns? Thanks.
Edit2: I written some code which should merge the data from both of my column, but I get a KeyError when I try to set the new values using my index generated from rows where my first df has values but my second df doesn't. Here's the code:
def merge_func(df):
null_index = df[(df['DOC_mg/L'].isnull() == False) & (df['TOC_mg/L'].isnull() == True)].index
df['TOC_mg/L'][null_index] = df[null_index]['DOC_mg/L']
notnull_index = df[(df['DOC_mg/L'].isnull() == True) & (df['TOC_mg/L'].isnull() == False)].index
df['DOC_mg/L'][notnull_index] = df[notnull_index]['TOC_mg/L']
df.insert(len(df.columns), 'Mean_mg/L', 0.0)
df['Mean_mg/L'] = (df['DOC_mg/L'] + df['TOC_mg/L']) / 2
return df
merge_func(sve)
And here's the error:
KeyError: "['2004-01-14T01:00:00.000000000+0100' '2004-03-04T01:00:00.000000000+0100'\n '2004-03-30T02:00:00.000000000+0200' '2004-04-12T02:00:00.000000000+0200'\n '2004-04-15T02:00:00.000000000+0200' '2004-04-17T02:00:00.000000000+0200'\n '2004-04-19T02:00:00.000000000+0200' '2004-04-20T02:00:00.000000000+0200'\n '2004-04-22T02:00:00.000000000+0200' '2004-04-26T02:00:00.000000000+0200'\n '2004-04-28T02:00:00.000000000+0200' '2004-04-30T02:00:00.000000000+0200'\n '2004-05-05T02:00:00.000000000+0200' '2004-05-07T02:00:00.000000000+0200'\n '2004-05-10T02:00:00.000000000+0200' '2004-05-13T02:00:00.000000000+0200'\n '2004-05-17T02:00:00.000000000+0200' '2004-05-20T02:00:00.000000000+0200'\n '2004-05-24T02:00:00.000000000+0200' '2004-05-28T02:00:00.000000000+0200'\n '2004-06-04T02:00:00.000000000+0200' '2004-06-10T02:00:00.000000000+0200'\n '2004-08-27T02:00:00.000000000+0200' '2004-10-06T02:00:00.000000000+0200'\n '2004-11-02T01:00:00.000000000+0100' '2004-12-08T01:00:00.000000000+0100'\n '2011-02-21T01:00:00.000000000+0100' '2011-03-21T01:00:00.000000000+0100'\n '2011-04-04T02:00:00.000000000+0200' '2011-04-11T02:00:00.000000000+0200'\n '2011-04-14T02:00:00.000000000+0200' '2011-04-18T02:00:00.000000000+0200'\n '2011-04-21T02:00:00.000000000+0200' '2011-04-25T02:00:00.000000000+0200'\n '2011-05-02T02:00:00.000000000+0200' '2011-05-09T02:00:00.000000000+0200'\n '2011-05-23T02:00:00.000000000+0200' '2011-06-07T02:00:00.000000000+0200'\n '2011-06-21T02:00:00.000000000+0200' '2011-07-04T02:00:00.000000000+0200'\n '2011-07-18T02:00:00.000000000+0200' '2011-08-31T02:00:00.000000000+0200'\n '2011-09-13T02:00:00.000000000+0200' '2011-09-28T02:00:00.000000000+0200'\n '2011-10-10T02:00:00.000000000+0200' '2011-10-25T02:00:00.000000000+0200'\n '2011-11-08T01:00:00.000000000+0100' '2011-11-28T01:00:00.000000000+0100'\n '2011-12-20T01:00:00.000000000+0100' '2012-01-19T01:00:00.000000000+0100'\n '2012-02-14T01:00:00.000000000+0100' '2012-03-13T01:00:00.000000000+0100'\n '2012-03-27T02:00:00.000000000+0200' '2012-04-02T02:00:00.000000000+0200'\n '2012-04-10T02:00:00.000000000+0200' '2012-04-17T02:00:00.000000000+0200'\n '2012-04-26T02:00:00.000000000+0200' '2012-04-30T02:00:00.000000000+0200'\n '2012-05-03T02:00:00.000000000+0200' '2012-05-07T02:00:00.000000000+0200'\n '2012-05-10T02:00:00.000000000+0200' '2012-05-14T02:00:00.000000000+0200'\n '2012-05-22T02:00:00.000000000+0200' '2012-06-05T02:00:00.000000000+0200'\n '2012-06-19T02:00:00.000000000+0200' '2012-07-03T02:00:00.000000000+0200'\n '2012-07-17T02:00:00.000000000+0200' '2012-07-31T02:00:00.000000000+0200'\n '2012-08-14T02:00:00.000000000+0200' '2012-08-28T02:00:00.000000000+0200'\n '2012-09-11T02:00:00.000000000+0200' '2012-09-25T02:00:00.000000000+0200'\n '2012-10-10T02:00:00.000000000+0200' '2012-10-24T02:00:00.000000000+0200'\n '2012-11-21T01:00:00.000000000+0100' '2012-12-18T01:00:00.000000000+0100'] not in index"
You are close, but you actually don't need to iterate over the rows when using the isnull() functions. by default
df[(df['DOC_mg/L'].isnull() == False) & (df['TOC_mg/L'].isnull() == True)].index
Will return just the index of the rows where DOC_mg/L is not null and TOC_mg/L is null.
Now you can do something like this to set the values for TOC_mg/L:
null_index = df[(df['DOC_mg/L'].isnull() == False) & \
(df['TOC_mg/L'].isnull() == True)].index
df['TOC_mg/L'][null_index] = df['DOC_mg/L'][null_index] # EDIT To switch the index position.
This will use the index of the rows where TOC_mg/L is null and DOC_mg/L is not null, and set the values for TOC_mg/L to the those found in DOC_mg/L in the same rows.
Note: This is not the accepted way for setting values using an index, but it is how I've been doing it for some time. Just make sure that when setting values, the left side of the equation is df['col_name'][index]. If col_name and index are switched you will set the values to a copy which is never set back to the original.
Now to set the mean, you can create a new column, we'll call this Mean_mg/L and set the value = 0.0. Then set this new column to the mean of both columns:
# Insert a new col at the end of the dataframe columns name 'Mean_mg/L'
# with default value 0.0
df.insert(len(df.columns), 'Mean_mg/L', 0.0)
# Set this columns value to the average of DOC_mg/L and TOC_mg/L
df['Mean_mg/L'] = (df['DOC_mg/L'] + df['TOC_mg/L']) / 2
In the columns where we filled null values with the corresponding column value, the average will be the same as the values.