Datetime object through 'datetime.strptime is not iterable' - python-2.7

i have a csv file containing years of data, and i need to calculate the difference between the max date and the min date, i am facing a real problem in how can i determine the max value of dates.
So, i am doing this to convert my dates into datetime object
Temps = datetime.strptime(W['datum'][i]+' '+W['timestamp'][i],'%Y-%m-%d %H:%M:%S')
Printing this line, gives me the exact result i want, but when i try to extract the max values of these dates using this line of code :
start = max(Temps)
I got this error : datetime.strptime' object is not iterable
where am i mistaken ?

The expression
datetime.strptime(W['datum'][i]+' '+W['timestamp'][i],'%Y-%m-%d %H:%M:%S')
produces a single value (a scalar). When you assign it to Temps this variable become a scalar not a list. It contains only one value.
Then when you try to evaluate max(Temps) max is expecting to find something with multiple values as its argument but, unfortunately, it finds what Temps was assigned most recently.
This was a single value, which is not 'iterable'.

Related

Determining dictionary value in python based on current system time

I have a dictionary created with key as time in %H:%M:%S and value for it .
dict = {
'06:00:01': '0x95', '06:10:01': '0x97',
'06:20:01': '0x98', '06:30:01': '0x99',
'06:40:01': '0x101', '06:50:01': '0x102',
'07:00:01': '0x104', '07:10:01': '0x105',
'07:20:01': '0x106', '07:30:01': '0x107',
'07:40:01': '0x109', '07:50:01': '0x110',
'08:00:01': '0x111', '08:10:01': '0x112',
'08:20:01': '0x113', '08:30:01': '0x114',
'08:40:01': '0x115', '08:50:01': '0x116',
'09:00:01': '0x117', '09:10:01': '0x118',
'09:20:01': '0x119', '09:30:01': '0x119',
'09:40:01': '0x120', '09:50:01': '0x121',
'10:00:01': '0x122', '10:10:01': '0x122',
'10:20:01': '0x123', '10:30:01': '0x124',
'10:40:01': '0x124', '10:50:01': '0x125',
'11:00:01': '0x125', '11:10:01': '0x126',
'11:20:01': '0x126', '11:30:01': '0x126',
'11:40:01': '0x127', '11:50:01': '0x127',
'12:00:01': '0x127', '12:10:01': '0x128',
'12:20:01': '0x128'
}
I am trying to think of logic which will return the dictionary value based on current system time. If current system time is in range of two key values of dictionary it should return value of lower key in python
This solution assumes that you are not using the am/pm format:
from datetime import datetime
cur_time = datetime.now().time()
print cur_time
keys = sorted(date_dict.keys())
times = [datetime.strptime(i, "%H:%M:%S") for i in keys]
out = False
for idx, t in enumerate(times):
if cur_time >= t.time():
t1 = times[min(idx+1, len(times)-1)].time()
if cur_time <= t1:
out = date_dict[str(t.time())]
print out
I switched the default name for a dictionary object dict to date_dict - that's exactly the dictionary from your question. Using the default names overwrites them, so it shouldn't be done.
cur_time is the current time, printed for convenience. keys is a list of sorted dictionary keys that are then turned into a list times of datetime objects. This can be done in one line but seams more readable this way.
The for loop uses enumerate to access both the datetime objects and their indices idx. If the current time is larger or equal to the datetime objects time, the code checks if it is also smaller or equal to the next object in the list. If that is the case, the current time fits in that range and the lower value of the lower key (t.time()) is assigned to out. If it's not in the dictionary range at all, the value will be remain the default 'False'.
The part times[min(idx+1, len(times)-1)] prevents the index to go out of range for values that are larger than '12:20:01'.
You can easily test this program by using timedelta and generating various times, here different by hours:
from datetime import timedelta
cur_time = (datetime.now() + timedelta(hours=8)).time()

Randomly set one-third of na's in a column to one value and the rest to another value

I'm trying to impute missing values in a dataframe df. I have a column A with 300 NaN's. I want to randomly set 2/3rd of it to value1 and the rest to value2.
Please help.
EDIT: I'm actually trying to this on dask, which does not support item assignment. This is what I have currently. Initially, I thought I'll try to convert all NA's to value1
da.where(df.A.isnull() == True, 'value1', df.A)
I got the following error:
ValueError: need more than 0 values to unpack
As the comment suggests, you can solve this with Series.where.
The following will work, but I cannot promise how efficient this is. (I suspect it may be better to produce a whole column of replacements at once with numpy.choice.)
df['A'] = d['A'].where(~d['A'].isnull(),
lambda df: df.map(
lambda x: random.choice(['value1', 'value1', x])))
explanation: if the value is not null (NaN), certainly keep the original. Where it is null, replace with the corresonding values of the dataframe produced by the first lambda. This maps values of the dataframe (chunks) to randomly choose the original value for 1/3 and 'value1' for others.
Note that, depending on your data, this likely has changed the data type of the column.

How can I calculate mean of list of strings?

I trying to calculate mean of one colum in a csv file.First, I read one column from .csv file and save it into a list. Next when I try to get mean it have a error
TypeError: 'builtin_function_or_method' object has no attribute '__getitem__'
my code is :
with open('XXXXXX.csv') as f:
reader = csv.DictReader(f)
for row in reader:
for (k,v) in row.items():
columns_95[k].append(v)
sVaR5 = columns_95['95%']
mean_95 = sum(sVaR5)/len(sVaR5)
and my csv looks like:
95% 99%
1.225 2.332
1.252 10.252
2.336 4.213
... ...
when I check my list, output is['1.225','1.252','2.336'] I think maybe the quote mark is the reason why my code has error. but how to fix it!Thanks!!!
sum is a function. If you want to call the function sum with the argument sVaR5, you need to write:
sum(sVaR5)
If your sVaR5 is a list of strings, you could convert them to floats for the sum:
sum(map(float, sVaR5))
If you put sum[sVaR5], Python tries to call __getitem__ on the object sum, hence the error
'builtin_function_or_method' object has no attribute '__getitem__'

converting python pandas column to numpy array in place

I have a csv file in which one of the columns is a semicolon-delimited list of floating point numbers of variable length. For example:
Index List
0 900.0;300.0;899.2
1 123.4;887.3;900.1;985.3
when I read this into a pandas DataFrame, the datatype for that column is object. I want to convert it, ideally in place, to a numpy array (or just a regular float array, it doesn't matter too much at this stage).
I wrote a little function which takes a single one of those list elements and converts it to a numpy array:
def parse_list(data):
data_list = data.split(';')
return np.array(map(float, data_list))
This works fine, but what I want to do is do this conversion directly in the DataFrame so that I can use pandasql and the like to manipulate the whole data set after the conversion. Can someone point me in the right direction?
EDIT: I seem to have asked the question poorly. I would like to convert the following data frame:
Index List
0 900.0;300.0;899.2
1 123.4;887.3;900.1;985.3
where the dtype of List is 'object'
to the following dataframe:
Index List
0 [900.0, 300.0, 899.2]
1 [123.4, 887.3, 900.1, 985.3]
where the datatype of List is numpy array of floats
EDIT2: some progress, thanks to the first answer. I now have the line:
df['List'] = df['List'].str.split(';')
which splits the column in place into an array, but the dtypes remain object When I then try to do
df['List'] = df['List'].astype(float)
I get the error:
return arr.astype(dtype)
ValueError: setting an array element with a sequence.
If I understand you correctly, you want to transform your data from pandas to numpy arrays.
I used this:
pandas_DataName.as_matrix(columns=None)
And it worked for me.
For more information visit here
I hope this could help you.

Pandas HD5-query, where expression fails

I want to query a HDF5-file. I do
df.to_hdf(pfad,'df', format='table')
to write the dataframe on disc.
To read I use
hdf = pandas.HDFStore(pfad)
I have a list that contains numpy.datetime64 values called expirations and try to read the portion of the hd5 table into a dataframe, that has values between expirations[1] and expirations[0] in column "expiration". Column expiration entries have the format Timestamp('2002-05-18 00:00:00').
I use the following command:
df = hdf.select('df',
where=['expiration<expiration[1]','expiration>=expirations[0]'])
However, this fails and produces a value error:
ValueError: The passed where expression: [expiration=expirations[0]]
contains an invalid variable reference
all of the variable refrences must be a reference to
an axis (e.g. 'index' or 'columns'), or a data_column
The currently defined references are: index,columns
Can you try this code:
df = hdf.select('df', where='expiration < expirations[1] and expiration >= expirations[0]')
or, as a query:
df = hdf.query('expiration < #expirations[1] and expiration >= #expirations[0]')
Not sure which one fits best your case, I noticed you are trying to use 'where' to filter rows, without a string or a list, does it make sense ?