Error on pandas.read_hdf - python-2.7

I created an HDF5 file with:
pfad = "E:\Geld\Handelssysteme\Kursdaten\Ivolatity/Daten Monatsoptionen/ODAX_alles.h5"
df.to_hdf(pfad,'df', format='table')
Now I want to read and put a portion of the table back into a dataframe without reading all of the lines in the file.
I tried
df=pandas.read_hdf('pfad', 'df', where = ['expiration<expirations[1] and expiration>=expirations[0]'])
where expirations is a list that contains datetime64[ns] values and I want to get a dataframe where the values in column "expiration" are between expirations[1] and expirations[0].
However, I get a KeyError: 'No object named df in the file'
What would the right syntax be?

The following works instead:
hdf=pandas.HDFStore(pfad)
df=hdf.select('df')

Related

Convert SList to Dataframe

I am reading data from a binary .out file using a python module "SWMMToolbox." The command to read the infilration time series for RG1 from the file.out is as follows:
x = !swmmtoolbox extract 'file.out' subcatchment,RG1,Infiltration_loss
See link for details about swmmtoolbox.
The data type of 'x' is a 'IPython.utils.text.SList'
The data looks like this:
I would like to import this Slist into pandas, but am having trouble. I want to get the datetime string as one column and the value after the comma as another. However, when I use
df = pd.DataFrame(data=x)
I get the following:
I also tried to use
df = pd.DataFrame.from_records(x)
but get this:
I tried to use pd.read_csv, but I couldn't get it to work since 'x' is a variable and not a file.
Any suggestions are much appreciated.

How to change value of any column of semicolon separated csv file using python

Hi I am new in the Python. I want to change value of any column in semicolon separated CSV file. I have following CSV file format:
"S. No.";"name";"number";"status";
"1";"Mac";"54";"ABC";
"2";"Jack";"34";"xyz"; '''
I am using following Python code !Python code
!
However I am getting error "list index out of range".
I have search similar examples but most of them are comma separated CSV file. This code without delimiter specified is working fine for comma separated CSV file. I am getting row value like this
Row [" 1"; "Mac";" 54"; "ABC";] so I can not able to access elements of row list. Please help me to sort out the issue.

Null Byte appending while reading the file through Python pandas

I have created a script which will give you the match rows between the two files. Post that, I am returning the output file to a function, which will be used the file as input to create pivot using pandas.
But somehow, something seems to be wrong, below is the code snippet
def CreateSummary(file):
out_file = file
file_df = pd.read_csv(out_file) ## This function is appending NULL Bytes at
the end of the file
#print file_df.head(2)
The above code is giving me the error as
ValueError: No columns to parse from file
Tried another approach:
file_df = pd.read_csv(out_file,delim_whitespace=True,engine='python')
##This gives me error as
_csv.Error: line contains NULL byte
Any suggestions and criticism is highly appreciated.

How can I calculate mean of list of strings?

I trying to calculate mean of one colum in a csv file.First, I read one column from .csv file and save it into a list. Next when I try to get mean it have a error
TypeError: 'builtin_function_or_method' object has no attribute '__getitem__'
my code is :
with open('XXXXXX.csv') as f:
reader = csv.DictReader(f)
for row in reader:
for (k,v) in row.items():
columns_95[k].append(v)
sVaR5 = columns_95['95%']
mean_95 = sum(sVaR5)/len(sVaR5)
and my csv looks like:
95% 99%
1.225 2.332
1.252 10.252
2.336 4.213
... ...
when I check my list, output is['1.225','1.252','2.336'] I think maybe the quote mark is the reason why my code has error. but how to fix it!Thanks!!!
sum is a function. If you want to call the function sum with the argument sVaR5, you need to write:
sum(sVaR5)
If your sVaR5 is a list of strings, you could convert them to floats for the sum:
sum(map(float, sVaR5))
If you put sum[sVaR5], Python tries to call __getitem__ on the object sum, hence the error
'builtin_function_or_method' object has no attribute '__getitem__'

Pandas HD5-query, where expression fails

I want to query a HDF5-file. I do
df.to_hdf(pfad,'df', format='table')
to write the dataframe on disc.
To read I use
hdf = pandas.HDFStore(pfad)
I have a list that contains numpy.datetime64 values called expirations and try to read the portion of the hd5 table into a dataframe, that has values between expirations[1] and expirations[0] in column "expiration". Column expiration entries have the format Timestamp('2002-05-18 00:00:00').
I use the following command:
df = hdf.select('df',
where=['expiration<expiration[1]','expiration>=expirations[0]'])
However, this fails and produces a value error:
ValueError: The passed where expression: [expiration=expirations[0]]
contains an invalid variable reference
all of the variable refrences must be a reference to
an axis (e.g. 'index' or 'columns'), or a data_column
The currently defined references are: index,columns
Can you try this code:
df = hdf.select('df', where='expiration < expirations[1] and expiration >= expirations[0]')
or, as a query:
df = hdf.query('expiration < #expirations[1] and expiration >= #expirations[0]')
Not sure which one fits best your case, I noticed you are trying to use 'where' to filter rows, without a string or a list, does it make sense ?