Can not multiply a pandas "instance method" - how to change to float - python-2.7

I am using Pandas to calculate the standard deviation of a column in a data frame then multiply it by 100 to get a percentage, and then finally print it as follows:
import pandas as pd
results = pd.read_csv("MRAret.csv")
vol = results["Return"].std
print "Volatility: ",round(vol*100,2),"%"
However I am getting the following error:
File "C:/Users/Stuart/Documents/SPYDER/MRA Analysis.py", line 37, in <module>
print "Volatility: ",round(vol*100,2),"%"
TypeError: unsupported operand type(s) for *: 'instancemethod' and 'int'
So obviously the "vol" variable type is an "instancemethod", which I have never come across before (I am new to Pandas).
I have tried changing the type to float using:
vol = float(vol)
but I get the following error:
TypeError: float() argument must be a string or a number
When I just type in "vol" into my iPython console I get the output:
In [95]: vol
-> vol()
Out[95]: 0.005856992616571794
But when I type:
print vol
I get:
In [96]: print vol
<bound method Series.std of 0 0.000000
1 0.004864
2 0.001604
...
2369 0.004290
2370 0.014001
Name: Return, dtype: float64
I don't understand how it can be one single value and an array of values at the same time.
Could someone please explain to me how I can manipulate the vol variable of "instancemethod" type, in order to carry out arithmetic calculations.
Many thanks.

Your error came from this typo:
vol = results["Return"].std
essentially vol referenced the std method you wanted to do this:
vol = results["Return"].std()
which is the output of that method

Related

Problem with string to float conversion of values in pandas

My pandas dataframe column which have prices are mostly in the format r'\d+\.\d+' , which is what you expect. But when I try to convert it astype float, it says that I have got few numbers in the format \d+\.\d+\.\d+ like this one '6041.60.1'.
How do I go about converting all of them in the format \d+\.\d+ with series.str.replace()? The expected result is '6041.60'.
I'd recommand using .apply
df1["column"] = df1["column"].apply(lambda x: "".join(x.rsplit(".",1)), axis = 1 )#remove the last "."
df1["column"] = df1["column"].astype("float")

How to overcome this error "TypeError: unsupported operand type(s) for -: 'float' and 'instancemethod'"

I'm working with pandas and i just came across this error, So basically i concated several dataframes to make one and i took the 'mean' and 'std' of each column using the following command
df = pd.concat(df_all)
df = df.groupby('wave').agg(['mean','std']).reset_index()
wave num stlines fwhm
mean std mean std mean std
0 4050.32 2.700565 1.036630 0.285702 0.007247 0.073511 0.002398
1 4208.98 4.632768 0.959788 0.484906 0.007137 0.086225 0.002070
2 4374.94 8.576271 1.299520 0.714421 0.003106 0.113164 0.001426
3 4379.74 4.248588 3.469888 0.310619 0.004290 0.091814 0.002183
4 4398.01 8.632768 3.628431 0.502670 0.007020 0.094771 0.005925
Now when i tried to plot this data
mean = df['fwhm']['mean']
std = df['fwhm']['std']
plt.errorbar(df.wave,mean, yerr = std ,fmt='o', label='original data')
Then i got this error TypeError: unsupported operand type(s) for -: 'float' and 'instancemethod'
So when i checked the type of std type(df['fwhm']['mean']) then it says it's an instancemethod.
How do i solve this issue?

Reading in TSP file Python

I need to figure out how to read in this data of the filename 'berlin52.tsp'
This is the format I'm using
NAME: berlin52
TYPE: TSP
COMMENT: 52 locations in Berlin (Groetschel)
DIMENSION : 52
EDGE_WEIGHT_TYPE : EUC_2D
NODE_COORD_SECTION
1 565.0 575.0
2 25.0 185.0
3 345.0 750.0
4 945.0 685.0
5 845.0 655.0
6 880.0 660.0
7 25.0 230.0
8 525.0 1000.0
9 580.0 1175.0
10 650.0 1130.0
And this is my current code
# Open input file
infile = open('berlin52.tsp', 'r')
# Read instance header
Name = infile.readline().strip().split()[1] # NAME
FileType = infile.readline().strip().split()[1] # TYPE
Comment = infile.readline().strip().split()[1] # COMMENT
Dimension = infile.readline().strip().split()[1] # DIMENSION
EdgeWeightType = infile.readline().strip().split()[1] # EDGE_WEIGHT_TYPE
infile.readline()
# Read node list
nodelist = []
N = int(intDimension)
for i in range(0, int(intDimension)):
x,y = infile.readline().strip().split()[1:]
nodelist.append([int(x), int(y)])
# Close input file
infile.close()
The code should read in the file, output out a list of tours with the values "1, 2, 3..." and more while the x and y values are stored to be calculated for distances. It can collect the headers, at least. The problem arises when creating a list of nodes.
This is the error I get though
ValueError: invalid literal for int() with base 10: '565.0'
What am I doing wrong here?
This is a file in TSPLIB format. To load it in python, take a look at the python package tsplib95, available through PyPi or on Github
Documentation is available on https://tsplib95.readthedocs.io/
You can convert the TSPLIB file to a networkx graph and retrieve the necessary information from there.
You are feeding the string "565.0" into nodelist.append([int(x), int(y)]).
It is telling you it doesn't like that because that string is not an integer. The .0 at the end makes it a float.
So if you change that to nodelist.append([float(x), float(y)]), as just one possible solution, then you'll see that your problem goes away.
Alternatively, you can try removing or separating the '.0' from your string input.
There are two problem with the code above.I have run the code and found the following problem in lines below:
Dimension = infile.readline().strip().split()[1]
This line should be like this
`Dimension = infile.readline().strip().split()[2]`
instead of 1 it will be 2 because for 1 Dimension = : and for 2 Dimension = 52.
Both are of string type.
Second problem is with line
N = int(intDimension)
It will be
N = int(Dimension)
And lastly in line
for i in range(0, int(intDimension)):
Just simply use
for i in range(0, N):
Now everything will be alright I think.
nodelist.append([int(x), int(y)])
int(x)
function int() cant convert x(string(565.0)) to int because of "."
add
x=x[:len(x)-2]
y=y[:len(y)-2]
to remove ".0"

Not calculating sum for all columns in pandas dataframe

I'm pulling data from Impala using impyla, and converting them to dataframe using as_pandas. And I'm using Pandas 0.18.0, Python 2.7.9
I'm trying to calculate the sum of all columns in a dataframe and trying to select the columns which are greater than the threshold.
self.data = self.data.loc[:,self.data.sum(axis=0) > 15]
But when I run this I'm getting error like below:
pandas.core.indexing.IndexingError: Unalignable boolean Series key
provided
Then I tried like below.
print 'length : ',len(self.data.sum(axis = 0)),' all columns : ',len(self.data.columns)
Then i'm getting different length i.e
length : 78 all columns : 83
And I'm getting below warning
C:\Python27\lib\decimal.py:1150: RuntimeWarning: tp_compare didn't
return -1 or -2 for exception
And To achieve my goal i tried the other way
for column in self.data.columns:
sum = self.data[column].sum()
if( sum < 15 ):
self.data = self.data.drop(column,1)
Now i have got the other errors like below:
TypeError: unsupported operand type(s) for +: 'Decimal' and 'float'
C:\Python27\lib\decimal.py:1150: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
Then i tried to get the data types of each column like below.
print 'dtypes : ', self.data.dtypes
The result has all the columns are one of these int64 , object and float 64
Then i thought of changing the data type of columns which are in object like below
self.data.convert_objects(convert_numeric=True)
Still i'm getting the same errors, Please help me in solving this.
Note : In all the columns I do not have strings i.e characters and missing values or empty.I have checked this using self.data.to_csv
As i'm new to pandas and python Please don't mind if it is a silly question. I just want to learn
Please review the simple code below and you may understand the reason of the error.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.random([3,3]))
df.iloc[0,0] = np.nan
print df
print df.sum(axis=0) > 1.5
print df.loc[:, df.sum(axis=0) > 1.5]
df.iloc[0,0] = 'string'
print df
print df.sum(axis=0) > 1.5
print df.loc[:, df.sum(axis=0) > 1.5]
0 1 2
0 NaN 0.336250 0.801349
1 0.930947 0.803907 0.139484
2 0.826946 0.229269 0.367627
0 True
1 False
2 False
dtype: bool
0
0 NaN
1 0.930947
2 0.826946
0 1 2
0 string 0.336250 0.801349
1 0.930947 0.803907 0.139484
2 0.826946 0.229269 0.367627
1 False
2 False
dtype: bool
Traceback (most recent call last):
...
pandas.core.indexing.IndexingError: Unalignable boolean Series key provided
Shortly, you need additional preprocess on your data.
df.select_dtypes(include=['object'])
If it's convertable string numbers, you can convert it by df.astype(), or you should purge them.

Python: function, for loop, error message

I am pretty new to Python and am practicing with codeacademy, am getting a strange error message with below function. I dont understand as it looks logically and syntactically correct to me, can anyone see the issue?
def compute_bill(food):
total = 0
for item in food:
total = total + item
return total
Oops, try again.
compute_bill(['apple'])
resulted in a
TypeError: unsupported operand type(s) for +: 'int' and 'str'
You cannot add a string with an integer .
typeError on python Docs -typeError
call the function like below-
compute_bill([1])
compute_bill([10,20,30])
OR
apple = 10
orange = 20
compute_bill([apple,orange])
as #Rilwan said in his answer yo cannot add string with an interger. Since you are working on codeacademy, i have completed similar assignment, I believe you have to get the cost of the food that you send to the function from a dictionary and then calculate the total.
food_cost = { "apples" : 20, "oranges" : 40}
def compute_bill(food):
total = 0
for item in food:
total = total + food_cost[item]
return total
compute_bill(['apples'])