I am using the following code to create a data frame from a list:
test_list = ['a','b','c','d']
df_test = pd.DataFrame.from_records(test_list, columns=['my_letters'])
df_test
The above code works fine. Then I tried the same approach for another list:
import pandas as pd
q_list = ['112354401', '116115526', '114909312', '122425491', '131957025', '111373473']
df1 = pd.DataFrame.from_records(q_list, columns=['q_data'])
df1
But it gave me the following errors this time:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-24-99e7b8e32a52> in <module>()
1 import pandas as pd
2 q_list = ['112354401', '116115526', '114909312', '122425491', '131957025', '111373473']
----> 3 df1 = pd.DataFrame.from_records(q_list, columns=['q_data'])
4 df1
/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in from_records(cls, data, index, exclude, columns, coerce_float, nrows)
1021 else:
1022 arrays, arr_columns = _to_arrays(data, columns,
-> 1023 coerce_float=coerce_float)
1024
1025 arr_columns = _ensure_index(arr_columns)
/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in _to_arrays(data, columns, coerce_float, dtype)
5550 data = lmap(tuple, data)
5551 return _list_to_arrays(data, columns, coerce_float=coerce_float,
-> 5552 dtype=dtype)
5553
5554
/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in _list_to_arrays(data, columns, coerce_float, dtype)
5607 content = list(lib.to_object_array(data).T)
5608 return _convert_object_array(content, columns, dtype=dtype,
-> 5609 coerce_float=coerce_float)
5610
5611
/usr/local/lib/python3.4/dist-packages/pandas/core/frame.py in _convert_object_array(content, columns, coerce_float, dtype)
5666 # caller's responsibility to check for this...
5667 raise AssertionError('%d columns passed, passed data had %s '
-> 5668 'columns' % (len(columns), len(content)))
5669
5670 # provide soft conversion of object dtypes
AssertionError: 1 columns passed, passed data had 9 columns
Why would the same approach work for one list but not another? Any idea what might be wrong here? Thanks a lot!
DataFrame.from_records treats string as a character list. so it needs as many columns as length of string.
You could simply use the DataFrame constructor.
In [3]: pd.DataFrame(q_list, columns=['q_data'])
Out[3]:
q_data
0 112354401
1 116115526
2 114909312
3 122425491
4 131957025
5 111373473
In[20]: test_list = [['a','b','c'], ['AA','BB','CC']]
In[21]: pd.DataFrame(test_list, columns=['col_A', 'col_B', 'col_C'])
Out[21]:
col_A col_B col_C
0 a b c
1 AA BB CC
In[22]: pd.DataFrame(test_list, index=['col_low', 'col_up']).T
Out[22]:
col_low col_up
0 a AA
1 b BB
2 c CC
If you want to create a DataFrame from multiple lists you can simply zip the lists. This returns a 'zip' object. So you convert back to a list.
mydf = pd.DataFrame(list(zip(lstA, lstB)), columns = ['My List A', 'My List B'])
just using concat method
test_list = ['a','b','c','d']
pd.concat(test_list )
You could also take the help of numpy.
import numpy as np
df1 = pd.DataFrame(np.array(q_list),columns=['q_data'])
How is line # 9(if n + 1 == x) relevant to checking if the number is prime?
Is there a simpler way to build this function?
def is_prime(x):
if x == 2:
return True
elif x > 2:
for n in range(2, x):
if x % n == 0:
return False
else:
if n + 1 == x:
return True
else:
return False
A prime number is an integer having only 1 and 'self' as divisors. Here is a similar solution that may be easier to follow. We use a pandas DataFrame and it's associated 'apply' function. Suppress the 'print df' line and modify the output as desired. Have fun
"""
Created on Fri Nov 18 13:32:08 2016
#author: Soya
"""
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas import DataFrame, Series
def isprime(x):
vals = range(2,x/2)
df = DataFrame([vals]).T
df['1'] = df.apply(lambda y: x%y)
print df
print ''
if df['1'].prod() != 0:
print 'PRIME'
isprime(17)
0 1
0 2 1
1 3 2
2 4 1
3 5 2
4 6 5
5 7 3
PRIME
I am new user in python. I would like to convert the word "cook" to the ASCII value. I want to calculate the total number. For e.g. for the word "cook" the total will be (99+210+321+428)=1058. Below is my code :
import nltk
s="cook"
sum=0
for c in s:
x=ord(c)
sum=sum+x
print(sum)
Output :
99
210
321
428
I want the total (1058). What I have to add more?
This appears to be the formula that you want:
x, total = 0, 0
for c in 'cook':
x += ord(c)
total += x
print(total)
It produces the number you want:
1058
Alternative: using numpy
>>> from numpy import sum, cumsum
>>> sum(cumsum([ord(c) for c in 'cook']))
1058
I have a question on the usage of .loc. I couldn't find an explicit answer in the documentation.
Say I have a df like:
import pandas as pd
import numpy as np
df = pd.DataFrame({"A": np.random.random(1000), "B": np.random.random(1000)})
I want to create a 1 in a new column if a value in column A is > .1. Using some boolean logic:
crit = df['A'] > .1
Now, is using .loc this way:
df['New Column'] = 0
df['New Column'].loc[crit] = 1
Any different than:
df['New Column'] = 0
df.loc[crit, 'New Column'] = 1
Using the first way, I continually get a SettingWithCopyWarning, however the values do appear to be changing in the df.
I am trying to clean up the data. For the first name variable, I would like to 1) assign missing value (NaN) to those entries that have one character only, 2) assign missing value if it contains only two characters AND one of the characters is a symbol (ie: ".", or "?"), and 3) convert "wm" to string "william"
I tried the following and other codes, but none seems to work:
import pandas as pd
from pandas import DataFrame, Series
import numpy as np
import re
def CleanUp():
data = pd.read_csv("C:\sample.csv")
frame2 = DataFrame(data)
frame2.columns = ["First Name", "Ethnicity"]
# Convert weird values to missing value
for Name in frame2["First_Name"]:
if len(Name) == 1:
Name == np.nan
if (len(Name) == 2) and (Name.str.contain(".|?|:", na=False)):
Name == np.nan
if Name == "wm":
Name == "william"
print frame2["First_Name"]
You're looking for df.replace
make up some data:
np.random.seed(3)
n=6
df = pd.DataFrame({'Name' : np.random.choice(['wm','bob','harry','chickens'], size=n),
'timeStamp' : np.random.randint(1000, size=n)})
print df
Name timeStamp
0 harry 256
1 wm 789
2 bob 659
3 chickens 714
4 wm 875
5 wm 681
run the replace:
df.Name = df.Name.replace('wm','william')
print df
Name timeStamp
0 harry 256
1 william 789
2 bob 659
3 chickens 714
4 william 875
5 william 681