Replace a string containing parentheses with a float in pandas - regex

I have a dataset with a column of strings which I want to convert to floats. However the column has a single entry containing a number within parentheses (which means to be a negative number). I tried different ways --indirect and direct-- to replace the value with a representation that would enable me to convert it to float but I have been failing and I don't understand why:
Here is the row with the digit under parentheses as a string:
My code:
mask1 = purchases.Amount.str.contains('\(').fillna(False)
purchases.loc[mask1, :]['Amount'] = purchases.loc[mask1, :]['Amount'].str.replace('\(', '-').str.replace('\)', '')
purchases.loc[mask2, :]['Amount'] = purchases.loc[mask2, :]['Amount'].str.replace('\s+', '').str.replace('[a-z]+', '')
# Both fail to replace
purchases.loc[mask1, :]['Amount'] = '-29.99' # direct assignment also fails
The result:
What am I doing wrong? How can I correct it?

Use rstrip for remove last ), then replace ( and last convert to floats:
df = pd.DataFrame({'Amount': ['(29.29)', '(39.39)', '12.5', '340']})
df['Amount'] = df['Amount'].str.strip(')').str.replace('\(', '-').astype(float)
print (df)
Amount
0 -29.29
1 -39.39
2 12.50
3 340.00
Your solutions are very close, what you need, only use loc with columns namef for avoid chain indexing:
mask1 = purchases.Amount.str.contains('\(').fillna(False)
purchases.loc[mask1, 'Amount'] = purchases.loc[mask1, 'Amount'].str.replace('\(', '-').str.replace('\)', '')
purchases.loc[mask2, 'Amount'] = purchases.loc[mask2, 'Amount'].str.replace('\s+', '').str.replace('[a-z]+', '')
purchases.loc[mask1, 'Amount'] = '-29.99'

Why not just check if the string is surrounded by brackets, and if it is, strip them.
from decimal import Decimal
def get_amount(s):
if s[0] == '(' and s[-1] == ')':
return Decimal(s[1:-1])
else:
return Decimal(s)

You can try:
df = pd.DataFrame({'Amount': ['(29.29)', '29.29']})
print(df)
df['Amount']=df.Amount.apply(lambda x: -float(x[1:-1]) if x[0] == '(' else float(x))
print(df)
print(df.dtypes)
Result:
Amount
0 (29.29)
1 29.29
Amount
0 -29.29
1 29.29
Amount float64
dtype: object

Related

Replace items in list if condition

I need to replace temperature values in list depends on negative/positive and get rid of float at the same time. I.e. value '-0.81' should be '-1' (round) or '0.88' should be '1'.
myList = ['-1.02', '-1.03', '-0.81', '-0.17', '-0.07', '0.22', '0.88', '0.88', '0.69']
for i in range (len(myList)):
if myList[i][0] == '-' and int(myList[i][-2]) > 5:
do sth...
At the end I need new list with new values. Thank you for any tips.
Your code is already almost there. It's not necessary to reference the elements by index.
myList = ['-1.02', '-1.03', '-0.81', '-0.17', '-0.07', '0.22', '0.88', '0.88', '0.69']
for i in myList:
if i[0] == '-' and int(i[-2]) > 5:
do sth...
If all you want to do is rounding then you can use a list comprehension.
roundlist = [round(float(i)) for i in myList]
You could parse the string into number, check for rounding (whether the decimal is higher or lower than 0.5), and convert it back to string
import math
myList = ['-1.02', '-1.03', '-0.81', '-0.17', '-0.07', '0.22', '0.88', '0.88', '0.69']
result = [0] * len(myList)
for i in range (len(myList)):
num = float(myList[i])
if num - math.floor(num) < 0.5:
result[i] = str(math.floor(num)) # round down
else:
result[i] = str(math.ceil(num)) # round up
print(result)

Converting to date format in pandas

I have a dataframe that contains a column which holds:
Date:
31062005
072005
12005
2012
I would like to convert these dates to the format:
Date:
31/06/2005
07/2005
01/2005
2012
What is the simplest way to do this? The fields are not in a date format yet, only strings.
Here:
df = pd.DataFrame(['30/06/2005', '07/2005', '1/2005', '2012'], columns=['Date'])
temp = pd.DataFrame(df['Date'].str.split('/').apply(reversed).tolist())\
.fillna('01')
df['Date'] = pd.to_datetime(temp[0].str.cat(temp[1].str.zfill(2))\
.str.cat(temp[2].str.zfill(2)), format='%Y%m%d')
suppose you write a function
def convert_date(s):
if len(s) == 4:
return s
elif len(s) < 7:
return s[: -4].zfill(2) + '/' + s[-4: ]
else:
return s[: -6].zfill(2) + '/' + s[-6: -4].zfill(2) + '/' + s[-4]
Then if your dates are in df.dates, you can use
>>> df.dates.apply(convert_date)
0 31/06/2
1 07/2005
2 01/2005
3 2012
Name: dates, dtype: object
Note that this converts a string in one form to a string in a different form, meaning you can't really manipulate dates further. If you want to do that, I'd suggest you amend the preceding function to use the appropriate datetime.datetime.strptime for the format matching the length of the string. It could look something like this:
def convert_date(s):
if len(s) == 4:
return datetime.datetime.strptime('%Y')
elif len(s) < 8:
return datetime.datetime.strptime('%m%Y')
else:
return datetime.datetime.strptime('%d%m%Y')
Note that your first date (with the 31 days) seems illegal, though.

Convert from decimal to binary - python

I'm having an issue with this piece of code I wrote. I'm trying to convert an integer input and print an output with its equivalent in binary base. For example for 5 it should drop an output of '101' however it just prints '10' like if it doesn't take into account the last digit. Please any comments would be greatly appreciated
T = raw_input()
for i in range(0, int(T)):
n = raw_input()
dec_num = int(n)
cnv_bin = ''
while dec_num//2 > 0:
if dec_num%2 == 0:
cnv_bin += '0'
else:
cnv_bin += '1'
dec_num = dec_num//2
print cnv_bin[::-1]
while dec_num//2 > 0:
should be:
while dec_num > 0:
The first time through the loop, 5//2==2, so it continues.
The second time through the loop, 2//2==1, so it continues.
The third time, 1//2==0 and the loop quits without handling the last bit.
Also, you can just do the following to display a number in binary:
print format(dec_num,'b')
Format string version:
print '{0} decimal is {0:b} binary.'.format(5)
Why not use the build-in function bin()?
eg:
bin(5)
output
0b101
If you don't want the prefix(0b), you can exclude it.
bin(5)[2:]
hope to be helpful!
import math
def roundup(n):
return math.ceil(n)
D = eval(input("Enter The Decimal Value: "))
n = roundup(math.log2(D+1))-1
bi = 0
di = D
qi = 0
i = n
print("Binary Value:",end = " ")
while(i>=0):
qi = math.trunc(di/2**i)
bi = qi
print(bi,end = "")
di = di - bi*(2**i)
i = i-1

Assigning variable from csv file based on header using python

I have a csv file that has a single record in it that I need to assign variables to and run through a python script. My date looks like the line below. Has header.
"CompanyName","Contact","Street","CityZip","Store","DateRec","apples","appQuan","oranges","orgQuan","peaches","peaQuan","pumpkins","pumQuan","Receive",0
American Grocers","Allison Smith","456 1st. Street","Podunk, California 00990","Store 135 Order","05/14/2015",1,10,0,4,1,4,2,0
Each value needs to be assigned a variable
1st position, "American Grocers" = CompanyName
2nd position, "Allison Smith" = Contact
3rd position = Street, etc.
After the date it gets tricky. The last 11 values are related to each other and get saved to a key.
If value 7 = 1, then variable 7 = "apples" and variable 8 = 10, else skip values 7 and 8 and go to 9
If value 9 = 1, then variable 9 = "oranges" and value 10 = the variable in position 10 (4), else skip values 9 and 10 and go to 11
If value 11 = 1, then variable 11 = "peaches" and value 12 = the variable in position 10 (4), else skip values 11 and 12 and go to 13
If value 13 = 1, then variable 13 = "pumkins" and value 13 = the variable in position 13 (2), else skip values 13 and 14
If value 15 = 1 then variable 15 = "Delivery", else variable = "Pick up"
thus python would assign the following:
CompanyName = "American Grocers"
Contact = "Allison Smith"
Street = "456 1st. Street"
CityZip = "Podunk, California 00990"
Store = "Store 135 Order"
OrderDate (does not need to be date type) = "05/14/2015"
orderList = {"apples" : 10, "peaches" : 4, "pumpkins" : 2}
Recieve = "Pick up"
I need to manipulate these variables further along in the script.
I have the following code to which outputs the data to its corresponding header information.
import csv
MY_FILE = "C:\\tests\\DataRequestsData\\qryFruit.csv"
def parse(raw_file, delimiter):
opened_file = open(raw_file)
csv_data = csv.reader(opened_file, delimiter=delimiter)
parsed_data = []
fields = csv_data.next()
for row in csv_data:
parsed_data.append(dict(zip(fields, row)))
opened_file.close()
return parsed_data
def main():
new_data = parse(MY_FILE, ",")
print new_data
if __name__ == "__main__":
main()
Output looks like this. (I am not sure why the output is not in the same order as the file ...)
[{'DateRec': '05/14/2015', 'orgQuan': '4', 'CompanyName': 'American Grocers', 'appQuan': '10', 'peaQuan': '4', 'oranges': '0', 'peaches': '1', 'Contact': 'Allison Smith', 'CityZip': 'Podunk, California 00990', 'pumpkins': '2', 'apples': '1', 'pumQuan': '0', 'Store': 'Store 135 Order', 'Street': '456 1st. Street'}]
I do not know how to take this and get the variables assigned as listed above. Suggestions? Using python 2.7
I am not sure why the output is not in the same order as the file ...
In a Python dictionary, entries are displayed an arbitrary order.
Below is the general contour of how to parse the program. The detailed logic: "if this field is this do this otherwise do that" is something that I hope you can do on your own. The specifics are way too long, detailed, and specific to be of interest of value for anyone else and I'm guessing that is why there hasn't been much interest in this.
import csv
MY_FILE = "C:\\tests\\DataRequestsData\\qryFruit.csv"
def parse(raw_file, delimiter):
parsed_data = []
with open(raw_file) as opened_file:
rec = {}
csv_data = csv.reader(opened_file, delimiter=delimiter)
fields = csv_data.next()
for row in csv_data:
for i, val in enumerate(row[0:6]):
rec[fields[i]] = val
# This part below is too specific, long, and complicated
# that it is doubtful filling this out in detail will be use
# or interest to anyone else on stackoverflow. But to give
# you an idea of how to proceed...
if row[6] == '1':
rec[fields[6]] = 'apples'
rec[fields[7]] = 10
else:
# continue
pass
# ...
parsed_data.append(rec)
return parsed_data
def main():
new_data = parse(MY_FILE, ",")
print new_data
if __name__ == "__main__":
main()
I finally accomplished what I was looking for with the following code: Thank you to all who offered help. Your ideas spurred me onto the tangents I needed.
import csv
MY_FILE = csv.reader(open("C:\\tests\\DataRequestsData\\qryFruit.csv", "rb"))
for row in MY_FILE:
CompanyName, Contact, Street, CityZip, Store, DateRec, apples, appQuan, oranges, orgQuan, peaches, peaQuan, pumpkins, pumQuan, Receive = row
s='{'
if apples == "1":
s = s + '"apples"' + ":" + appQuan
if oranges == "1":
s = s + '", "oranges"' + ":" + orgQuan
if peaches == "1":
s = s + '", "peaches"' + ":" + peaQuan
if pumpkins == "1":
s = s + '", "pumpkins"' + ":" + pumQuan
s = s + '}'
if Receive == "0":
Receive = "Pick up"
else:
Receive = "Deliver"

something wrong with my password generator

I made a password generator - I'm only 16 so it's probably not the best- and it outputs 8 0 and ones like 01100101 and then enderneath that it outputs the password. Well when there is a "10" in the password like FG4v10Y6 it will add another character so instead of it being FG4v10Y6 it would be FG4v10Y6M so it has nine or more characters depending on how many "10" are in it.
I'm not sure why it's doing this please help. THanx!
import pygame
import random
pygame.init()
#letters
reg = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
CAP = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
final_pass = []
num_let_list = []
new_list = []
i = 0
file = open("password_test","w")
def num_list_gen(num_list):
for i in range(8):
num_let_list.append(random.randint(0,1))
i += 1
for each in num_let_list:
each = str(each)
new_list.append(each)
print ''.join(new_list)
def CAP_reg_num(final_pass,num_let_list,CAP,reg):
for each in num_let_list:
if each == 0:
cap_reg = random.randint(0,1)
if cap_reg == 0:
let1 = random.randint(0,25)
final_pass.append(reg[let1])
if cap_reg == 1:
let1 = random.randint(0,25)
final_pass.append(CAP[let1])
if each == 1:
num1 = random.randint(0,10)
num1 = str(num1)
final_pass.append(num1)
def main(CAP,reg,num_let_list,final_pass):
num_list_gen(num_let_list)
CAP_reg_num(final_pass,num_let_list,CAP,reg)
print ''.join(final_pass)
file.write(''.join(final_pass))
file.close
main(CAP,reg,num_let_list,final_pass)
why did the code come out all weird on the post in some places and how do you fix it?
Your password generator is flipping a coin to choose between adding a letter or a number. When it chooses to add a number, you choose the number to add with:
num1 = random.randint(0,10)
However, this doesn't return a single digit number. It returns one of 11 possible values: the numbers between 0 and 10 inclusive. So one time in 11, it will add the number 10 to the string, which is, of course, two digits.
You want:
num1 = random.randint(0,9)
instead to add a single digit.