Formatting thousand separator for numbers in a pandas dataframe

Formatting thousand separator for numbers in a pandas dataframe - python-2.7

I am trying to write a dataframe to a csv and I would like the .csv to be formatted with commas. I don't see any way on the to_csv docs to use a format or anything like this.
Does anyone know a good way to be able to format my output?
My csv output looks like this:
12172083.89 1341.4078 -9568703.592 10323.7222
21661725.86 -1770.2725 12669066.38 14669.7118
I would like it to look like this:
12,172,083.89 1,341.4078 -9,568,703.592 10,323.7222
21,661,725.86 -1,770.2725 12,669,066.38 14,669.7118

Comma is the default separator. If you want to choose your own separator you can do this by declaring the sep parameter of pandas to_csv() method.
df.to_csv(sep=',')
If you goal is to create thousand separators and export them back into a csv you can follow this example:
import pandas as pd
df = pd.DataFrame([[12172083.89, 1341.4078, -9568703.592, 10323.7222],
[21661725.86, -1770.2725, 12669066.38, 14669.7118]],columns=['A','B','C','D'])
for c in df.columns:
df[c] = df[c].apply(lambda x : '{0:,}'.format(x))
df.to_csv(sep='\t')
If you just want pandas to show separators when printed out:
pd.options.display.float_format = '{:,}'.format
print(df)

What you're looking to do has nothing to do with csv output but rather is related to the following:
print('{0:,}'.format(123456789000000.546776362))
produces
123,456,789,000,000.546776362
See format string syntax.
Also, you'd do well to pay heed to #Peter 's comment above about compromising the structure of a csv in the first place.

Related

Any ideas on Iterating over dataframe and applying regex?

This may be a rudimentary problem but I am new to pandas.
I have a csv dataframe and I want to iterate over each row to extract all the string information in a specific column through regex. . (The reason why I am using regex is because eventually I want to make a separate dataframe of that column)
I tried iterating through for loop but I got ton of errors. So far, It looks like for loop reads each input row as a list or series rather than a string (correct me if i'm wrong). My main functions are iteritems() and findall() but no good results so far. How can I approach this problem?
My dataframe looks like this:
df =pd.read_csv('foobar.csv')
df[['column1','column2, 'TEXT']]
My approach looks like this:
for Individual_row in df['TEXT'].iteritems():
parsed = re.findall('(.*?)\:\s*?\[(.*?)\], Individual_row)
res = {g[0].strip() : g[1].strip() for g in parsed}
Many thanks in advance

you can try the following instead of loop:
df['new_TEXT'] = df['TEXT'].apply(lambda x: [g[0].strip(), g[1].strip()] for g in re.findall('(.*?)\:\s*?\[(.*?)\]', x), na_action='ignore' )
This will create a new column with your resultant data.

unexpected character after line continuation character. Also to keep rows after floating point rows in pandas dataframe

I have a dataset in which I want to keep row just after a floating value row and remove other rows.
For eg, a column of the dataframe looks like this:
17.3
Hi Hello
Pranjal
17.1
[aasd]How are you
I am fine[:"]
Live Free
So in this I want to preserve:
Hi Hello
[aasd]How are you
and remove the rest. I tried it with the following code, but an error showed up saying "unexpected character after line continuation character". Also I don't know if this code will solve my purpose
Dropping extra rows
for ind in data.index:
if re.search((([1-9][0-9]*\.?[0-9]*)|(\.[0-9]+))([Ee][+-]?[0-9]+)?, ind):
ind+=1
else:
data.drop(ind)

your regex has to be a string, you can't just write it like that.
re.search((('[1-9][0-9]*\.?[0-9]*)|(\.[0-9]+))([Ee][+-]?[0-9]+)?', ind):
edit - but actually i think the rest of your code is wrong too.
what you really want is something more like this:
import pandas as pd
l = ['17.3',
'Hi Hello',
'Pranjal',
'17.1',
'[aasd]How are you',
'I am fine[:"]',
'Live Free']
data = pd.DataFrame(l, columns=['col'])
data[data.col.str.match('\d+\.\d*').shift(1) == True]
logic:
if you have a dataframe with a column that is all string type (won't work for mixed type decimal and string you can find the decimal / int entries with the regex '\d+.?\d*'. If you shift this mask by one it gives you the entries after the matches. use that to select the rows you want in your dataframe.

Convert SList to Dataframe

I am reading data from a binary .out file using a python module "SWMMToolbox." The command to read the infilration time series for RG1 from the file.out is as follows:
x = !swmmtoolbox extract 'file.out' subcatchment,RG1,Infiltration_loss
See link for details about swmmtoolbox.
The data type of 'x' is a 'IPython.utils.text.SList'
The data looks like this:
I would like to import this Slist into pandas, but am having trouble. I want to get the datetime string as one column and the value after the comma as another. However, when I use
df = pd.DataFrame(data=x)
I get the following:
I also tried to use
df = pd.DataFrame.from_records(x)
but get this:
I tried to use pd.read_csv, but I couldn't get it to work since 'x' is a variable and not a file.
Any suggestions are much appreciated.

dataframe to dictionary:python

So, i have a file
F1.txt
CDUS,CBSCS,CTRS,CTRS_ID
0,0,0.000000000375,056572
0,0,4.0746,0309044
0,0,0.6182,0971094
0,0,15.4834,075614
I want to insert the column names and its dtype into a dictionary with the column names being the key and the corresponding dtype of the column being the value.
My read statement has to be like this:
csv=pandas.read_csv('F2.txt',dtype={'CTRS_ID':str})
I'm expecting something like this:
data = {'CDUS':'int64','CBSCS':'int64','CTRS':'float64','CTRS_ID':'str'}
Can someone help me with this. Thanks in advance

You can use dtypes to find the type of each column and then transform the result to a dictionary with to_dict. Also, if you want a string representation of the type, you can convert the dtypes output to string:
csv=pandas.read_csv('F2.txt',dtype={'CTRS_ID':str})
csv.dtypes.astype(str).to_dict()
Which gives the output:
{'CBSCS': 'int64', 'CDUS': 'int64', 'CTRS': 'float64', 'CTRS_ID': 'object'}
This is actually the right result, since pandas treats string as object.
I have not enough expertise to elaborate on this, but here a couple of references:
pandas distinction between str and object types
pandas string data types
"pandas doesn't support the internal string types (in fact they are always converted to object)" [from pandas maintainer #Jeff]

'~' leading to null results in python script

I am trying to extract a dynamic value (static characters) from a csv file in a specific column and output the value to another csv.
The data element I am trying to extract is '12385730561818101591' from the value 'callback=B~12385730561818101591' located in a specific column.
I have written the below python script, but the output results are always blank. The regex '=(~[0-9]+)' was validated to successfully pull out the '12385730561818101591' value. This was tested on www.regex101.com.
When I use this in Python, no results are displayed in the output file. I have a feeling the '~' is causing the error. When I tried searching for '~' in the original CSV file, no results were found, but it is there!
Can the community help me with the following:
(1) Determine root cause of no output and validate if '~' is the problem. Could the problem also be the way I'm splitting the rows? I'm not sure if the rows should be split by ';' instead of ','.
import csv
import sys
import ast
import re
filename1 = open("example.csv", "w")
with open('example1.csv') as csvfile:
data = None
patterns = '=(~[0-9]+)'
data1= csv.reader(csvfile)
for row in data1:
var1 = row[57]
for item in var1.split(','):
if re.search(patterns, item):
for data in item:
if 'common' in data:
filename1.write(data + '\n')
filename1.close()

Here I have tried to write sample code. Hope this will help you in solving the problem:
import re
str="callback=B~12385730561818101591"
rc=re.match(r'.*=B\~([0-9A-Ba-b]+)', str)
print rc.group(1)

You regex is wrong for your example :
=(~[0-9]+) will never match callback=B~12385730561818101591 because of the B after the = and before the ~.
Also you include the ~ in the capturing group.
Not exatly sure what's your goal but this could work. Give more details if you have more restrictions.
=.+~([0-9]+)
EDIT
Following the new provided information :
patterns = '=.+~([0-9]+)'
...
result = re.search(patterns, item):
number = result.group(0)
filename1.write(number + '\n')
...
Concerning your line split on the \t (tabulation) you should show an example of the full line

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Formatting thousand separator for numbers in a pandas dataframe - python-2.7

Related

Any ideas on Iterating over dataframe and applying regex?

unexpected character after line continuation character. Also to keep rows after floating point rows in pandas dataframe

Convert SList to Dataframe

dataframe to dictionary:python

'~' leading to null results in python script

Categories

Resources