How to import money data using IMPORTXML? - regex

I'm trying using ImportXML with XPath code to get currency rate, but this code not working.
=IMPORTXML("https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml"; "/gesmes:Envelope/Cube/Cube/Cube[#currency='USD']#rate")

=REGEXEXTRACT(QUERY(
IMPORTDATA("https://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml");
"where Col1 contains 'USD'"; 0); "rate='(.*)'")

Related

Cumsum function in python

I have the below-mentioned dataset.
https://docs.google.com/spreadsheets/d/13GCAXHp5BU4vYU6PdX40wM-Jhp--LeRd9C5oUurbVY4/edit#gid=0
I want to find the cumulative values for sales for difference stores in one column. For example, the cumulative value for store 2106 the sales figure should be 176,849
I'm using the following function
df = df.groupby('storenumber')['sales'].cumsum() but i am not getting the correct result
Can someone help?
Here's what I did to solve this problem.
import pandas as pd
import numpy as np
df = pd.read_csv('data.csv') # get data frame from csv file
You won't be able to run numerical operations on your data, as it is, because the Sale (Dollars) column in df is not formatted as a numerical type. The following piece of code will convert the data in the Sale (Dollars) and Suggested answer column to be of type float and remove the dollar sign and separating commas.
df[df.columns[2:]] = df[df.columns[2:]].replace('[\$,]', '', regex=True).astype(float)
Then, I used the following bit of code to get the cumulative value for each unique Store Number.
cum_sales_by_store_number = df.groupby('Store Number')['Sale (Dollars)'].agg(np.sum)
cum_sales_by_store_number = pd.DataFrame(cum_sales_by_store_number)
Output for cum_sales_by_store_number:
Sale (Dollars)
Store Number
2106 176849.97
I hope this answers your question. Happy coding!

how to get formula result in excel using xlwings

What i want to do is 1)get a folmula result in excel and 2)update the values to the existing excel file. [ I created and wrote the folmula using "xlsxwriter". But when I tried openpyxl (or pandas) to retrieve the folmula result, it returns 0. I want to use "xlwings" to solve this problem, but no idea how to do it. can anyone help?
#openpyx
wb = openpyxl.load_workbook(filename=xlsx_name,data_only=True)
ws = wb.get_sheet_by_name("sheet1")
print "venn_value",(ws.cell('X2').value)
#pandas
fold_merge_data=pd.read_excel(xlsx_name,sheetname=1)
print fold_merge_data['Venn diagram'][:10]
Yes, xlwings can solve this problem for you because it uses pywin32 objects to interact with Excel, rather than just reading/writing xlsx or csv documents like openpyxl and pandas. This way, Excel actually executes the formula, and xlwings grabs the result.
In order to get the value you can do:
import xlwings as xw
sheet = xw.sheets.active # if the document is open
#otherwise use sheet = xw.Book(r'C:/path/to/file.xlsx').sheets['sheetname']
result = sheet['X2'].value
Also, note that you can set the formula using, for example
sheet['A1'].value = '=1+1' # or ='B1*2' if you want to reference other cells
import xlwings as xw
sheet = xw['Sheet1']
a2_formula = sheet.range('A2').formula
sheet.range('A2:A300').formula = a2_formula #it copys relative
You can use this method for copy formula or value

Changing column width from excel files

I have a file like so that I am reading from excel:
Year Month Day
1 2 1
2 1 2
I want to specify the column width that excel recognizes. I would like to do it in pandas but I don't see a option. I have tried to do it with the module StyleFrame.
This is my code:
from StyleFrame import StyleFrame
import pandas as pd
df=pd.read_excel(r'P:\File.xlsx')
excel_writer = StyleFrame.ExcelWriter(r'P:\File.xlsx')
sf=StyleFrame(df)
sf=sf.set_column_width(columns=['Year', 'Month'], width=4.0)
sf=sf.set_column_width(columns=['Day'], width=6.00)
sf=sf.to_excel(excel_writer=excel_writer)
excel_writer.save()
but the formatting isn't saved when I open the new file.
Is there a way to do it in pandas? I would even take a pure python solution to this, pretty much anything that works.
As for your question on how to remove the headers, you can simply pass header=False to to_excel:
sf.to_excel(excel_writer=excel_writer, header=False).
Note that this will still result with the first line of the table being bold.
If you don't want that behavior you can update to 0.1.6 that I just released.

Python/Pandas: How do I convert from datetime64[ns] to datetime

I have a script that processes an Excel file. The department that sends it has a system that generated it, and my script stopped working.
I suddenly got the error Can only use .str accessor with string values, which use np.object_ dtype in pandas for the following line of code:
df['DATE'] = df['Date'].str.replace(r'[^a-zA-Z0-9\._/-]', '')
I checked the type of the date columns in the file from the old system (dtype: object) vs the file from the new system (dtype: datetime64[ns]).
How do I change the date format to something my script will understand?
I saw this answer but my knowledge about date formats isn't this granular.
You can use apply function on the dataframe column to convert the necessary column to String. For example:
df['DATE'] = df['Date'].apply(lambda x: x.strftime('%Y-%m-%d'))
Make sure to import datetime module.
apply() will take each cell at a time for evaluation and apply the formatting as specified in the lambda function.
pd.to_datetime returns a Series of datetime64 dtype, as described here:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html
df['DATE'] = df['Date'].dt.date
or this:
df['Date'].map(datetime.datetime.date)
You can use pd.to_datetime
df['DATE'] = pd.to_datetime(df['DATE'])

Pandas Read CSV with dates as DD-MMM-YY

I have a data set that looks as follows in a CSV file:
Date Sample
01-AUG-09 Sample 1
02-Aug-09 Sample 2
etc...
When I use Pandas, I read in the file with the following code:
in_file = pd.read_csv('File Name.csv', parse_dates = True)
However, it is not recognizing the date column properly. Does anybody know if the Pandas date parser can recognize dates that are in DD-MMM-YY format?
The following worked for me
I suspect yours is probably much simpler to parse because they are many tab separated? (I did an exact width parsing which is not trivial)
In [41]: df = pd.read_fwf(StringIO(data),widths=[9,13],parse_dates=True,index_col=0,names=['sample'],header=None,skiprows=1)
In [42]: df
Out[42]:
sample
2009-08-01 Sample 1
2009-08-02 Sample 2
Tab separated is much simpler
In [43]: data2 = """Data\tSample\n01-AUG-09\tSample 1\n02-Aug-09\tSample 2\n"""
In [44]: read_csv(StringIO(data2),sep='\t',parse_dates=True,index_col=0)
Out[44]:
Sample
Data
2009-08-01 Sample 1
2009-08-02 Sample 2