I am trying to update old excel file with new one with the data by comparing date in two excel files.
Objective is to update the previous dates columns and add new dates found in new excel into the old excel.
Also copy the formatting from old column into new columns that were added.
What I did, I tried to merge the dataframes from two excels. I still do need help on the logic.
Excel Old File
Excel New File
Please someone help in this
The formatting is kind of worth a separate question.
Here's my approach to the merge that i think you are seeking.
In quick summary we use combine_first()
df_old_excel = pd.read_excel(r'C:\temp\Excel_Old_File.xlsx',header =1)
df_new_excel = pd.read_excel(r'C:\temp\Excel_New_File.xlsx',header =1)
df_old_excel = df_old_excel.set_index('DATE')
df_new_excel.index = df_old_excel.index
df_new_excel.combine_first(df_old_excel)
Related
We are utilizing an excel monthly report in our power bi project that has added measure columns, and we keep the sheets the fields pull the data from in one folder. When we get each month's updated excel sheet, would we be able delete the old one, add the new report to the folder with the exact same name as the old, and refresh the power bi query to use the new updated data? All the column headers would remain the same, the only thing that would be changing is maybe the amount of rows and the data within them. If we were to keep all the names the same, the only thing changing is the data sheet itself (not the column headers just the data) would the added measure columns remain and work? The measure columns act as column data multipliers and filters, and it would be a pain to make new ones each month.
Thanks
Yes. If the file path and filename and sheet/table name all remain the same, Power BI won't know the difference and you shouldn't have trouble if the columns and headers stay consistent.
Additionally, if you don't want to rename the file or delete/move older files from the folder, you could do a Load from Folder query and sort by date created/modified and grab the top row instead of specifying the filename.
I have multiple Excel files in one folder and I loaded the entire folder to PowerBI. The first Excel file is the sample file to PowerBI and I applied some formatting steps to this table of which one of those is to remove the top three rows. PowerBI should now remove the three top rows of all other Excel files in this folder too now. However, I see that of some Excel files it only removes 1 row. Does anybody know what causes this? Thanks in advance.
This is probably caused by inconsistent Excel files. The most common issue that would cause the behavior you describe is hidden rows in Excel, which Power BI will read as data rows. But hey - it's Excel so the users could've done almost anything.
You can edit the Sample File query to point it at the file with issues which might give you more insight.
More generally I would say that is a fragile query design, instead I would try to filter on a column e.g. Remove Empty.
It was caused by the fact that the merging of all Excel files happens before the Top 3 Rows were removed, resulting in the fact that only the first three rows of the total merged table were removed.
Will make this short and sweet - we have a massive .CSV that we are linking to PBI Desktop. Some of the header names in this .CSV were not optimal and have since been updated by the SQL backend. However, PBI is not happy with not being able to find the exact header that existed previously, and we could not find a route by which to tell the software that a header name had changed. Is there a quick solution for this?
Here is an idea to work around this issue,
as long as your header are changing, I recommend you to do these step in power query:
search on your power query the step where the headers are promoted
instead of this step, delete the first row (the one containing the headers)
Then add a step renaming the header as desired
Hope that helps
Quick.. Not so much. Easy? Relatively.
You'll need to manually edit the PowerQuery in the Advanced Query side.
I recommend un-hiding the formula bar in the Query Editor and going step by step through the applied steps. Once you find a broken step, check out the PowerQuery, you'll see your no longer existent fields there as plain text in the formula bar ( or advanced editor view ). Swap out the old field names in the PowerQuery with the new names and you should be golden.
You might even get away with a few find/replaces..
You can fix this if you delete your top rows in your first row operation in query editor. This way the CODE of your power BI query will not contain a specific name and will name your columns: 'column 1' ; 'column 2' etc.etc.
You can now edit to your own demands and when you change datasource nothing will go in error.
Hope this helps for people who are also looking into the problem.
I use SAS EG 6.1 to add sheets to an existing excel file (xlsx). I use a simple proc export with DBMS=xlsx. The data is written to the excelfile succesfully.
However it appears that formatting in Excel is taken from the already existing sheets. There also is a difference between cells that contains numbers vs. cells that contain text. For instance when in the existing sheet i used header 1 cell style, the numbers in the exported worksheets also had this header 1 style.
Screenshot of the existing sheet: Existing Worksheet
Screenshot of the added sheet (wrong formats)
The wrongly formatted added worksheet
I tried the following things:
- add an extra sheet without formatting and place this as first sheet in the workbook. My thought was the exported work sheets then wouldn't have the format either. No succes.
- add an extra sheet without formatting and place this as last sheet in the workbook. My thought was the exported work sheets then wouldn't have the format either. No succes.
2 alternative possible solutions i think of are:
1) using the pcfiles and ranges method. I will try this and post the results.
2) recreate the existing workbook and pray to see different results.
Did anyone have this experience and solved this problem?
update 17-1-2016: added screenshots and tried the procedure with a fresh excel workfile. The latter didn't result in succes.
Is there an excel-lib for Django, whom doesn’t put a limit at 65k+ rows?
Or plan B: dirty workaround to make xlwt produce the desired files?
You need to use the new Excel format, xlsx, instead of the old one with xls extension. After Excel 2007 the 65.000 row limit was increased to 1 million rows. Unfortunately, I don't know about Django and I can't suggest any library to produce the new Excel format.