Reading Excel with edit on read in DataFusion - google-cloud-platform

I am reading an excel file with google DataFusion Wrangler plugin. In the excel the first row needs to be discarded, as headers and data start from second row.
Problem is when Wrangler reads and parse-as-excel a file, it gives default option of choosing the first row as header. Need some help to isolate such that first row is skipped and header is 2nd row with the data following.
Thanks for the help!

This behavior is currently not supported by the Wrangler plugin. As you are already aware, Wrangler will only take a look at the first column to decode headers.
In this case, pre-processing the file to remove the first row is the easiest solution.

Yes, it is possible. You have to filter any column that will always have values (not empty). But you will have to enter the column names by hand after.
Inside the wrangler, go to the column "A", click on the arrow to open the menu.
Then you choose "Filter", and you choose to "Remove rows" if value is empty.
It will remove the first row.
You repeat the operation, but filter to remove if column "A" is equal the value that is the header for the A column.

Related

Informatica powercenter always can't read the file properly. All data always appears in one column

I have problem with informatica powercenter. When i want to import data from flat file csv, all datas always appear in one column. I need to edit the file first, and set define name in excel then informatica can read all data properly. How to read the data properly in powercenter without doing define name first in excel?
Thank you
You need to ensure,
You're reading file definition as delimited. Here is a file wizard where you can define it as delimited.
while reading set it so it reads col name from first row.
And then read from second row.
You can check this img.
https://2.bp.blogspot.com/-enDSMKLYyRY/UXADBtNE8WI/AAAAAAAAAu8/oVfr6IsAl8Y/s1600/8.jpg
If you set above properties up, infa should be able to read definition properly and you dont have to set col name or datatype.

Power BI - Model Object Names must be non empty

I made few changes in the query editor. But when I close and apply the changes. I get the error Model object names must be non empty. I tried deleting the last changes I made in query editor. The error still exists.
In Power Bi, this error is saying you have empty columns in your data set. You have to delete that columns and refresh the document and it should work.
#HuaGong is correct. Most likely one or more columns are loaded without a name.
This says that you have left one Column heading blank.
Delete the column or give a name to the column.
This happens when you enable "Use First Row as a header" and there are null values in the first row.
This can also happen if the Excel (other source document you are using) is open on your desktop. You have to close the document and then try again.

How to deal with header names changing in CSV data source?

Will make this short and sweet - we have a massive .CSV that we are linking to PBI Desktop. Some of the header names in this .CSV were not optimal and have since been updated by the SQL backend. However, PBI is not happy with not being able to find the exact header that existed previously, and we could not find a route by which to tell the software that a header name had changed. Is there a quick solution for this?
Here is an idea to work around this issue,
as long as your header are changing, I recommend you to do these step in power query:
search on your power query the step where the headers are promoted
instead of this step, delete the first row (the one containing the headers)
Then add a step renaming the header as desired
Hope that helps
Quick.. Not so much. Easy? Relatively.
You'll need to manually edit the PowerQuery in the Advanced Query side.
I recommend un-hiding the formula bar in the Query Editor and going step by step through the applied steps. Once you find a broken step, check out the PowerQuery, you'll see your no longer existent fields there as plain text in the formula bar ( or advanced editor view ). Swap out the old field names in the PowerQuery with the new names and you should be golden.
You might even get away with a few find/replaces..
You can fix this if you delete your top rows in your first row operation in query editor. This way the CODE of your power BI query will not contain a specific name and will name your columns: 'column 1' ; 'column 2' etc.etc.
You can now edit to your own demands and when you change datasource nothing will go in error.
Hope this helps for people who are also looking into the problem.

Excel columns made up of different merged cells

I'm trying to tidy up a sheet with the following problem, and would appreciate any advice.
My sheet has 7 "master columns" and about 4000 rows. It was compiled by converting a load of PDF documents.
The master columns are made up of merged minor columns, but at various parts of the data, the minor columns that make up each master column are different.
eg The first master column is made up of merged columns A-H for the first 30 rows, but for the next 25 rows it's made up of merged columns A-G etc.
As I said, overall there are still the same 7 master columns from top to bottom, but the merging is different throughout...
Can anyone think of a way to fix this without doing it all manually?
Copy your horrible spreadsheet into Word with Home > Clipboard – Paste, Paste Special, Unformatted Text and replace ^t^t with ^t. Replace All repeatedly, until Word has completed its search of the document and has made 0 replacements. Copy back in to Excel.
This is not tested on your image so there might be some issues – perhaps column misalignments (where even Word’s limited regex may help to add back tabs where suitable). The result should be no merged cells – mind you someone on SE described these along the lines of “A creation of the Devil to test us beyond endurance” (ie best avoided).
Try selecting the full document and click unmerge button from the ribbon.
As per the screen shot you provided, you can select all and unmerge but getting the corresponding fields in order might be challenging.
Try using macros to set combined functionalities in a single or combine key presses

How to add rows that include formula in spreadsheet?

I am working with a spreadsheet in OpenOffice. This spreadsheet already has the
formulas for each row. I need to add additional rows to this spreadsheet, but don't
know how to do so in order that it copys the formula but applies it to that row.
For example, each row has 8 columns (A-H), and there are formulas in D,F,G,and H. The formulas apply to each row, for example the last row on the sheet is the 6th row, so the formulas read like: =+B6*C6, =+E6*B6, etc.
Lets say I need to add a 7th row that utilizes the same formulas, but don't want to have to manually enter it for each new row so that it applies (for example: =+B7*C7, etc) How would I accomplish this?
Normal copy and paste will do that. That's the beauty of a spreadsheet. Although the formula looks like it says "B6" it is actually stored internally as something like "three cells to the left" so when you copy it to the row below, it is still "three cells to the left" only it appears as B7.
You can also select (click) the cell with the formula, then drag the little black square in the bottom right hand corner of your selection, down to repeat however many times you need it to.
I found a way.
Select the entire row (clicking on the row number), hold Ctrl+Alt and drag the row to the line below. It will copy+insert the row.
You can also copy the entire row and "paste special" (Ctrl+Shift+V) selecting the option to shift the cells down
I just didn't find a way to insert many rows at once with the data.
(I now it´s an old post, but it´s to help people looking on Google)
Excel tables auto fill formulas as new rows are added.
As easy as it sounds to copy/fill formulas down, it is usually beyond a users ability to either comprehend or remember to do the fill. OpenOffice/libre office, etc., needs to be able to do so.
Although new rows with copied formulas can be inserted anywhere for any reason, these instructions assume that you have spreadsheet with many data rows. And above is/are header rows. And below is/are total rows. And it is your desire to add a row to the bottom of the data rows (immediately above the total rows) and that the data rows contain formulas you would like to copy. And the new data row is to applied to the total below.
The first time you do it... not so easy... After that... two clicks...
Select the row with the formulas by clicking the row number (the last data row).
Copy the row (Ctrl+C).
Press the down arrow (now on footer row possibly containing summation formulas).
Begin a special paste operation (Ctrl+Shift+V).
Change the "selection" check marks so that only "Formulas" is selected.
Chose "Down" in "Shift Cells".
Hit "OK" (or press Return) to inset the row.
Edit the summation formula (F2) and make sure the summation range is still correct. If it is not, then you can manually fix the range, but you really need to change the following LibreOffice setting:
Choose menu option "Tools" (Alt-T).
Choose "Options" (O) in the "Tools" menu.
Expand "LibreOffice Calc" (Hit the disclosure triangle there).
Select "General" in the "LibreOffice Calc" expansion list.
Put a checkmark in (click) the "Expand references when new columns/rows are inserted"
Now that you have the row in the copy buffer and proper setup is complete, you only need to be on the first footer row, press Ctrl+Shift+V and hit Return.