Excel columns made up of different merged cells - regex

I'm trying to tidy up a sheet with the following problem, and would appreciate any advice.
My sheet has 7 "master columns" and about 4000 rows. It was compiled by converting a load of PDF documents.
The master columns are made up of merged minor columns, but at various parts of the data, the minor columns that make up each master column are different.
eg The first master column is made up of merged columns A-H for the first 30 rows, but for the next 25 rows it's made up of merged columns A-G etc.
As I said, overall there are still the same 7 master columns from top to bottom, but the merging is different throughout...
Can anyone think of a way to fix this without doing it all manually?

Copy your horrible spreadsheet into Word with Home > Clipboard – Paste, Paste Special, Unformatted Text and replace ^t^t with ^t. Replace All repeatedly, until Word has completed its search of the document and has made 0 replacements. Copy back in to Excel.
This is not tested on your image so there might be some issues – perhaps column misalignments (where even Word’s limited regex may help to add back tabs where suitable). The result should be no merged cells – mind you someone on SE described these along the lines of “A creation of the Devil to test us beyond endurance” (ie best avoided).

Try selecting the full document and click unmerge button from the ribbon.
As per the screen shot you provided, you can select all and unmerge but getting the corresponding fields in order might be challenging.
Try using macros to set combined functionalities in a single or combine key presses

Related

Excel formula that checks the text of a specific column and the text of a specific row and returns the data listed in the table

I am trying to display the outcome scores on one Excel sheet into another Excel sheet based on the outcome name and course.
If the text in Sheet1!C2=communication and Sheet1!E2=Comm 2010, then display Sheet1!D2 on Sheet2!B3.
If the text in Sheet1!C4=information* and Sheet1!E4=Commm 3000, then display Sheet1!1D4 on Sheet2!C5.
Need to be able to use Wildcard when checking the text.
If the text in Sheet1!C6=communication and Sheet1!E6=Comm2010, but there is no number in Sheet1!D6, leave Sheet2!B5 blank
I have played around with a few different IF AND formulas, but I can't get the data displayed correctly.
Right now, I am building a pivot table from the data in Sheet1, then taking the table and formatting it to match the table on Sheet1 then using =IF(Pivot!C7="","",Pivot!C7). This works, but building a pivot table for each student and then formatting it to match Sheet1 is a time drain.
I'm really hoping there is a better way to do this.
Thank you!
Since you are compiling outcomes on a per-student basis and not in total it is safe to use the SUMPRODUCT() function:
The formula below is used in B3
=SUMPRODUCT((Sheet1!$E$2:$E$6=Sheet2!B$1)*(Sheet1!$C$2:$C$6=Sheet2!$A3)*(Sheet1!$D$2:$D$6))
and can be copied across and down throughout B3:C4
The formula used in B5 is different, because of the 'wildcard criterion'
=SUMPRODUCT((Sheet1!$E$2:$E$6=Sheet2!B$1)*(LEFT(Sheet1!$C$2:$C$6,11)="Information")*(Sheet1!$D$2:$D$6))
(unless you are using Microsoft 365, having the formula directly suppress 0 values essentially entails doubling it in length so, as an alternative, given the small output range, a custom-number format has been implemented, which effectively doesn't display 0 in a cell where that is the formula result)

Remove Multiple Time Series from Table in Power Query Editor

I have a data set which I want to attribute some new values but since the table has multiple identical time series per for one unit I end up with too many values. unfortunately removing duplicates does not work, since the time series repeat for every unit.
My Question is: How could I remove the excess data.
The Data set looks something like this rn:
enter image description here
And this would be my optimal outcome:
enter image description here
Thanks
Removing duplicates will absolutely work. Just click select BOTH columns then right click and remove duplicates.

Power Bi dealing with repeated instruments from REDCap

I have data like this:
It comes from REDCap, and as you may be able to tell, the data in the far right columns are repeated variables about each "protocol_title" (the far left column). I.e. "Love it" and "I want a disc instead" are both about "study 2"
I've imported the data into Power Bi and currently I have this:
What I'd like is for the top left visual to only have one row per study (with columns such as principal investigator and method of image transfer, i.e. columns that had data in the first row) and a visual on the lower left with all the right-most columns.
By switching the top visual from a table to a matrix I can kinda accomplish this:
But it adds a bunch of unnecessary columns. As an alternative I thought I could add a filter to the top visual that would filter to "redcap_event_name"=="protocol_information" which would only be those top rows.... but given the visuals are linked, if I do that it removes everything from the bottom visual. I'd like to keep the link between the visuals so that if I select "study2" in the top visual, it'll highlight relevant study 2 information in the bottom one.
So my question is: what's the best approach for making the visuals I want? Are there special settings for visuals? Do I need to do something to the data first in the query? How should I go about this?
You might want to rework you data structure. At first glance, your flat source table could be parsed into two tables :
Protocol
Survey
This can be done in PowerQuery.
For Protocol :
Select columns A to R.
Filter on redcap_event (?) starts by "protocol_info"
Delete empty rows
For Survey
Select columns A (to keep the protocol ID and be able to link both tables), T and U.
Filter on redcap_event (?) starts by "survey"
Delete empty rows.
You should end up with the two table with a one-to-many relationship between Protocol[Protocol_ID] (column A) and Survey[Protocol_ID] (same)
And it should make everything much easier: visuals, calculations...

Is there an absolute column limit for Google's Charts?

I have finally gotten a column chart working for my data set. However, it only outputs fifteen columns, and the data set has 36 columns. It will output fifteen columns (or less if I limit the set to only items that are non-zero...but my boss wants all of the data shown) no matter what width the graph is set to.
Is there an absolute hard-coded column limit for graphs made by Google's Charts API, and if not, is there a way I can tell the graph to output everything?
I've just run into this myself, almost 7 years after the original problem report. Columns representing the right-side of my data are being silently un-drawn.
Let's look at the big picture. Somebody provides a charting library. They should be expected to show the data as best they can. In the case of a column table, that would be to show the first and last columns, and then choose which intermediate columns to show based on an algorithm that takes available pixels into account. It would then let the user zoom in to see the full set of columns within the selected range. This gives the developer using the chart the freedom to show an unlimited amount of data and not have to worry that someday columns at the end are simply not drawn.
Google is already choosing to not print some of the column labels due to space constraints, so they're already halfway to understanding the big picture.
Nowhere in the documentation does it explain this truncation of columns due to space constraints, or for any other chart type that I've seen. But you sure can choose your background colors in great levels of detail.
If I had known this restriction going in, I would have chosen a different chart package and not wasted my time. My choices now are to break my "Lifetime" data into yearly graphs that fit in the available space, which is clunky as hell, or migrate to a different chart package. Thanks Google. :^(
P.S. I tried to post this as a comment to the OP, but after using SO for years I don't have enough points...

How to add rows that include formula in spreadsheet?

I am working with a spreadsheet in OpenOffice. This spreadsheet already has the
formulas for each row. I need to add additional rows to this spreadsheet, but don't
know how to do so in order that it copys the formula but applies it to that row.
For example, each row has 8 columns (A-H), and there are formulas in D,F,G,and H. The formulas apply to each row, for example the last row on the sheet is the 6th row, so the formulas read like: =+B6*C6, =+E6*B6, etc.
Lets say I need to add a 7th row that utilizes the same formulas, but don't want to have to manually enter it for each new row so that it applies (for example: =+B7*C7, etc) How would I accomplish this?
Normal copy and paste will do that. That's the beauty of a spreadsheet. Although the formula looks like it says "B6" it is actually stored internally as something like "three cells to the left" so when you copy it to the row below, it is still "three cells to the left" only it appears as B7.
You can also select (click) the cell with the formula, then drag the little black square in the bottom right hand corner of your selection, down to repeat however many times you need it to.
I found a way.
Select the entire row (clicking on the row number), hold Ctrl+Alt and drag the row to the line below. It will copy+insert the row.
You can also copy the entire row and "paste special" (Ctrl+Shift+V) selecting the option to shift the cells down
I just didn't find a way to insert many rows at once with the data.
(I now it´s an old post, but it´s to help people looking on Google)
Excel tables auto fill formulas as new rows are added.
As easy as it sounds to copy/fill formulas down, it is usually beyond a users ability to either comprehend or remember to do the fill. OpenOffice/libre office, etc., needs to be able to do so.
Although new rows with copied formulas can be inserted anywhere for any reason, these instructions assume that you have spreadsheet with many data rows. And above is/are header rows. And below is/are total rows. And it is your desire to add a row to the bottom of the data rows (immediately above the total rows) and that the data rows contain formulas you would like to copy. And the new data row is to applied to the total below.
The first time you do it... not so easy... After that... two clicks...
Select the row with the formulas by clicking the row number (the last data row).
Copy the row (Ctrl+C).
Press the down arrow (now on footer row possibly containing summation formulas).
Begin a special paste operation (Ctrl+Shift+V).
Change the "selection" check marks so that only "Formulas" is selected.
Chose "Down" in "Shift Cells".
Hit "OK" (or press Return) to inset the row.
Edit the summation formula (F2) and make sure the summation range is still correct. If it is not, then you can manually fix the range, but you really need to change the following LibreOffice setting:
Choose menu option "Tools" (Alt-T).
Choose "Options" (O) in the "Tools" menu.
Expand "LibreOffice Calc" (Hit the disclosure triangle there).
Select "General" in the "LibreOffice Calc" expansion list.
Put a checkmark in (click) the "Expand references when new columns/rows are inserted"
Now that you have the row in the copy buffer and proper setup is complete, you only need to be on the first footer row, press Ctrl+Shift+V and hit Return.