Calculating median on a excel file - regex

I want to calculate the medians for a series of numbers from an excel file.
My excel spreadsheet looks like this:
CELLNOUN 9.32
CELLNOUN 10.62
CELLNOUN 8.42
CELLNOUN 10.64
CELLNOUN 11.51
CELLNOUN 12.01
CELLNOUN 8.83
CELLSNOUN/CELLNOUN 9.53
CELLSNOUN/CELLNOUN 9.21
CELLNOUN/CELLSNOUN 10.76
CELLNOUN/CELLSNOUN 7.01
CELLSNOUN/CELLNOUN 10.21
PLANTNOUN/PLANTSNOUN 3.62
PLANTNOUN/PLANTSNOUN 3.38
PLANTSNOUN/PLANTNOUN 3.92
PLANTSNOUN/PLANTNOUN 3.24
PLANTNOUN/PLANTSNOUN 3.83
PLANTNOUN/PLANTSNOUN 3.24
PLANTSNOUN/PLANTNOUN 3.00
PLANTSNOUN/PLANTNOUN 1.80
...
In the spreadsheet, each set of words has been separated by a blank row, but the numbers of the entries for each set varies, like CELLNOUN/CELLSNOUN has 12 entries but PLANTNOUN/ has 8 entries. The numbers coming after the words are, in fact, the occurrences of these words. I want to find out the median of the occurrences for CELLNOUN/CELLSNOUN, PLANTNOUN/PLANTSNOUN etc, by using Regex instead of using the MEDIAN function in Excel to do it, because I have thousands of sets like this and I can't do it one by one on Excel. But if you know a quicker way to do it on Excel, please advice.
Thank you very much.

First of all, remove the blank rows from your data set and then create an Excel Table with Insert > Table or Ctrl-T. With an Excel table object, all functions and commands that refer to the table will catch when more data is added to the table.
Now you can create a pivot table from your source data with Insert > PivotTable. If you drag the first column field into the rows area, you will have a list of unique values in that source data column. You can drag the values column into the Values area of the Pivot Panel, if you want to. This should now look similar to this screenshot:
I'm not sure if you are aware of the different spellings of your categories, i.e. with or without an "S". The pivot table uncovers them all.
Out of the box, Excel PivotTables do not offer the Median as an option to aggregate, but you can use a method outlined here
http://www.myonlinetraininghub.com/calculating-median-in-pivottables
to calculate a median.
The exact approach varies depending on whether or not you use Pivot tables or Power Pivot, so check out the article.

Use an array formula as shown below and press ctrl+shift+enter to make it an array formula:
=MEDIAN((IF($A$1:$A$20=A1,$B$1:$B$20)))
Refer to the formula bar in the image below to apply to all cells by applying the same formula to all cells

Related

Excel formula that checks the text of a specific column and the text of a specific row and returns the data listed in the table

I am trying to display the outcome scores on one Excel sheet into another Excel sheet based on the outcome name and course.
If the text in Sheet1!C2=communication and Sheet1!E2=Comm 2010, then display Sheet1!D2 on Sheet2!B3.
If the text in Sheet1!C4=information* and Sheet1!E4=Commm 3000, then display Sheet1!1D4 on Sheet2!C5.
Need to be able to use Wildcard when checking the text.
If the text in Sheet1!C6=communication and Sheet1!E6=Comm2010, but there is no number in Sheet1!D6, leave Sheet2!B5 blank
I have played around with a few different IF AND formulas, but I can't get the data displayed correctly.
Right now, I am building a pivot table from the data in Sheet1, then taking the table and formatting it to match the table on Sheet1 then using =IF(Pivot!C7="","",Pivot!C7). This works, but building a pivot table for each student and then formatting it to match Sheet1 is a time drain.
I'm really hoping there is a better way to do this.
Thank you!
Since you are compiling outcomes on a per-student basis and not in total it is safe to use the SUMPRODUCT() function:
The formula below is used in B3
=SUMPRODUCT((Sheet1!$E$2:$E$6=Sheet2!B$1)*(Sheet1!$C$2:$C$6=Sheet2!$A3)*(Sheet1!$D$2:$D$6))
and can be copied across and down throughout B3:C4
The formula used in B5 is different, because of the 'wildcard criterion'
=SUMPRODUCT((Sheet1!$E$2:$E$6=Sheet2!B$1)*(LEFT(Sheet1!$C$2:$C$6,11)="Information")*(Sheet1!$D$2:$D$6))
(unless you are using Microsoft 365, having the formula directly suppress 0 values essentially entails doubling it in length so, as an alternative, given the small output range, a custom-number format has been implemented, which effectively doesn't display 0 in a cell where that is the formula result)

Binning in Power BI

I'm new in Dax in PowerBi and I wanna ask you if there is a way in which I can write a function where I can manually group the data in bins for example about 50mm, 25 mm, 10 mm, etc ?
The intention is to create a filter in which I write the number and then the data will be grouped based onn the values of the specified bins.
Your data doesn't match your bins but regardless, I would do this in Power Query. The absolute easiest way is highlight the column you're binning and then select add column - column from examples on the ribbon. Type in the first bin and Power Query will fill in the rest.
Adjust the bins if required and then click OK.
These are the steps which I follow
First
Second
Third
Forth

How to return values from a specific pattern of rows, with a drag down function?

Hi there,
I want to be able to fill in data in the right hand table based off values entered in the log on the left.
So for example, column GL contains data from FR13, FR40, FR60, FR94 (every 27 rows), and rather than always type in '=FR' followed by the row number I'd like to drag down a formula to return the values.
I would also like to be able to drag the formula down the column
I have tried using ROWS and INDEX but it hasn't quite worked.
Many thanks for your help
Tried using ROWS and INDEX, expecting it to select the cells I wanted, but the pattern failed.

Showing null values for some of the Power BI rows

I have joined an excel sheet with the world population web link from Wikipedia in my Power BI tool. When I merge these two tables, it shows me the population only from the United States, other countries have null values.
Would really appreciate the help. Screenshots provided below
It looks like your merge isn't matching the rows as you expect.
I would try to investigate if there are "invisible" differences in the columnar values:
Canada (with an appended space) will not match to Canada for example. To check for this, go into the table you are merging and select to Trim the key column.
In the table you are merging into, do the same Trim operation for the key column.
Edit: Another option is to apply fuzzy matching to the merging process and to limit the amount of fuzzyness by setting maximum number of matches per row and adjusting the similarity threshold up from 0.80 to something closer to the maximum 1.00 (= exact matching).
I think issue is left join. Try with join tables with country column to present all columns.
Click dropdown and select full outer feature and expand.

How to sum entries that have same ID OpenOffice - Calc

I have a spreadsheet similar to the one in the screenshot.
From this I want to sum all the entries in Data 2 which have the same Data 1 ID and store it in another column. So something like this:
I am not able to figure out the formula which would do this. I figured out how to get a column with unique entries I just need to figure out how to get the sum of the values which have the same data 1 id.
Can someone point me in the right direction?
You can use SUMIF, e.g. if I'm reading your sheet right, =SUMIF(A$2:A$7, A11, B$2:B$7), and then copy down. This sums the values from B2-B7 whenever the corresponding value in A2-A7 matches A11.
You can find more on SUMIF here.
you may use subtotals function under the Data tab, although it gives you the answer in the row below all the cell matching your id; then you have to click the - and + buttons that appear on the left...you may see these in this picture of my data
this is a nice resource when you have non-English characters(á, ó, ß,...), spaces, dashes and points, so it becomes difficult to process with sql