I have joined an excel sheet with the world population web link from Wikipedia in my Power BI tool. When I merge these two tables, it shows me the population only from the United States, other countries have null values.
Would really appreciate the help. Screenshots provided below
It looks like your merge isn't matching the rows as you expect.
I would try to investigate if there are "invisible" differences in the columnar values:
Canada (with an appended space) will not match to Canada for example. To check for this, go into the table you are merging and select to Trim the key column.
In the table you are merging into, do the same Trim operation for the key column.
Edit: Another option is to apply fuzzy matching to the merging process and to limit the amount of fuzzyness by setting maximum number of matches per row and adjusting the similarity threshold up from 0.80 to something closer to the maximum 1.00 (= exact matching).
I think issue is left join. Try with join tables with country column to present all columns.
Click dropdown and select full outer feature and expand.
Related
I'm pretty new to Power BI. I'm unsure how to approach this.
I have one visualization that displays the ten most frequently bought products in a time frame that is set by a slicer. In another visualization, I display how those products have been selling over the past few years (this time frame is not determined by the slicer). I want to display only the ten products that come from the first visualization, not the ten most common over the time frame in the second visualization.
How can I accomplish this? The approach I have in mind (and I'm open to others) is to create a true/false column that changes with the first visualization. "True" would be for products that are frequently bought as determined by the first visualization in the slicer-determined time range, and the second visualization would only look at values with a "true" in that column. How can I create a column (or table, maybe?) that changes depending on a visualization?
Clarification: most of the pages will say Top10 ... Actually, the measure used was a simple Top5 that includes products with the same number of orders than the 5th product. Therefore, to avoid dealing with larger images, 7 products will be seen but it is a Top5 ranking. The idea is you can replace it with your custom TopN measure.
What I understood:
The simplification of your model plus the disconnected help table would be:
I have one visualization that displays the ten most frequently bought
products in a time frame that is set by a slicer.
The Date slicer belongs to the Dates table in the Data model.
The table viz represents the number of rows in the sales table in the
current context (for each product within the Date range).
The table viz is sorted according to the [#Rows] measure in descending
order.
The table viz only presents the TopN products even without the presence
of the [#Rows] measure due to the presence of the [TopOrders]
measure within Filters on this visual. [TopOrders] is 1.
On the second page you create:
A slicer with the Dates[Date] column (the same one used on the
previous page).
A matrix with Products[ProductName] on the rows, HDates[Year] on
the columns, and a measure on values.
From the View tab, you select the Sync Slicers option.
Inside the Sync Slicers pane:
In the Sync column, check the boxes related to the necessary pages.
In the Display column uncheck the box that contains the over
years report.
So far all we have done is pass the time frame context from page 1 to page 2.
Since the TopN context depends on the time frame context, we can now use the [TopOrders] measure as a Filters on this visual in the matrix. Again, [TopOrders] is 1.
Why do the numbers differ between rows and not between columns?
Also, in this example, the Sales table only has information up to 12/31/2020 but the visualization shows an additional year and the Sales[Amount] values for each order is $1 so that [#Orders] and [SalesAmount] are the same for easy comparison.
HDates is not related to the model and for each combination of HDates[Year]-Products[ProductName], the [SalesAmount] measure is using the information coming from the previously hidden slicer and the respective Products[ProductName] because the information coming from HDates[Year] has no effect yet.
In order to complete this exercise, it only remains to modify the [SalesAmount] measure in such a way that it removes the filter on the time frame (Dates[Date]) and it recognizes HDates[Year] as Dates[Year].
SalesAmount :=
CALCULATE(
SUM(Sales[Amount]),
ALL(Dates),
TREATAS(VALUES(HDates[Year]),Dates[Year])
)
And this is the final result.
I hope it works for someone or the idea can be improved.
I am using a power BI matrix report and I want to fill the blank values to 0 in the matrix tables. The data source would be a table from SQL server.
I am looking for options to fill the blank values with 0 using power BI? Any help would be greatly appreciated.
In a given table, (Blank) often comes from "null" in a column. Under Transform data, you can select the column you want to edit, then select "Replace Values" in the Home ribbon. Then it just works like a find and replace in any editor.
As mentioned in the comments, Blank is there for a reason and replacing to 0 may be a bad idea, depending on the data. In general, I try not to destroy any data unless entirely unavoidable.
Consider other solutions:
Like if you just don't want your calculated visualizations to show "(Blank)", do something like Measure = CALCULATE(<something>)+0 and it'll show a calculation of 0 if theres nothing in the column.
If you have a slicer showing a "(Blank)" category, just filter it out in the filters sidebar.
I'm trying to return "duplicates" from a range. In this case a duplicate is when there exists more than one row that has the same data in the first and last column (the data in the middle columns needs to be returned, but is irrelevant in terms of having useful data for the search to be performed on).
For a small example data set and desired output see this sheet.
My current incomplete solution path is as follows:
I use
=QUERY({SourceData!A2:E,ARRAYFORMULA(IF(LEN(SourceData!A2:A),COUNTIFS(SourceData!A2:A&SourceData!E2:E,SourceData!A2:A&SourceData!E2:E,ROW(SourceData!A2:A),"<="&ROW(SourceData!A2:A)),))},"select Col1, Col2, Col3, Col4, Col5 where Col6 > 1")
where the ARRAYFORMULA appends a rolling count column to the end of the range and then QUERY the rows of the original range where the rolling count is above 1.
However, this only gives me the subsequent rows and not the first of the duplicates. (In the example it only gives me the second row of the matching pair and not the first.)
I'm tempted to limit the QUERY output to just column 1 and then wrap that output in a JOIN to make the output conditions of another QUERY. But given the size of the actual data set and the sheer number of IMPORTRANGEs and QUERYs I've already got going I'm starting to worry about efficiency. (I've got 12 Google Sheet documents all importing from a 13th Google Sheet document then the 13th document pulls and combines data from the 12 other sheets and spits subsets of the combined data set back to each of the 12 other documents.) The whole thing won't be usable if a user has to wait multiple minutes while all the functions resolve. Plus I'm sure someone out there has a more elegant way of getting this done that would be helpfully enlightening to an amateur such as me.
Advice is appreciated! Thank you for your time.
try:
={SourceData!A1:E1;
ARRAYFORMULA(FILTER(SourceData!A2:E, REGEXMATCH(SourceData!A2:A&SourceData!E2:E,
TEXTJOIN("|", 1, FILTER(SourceData!A2:A&SourceData!E2:E,
COUNTIFS(SourceData!A2:A&SourceData!E2:E, SourceData!A2:A&SourceData!E2:E,
ROW(SourceData!A2:A), "<="&ROW(SourceData!A2:A))>=2)))))}
I am not sure if my title is correct one, but here is the deal:
I want to make a matrix visualization in Power Bi Desktop. I have fields: ARTICLE_ID and ARTICLE_NAME.
I would like to have both those fields in COLUMNS position in matrix data view. And I need them to be in one row, no need to drill down... because it is one and the same thing...
I need to have ARTICLE_ID and ARTICLE_NAME as two separate columns on the same level. without drilling. And also, I dont want to use concatenation or merging into some third column. is that possible? thanks
1) I started with this sample data.
2) I created a matrix and configured it as shown in the image below.
3) I clicked the forked arrows to show all levels.
4) In the Rows section of the formatting pane, I turned off "Stepped layout".
5) In the Subtotals section, I turned off "Row subtotals".
I don't know if this is exactly what you are looking for, but I think it is the closest I can come up with since you don't want to concatenate the columns together.
If they are from the same table then just drag and drop them into the columns.
The way a matrix works from my understanding in PowerBI is;
Rows are just the headings/categories of the values.
You might need to also go into the format tab, values, and make sure show on rows is on.
For example, let's say our value headings are rainy days and sunny days.
Your columns are months.
The rows will be the 2 categories.
The values will be the values.
As concatenation and "&" does not work in Power BI directquery you can use the below one:
[New_column_name] = if((table_name[column_name]="A" && table_name[column_name]="B"),"AB","NA")
Thanks,
Sachin Kashyap
I want to calculate the medians for a series of numbers from an excel file.
My excel spreadsheet looks like this:
CELLNOUN 9.32
CELLNOUN 10.62
CELLNOUN 8.42
CELLNOUN 10.64
CELLNOUN 11.51
CELLNOUN 12.01
CELLNOUN 8.83
CELLSNOUN/CELLNOUN 9.53
CELLSNOUN/CELLNOUN 9.21
CELLNOUN/CELLSNOUN 10.76
CELLNOUN/CELLSNOUN 7.01
CELLSNOUN/CELLNOUN 10.21
PLANTNOUN/PLANTSNOUN 3.62
PLANTNOUN/PLANTSNOUN 3.38
PLANTSNOUN/PLANTNOUN 3.92
PLANTSNOUN/PLANTNOUN 3.24
PLANTNOUN/PLANTSNOUN 3.83
PLANTNOUN/PLANTSNOUN 3.24
PLANTSNOUN/PLANTNOUN 3.00
PLANTSNOUN/PLANTNOUN 1.80
...
In the spreadsheet, each set of words has been separated by a blank row, but the numbers of the entries for each set varies, like CELLNOUN/CELLSNOUN has 12 entries but PLANTNOUN/ has 8 entries. The numbers coming after the words are, in fact, the occurrences of these words. I want to find out the median of the occurrences for CELLNOUN/CELLSNOUN, PLANTNOUN/PLANTSNOUN etc, by using Regex instead of using the MEDIAN function in Excel to do it, because I have thousands of sets like this and I can't do it one by one on Excel. But if you know a quicker way to do it on Excel, please advice.
Thank you very much.
First of all, remove the blank rows from your data set and then create an Excel Table with Insert > Table or Ctrl-T. With an Excel table object, all functions and commands that refer to the table will catch when more data is added to the table.
Now you can create a pivot table from your source data with Insert > PivotTable. If you drag the first column field into the rows area, you will have a list of unique values in that source data column. You can drag the values column into the Values area of the Pivot Panel, if you want to. This should now look similar to this screenshot:
I'm not sure if you are aware of the different spellings of your categories, i.e. with or without an "S". The pivot table uncovers them all.
Out of the box, Excel PivotTables do not offer the Median as an option to aggregate, but you can use a method outlined here
http://www.myonlinetraininghub.com/calculating-median-in-pivottables
to calculate a median.
The exact approach varies depending on whether or not you use Pivot tables or Power Pivot, so check out the article.
Use an array formula as shown below and press ctrl+shift+enter to make it an array formula:
=MEDIAN((IF($A$1:$A$20=A1,$B$1:$B$20)))
Refer to the formula bar in the image below to apply to all cells by applying the same formula to all cells