Having trouble transforming this dataset for ETL

Having trouble transforming this dataset for ETL - powerbi

I'm playing around with some datasets on Kaggle.com, trying to learn better practices for ETL, as I tend to get stuck with specific things with the transform part. For this question, I am dealing with the survey results from Stack Overflow 2018: https://www.kaggle.com/stackoverflow/stack-overflow-2018-developer-survey - specifically the LanguageWorkedWith column.
Currently I am using a combination of RapidMiner/Excel to attempt to change the data. I am not well versed in R and Python code enough to solve this problem with coding methods.
The problem with the current column, is it lists all the languages that a user has chosen separated by a semi-colon. I can easily split a column on a semi-colon, but what occurs is either 2 things:
I have 31 columns of LanguageWorkedWith1 - LanguageWorkedWith31. This makes gathering a count of languages by salary to not work.
A cartesian effect where each row would be duplicated to accommodate only the choice of language. So you'll have a lot of duplicate rows, which definitely affects the integrity of the data. I have also tried using Power BI ( the Load location) to remove duplicates on the responder ID and language, but that didnt work.
Ideally I'd like to do a language by salary visual in Power BI, similar to how many kernals have it, but cant figure out the process for making this happen outside of code. Not sure how this would look exactly, but if i can split all the languages and count them, I can at least do something like this:
But I'm not sure if i can relate this back to salary with how the data is.
I just want to understand some transforming processes better! Appreciate any help!

The key here is to split into rows instead of columns.
So that you end up with a table like this:
You can keep that row expansion in its own related table in your data model so you aren't creating a giant table.
From there it's pretty easy to make visuals provided you know a little bit of DAX. For example, I created an AvgSalary measure (after converting that column to a numeric type) like this:
AvgSalary =
CALCULATE (
AVERAGE ( survey_results_public[ConvertedSalary] ),
FILTER (
survey_results_public,
survey_results_public[Respondent] IN VALUES ( 'Language'[Respondent] )
)
)
and was then able to create interesting charts like the following:

Related

Converting a Tableau report with different measures for each row to Power BI

I'm working on a project where we are converting a client from Tableau to PBI. One of the Tableau reports I'm converting looks like this:
Each row is a different calculation (measure). I can achieve a similar look, with regards to the column headers, in PBI by using a matrix. However, there isn't a way, that I know of, to apply a different measure for each row. The only way I can think of to do this is to create three matrix tables and stack them on top of each other. It won't look nearly as good but I can generate the same results. Does anyone have a better solution?

Put the Measure Names pill on Rows, Measure Values on Text and your date fields on Columns. That should give you when you want.

How to get Nth value from a list in Power BI?

I have a basic question about Power BI but I cannot find the answer anywhere and I don't have the time to take an entire course in DAX before I can do something so basic.
I have a column containing a list of values which are latitude and longitude, and I'm trying to define a measure for each.
In essence I just want to do something like LatitudeMeasure = Locations[0], LongitudeMeasure = Locations[1]. But it doesn't seem to be that simple.
I have no idea how to do it, I'm finding nothing online and I don't feel like completely redesigning my table structure just for something so basic.

Creating a Quick Calculation with Dates that Repeat in PowerBI

So, I'm in PowerBI Desktop. I have a table that pulls in data for various properties (website A, website B, website C) and creates a row for each property, for each day. So, for example, it'd look like this over the course of three days.
A snapshot of the data:
I need to create a single measure, showing the total number of returning users (for all properties) month-to-date.
My original plan was to do Quick Measures > Time Intelligence > Month-to-date Total.
The "Quick Measure" form I'm using:
This creates this measure:
usersReturning MTD =
IF(
ISFILTERED('dailyLog'[date]),
TOTALMTD(SUM('dailyLog'[usersReturning]), 'dailyLog'[date].[Date])
)
However, when I try to make this value shown in a card tile, it just shows (blank). And in the past, I know at some point it creates another kind of error. (I'm having difficulty replicating the value.) I'm wondering if this is because I don't have unique dates, but repeating dates? But I'm not getting any feedback on why.
I'm relatively new to PowerBI, and particularly to the quick measures and DAX scripting. So open to help or suggestions, and wondering if there is a way to make this work with the data schema that I'm showing here.

Based on this data...
You can use a TABLE visualization. Don't check date, & use sum on the returning users to get this report:
Your "single measure" is 7565, but you can also see a summary of A, B, & C to understand how that measure came to be. But if you really just want the 7565 all by itself, then select Card as the visual:

DAX function to check the amount of dates that is greater than another set of dates in 2 different columns

I'm currently doing an internship where I have been asked to make a few visuals in Power BI
I've searched around, tried a couple of things. But the truth is I am very much a beginner at coding and functions in general. Only had basic courses of different languages during my education and to be fair, it's a bit outside my scope of work.
So I have 2 columns I need to compare in order to find out how many dates in column 2 that is greater than the dates in column 1
So I'm imagining something like:
Measure = IF[(Investments(Expected closure)]<[(Investments(Actualclosure)]
Basically I want an overview of how many investments have a later closure date than expected.
Next thing would possibly be to create a boxplot showing the distribution (by how far we are off).
I know this is very basic, and possibly not formulated in the best way possible, please let me know if you need any more information.
Thanks in advance

You can use a calculated column as a flag to identify if actual date > expected date and then count the flag.
Flag = IF('Table'[Act] > 'Table'[Exp], 1, 0)
Hope this helps. Thanks.
enter image description here

Welcome to the community. Be sure to read to read this for posting questions.
For your questions, you can use the following code. You are filter the table with your logical expression with FILTER and then you count the lines of a column with COUNTA.
Measure = CALCULATE(COUNTA('Investments'[Actualclosure]),FILTER(ALL('Investments'),'Investments'[Actualclosure]>'Investments'[Expected closure]))
Hope this solves your problem.

How to compare two columns values in the same table PowerBI DAX

I am working in a data analysis project using MS Power BI, thankfully I'm doing good work to start. However, I'm facing a little problem with DAX syntax. I come from a web development background. Anyways, my current problem is that I have rental vehicles, which can be rented from one branch and handed in at another.
I would like to compare two columns values in the same table. 'owner_branch' and 'current_branch'. Is it a good choice to create a filter with DAX? Or should I move to R Language?

If I understood your problem correct you need Calculated Column like this:
CompCol = IF ( Owner_Branch = Current_Branch, TRUE, FALSE )

as a temporary solution which I think that is not an efficient solution for a larger records in future. Anyways, my solution was creating a new column type of boolean.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js