Python Data frame Find and replace column values from a list

Python Data frame Find and replace column values from a list - replace

I have a data frame with 4 columns. i will typically have 200 or more so rows. i have an example below showing 4 rows as an example. There is a column for account number. this account number may appear multiple times in the column. i have a separate excel sheet with 2 columns, listing account number and account name. I want to replace the account number with the corresponding account name shown on my excel sheet. I cannot manually type out using the replace function for every account number, as there are hundreds of account names and numbers. is there a way i can replace the account number with their relevant account names? or perhaps append a new column showing the relevant account name?

If I understand correctly, you want something like the following (You can get l1 and l2 by parsing the excel sheet):
import pandas as pd
l1 = [100, 200]
l2 = [1000, 2000]
z = pd.DataFrame({"one": [100,100,300,200], "two": [100,100,300,200]})
"""
one two
0 100 100
1 100 100
2 300 300
3 200 200
"""
print(z)
z.two.replace(l1, l2, inplace=True)
"""
one two
0 100 1000
1 100 1000
2 300 300
3 200 2000
"""
print(z)

Related

Create numeric range slicer

I have a table containing :
VideoID Views DeviceType
1 12 Desktop
1 30 Mobile
1 95 Tablette
...
I want to create a slicer names Views Ranges with ranges like below to filter videos by the sum of their views :
From 100 to 1000
From 1001 to 10000
...
VideoID 1 has a total 137 of views, so when I choose From 100 to 1000 in the filter it will be show.
This is what I tried :
Range =
VAR ranges =[Total Views]
RETURN
SWITCH(TRUE(),
ranges <= 100, "From 0 to 100",
ranges <= 10000, "From 100 to 10,000",
ranges <= 20000, "From 10,000 to 20,000",
....
)
But the filter is empty.

This appears to fit the Basic Category of the Static Segmentation pattern. It's explained in full detail here:
https://www.daxpatterns.com/static-segmentation/
To summarize: I would add a DAX Calculated Column to your first table ("Views"?) to return the Key from a table of ranges. You can then use that Key to establish a relationship to the table of ranges. With the relationship in place, you can use the table of ranges to filter in a Slicer, or categorize in any other visual.

Why does my measure with SumX return this result?

I have 2 tables:
**Partners**
PartnerID Name
1 AAAA
2 BBBB
3 CCCC
4 DDDD
**Sales**
PartnerID SaleAmount
1 15
2 20
3 30
4 40
1 15
I have a visual with PartnerID from my Partners table and this measure:
TotalSalesMeasure: Sumx(Partners, Sum(Sales[SaleAmount]))
**Resulting table visual with measure**
PartnerID TotalSalesMeasure
1 30
2 20
3 30
4 40
What's confusing me is how the results are derived. It's my understanding that:
-The partners table is filtered using the incoming context(PartnerID)
-Filtered partners table is iterated through and Sum(Sales[SaleAmount]) is called for each row
-After Iteration is done, it is summed
First row ex:
-Partners is filtered to 1 row based on the partnerID
-Since it's using row context and not filter context in the partners table, it sums the entire Sales[SaleAmount] column one time
-That should result in 15+20+30+40+15 = 120, but it shows 30
I was basing this on a video here at around the 49 min mark where he does a similar operation:
https://www.youtube.com/watch?v=1yWLhxYoq88
What's odd is if I do the same thing he does later by wrapping Sum with Calculate, I get the same result(aside from the totals field). In fact, my result seems to be what calculate would return(again, outside of the grand total)
I'm obviously confusing something, but I don't know what it is
Edit:
I think I know what's going on now. The external filter context is applied to the sales table before summing. I didn't realize that it did that as well
SumX(
Partners <---Affected by exterior context
Sum(Sales[SalesAmount]) <---Affected by exterior context + row context
)

DAX selecting and displaying the max value of all selected records

Problem
I'm trying to calculate and display the maximum value of all selected rows alongside their actual values in a table in Power BI. When I try to do this with the measure MaxSelectedSales = MAXX(ALLSELECTED(FactSales), FactSales[Value]), the maximum value ends up being repeated, like this:
If I add additional dimensions to the output, even more rows appear.
What I want to see is just the selected rows in the fact table, without the blank values. (i.e., only four rows would be displayed for SaleId 1 through 4).
Does anyone know how I can achieve my goal with the data model shown below?
Details
I've configured the following model.
The DimMarket and DimSubMarket tables have two rows each, you can see their names above. The FactSales table looks like this:
SaleId
MarketId
SubMarketId
Value
IsCurrent
1
1
1
100
true
2
2
1
50
true
3
1
2
60
true
4
2
2
140
true
5
1
1
30
false
6
2
2
20
false
7
1
1
90
false
8
2
2
200
false
In the table output, I've filtered FactSales to only include rows where IsCurrent = true by setting a visual level filter.

Your max value (the measure) is a scalar value (a single value only). If you put a scalar value in a table with the other records, the value just get repeated. In general mixing scalar values and records (tables) does not really bring any benefit.
Measures like yours can be better displayed in a KPI or Multi KPI visual (normally with the year, that you get the max value per year).
If you just want to display the max value of selected rows (for example a filter in your table), use this measure:
Max Value = MAX(FactSales[Value])
This way all filter which are applied are considered in the measures calculation.
Here is a sample:

I've found a solution to my problem, but I'm slightly concerned with query performance. Although, on my current dataset, things seem to perform fairly well.
MaxSelectedSales =
MAXX(
FILTER(
SELECTCOLUMNS(
ALLSELECTED(FactSales),
"id", FactSales[SaleId],
"max", MAXX(ALLSELECTED(FactSales), FactSales[Value])
),
[id] = MAX(FactSales[SaleId])
),
[max]
)
If I understand this correctly, for every row in the output, this measure will calculate the maximum value across all selected FactSales rows, set it to a column named max and then filter the table so that only the current FactSales[SaleId] is selected. The performance hit comes from the fact that MAX needs to be executed for every row in the output and a full table scan would be done when that occurs.
Posted on behalf of the question asker

Creating a column with lookup from another table

I have a table of sales from multiple stores with the value of sales in dollars and the date and the corresponding store.
In another table I have the store name and the expected sales amount of each store.
I want to create a column in the main first table that evaluates the efficiency of sales based on the other table..
In other words, if store B made 500 sales today, I want to check with lookup table to see the target then use it to divide and obtain the efficiency then graph the efficiency of each store.
Thanks.
I tried creating some measures and columns but stuck with circular dependencies
I expect to add one column to my main table to an integer 0 to 100 showing the efficiency.

You can merge the two tables. In the query editor go to Merge Querires > Merge Query As New. Chose your relationship (match it by the column StoreName) and merge the two tables. You will get something like this (just a few of your sample data):
StoreName ActualSaleAmount ExpectedAmount
a 500 3000
a 450 3000
b 370 3500
c 400 5000
Now you can add a calculated column with your efficency:
StoreName ActualSaleAmount ExpectedAmount Efficency
a 500 3000 500/3000
a 450 3000 450/3000
b 370 3500 370/3500
c 400 5000 400/5000
This would be:
Efficency = [ActualSaleAmount] / [ExpectedAmount]

AWS quicksight parseInt() returns null

I'm trying to generate a QuickSight analysis with a simple .csv file. The file contains some arbitrary data like
Yifei, 24, Male, 2
Joe, 30, Male, 3
Winston, 40, Male, 7
Emily, 18, Female, 5
Wendy, 32, Female, 4
I placed the file in an S3 bucket, and then use AWS Athena to parse that into a table. The table treats all columns as strings, and I can query it properly
SELECT * FROM users
returns
name age gender consumed
1 Yifei 24 Male 2
2 Joe 30 Male 3
3 Winston 40 Male 7
4 Emily 18 Female 5
5 Wendy 32 Female 4
Okay so far so good. Then in QuickSight, I import the table as dataset, and it's properly displayed under fields with the correct values. The only problem remaining is that age and consumed are treated as strings, not numbers. So, I created two calculated fields:
age_calc: parseInt({age})
consumed_calc: parseInt({consume})
Works just fine, now under the fields I can see the newly created fields with correct values. However, once I try to create actual visualization (For example, a pie chart with how much everyone consumed) using the field consumed_calc, the value of consumed_calc is just null.

I found the issue. Basically, csv does not work very well with spaces, so despite the calculated fields showing correct result in preview, when parsed the field " 23" gets an error. Removing the spaces in the original .csv file solved this issue

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Python Data frame Find and replace column values from a list - replace

Related

Create numeric range slicer

Why does my measure with SumX return this result?

DAX selecting and displaying the max value of all selected records

Creating a column with lookup from another table

AWS quicksight parseInt() returns null

Categories

Resources