Populate df row value based on column header - regex

Appreciate any help. Basically, I have a poor data set and am trying to make it more useful.
Below is a representation
df = pd.DataFrame({'State': ("Texas","California","Florida"),
'Q1 Computer Sales': (100,200,300),
'Q1 Phone Sales': (400,500,600),
'Q1 Backpack Sales': (700,800,900),
'Q2 Computer Sales': (200,200,300),
'Q2 Phone Sales': (500,500,600),
'Q2 Backpack Sales': (800,800,900)})
I would like to have a df that creates separate columns for the Quarters and Sales for the respective state.
I think perhaps regex, str.contains, and loops perhaps?
snapshot below

IIUC, you can use:
df_a = df.set_index('State')
df_a.columns = pd.MultiIndex.from_arrays(zip(*df_a.columns.str.split(' ', n=1)))
df_a.stack(0).reset_index()
Output:
State level_1 Backpack Sales Computer Sales Phone Sales
0 Texas Q1 700 100 400
1 Texas Q2 800 200 500
2 California Q1 800 200 500
3 California Q2 800 200 500
4 Florida Q1 900 300 600
5 Florida Q2 900 300 600
Or we can go further:
df_a = df.set_index('State')
df_a.columns = pd.MultiIndex.from_arrays(zip(*df_a.columns.str.split(' ', n=1)), names=['Quarters','Items'])
df_a = df_a.stack(0).reset_index()
df_a['Quarters'] = df_a['Quarters'].str.extract('(\d+)')
print(df_a)
Output:
Items State Quarters Backpack Sales Computer Sales Phone Sales
0 Texas 1 700 100 400
1 Texas 2 800 200 500
2 California 1 800 200 500
3 California 2 800 200 500
4 Florida 1 900 300 600
5 Florida 2 900 300 600

Related

Minimum per subgroup in stata

In Stata, I want to calculate the minimum and maximum for subgroups per country and year, while the result should be in every observation.
Ulitmately, I want to have the difference between min and max as a separate variable.
Here is an example for my dataset:
country
year
oranges
type
USA
2021
100
1
USA
2021
200
0
USA
2021
900
0
USA
2022
500
1
USA
2022
300
0
Canada
2022
300
0
Canada
2022
400
1
The results should look like this:
country
year
oranges
type
min(tpye=1)
max(type=0)
distance
USA
2021
100
1
100
900
800
USA
2021
200
0
100
900
800
USA
2021
900
0
100
900
800
USA
2022
500
1
500
300
-200
USA
2022
300
0
500
300
-200
Canada
2022
300
0
400
300
-100
Canada
2022
400
1
400
300
-100
So far, I tried the following code:
bysort year country: egen smalloranges = min(oranges) if type == 1
bysort year country: egen bigoranges = max(oranges) if type == 0
gen distance = bigoranges - smalloranges
I would approach this directly, as follows:
* Example generated by -dataex-. For more info, type help dataex
clear
input str6 country int(year oranges) byte type
"USA" 2021 100 1
"USA" 2021 200 0
"USA" 2021 900 0
"USA" 2022 500 1
"USA" 2022 300 0
"Canada" 2022 300 0
"Canada" 2022 400 1
end
egen min = min(cond(type == 1, oranges, .)), by(country year)
egen max = max(cond(type == 0, oranges, .)), by(country year)
gen wanted = max - min
list, sepby(country year)
b +------------------------------------------------------+
| country year oranges type min max wanted |
|------------------------------------------------------|
1. | USA 2021 100 1 100 900 800 |
2. | USA 2021 200 0 100 900 800 |
3. | USA 2021 900 0 100 900 800 |
|------------------------------------------------------|
4. | USA 2022 500 1 500 300 -200 |
5. | USA 2022 300 0 500 300 -200 |
|------------------------------------------------------|
6. | Canada 2022 300 0 400 300 -100 |
7. | Canada 2022 400 1 400 300 -100 |
+------------------------------------------------------+
For more discussion, see Section 9 of https://www.stata-journal.com/article.html?article=dm0055
I am not sure if I understand the purpose of type 1 and 0, but this generates the exact result you describe in the tables. It might seem convoluted to create temporary files like this, but I think it modularizes the code into clean blocks.
* Example generated by -dataex-. For more info, type help dataex
clear
input str6 country int(year oranges) byte type
"USA" 2021 100 1
"USA" 2021 200 0
"USA" 2021 900 0
"USA" 2022 500 1
"USA" 2022 300 0
"Canada" 2022 300 0
"Canada" 2022 400 1
end
tempfile min1 max0
* Get min values for type 1 in each country-year
preserve
keep if type == 1
collapse (min) min_type_1=oranges , by(country year)
save `min1'
restore
* Get max values for type 0 in each country-year
preserve
keep if type == 0
collapse (max) max_type_0=oranges , by(country year)
save `max0'
restore
* Merge the min and the max
merge m:1 country year using `min1', nogen
merge m:1 country year using `max0', nogen
* Calculate distance
gen distance = max_type_0 - min_type_1

Power BI - Showing Top 5 records in Metrix Table but total should show for all records

I have table with thousands of record. i want to create a table visual that will show top 5 records for each category. i created a measure to achieve this and i am getting the result exactly the same i am looking for but facing one issue there.
See below image where i am showing top 5 records for each category, but after each category i have total.
I don't want that total for top 5 records i am showing in the table instead i want the total of all the records which is there under each category.
How can i achieve that?
Measure I created is - Top 5 = RankX(AllSelected(table(Category), Table(account), table(name)),amount_measure,,,Dense)
for Top 5 measure i am putting the filter for top 5.
Category
Account
Name
P%
amount
country
owner
Food
A101
AA11
10%
105
India
A
Food
A102
AA12
20%
120
India
A
Food
A103
AA13
80%
100
India
A
Food
A104
AA14
30%
150
India
A
Food
A105
AA15
60%
90
India
A
Stat
B101
AA11
10%
205
India
A
Stat
B102
AA12
20%
220
India
A
Stat
B103
AA13
80%
200
India
A
Stat
B104
AA14
30%
250
India
A
Stat
B105
AA15
60%
190
India
A
Admn
D101
AD11
10%
305
India
A
Admn
D102
AD12
20%
320
India
A
Admn
D103
AD13
80%
300
India
A
Admn
D104
AD14
30%
350
India
A
Admn
D105
AD15
60%
290
India
A
Thanks,
SK
You can try this
Let's suppose you have the following measures
_sumAMT:= SUM('Table 1'[amount])
and this is your ranking measure
_sumAMTRank:= RANKX(ALLEXCEPT('Table 1','Table 1'[Category]),[_sumAMT],,DESC,Dense)
You can revise the subtotal by doing this
_sumAMT by CAT:= CALCULATE(SUM('Table 1'[amount]),ALLEXCEPT('Table 1','Table 1'[Category]))
_revisedTotal:= IF(HASONEVALUE('Table 1'[Name])=true(),[_sumAMT],[_sumAMT by CAT])

How to plot % Running Difference in Google Data Studio?

I'm creating charts in Google Data Studio, but unable to perform row wise aggregations. How would I plot bar/line charts of " % diff " of a specific column/value.
% diff = ((Today - Previous day) / (Today)) X 100
for example, I have table as shown below with two column DATE and VALUE and I want to plot date wise % Diff of VALUE, i.e. for 02-01-2020 i want to plot (900-400)/900 * 100= 55 % , for 03-01-2020 i want to plot (200-900)/200 * 100= -350 %, .... and similarly for all the dates
DATE VALUE
01-01-2020 400
02-01-2020 900
03-01-2020 200
04-01-2020 700
05-01-2020 400
06-01-2020 800
07-01-2020 900
08-01-2020 500
09-01-2020 800
10-01-2020 900
11-01-2020 600
12-01-2020 400
13-01-2020 400
14-01-2020 200
15-01-2020 300

Django ORM QUERY Adjacent row sum with sqlite

In my database I'm storing data as below:
id amt
-- -------
1 100
2 -50
3 100
4 -100
5 200
I want to get output like below
id amt balance
-- ----- -------
1 100 100
2 -50 50
3 100 150
4 -100 50
5 200 250
How to do with in django orm

PowerBI - Average and Variance Calculation with conditions

I am trying to calculate Variance and Average in PowerBI. I am running into Circular dependency errors.
This is my Data,
Month Year Item Count
1 2017 Chair 100
1 2017 Chair 200
1 2017 Chair 300
1 2017 Bench 110
1 2017 Bench 140
1 2017 Bench 150
2 2017 Chair 180
2 2017 Chair 190
2 2017 Chair 250
2 2017 Bench 270
2 2017 Bench 370
3 2017 Chair 120
3 2017 Chair 150
3 2017 Bench 180
3 2017 Bench 190
4 2017 Chair 200
4 2017 Chair 210
4 2017 Bench 220
4 2017 Bench 230
.
.
.
Average = Sum of Counts for the Previous 3 months / 3
Variance = (Average - Sum(CurrentMonth)) / Average
So, because the average won't be meaningful for the first 3 months, I wouldn't be worried about that.
Expected Output,
Month Year Item Sum(CurrentMonth) Average Variance
1
1
2
2
3
3
4 2017 Chair 410 497 0.21
4 2017 Bench x y z
Lets Say for Chair,
Sum of Current Month = 200 + 210 = 410
Average of Last 3 Months = (100 + 200 + 300 + 180 + 190 + 250 + 120 + 150 )/ 3 = 1490 / 3 = 497
Variance = (497 - 410) / 410 = 87 / 410 = 0.21
Kindly share your thoughts.
I started with this as Table1 (I added a couple months data to yours):
I loaded it into Power BI and added a column called "YearMonth" using this code: YearMonth = Table1[Year]&FORMAT(Table1[Month],"00") ...to get this:
Then I added another column called "Sum(CurrentMonth)" using this code: Sum(CurrentMonth) = SUMX(FILTER(FILTER(Table1,Table1[Item]=EARLIER(Table1[Item])),VALUE(Table1[YearMonth])=VALUE(EARLIER(Table1[YearMonth]))),Table1[Count]) ...to get this:
Then I added another column called "Average" using this code: Average = SUMX(FILTER(FILTER(FILTER(Table1,Table1[Item]=EARLIER(Table1[Item])),VALUE(Table1[YearMonth])<=VALUE(EARLIER(Table1[YearMonth]))-1),VALUE(Table1[YearMonth])>=VALUE(EARLIER(Table1[YearMonth]))-3),Table1[Count])/3 ...to get this:
Lastly, I added a column called "Variance" using this code: Variance = (Table1[Average]-Table1[Sum(CurrentMonth)])/Table1[Sum(CurrentMonth)] ...to get this:
I hope this helps you.