I have a data set which contains data on the average price of unleaded regular gasoline (per gallon), whole large eggs (per dozen), and whole milk (per gallon). The variables in this file are year, month, price, and type of commodity.
Year Month Price Commodity
2004 1 1.592 Gas
2004 2 1.672 Gas
2005 1 1.766 Gas
2005 2 1.833 Gas
2006 1 2.009 Gas
2006 2 2.041 Gas
2004 1 1.95 Egg
2004 2 1.979 Egg
2005 1 1.97 Egg
2005 2 1.951 Egg
2006 1 2.032 Egg
2006 2 2.21 Egg
2004 1 2.879 Milk
2004 2 2.814 Milk
2005 1 2.786 Milk
2005 2 2.906 Milk
2006 1 3.374 Milk
2006 2 3.574 Milk
Can anyone help me to create a data set that contains the average price per year for each commodity?
I am able to create a data set that contains the average price per year or per commodity, but unable to calculate average price per year for each commodity.
Note: I am using SAS 9.4 version
From your description, this is a simple code that does what you requested:
proc sql;
select mean(price) as average, year, commodity
from have
group by commodity, year
order by commodity, year;
quit;
This gives the output:
average year commodity
1.9645 2004 Egg
1.9605 2005 Egg
2.121 2006 Egg
1.632 2004 Gas
1.7995 2005 Gas
2.025 2006 Gas
2.8465 2004 Milk
2.846 2005 Milk
3.474 2006 Milk
Related
I need to calculate the total value of a column per employee per month. Then I need to impose a limit of 177 per employee per month. This will go into a matrix with employee as rows and months as columns. Lastly, i want to add up all the amounts per month to show the total in a line chart.
I made a measure to calculate the 1% with a max of amount of 177= if(0.01sum[amount]>177, 177,0.01sum[amount]). Then I used this measure in my matrix as explained above. This worked fine, but when i want to make the line chart the limit of 177 is still imposed because I use the same measure.
I tested it with some dummy data! Please do it like this:
Employee Month Amount
Jack January 1500
Joe February 20000
Joe March 1600
Jack April 1800
Brad June 10000
Jack July 9500
Joe February 9500
Brad April 6500
Jack December 12000
Joe June 8000
Brad April 9500
Jack January 1000
Jack April 1100
Jack April 8000
Joe February 12000
Joe February 12500
Joe February 13000
Brad June 15000
Brad June 16000
Here is the measure (DAX Code)you need to use:
your_measure =
if(0.01 * sum(your_table[Amount]) > 177, 177,0.01* sum(your_table[Amount]))
Then lets put it on a matrix and line chart:
If you want your 177 restriction not to be applied in line chart, Why not create another simple total measure:
= 0.01 * SUM(your table[amount])
Update requested from Peter
Now You need to check the whole picture! Employee is not a part of filter context. Model is filtered only by month! I added both measure as legends to the line chart!
I have two tables in PowerBI, one modified date and one fact for customer scores. The relationship will be using the "Month Num" column. Score assessments take place every June, so I would like to be able to have the scores for 12 months (June 1 to June 30) averaged. Then I will just have a card comparing the Previous year score and Current year score. Is there a way to do this dynamically, so I do not have to change the year in the function every new year? I know using the AVERAGE function will be nested into the function somehow, but I am getting confused not using a calendar year and not seasoned enough to use Time Intelligence functions yet.
Customer Score Table
Month
Month Num
Year
Score
Customer #
June
6
2020
94.9
11111
July
7
2020
97
11111
months
continue
2020
100
June
6
2021
89
22222
July
7
2021
91
22222
months
continue
2021
100
June
6
2022
93
33333
July
7
2022
94
33333
Date Table
Month
Month Num
Month Initial
january
1
J
feb
2
F
march
3
M
other
months
continued
I have been trying to calculate the amount of turnover happening in exective boards between 2006 and 2009 in the financial sector.
For this I have data looking like the following:
Year Bank Director DirectorID (ISIN, RoA, Size etc)
2005 Bank1 John Smith 120
2005 Bank1 Barry Pooter 160
2005 Bank1 Jack Sparrow 2070
2006 Bank1 John Smith 120
2006 Bank1 Barry Pooter 160
2006 Bank1 Jack Sparrow 2070
2007 Bank1 John Smith 120
2007 Bank1 Barry Pooter 160
2007 Bank1 Jack Sparrow 2070
2008 Bank1 John Smith 120
2008 Bank1 Carla Jansen 250
2008 Bank1 Jack Sparrow 2070
2009 Bank1 John Smith 160
2009 Bank1 Carla Jansen 250
2009 Bank1 Mike Stata 875
And this data repeats for each bank from 2005 - 2015.
Now I have already made a turnover dummy variable with 0 = no change and 1 = change by using:
collapse(sum) DirectorID, by (ISIN, Year, Bank)
gen interest = inrange(Year, 2006,2009)
bysort ID interest (DirectorID) : gen temp = DirectorID[1] != DirectorID[_N]
replace temp = . if interest==0
bysort ID : egen changed = max(temp)
However, I would like to make turnover an actual variable on how many changes were made i.e.: (assume bank2 made no change Turnover=0, bank3 made 6 changes (6 new managers came in)Turnover=6 and bank4 made 4 changes (4 new managers came in)Turnover=4.
Bank Turnover (ISIN, RoA, Size, etc)
Bank1 2
Bank2 0
Bank3 6
Bank4 4
Is this possible with Stata (or SPSS if that happens to be the case)?
ISIN codes are my ID variable as they are linked to each specific bank.
Two new people entered the board of Bank1. For now it would show as Turnover = 2 as only 2 new people entered the organization's board. Had three people joined in the previous example, in that case Turnover = 3 as each change made to the Board counts as "+1" turnover regardless of the people leaving. Only people that join (whether they replace someone or are just an addition to the board) are of interest in my thesis.
However, this could also be calculated differently if that makes it easier. Depends on how I write my methodology. It would be fine if the variable turnover says how many changes were made per year i.e. Turnover2005: 2005 - 2006, Turnover2006: 2006 - 2007, Turnover2007 2007- 2008 and Turnover2008 2008 - 2009
Finally, it's possible that TMTs grow, i.e. 2005 bank 1 has 14 managers on the board and in 2006 they hire 3 new managers but only let 1 go. Now the board has 16 managers and made 3 changes (3 new managers)
This might help. The following code builds a dataset consisting of data with four banks and five years. It is panel data. The xtset command lets you use time series operators which are well documented here (https://www.youtube.com/watch?v=ik8r4WvrPkc). (Note: for sake of clear exposition, in this example Bank 1 had no changes, Bank 2 had two changes, Bank 3 had three, etc.).
// Clear the session and other memory.
set more off
clear all
// Input reproducible data.
input year bank_num ceo_num
2005 1 200
2006 1 200
2007 1 200
2008 1 200
2009 1 200
2005 2 222
2006 2 222
2007 2 222
2008 2 333
2009 2 444
2005 3 300
2006 3 301
2007 3 302
2008 3 302
2009 3 303
2005 4 999
2006 4 888
2007 4 777
2008 4 666
2009 4 555
end
// Declare the panel structure.
xtset bank_num year
// Gen variable indicating if ceo_num stayed same.
// Resulting variable is 0 when there was no change.
gen no_turn = (ceo_num - f1.ceo_num)
// Gen dummy to indicate if ceo_num changed.
gen is_turn = (no_turn != 0 & no_turn < .)
// Gen a variable that counts changes.
egen turn_nums = sum(is_turn), by(bank_num)
// List data to inspect results.
list
Edit: Re-characterized comment for no_turn variable.
I have a dataset covering a number of companies for which there is a variable for the firms employees. Some years the number of employees has not been reported, hence a some years appear blank while the year before and after contains a value.
The data is similar to:
COMPANY YEAR NO. EMPLOYEES
Company 1 2007 4
Company 1 2008 5
Company 1 2009 5
Company 1 2010 5
Company 2 2007 11
Company 2 2008 10
Company 2 2009
Company 2 2010 10
Company 3 2007 3
Company 3 2008 4
Company 3 2009
Company 3 2010 3
I would like to be able to search the dataset for any such occurrences, making an indicator of these years, and afterwards replace any blank spots with the year before. If there is no previous year to use as a replacement or the previous year is blank, the year after the blank spot. I am hoping for the dataset to like:
COMPANY YEAR NO. EMPLOYEES
Company 1 2007 4
Company 1 2008 5
Company 1 2009 5
Company 1 2010 5
Company 2 2007 11
Company 2 2008 10
Company 2 2009 10
Company 2 2010 10
Company 3 2007 3
Company 3 2008 4
Company 3 2009 4
Company 3 2010 3
To sum up, at first i need to check whether or not i do have a problem with missing values in-between two years (important that the codes do not replace missing values before or after the last year with a non-missing value, since som firms exit the sample). Next, if any blank years in between any two years that are non-blank, I would like to replace these blank spots as mentioned above.
The method I would use:
1. Sort the dataset company/year.
2. Replace missing values using LAG function if the missing value is not the first observation of the company group.
3. Reverse the sort order
4. Repeat step 2 on the dataset with reversed order
5. Return the dataset to the original order
Please note, I have changed your original data for Company 3 in order to have a case for your second scenario (missing value, no previous record).
DATA HAVE;
input COMPANY $ 0-10 YEAR 13-17 N_EMPLOYEES 24-27;
datalines;
Company 1 2007 4
Company 1 2008 5
Company 1 2009 5
Company 1 2010 5
Company 2 2007 11
Company 2 2008 10
Company 2 2009
Company 2 2010 10
Company 3 2007
Company 3 2008 3
Company 3 2009 4
Company 3 2010 3
;
run;
PROC SORT DATA=HAVE
OUT=DOSOMEWORKHERE;
BY COMPANY YEAR;
RUN;
DATA DOSOMEWORKHERE (drop=PREV_N_EMPLOYEES);
set DOSOMEWORKHERE;
by COMPANY;
PREV_N_EMPLOYEES = LAG(N_EMPLOYEES);
if first.COMPANY then
do;
PREV_N_EMPLOYEES = .;
end;
if N_EMPLOYEES = . then N_EMPLOYEES = PREV_N_EMPLOYEES;
run;
PROC SORT DATA=DOSOMEWORKHERE
OUT=DOSOMEWORKHERE;
BY DESCENDING COMPANY DESCENDING YEAR ;
RUN;
DATA DOSOMEWORKHERE (drop=PREV_N_EMPLOYEES);
set DOSOMEWORKHERE;
by DESCENDING COMPANY;
PREV_N_EMPLOYEES = LAG(N_EMPLOYEES);
if first.COMPANY then
do;
PREV_N_EMPLOYEES = .;
end;
if N_EMPLOYEES = . then N_EMPLOYEES = PREV_N_EMPLOYEES;
run;
PROC SORT DATA=DOSOMEWORKHERE
OUT=WANT;
BY COMPANY YEAR;
RUN;
Result:
Quick question. I'm working with code that produces a spreadsheet that contains the information like the following:
year business sales profit
2001 a 5 3
2002 a 6 4
2003 a 4 2
2001 b 2 1
2002 b 6 3
2003 b 7 5
How can I get Stata to total sales and profits across years?
Thanks
Try
collapse (sum) sales profit, by(year)
or, if you want to retain your original data,
bysort year: egen tot_sales = total(sales)
egen stands for extended generate, a very useful command.