Setting up negative to positive counts for time

Setting up negative to positive counts for time - stata

If there is a data set that has months and each person has a different month of starting a job. For example:
person date date_started date_count
Tim 1/1/2000 3/1/2000 -2
Tim 2/1/2000 3/1/2000 -1
Tim 3/1/2000 3/1/2000 0
John 1/1/2000 7/1/2000 -6
John 2/1/2000 7/1/2000 -5
John 3/1/2000 7/1/2000 -4
John 4/1/2000 7/1/2000 -3
John 5/1/2000 7/1/2000 -2
John 6/1/2000 7/1/2000 -1
John 7/1/2000 7/1/2000 0
John 8/1/2000 7/1/2000 1
John 9/1/2000 7/1/2000 2
John 10/1/2000 7/1/2000 3
Mary 3/1/2000 3/1/2000 0
Mary 4/1/2000 3/1/2000 1
What is the most efficient way to get the date_count column? I also have a column that is 1 in your first month and 0 otherwise. I rather use that in making the date_count

I don't understand what the difficulty is here. The question seems poorly explained to me.
You mention months, but your example shows daily dates, so the role of months in the problem is a mystery.
The variable you want is just the difference between two daily dates. So long as you have two daily date variables (Dimitriy explains how to get those from string dates), it is just a subtraction.
(Added later) My uncertainty shows what happens when one assumes on an international list that local conventions are universal. There are two conventions easily confused, showing dates as day/month/year and showing dates as month/day/year. Evidently you are using the second convention. If so, the problem is to convert from daily dates to monthly dates using mofd(); then as said it is a subtraction.

I don't know if this is the optimal way, but I think it should work:
/* convert your dates to Stata's date format from strings */
gen date2=daily(date,"MDY");
gen date_started2=daily(date_started,"MDY");
format date2 date_started2 %td;
/* this is the main code */
gen before = date_started2>date2;
bys person before: egen date_count2 = rank(abs(date_started2 - date2));
replace date_count2 = date_count2 - 1 if before==0;
replace date_count2 = -date_count2 if before==1;
drop before;
Edit:
Mea culpa. I completely misunderstood your question to mean that you wanted a countdown to start date for each person-observation event. You actually want something much simpler:
gen date_count2=mofd(daily(date,"MDY")) - mofd(daily(date_started,"MDY"));
This assumes you are working with date and date_started that are stores as string variables. The daily() converts to Stata date format, and mofd() converts to calendar months. Then it's just the difference.

Related

PowerBI DAX - Sum table by criteria and date

relatively new to PowerBI/PowerQuery/DAX and have become stuck at the following problem. I am unsure what road to go down to get the best outcome and would appreciate any help.
My data table is connected to a time tracking application. A User will enter a time entry everytime they complete a task. The task can be either a Project task or an Admin task. When selecting either of these, there will be multiple sub-categories beneath each, each with its own ID. This translates to my table as the following :
User ProjectID AdminID Hours Date
John 1 2 01/01/22
John 11 1 01/01/22
John 4 1 01/01/22
John 12 3 01/01/22
John 13 1 01/01/22
Pete 7 1 01/01/22
Pete 2 4 01/01/22
Pete 3 2 01/01/22
Mike 1 6 01/01/22
Mike 9 1 01/01/22
Mike 10 1 01/01/22
My objective is, for each Date in the table, to calculate the total hours spent either doing Project tasks or Admin tasks. I am not concerned about the specific breakdown (ie the sum of the unique IDs), rather the overall total. The above example covers just one day, in reality my data covers multiple years. My expected output will look like this :
User TotalProject TotalAdmin Date
John 3 5 01/01/22
John 3 4 01/02/22
John 5 2 01/03/22
Pete 5 1 01/01/22
Pete 1 8 01/02/22
Pete 6 2 01/03/22
Mike 6 2 01/01/22
Mike 6 1 01/02/22
Mike 7 2 01/03/22
I am unsure the best method to achieve this - either by creating some kind of column in the table through PowerQuery? Or a calculated column using DAX? And if so, what the SUM syntax would look like?
Very willing to learn, to any tips would be greatly appreciated!

For your sample input, just create 2 measures.
Total Admin = CALCULATE( SUM('Table'[Hours]), NOT(ISBLANK('Table'[AdminID])))
Total Project = CALCULATE( SUM('Table'[Hours]), NOT(ISBLANK('Table'[ProjectID])))

Using Lookup or Index - If a certain placing, then place the name

I would like to provide the name of the competitor if they placed first. In different cells, I will like the same for second place to fifth place.
My purpose is because there are many divisions, 27, and each are on different worksheets. It would make it easier to have all the top five division placings on one sheet for the announcer and passing out trophies.
I am unable to provide a picture until I have a rep of 10. Therefore, the data is provided below.
Thank you so much for your time and help!
Column B
Competitor Name
Brown, Sam
Simmons, Donald
Smith, John
Doe, John
Lee, Joe
Smith, Joey
Smith, Joey
Smith, Joey
Column C
Placings
5
4
2
6
8
7
1
3

I figured out the formula, but before hand I had to make sure the data was in ascending order:
=LOOKUP(1,C1:C8,B1:B8)
Formula returned - Smith, Joey
=LOOKUP(2,C1:C8,B1:B8)
Formula returned - Smith, John

I figured out another formula so the numbers do not need to be in any particular order:
=INDEX(B1:B8,MATCH(1,C1:C8,0),1)
Formula returned - Smith, Joey
=INDEX(B1:B8,MATCH(2,C1:C8,0),1)
Formula returned - Smith, John

Turn columns to rows in pandas

I have a dataframe with the names of newborn babies per year.
"name","1998","1999","2000","2001","2002","2003","2004","2005","2006","2007","2008","2009","2010","2011","2012","2013"
"Aicha",0,0,0,0,0,0,0,0,0,0,0,0,10,0,0,0
"Aida",15,20,16,0,10,0,10,14,13,11,12,11,13,14,13,18
"Aina",0,0,0,0,0,0,16,12,15,13,12,14,10,11,0,12
"Aisha",14,0,10,12,15,13,28,33,26,26,52,44,43,54,68,80
"Ajla",15,10,0,0,22,18,28,27,26,26,19,16,19,22,17,27
"Alba",0,0,14,14,22,14,17,19,23,15,28,32,25,33,33,33
I want to plot this in a line chart, where each line is a different name and the x axis is the years. In order to do that, I imagine I need to reshape the data into something like this:
"name","year","value"
"Aicha","1998",0
"Aicha","1999",0
"Aicha","2000",0
...
How do I reshape the data in that fashion? I've tried pivot_table but I can't seem to get it to work.

You could use pd.melt:
>>> df_melted = pd.melt(df, id_vars="name", var_name="year")
>>> df_melted.head()
name year value
0 Aicha 1998 0
1 Aida 1998 15
2 Aina 1998 0
3 Aisha 1998 14
4 Ajla 1998 15
and then sort using
>>> df_melted = df_melted.sort("name")
if you liked.

Stata Deleting Multiple Observations

I have the following data matrix containing ideology scores in a customized dataset:
year state cdnum party name dwnom1
1946 23 10 200 WOODRUFF 0.43
1946 23 11 200 BRADLEY F. 0.534
1946 23 11 200 POTTER C. 0.278
1946 23 12 200 BENNETT J. 0.189
My unit of analysis is a given congressional district, in a given year. As one can see state #23, cdnum #11, has two observations in 1946.
What I would like to do is delete the earlier observation, in this case the observation corresponding to name: BRADLEY.F. This happens when a Congressional district has two members in a given Congress. The attempt of code that I have tried is as follows:
drop if year==[_n+1] & statenum==[_n+1] & cdnum==[_n+1]
My attempt is a conditional argument, drop the observation if: the year is the same as the next observation, the statenum is the same as the next observation, and the cdnum is the same as the next observation. In this way, I can insure each district has only one corresponding for a given year. When I attempt to run the code I get:
drop if year==[_n-1] & statenum==[_n-1] & cdnum==[_n-1]
(0 observations deleted)

Brief alternative: You should check out the duplicates command.
Detailed explanation of error:
You don't mean what you say to Stata.
Your conditions such as
if year == [_n-1]
should be
if year == year[_n-1]
and so forth.
[_n-1]
by itself is treated as if you typed
_n-1
which is the observation number, minus 1.
Here is a dopey example. Read in the auto data.
. sysuse auto
(1978 Automobile Data)
. list foreign if foreign == [_n-1], nola
+---------+
| foreign |
|---------|
1. | 0 |
+---------+
The variable foreign is equal to _n - 1 precisely once, in observation 1 when foreign is 0 and _n is 1.
In short, [_n-1] is not to be interpreted as the previous value (of the variable I just mentioned).
help subscripting gives very basic help.

Convert Daily Returns to Monthly Returns in Stata

I am using Stata and I have 6 years of daily returns for stocks that individuals hold in their portfolios. I would like to aggregate the daily returns to monthly portfolio returns. In some instances, the individual may hold more than one stock in the portfolio. I am struggling with writing the code to do this.
For a visual, my data looks like this:
I would like the results to look like this:
Where individual 2's portfolio return for the month of December 1996 is calculated as: 0.3 * 0.0031 + 0.7 * 0.0076 = 0.00625.
I have tried the collapse command such as
collapse Return, by (ID Year Month)
but this does not provide the same return that I calculated out in Excel.
I am able to make a weighted portfolio return for all the days using
bysort ID year month: egen wt_return = stock_weight * monthly_return
But this gives me daily returns. My trouble is then aggregating them into one return for the corresponding month.
As for the specifics, I would like to calculate the monthly portfolio return as the product of 1 + the weighted daily returns. As a last resort, the mean return for the month could work.

You don't show monthly portfolio return for person 2 in 1991. Your initial example data doesn't show stock weights but the desired example
data does. Your variable Monthly Return is not reproducible. You should take time to verify your question is clear when posting.
It's supposed be clear to the public who will read it, not only to you.
I didn't bother checking if your computations are correct but below is what I
understand you want. The procedure is simply to compute a weighted return and then
add them up by person year month groups. (I assume the stock weights apply to stocks on a daily basis, which is what your example data implies.)
clear all
set more off
input ///
perid year month day str3 stockid return stockw
1 1991 1 1 "ABC" .01 1
1 1991 1 2 "ABC" .02 1
1 1991 1 3 "ABC" -.01 1
1 1991 1 31 "ABC" .004 1
1 1996 12 31 "ABC" .002 1
2 1991 1 1 "ABC" .01 .3
2 1991 1 2 "ABC" .02 .3
2 1996 12 31 "ABC" .004 .3
2 1991 1 1 "XYZ" .001 .7
2 1991 1 2 "XYZ" .004 .7
2 1996 12 31 "XYZ" .021 .7
end
* create weighted return
gen returnw = return * stockw
sort perid year month day
list, sepby(perid year month day)
* sum weighted returns by person, year, month
collapse (sum) returnw, by (perid year month)
list, sepby(perid)
If you want collapse to sum, then you must indicate it with the (sum) (although I'm not clear if this is what you want). By default, it computes the mean. Read help collapse thouroughly.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js