How to identify if an observation is repeated every day in Stata - stata

I have a database where I have a date variable, an id variable and a city variable. Sometimes the id variable is repeated in the same date and city.
Data looks something like this:
Date ID City
2/1/2015 1 1
2/1/2015 1 1
2/1/2015 1 2
2/2/2015 1 1
2/1/2015 2 1
2/2/2015 2 1
I would like to know how much days each ID is present, identify the id's that are present every day, and later on, those that are present every day in every city.
In the example above both ID 1&2 are present each day, but only ID 1 is present in each city each day.
Thanks!

I think I just did what i wanted to do.
All I had to do was:
by ID city date, sort: gen nvals = _n == 1
by ID city: replace nvals = sum(nvals)
by ID city : replace nvals = nvals[_N]

Related

The last day of month with a given time flow in SAS

I am bothered with a simple question.
I need to get the last business day of the month and the date in my dataset only includes the business day.
For example:
ID Date
1 20180301
1 20180302
1 20180305
...
1 20180329
1 20180330
1 20180402
...
2 20180301
2 20180302
2 20180305
And I need the output like this:
ID Date Enddate
1 20180301 20180330 (The last business of March)
1 20180302 20180330
1 20180305 20180330
...
1 20180329 20180330
1 20180330 20180330
1 20180402 20180430 (The last business of March)
...
2 20180301 20180330 (Same for other IDs)
2 20180302 20180330
2 20180305 20180330
I tried to use this command:
enddt=intnx('month',date,0,'E');
However, it will output 20180331 instead of 20180330.
So I was wondering if there is a method to extract directly the last day of given month instead of the calendar month.
Thank you very much for your kind help.
You can do this in a data step:
1) sort on date from latest to earliest (reverse sort)
2) create a variable based on the yearmonth = int(date/100)
3) do a datastep by yearmonth and retain enddate
4) if first.yearmonth then enddate = date;
5) drop vars you don't want
6) sort back to original order

Count unrepeated customer within a group and make a tabulate

I have the below data in SAS and I want to get the table with the number of customers buying a certain product in a certain time.
It does not count if a customer is repeated within the group.
Product customer interval
1 A Morning
1 A Morning
1 B Afternoon
1 A Evening
2 A Afternoon
2 B Morning
2 C Afternoon
What I want to get is the below table
Morning Afternoon Evening All
Product Customer number
1 1 1 1 2
2 1 2 0 3
I believe you have to remove duplicates to make this table.
This is easily done by using the nodupkey option in a proc sort:
proc sort data = have out = want nodupkey;
by product customer interval;
run;
Here's a format that will correctly order the interval categories by putting spaces in front of the categories you want first:
proc format;
value $interval
"Morning" = " Morning"
"Afternoon" = " Afternoon"
"Evening" = "Evening";
run;
And here's the tabulate statement:
proc tabulate data = want order = formatted;
class product interval;
tables product, interval = " " all / row = float misstext = "0" printmiss;
keylabel n = " ";
format interval $interval.;
run;
This returns the following table:
Morning Afternoon Evening All
Product
1 1 1 1 3
2 1 2 0 3
If there are missing values this will be more complicated.

Run a regression of countries by quartiles for a specific year

I am exploring an effect that I think will vary by GDP levels, from a data set that has, vertically, country and year (1960 to 2015), so each country label is on 55 rows. I ran
sort year
by year: egen yrank = xtile(rgdp), nquantiles(4)
which tags every year row with what quartile of GDP they were in that year. I want to run this:
xtreg fiveyearg taxratio if yrank == 1 & year==1960
which would regress my variable (tax ratio) against some averaged gdp data from countries that were in the bottom quartile of GDPs in 1960 alone. So even if later on they grew enough to change ranks, the later data would still be in the regression pool. Sadly, I cannot get this code, or any variation, to run.
My current approach is to try to generate some new variable that would give every row with country label X a value of 1 if they were in the bottom quartile in 1960, but I can't get that to work either. i have run out of ideas, so I thought I would ask!
Based on your latest comment, which describes the (un)expected behavior:
clear
set more off
*----- example data -----
input ///
country year rank
1 1960 2
1 1961 1
1 1962 2
2 1960 1
2 1961 1
2 1962 1
3 1960 3
3 1961 3
3 1962 3
end
list, sepby(country)
*----- what you want -----
// tag countries whose first observation for -rank- is 1
// (I assume the first observation for -year- is always 1960)
bysort country : gen toreg = rank[1] == 1
list, sepby(country)
// run regression conditional on -toreg-
xtreg ... if toreg
Check help subscripting if in doubt.

Generating a dummy equals to 1 if the product was purchased in the previous period

I have a panel data set with product purchases identified for unique household ids and need to generate a dummy variable "brand loyally" that will equal to 1 if the same brand was purchased by the household in the previous period. My periods are not equally timed. For some households it can be 1 week, for others - 10 weeks. Does this code sound about right?:
panid - unique household id
l5 - brand name
loy - wanted dummy
bysort panid week: egen loy=1 if l5=l5[_n-1]
I assume that the unit of the variable week is weeks. In that case you can type
tsset panid week
by panid: gen byte loy = ( L.l5 == l5 ) if !missing(L.l5,l5) & _n > 1

mysql connect all fields in two columns

I have a view with two columns: a person's ID (a number) and the sector that they below to (given as numbers 1-5).
I want to create a view to show whether people belong to the same sector. I think this would have three columns: ID1, ID2, and SameSector. The first column would list IDs, and for each ID in column 1 the second column would list ALL of the IDs. The third column would be an if statement, 1 if the sector was the same for both IDs, 0 if it wasn't. This is made slightly more complicated because a person can belong to more than one sector.
For example:
I have:
ID Sector
1 1
2 1
2 5
3 1
I want:
ID1 ID2 SameSector
1 1 1
1 2 1
1 2 0
1 3 0
2 1 1
2 1 0
etc.
I'm guessing this involves some sort of self join and if statement but I can't figure out how to get all of the ID fields to be listed in ID1 column and matched to all of the ID fields in ID2 any ideas?
This should be what you want:
SELECT a.ID AS ID1, b.ID AS ID2, IF(a.Sector=b.Sector,1,0) AS SameSector
FROM theTable AS a, theTable AS b
http://sqlfiddle.com/#!2/f2cbc/4
I initially had a much more complicated query, but then realized you wanted a complete cross-join, including the same ID comparing to itself.