Find social network components in Stata

Find social network components in Stata - grouping

[I copied part of the below example from a separate post and changed it to suit my specific needs]
pos_1 pos_2
2 4
2 5
1 2
3 9
4 2
9 3
The above is read as person_2 is connected to person_4,...,person_4 is connected to person_2, and person_9 is connected to person_3.
I want to create a third categorical [edited] variable, component, that lets me know if the observed link is part of a connected component (subnetwork) within this network. In this case, there are two connected components in the network:
pos_1 pos_2 component
2 4 1
2 5 1
1 2 1
3 9 2
4 2 1
9 3 2
All nodes in component 1 are connected to each other, but not to the nodes in component 2 and vice versa. Is there a way to generate this component variable in Stata? I know there are alternative programs to do this in, but my code would be more seamless if I can integrate it into Stata.

If you reshape the data to long form, you can use group_id (from SSC) to get what you want:
clear
input pos_1 pos_2
2 4
2 5
1 2
3 9
4 2
9 3
end
gen id = _n
reshape long pos_, i(id) j(n)
clonevar comp = id
list, sepby(comp)
group_id comp, match(pos)
reshape wide pos_, i(id) j(n)
egen component = group(comp)
list

Related

In Stata, how can I only analyze observations with repeated measures using the mixed command?

I have a dataset on multiple outcome for individuals in two groups that were treated (or not treated) by an intervention at two time points. However, not every individual has complete data for each measure at each time point.
id
outcome
outcome_value
group
time
1
depression
10
1
1
1
depression
8
1
2
2
depression
10
2
1
2
depression
.
2
2
1
anxiety
12
1
1
1
anxiety
8
1
2
2
anxiety
12
2
1
2
anxiety
6
2
2
How do I exclude IDs that do not have an outcome in both periods? I only want to see how outcomes changed between groups over time for observations have data in all periods. I am using the mixed command in Stata to conduct this analysis.

First drop the missing rows
keep if !missing(outcome_value)
Then, keep the ID/outcome combinations that have _N==2
bysort id outcome: keep if _N==2
Output:
id outcome outco~ue group time ct
1 anxiety 8 1 2 2
1 anxiety 12 1 1 2
1 depression 10 1 1 2
1 depression 8 1 2 2
2 anxiety 6 2 2 2
2 anxiety 12 2 1 2
As #NickCox has pointed out in the comments, while we cannot directly combine these two, there is still a one-line approach:
bysort id outcome (time) : keep if !missing(outcome_value[1], outcome_value[2])
Of note, we cannot do this:
bysort id outcome : keep if !missing(outcome_value) & _N==2
because _N is not reduced by group until after the rows with missing outcome have been removed.

Amazon QuickSight - Working out size of network

I have a database table with a record for each IOT device connected, each device has a unique device id and a unique network id associated with it.
For example:
device_id
network_id
1
1
2
1
3
1
4
2
5
2
6
3
7
3
8
3
9
3
10
4
I would like to be able visualise the size of each network based on its id. So I would have an output like such based on the above data:
network_id
size
1
3
2
2
3
4
4
1
I'm not currently sure how to do this

I found that using the countOver function worked for this
I made a calculated field called NetworkSize which was defined as:
countOver
(
{device_id}
,[{network_id}]
)
Which gives the right output I was looking for
However I have to include device_id in the visual which is a bit inconvenient

Lag in Stata generates only missing

I have a trouble using L1 command in Stata 14 to create lag variables.
The resulted Lag variable is 100% missing values!
gen d = L1.equity
tnanks in advance

There is hardly enough information given in the question to know for certain, but as #Dimitriy V. Masterov suggested by questioning how your data is tsset, you likely have an issue there.
As a quick example, imagine a panel with two countries, country 1 and country 3, with gdp by country measured over five years:
clear
input float(id year gdp)
1 1 5
1 2 2
1 3 7
1 4 9
1 5 6
3 1 3
3 2 4
3 3 5
3 4 3
3 5 4
end
Now, if you improperly tsset this data, you can easily generate the missing values you describe:
tsset year id
gen lag_gdp = L1.gdp
And notice now how you have 10 missing values generated. In this example, it happens because the panel and time variables are out of order and the (incorrectly specified) time variable has gaps (period 1 and period 3, but no period 2).
Something else I have witnessed is someone trying to tsset by their time variable and their analysis variable, which is also incorrect:
clear
input float(year gdp)
1 5
2 3
3 2
4 4
5 7
end
tsset year gdp
gen d = L1.gdp
I suspect you are having a similar issue.
Without knowing what your data looks like or how it is tsset there is no possible way to diagnose this, but it is very likely an issue with how the data is tsset.

Retain the cluster number for each member of a cluster within an id variable

I would like to label how many unique clusters of data are in a longitudinal dataset and have each member of the cluster carry the cluster count. Distinct clusters are those sharing a set of dates within an id. The order of those distinct cluster relative to previous (earlier) clusters creates the desired result. This coding is necessary to address the problem of event ordering required for a time-dependent covariate analysis.
input id date
1 28jan2015
1 28jan2015
2 26nov2015
3 19oct2015
4 26dec2015
5 23dec2015
6 22may2015
6 23sep2015
6 23sep2015
7 14jan2015
7 27feb2015
7 30may2015
8 16apr2015
8 16apr2015
8 16apr2015
8 16apr2015
8 16apr2015
9 17jul2015
9 03oct2015
9 03oct2015
10 27jul2015
end
I have attempted:
bys id (date): gen count_obs = [_n]
bys id date: gen count_interval_obs = [_n]
egen n_interval = group(id date)
resulting in accurate counts of the total number of observations per id and enumeration of the number of observations within a date. However, the egen function group() results in identifying each unique set of dates, but numbers the groups without regard to id, giving:
id wrong_cluster correct_cluster
1 28jan2015 1 1
1 28jan2015 1 1
2 26nov2015 2 1
3 19oct2015 3 1
4 26dec2015 4 1
5 23dec2015 5 1
6 22may2015 6 1
6 23sep2015 7 2
6 23sep2015 7 2
etc.
egen, group() cannot be used with the by: prefix.
Any assistance would be appreciated.
Todd
Edit: Added an explanation of why the cluster identification is necessary. Clarified what rule defines a cluster.

#Roberto Ferrer has given a direct approach. It follows from the logic he uses that there is also a route using egen's group() function:
egen group = group(id date2)
bysort id (group): gen clust2 = sum(group != group[_n-1])

For each id, when the date is different than the preceding observation, add 1 to the running sum. The 1 is realized when the condition inside sum() is met.
clear
set more off
input id str15 date
1 28jan2015
1 28jan2015
2 26nov2015
3 19oct2015
4 26dec2015
5 23dec2015
6 22may2015
6 23sep2015
6 23sep2015
7 14jan2015
7 27feb2015
7 30may2015
8 16apr2015
8 16apr2015
8 16apr2015
8 16apr2015
8 16apr2015
9 17jul2015
9 03oct2015
9 03oct2015
10 27jul2015
end
gen date2 = date(date, "DMY")
format %td date2
drop date
list, sepby(id)
*----- what you want -----
bysort id (date2) : gen clust = sum(date2 != date2[_n-1])
list, sepby(id)

Two Way EntityCollection Binding to a Two Dimension Data Matrix

I have a Day Strucuture Table, which has following Columns I want to display:
DoW HoD Value
1 1 1
1 2 2
1 3 2
1 4 2
1 5 2
1 6 2
1 7 2
1 8 2
1 9 2
1 10 2
1 11 4
1 12 4
1 13 4
1 14 4
1 15 4
1 16 4
1 17 4
1 18 4
1 19 4
1 20 4
1 21 1
1 22 1
1 23 1
1 24 1
Dow is The Day of Week (Monday etc.), HoD is the Hour of Day and Value is the actual value.
Now I want to Bind this Day Structure Entity Collection directly to a Control so any Changes can be bound TwoWay
Like this Format:
I think the best way to achieve this is to use a Template and/or a converter, but I just dont know how ;)
I already read this article, but Lack of a TwoWay Binding functionality makes it not useful for me :(
I Hope you can help me
Jonny

Again i solved it on my own ;)
For this problem i created a Grid with a fixed amout of rows and columns. Inside this Grid I put a Itemscontrol bound to my List of data. Inside the DataTemplate I placed a Textbox bound to the current value and bound the Grid Row and Columnproperties to the Day of the Week/Hour of Day.
Pro:
The Textbox is TwoWay Databound to a certain Object or Element.
Very Easy to implement if Row and Colum Property is numeric.
Con:
Limited to a fixed amout of Rows/Columns.
Very much Code to write in XAML (Copy and Paste)
Kinda "dirty" Code. Feels not like the best way to do it.
Im still open for other suggestions.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Find social network components in Stata - grouping

Related

In Stata, how can I only analyze observations with repeated measures using the mixed command?

Amazon QuickSight - Working out size of network

Lag in Stata generates only missing

Retain the cluster number for each member of a cluster within an id variable

Two Way EntityCollection Binding to a Two Dimension Data Matrix

Categories

Resources