Assign groups based on variables - sas

In SAS, I have a dataset(have) as below, I need to add a group variable based on test and visitnum. When visitnum is 101 and 108, they need to be in the same group. The desired as shown as data want.
data have:
test visitnum ord seq
aa 101 0 0
aa 101 0 1
aa 108 1 0
aa 108 1 1
aa 108 2 0
aa 108 2 1
aa 115 1 0
aa 115 1 1
aa 115 2 0
aa 115 2 1
bb 101 0 0
bb 101 0 1
bb 108 1 0
bb 108 1 1
bb 108 2 0
bb 108 2 1
bb 115 1 0
bb 115 1 1
bb 115 2 0
bb 115 2 1
data want:
test visitnum ord seq group
aa 101 0 0 1
aa 101 0 1 1
aa 108 1 0 1
aa 108 1 1 1
aa 108 2 0 1
aa 108 2 1 1
aa 115 1 0 2
aa 115 1 1 2
aa 115 2 0 2
aa 115 2 1 2
bb 101 0 0 3
bb 101 0 1 3
bb 108 1 0 3
bb 108 1 1 3
bb 108 2 0 3
bb 108 2 1 3
bb 115 1 0 4
bb 115 1 1 4
bb 115 2 0 4
bb 115 2 1 4

First sort your data by test and visitnum. There are two cases when we want to increment the group number:
When it's the start of a test group and the visitnum is 101 or 108
When it's the start of a visitnum group and it's not 101 or 108
Here's how this looks:
proc sort data=have;
by test visitnum;
run;
data want;
set have;
by test visitnum;
if( first.test AND visitnum IN(101, 108)
OR (first.visitnum AND visitnum NOT IN(101, 108) )
)
then group+1;
run;
Output:
test visitnum ord seq group
aa 101 0 0 1
aa 101 0 1 1
aa 108 1 0 1
aa 108 1 1 1
aa 108 2 0 1
aa 108 2 1 1
aa 115 1 0 2
aa 115 1 1 2
aa 115 2 0 2
.. ... .. .. ..
bb 115 2 0 4
bb 115 2 1 4

Related

Stata: Changing Number Format

I am using estpost and esttab to export tabulation results in Stata.
sysuse auto, clear
estpost tabulate turn foreign
esttab ., cells("b(fmt(0))") unstack
---------------------------------------------------
(1)
Domestic Foreign Total
b b b
---------------------------------------------------
31 1 0 1
32 0 1 1
33 1 1 2
34 2 4 6
35 2 4 6
36 1 8 9
37 2 2 4
38 1 2 3
39 1 0 1
40 6 0 6
41 4 0 4
42 7 0 7
43 12 0 12
44 3 0 3
45 3 0 3
46 3 0 3
48 2 0 2
51 1 0 1
Total 52 22 74
---------------------------------------------------
N 74
---------------------------------------------------
Although I can change the format of the cells, I couldn't find a way to change the format of the observation number(N) and the total number of observations in each column. I tried adding obs(fmt(%10.2fc)) as an estab option but it didn't work.

How to loop rows and columns in pandas while replacing values with a constant increment

I am trying to replace values in a dataframe by 0. the first column I need to replace the 1st 3 values, the next column the 1st 6 values so on so forth increasing by 3 every time
a=np.array([133,124,156,189,132,176,189,192,100,120,130,140,150,50,70,133,124,156,189,132])
b = pd.DataFrame(a.reshape(10,2), columns= ['s','t'])
for columns in b:
yy = 3
for i in xrange(yy):
b[columns][i] = 0
yy += 3
print b
the outcome is the following
s t
0 0 0
1 0 0
2 0 0
3 189 189
4 132 132
5 176 176
6 189 189
7 192 192
8 100 100
9 120 120
I am clearly missing something really simple, to make the loop replace 6 values instead of only 3 in column t, any ideas?
i would do it this way:
i = 1
for c in b.columns:
b.ix[0 : 3*i-1, c] = 0
i += 1
Demo:
In [86]: b = pd.DataFrame(np.random.randint(0, 100, size=(20, 4)), columns=list('abcd'))
In [87]: %paste
i = 1
for c in b.columns:
b.ix[0 : 3*i-1, c] = 0
i += 1
## -- End pasted text --
In [88]: b
Out[88]:
a b c d
0 0 0 0 0
1 0 0 0 0
2 0 0 0 0
3 10 0 0 0
4 8 0 0 0
5 49 0 0 0
6 55 48 0 0
7 99 43 0 0
8 63 29 0 0
9 61 65 74 0
10 15 29 41 0
11 79 88 3 0
12 91 74 11 4
13 56 71 6 79
14 15 65 46 81
15 81 42 60 24
16 71 57 95 18
17 53 4 80 15
18 42 55 84 11
19 26 80 67 59
You need inicialize yy=3 before loop:
yy = 3
for columns in b:
for i in xrange(yy):
b[columns][i] = 0
yy += 3
print b
Python 3 solution:
yy = 3
for columns in b:
for i in range(yy):
b[columns][i] = 0
yy += 3
print (b)
s t
0 0 0
1 0 0
2 0 0
3 189 0
4 100 0
5 130 0
6 150 50
7 70 133
8 124 156
9 189 132
Another solution:
yy= 3
for i, col in enumerate(b.columns):
b.ix[:i*yy+yy-1, col] = 0
print (b)
s t
0 0 0
1 0 0
2 0 0
3 189 0
4 100 0
5 130 0
6 150 50
7 70 133
8 124 156
9 189 132

Consecutive tagging in Stata

The task is to identify which consecutive week a product (in a specific store) has been on promotion.
clear
input ///
upc week store promo
1 1 86 1
1 2 86 1
1 3 86 1
1 4 86 1
3 1 86 0
3 2 86 1
4 1 86 0
4 2 86 1
4 3 86 1
end
The end result should look something like this:
upc week store promo promocount
1 1 86 1 1
1 2 86 1 2
1 3 86 1 3
1 4 86 1 4
3 1 86 0 0
3 2 86 1 1
4 1 86 0 0
4 2 86 1 1
4 3 86 1 2
end
I have 800K obs., and I am encountering a problem with the real data set. When I run bysort upc week store promo: gen prcount = _n if promo==1, my data set is sorted in a different way (which, as a result, yields wrong tagging):
upc week store promo
1 1 86 1
3 1 86 0
4 1 86 0
1 2 86 1
3 2 86 1
4 2 86 1
1 3 86 1
4 3 86 1
1 4 86 1
Anyway, I now realize my code is wrong. Any suggestions?
I think
. quietly input ///
> upc week store promo
. generate promocount = 0
. bysort store upc (week): replace promocount = 1+cond(_n==1,0,promocount[_n-1]) if promo>0
(7 real changes made)
. list, clean noobs
upc week store promo promoc~t
1 1 86 1 1
1 2 86 1 2
1 3 86 1 3
1 4 86 1 4
3 1 86 0 0
3 2 86 1 1
4 1 86 0 0
4 2 86 1 1
4 3 86 1 2
does do what you want.

Replacing part of string while leaving the rest intact with Notepad++

I'm trying to edit a huge file with lines like:
1 9416 0 0 10 10 0 dropitems.drop_MFighter_m012_t91_u_m00 mfighter.MFighter_m201_t201_u 0 0 0 0 0 1 0 0 icon.armor_t91_u_i00 -1 7570 46 1 2531A298 0 1 21 1 Fighter.MFighter_m201_u 1 mfighter.MFighter_m201_t201_u 2 Fighter.MFighter_m201_hrh_ad00 104 114 Fighter.MFighter_m201_hrs_ad00 115 114 2 MFighter.MFighter_m201_HRR_ad00_t201_a MFighter.MFighter_m201_HRR_ad00_t201_a 1 Fighter.FFighter_m012_u 1 FFighter.FFighter_m012_t201_u 2 Fighter.FFighter_m012_hrr_ad00 114 114 Fighter.FFighter_m012_hra_ad00 97 114 2 MFighter.MFighter_m201_HRR_ad00_t201_a MFighter.MFighter_m201_HRR_ad00_t201_a 1 DarkElf.MDarkElf_m010_u 1 MDarkElf.MDarkElf_m201_t201_u 2 DarkElf.MDarkElf_m010_Hrr_ad00 114 114 DarkElf.MDarkElf_m010_Hra_ad00 97 114 2 MDarkElf.MDarkElf_m010_HRR_ad00_t201_a MDarkElf.MDarkElf_m010_HRR_ad00_t201_a 1 DarkElf.FDarkElf_m006_u 1 FDarkElf.FDarkElf_m006_t201_u 2 DarkElf.FDarkElf_m006_hrr_ad00 114 114 DarkElf.FDarkElf_m006_hra_ad00 97 114 2 MDarkElf.MDarkElf_m010_HRR_ad00_t201_a MDarkElf.MDarkElf_m010_HRR_ad00_t201_a 1 Dwarf.MDwarf_m008_u 1 MDwarf.MDwarf_m201_t201_u 2 Dwarf.MDwarf_m008_Hrr_ad00 114 114 Dwarf.MDwarf_m008_Hra_ad00 97 114 2 MDwarf.MDwarf_m008_HRR_ad00_t201_a MDwarf.MDwarf_m008_HRR_ad00_t201_a 1 Dwarf.FDwarf_m008_u 1 FDwarf.FDwarf_m008_t201_u 2 Dwarf.FDwarf_m008_hrr_ad00 114 114 Dwarf.FDwarf_m008_hra_ad00 97 114 2 MDwarf.MDwarf_m008_HRR_ad00_t201_a MDwarf.MDwarf_m008_HRR_ad00_t201_a 1 Elf.MElf_m011_u 1 MElf.MElf_m011_t201_u 2 Elf.MElf_m011_Hrr_ad00 114 114 Elf.MElf_m011_Hra_ad00 97 114 2 MElf.MElf_m011_HRR_ad00_t201_a MElf.MElf_m011_HRR_ad00_t201_a 1 Elf.FElf_m011_u 1 FElf.FElf_m011_t201_u 2 Elf.FElf_m011_hrr_ad00 114 114 Elf.FElf_m011_hra_ad00 97 114 2 MElf.MElf_m011_HRR_ad00_t201_a MElf.MElf_m011_HRR_ad00_t201_a 1 Magic.MMagic_m011_u 1 mMagic.MMagic_m011_t301_u 2 0 255 Magic.MMagic_m011_Rra_ad00 97 114 2 Mmagic.Mmagic_m011_Rra_ad00_t301_x 1 Magic.FMagic_m013_u 1 FMagic.FMagic_m013_t301_u 2 0 255 Magic.FMagic_m013_Rra_ad00 97 114 2 Mmagic.Mmagic_m011_Rra_ad00_t301_x 1 Orc.MOrc_m007_u 1 MOrc.MOrc_m201_t201_u 2 Orc.MOrc_m007_hrh_ad00 104 114 Orc.MOrc_m007_hrs_ad00 115 114 2 MOrc.MOrc_m007_HRR_ad00_t201_a MOrc.MOrc_m007_HRR_ad00_t201_a 1 Orc.FOrc_m007_u 1 FOrc.FOrc_m007_t201_u 2 Orc.FOrc_m007_hrr_ad00 114 114 Orc.FOrc_m007_hra_ad00 97 114 2 MOrc.MOrc_m007_HRR_ad00_t201_a MOrc.MOrc_m007_HRR_ad00_t201_a 1 Shaman.MShaman_m007_u 1 MShaman.MShaman_m007_t301_u 2 0 255 Shaman.MShaman_m007_Rra_ad00 97 114 2 MShaman.MShaman_m007_Rra_ad00_t301_x 1 Shaman.FShaman_m007_u 1 FShaman.FShaman_m007_t301_u 2 0 255 Shaman.FShaman_m007_Rra_ad00 97 114 2 MShaman.MShaman_m007_Rra_ad00_t301_x 1 Kamael.mkamael_m005_u 2 MKamael.MKamael_m005_t101_u Mkamael.Mkamael_m005_t101_ut 3 Kamael.MKamael_m005_Lrr_ad00 114 114 Kamael.Mkamael_m000_w_ad00 119 95 Kamael.mkamael_m005_l_ad00 108 95 3 MKamael.MKamael_m005_Lrr_ad00_t101_a Mkamael.Mkamael_m000_t00_w Mkamael.Mkamael_m005_t101_ut 1 Kamael.fkamael_m009_u 2 FKamael.FKamael_m009_t101_u FKamael.FKamael_m009_t101_ut 3 Kamael.FKamael_m009_Lrr_ad00 114 114 Kamael.Fkamael_m000_w_ad00 119 95 Kamael.Fkamael_m009_l_ad00 108 95 3 MKamael.MKamael_m005_Lrr_ad00_t101_a Fkamael.Fkamael_m000_t00_w FKamael.FKamael_m009_t101_ut 1 1 0 0 LineageEffect.p_u002_a 4 ItemSound.armor_metal_alt_6 ItemSound.public_armor_04 ItemSound.shield_steel_1 ItemSound.shield_steel_8 ItemSound.itemdrop_armor_lightmetal ItemSound.itemequip_armor_lightmetal 1 0 2 5 0 226 0 0 36
I have come up with the following regular expression to find similar lines:
^.*?(_t91(.*?)1 0 2 5).*?$
Basically I have to change number 5 to number 6 and I'm trying to do so with the following regular expression:
\1t91\21 0 2 6
The result of that is a line starting in 't91' and ending as it should with the number replaced, but then it repeats itself 2-3 times (i.e: 2 6 0 226 0 0 36t91 etc.)
Do you guys have any ideas about this? Regular expressions are still a mystery to me.
Thanks in advance.
From what I see you are trying to replace 5 as a whole word in a specific environment.
To match a whole word, you'd need \b (a word boundary) so as not to change a value like 55.
To replace it just once on a whole line, you can use the regex
^(.*?_t91.*?1\s+0\s+2\s+)5(\b.*?)$
|-----first group------| | 2|
And replace with ${1}6${2} (although \16\2 works here, too).
If there are multiple values like this on a single line, use
(_t91.*?1\s+0\s+2\s+)5\b
And replace with ${1}6.

In R, Train/update model with multiple datasets

In R, I'm trying to train a neural network on multiple files. I have preformed the multinom function on a single dataset but I cannot find how to train my model with another dataset.
So I want to apply a model from a previous call to new data without re-estimating the model.
So first you build a model as in Sam Thomas's answer is explained.
#load libraries
library(nnet)
library(MASS)
#Define data
example(birthwt)
# Define training and test data
set.seed(321)
index <- sample(seq_len(nrow(bwt)), 130)
bwt_train <- bwt[index, ]
bwt_test <- bwt[-index, ]
# Build model
bwt.mu <- multinom(low ~ ., data=bwt_train)
Then I have another similar dataset I want to train/update the earlier created model with. So I want to update the model with new data to improve my model.
# New data set (for example resampled bwt)
bwt2=sapply(bwt, sample)
head(bwt2,3)
low age lwt race smoke ptd ht ui ftv
[1,] 1 31 115 3 1 1 0 0 2
[2,] 1 20 95 1 0 1 0 0 3
[3,] 2 25 95 2 0 1 0 1 1
# Define training and test data with new dataset
set.seed(321)
index <- sample(seq_len(nrow(bwt2)), 130)
bwt2_train <- bwt2[index, ]
bwt2_test <- bwt2[-index, ]
Now with this new dataset I want to optimze the model. I cannot merge the two datasets because the model should update over time when new data is available. This also because it is not preferable to recalculate everytime we have new data availble.
Thanks in advance,
Adam
Borrowed from an example in ?nnet::multinom
library(nnet)
library(MASS)
example(birthwt)
head(bwt, 2)
low age lwt race smoke ptd ht ui ftv
1 0 19 182 black FALSE FALSE FALSE TRUE 0
2 0 33 155 other FALSE FALSE FALSE FALSE 2+
set.seed(321)
index <- sample(seq_len(nrow(bwt)), 130)
bwt_train <- bwt[index, ]
bwt_test <- bwt[-index, ]
bwt.mu <- multinom(low ~ ., bwt_train)
(pred <- predict(bwt.mu, newdata=bwt_test))
[1] 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0
[39] 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0
Levels: 0 1
Or if you want the probabilities
(pred <- predict(bwt.mu, newdata=bwt_test, type="probs"))
1 5 6 16 19 23 24
0.43672841 0.65881933 0.21958026 0.39061949 0.51970665 0.01627479 0.17210620
26 27 28 29 30 37 40
0.06133368 0.31568117 0.05665126 0.26507476 0.37419673 0.18475433 0.14946268
44 46 47 51 56 58 60
0.09670367 0.72178459 0.06541529 0.37448908 0.31883809 0.09532218 0.27515734
61 64 67 69 72 74 76
0.27515734 0.09456443 0.16829037 0.62285841 0.12026718 0.47417711 0.09603950
78 87 94 99 100 106 114
0.34588019 0.30327432 0.87688323 0.21177276 0.06576210 0.19741587 0.22418653
115 117 118 120 125 126 130
0.14592195 0.19340994 0.14874536 0.30176632 0.09513698 0.08334515 0.03886775
133 134 139 140 145 147 148
0.41216817 0.85046516 0.46344537 0.34219775 0.33673304 0.26894886 0.43778705
152 163 164 165 168 174 180
0.19044485 0.27800125 0.17865143 0.86783149 0.25969355 0.60623964 0.34931986
182 183 185
0.22944657 0.08066599 0.22863967