The following code produces the picture below.
As you can see, the group statement results in different colours for the data points.
Question: How can I also have different symbols for the two groups?
proc sgplot data=test;
scatter x=time y=Y / group=group;
run;
group time Y
0 0 10085.472039
0 0 10085.472039
0 0 10085.472039
0 1 9950.3642122
0 2 9817.0663279
0 4 9555.8037259
0 6 9301.4941325
0 8 9053.9525066
0 8 9053.9525066
0 8 9053.9525066
1 0 2954.7558871
1 0 2954.7558871
1 0 2954.7558871
1 1 2987.6191302
1 2 3020.8478832
1 4 3088.4182255
1 6 3157.4999815
1 8 3228.1269586
1 8 3228.1269586
1 8 3228.1269586
0 0 3929.2678194
0 0 3929.2678194
0 0 3929.2678194
0 1 3903.7639936
0 2 3878.4257063
0 4 3828.2414563
0 6 3778.7065572
0 8 3729.8126068
0 8 3729.8126068
0 8 3729.8126068
1 0 2694.5952697
1 0 2694.5952697
1 0 2694.5952697
1 1 2580.159876
1 2 2470.5843807
1 4 2265.1962804
1 6 2076.8827929
1 8 1904.2244475
1 8 1904.2244475
1 8 1904.2244475
Using http://www.ats.ucla.edu/stat/sas/faq/gr2grps_new.htm:
symbol1 v=star c=red h=1;
symbol2 v=triangle c=blue h=1;
proc gplot data=temp;
plot y*time=group;
run;
quit;
Related
I have a dataset consisting of variables ObservationNumber, MeasurementNumber, SubjectID, and many dummy variables.
I would like to consolidate all non-zero values into one row by SubjectID GroupNumber.
Have:
ObsNum MeasurementNum SubjectID Dummy0 Dummy1 ... Dummy999
----------------------------------------------------...---------------
01 1 1 0 1 ... 0
02 2 1 0 1 ... 0
03 3 1 0 1 ... 0
04 4 1 0 0 ... 0
05 5 1 - - ... -
06 6 1 0 0 ... 0
07 1 2 1 0 ... 0
08 2 2 0 0 ... 0
09 3 2 0 1 ... 0
10 4 2 1 0 ... 0
11 4 2 0 1 ... 0
12 5 2 0 0 ... 1
13 6 2 0 0 ... 0
14 6 2 0 0 ... 1
15 6 2 0 0 ... 0
16 6 2 0 0 ... 0
17 6 2 0 1 ... 0
18 6 2 0 0 ... 0
19 6 2 0 0 ... 0
20 6 2 0 0 ... 0
21 6 2 1 0 ... 0
22 1 3 1 0 ... 0
23 2 3 0 1 ... 0
24 3 3 0 0 ... 1
25 4 3 - - ... -
26 5 3 0 0 ... 0
27 6 3 0 0 ... 0
28 1 4 - - ... -
29 2 4 0 0 ... 0
30 3 4 0 1 ... 0
31 4 4 1 0 ... 0
32 4 4 0 1 ... 0
33 4 4 0 0 ... 1
34 5 4 0 0 ... 1
35 6 4 0 1 ... 0
36 6 4 0 0 ... 1
Want:
MeasurementNum SubjectID Dummy0 Dummy1 ... Dummy999
----------------------------------------------------...---------------
1 1 0 1 ... 0
2 1 0 1 ... 0
3 1 0 1 ... 0
4 1 0 0 ... 0
5 1 - - ... -
6 1 0 0 ... 0
1 2 1 0 ... 0
2 2 0 0 ... 0
3 2 0 1 ... 0
4 2 1 1 ... 0
5 2 0 0 ... 1
6 2 1 1 ... 1
1 3 1 0 ... 0
2 3 0 1 ... 0
3 3 0 0 ... 1
4 3 - - ... -
5 3 0 0 ... 0
6 3 0 0 ... 0
1 4 - - ... -
2 4 0 0 ... 0
3 4 0 1 ... 0
4 4 1 1 ... 1
5 4 0 0 ... 1
6 4 0 1 ... 1
Each SubjectID has six measurement in which a set of dummyvariables are measured without outcome 0, 1 or missing. If a missing value occurs, all dummy variables for the respective observation are missing--and only one observation will be present in the dataset for that `MeasurementNumber.
I have tried to use the UPDATE statement, but it seems to not be able to deal with '0' and '-'.
Is there a direct way of condensing all dummyvariables in this dataset for each SubjectID grouped by MeasurementNumber?
Use Proc MEANS with BY and OUTPUT statements.
data have;
rownum = 0;
do rowid = 1 to 1000;
subjectid + 1;
do measurenum = 1 to 6;
do repeat = 1 to ceil(4 * ranuni(123));
array flags flag1-flag999;
do _n_ = 1 to dim(flags);
flags(_n_) = ranuni(123) < 0.10;
if subjectid < 7 and measurenum = subjectid then flags(_n_) = .;
end;
rownum + 1;
output;
end;
end;
end;
keep rownum measurenum subjectid flag:;
run;
proc means noprint data=have;
by subjectid measurenum;
var flag:;
output max=;
run;
I have a dataframe that looks like:
subgroup value
0 1 0
1 1 1
2 1 1
3 1 0
4 2 0
5 2 0
6 2 0
7 3 0
8 3 1
9 3 0
10 3 0
I need to add a column that add 1 whenever there is at least one value different than 0 in the different subgroups. Please, note that if the value 1 is repeated more than once in the same subgroup, it doesn't affect the count.
The result should be:
subgroup value count
0 1 0 1
1 1 1 1
2 1 1 1
3 1 1 1
4 2 0 1
5 2 0 1
6 2 0 1
7 3 0 2
8 3 1 2
9 3 0 2
10 3 0 2
Thank you in advance for your help!
Using shift with -1 and 1 and cumsum the result
mask=(df.value.ne(df.value.shift()))&(df.value.ne(df.value.shift(-1)))
mask.cumsum()
Out[18]:
0 1
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 2
9 2
10 2
Name: value, dtype: int32
Using merge and groupby
df.merge(df.groupby('subgroup').value.sum().gt(0).cumsum().reset_index(name='out'))
subgroup value out
0 1 0 1
1 1 1 1
2 1 1 1
3 1 0 1
4 2 0 1
5 2 0 1
6 2 0 1
7 3 0 2
8 3 1 2
9 3 0 2
10 3 0 2
I have a pandas dataframe:
import pandas as pd
d={'col1':[[1,2,3],[4,5,6]],'col2':[[7,8,9],[10,11,12]]}
df=pd.DataFrame(d)
which results in:
however I want to implement a onHotEncoder, which will treat each list with the cells of the dataFrame as a string, and I want it to treat each value independently.
How would I implement this? My actual dataFrame contains lists of 500 items, and has 4000 unique values.
I think you can use stack for creating Series, then cast list to string by astype, remove [] by strip and last call get_dummies:
df = df.stack().astype(str).str.strip('[]').str.get_dummies(sep=', ')
print (df)
1 10 11 12 2 3 4 5 6 7 8 9
0 col1 1 0 0 0 1 1 0 0 0 0 0 0
col2 0 0 0 0 0 0 0 0 0 1 1 1
1 col1 0 0 0 0 0 0 1 1 1 0 0 0
col2 0 1 1 1 0 0 0 0 0 0 0 0
One column only:
df = df['col1'].astype(str).str.strip('[]').str.get_dummies(sep=', ')
print (df)
1 2 3 4 5 6
0 1 1 1 0 0 0
1 0 0 0 1 1 1
I have a data set which essence is the following
data have;
input Name $ ab gh vz iz jh pq ch km eo lk;
datalines;
adam 7 8 7 0 0 0 0 0 0 0
bob 0 1 0 3 4 6 0 1 6 0
clint 0 0 0 5 4 3 1 0 0 2
;
run;
Now I would like to count how many times I have a number greater than zero in the variables iz, jh, chand km. The result should look like this
/* want
Name ab gh vz iz jh pq ch km eo lk count_of_iz_jh_ch_km
adam 7 8 7 0 2 3 0 0 0 0 1
bob 0 1 0 3 0 6 0 1 6 0 2
clint 5 0 0 5 4 3 1 2 0 2 4
*/
I would greatly appreciate any help since I wasn't successful searching the internet for a solution.
Gerit
The below code will initialize the required variables from have into an array called vars, then for each row, count every time one of these variables is > 0.
data want;
set have;
array vars[*] iz jh ch km;
count_of_iz_ch_km = 0;
do i = 1 to dim(vars);
if(vars[i] > 0) then count_of_iz_ch_km+1;
end;
drop i;
run;
I'm looking for a J code to do the following.
Suppose I have a list of random integers (sorted),
2 3 4 5 7 21 45 49 61
I want to start with the first element and remove any multiples of the element in the list then move on to the next element cancel out its multiples, so on and so forth.
Thus the output
I'm looking at is 2 3 5 7 61. Basically a Sieve Of Eratosthenes. Would appreciate if someone could explain the code as well, since I'm learning J and find it difficult to get most codes :(
Regards,
babsdoc
It's not exactly what you ask but here is a more idiomatic (and much faster) version of the Sieve.
Basically, what you need is to check which number is a multiple of which. You can get this from the table of modulos: |/~
l =: 2 3 4 5 7 21 45 49 61
|/~ l
0 1 0 1 1 1 1 1 1
2 0 1 2 1 0 0 1 1
2 3 0 1 3 1 1 1 1
2 3 4 0 2 1 0 4 1
2 3 4 5 0 0 3 0 5
2 3 4 5 7 0 3 7 19
2 3 4 5 7 21 0 4 16
2 3 4 5 7 21 45 0 12
2 3 4 5 7 21 45 49 0
Every pair of multiples gives a 0 on the table. Now, we are not interested in the 0s that correspond to self-modulos (2 mod 2, 3 mod 3, etc; the 0s on the diagonal) so we have to remove them. One way to do this is to add 1s on their place, like so:
=/~ l
1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 1
(=/~l) + (|/~l)
1 1 0 1 1 1 1 1 1
2 1 1 2 1 0 0 1 1
2 3 1 1 3 1 1 1 1
2 3 4 1 2 1 0 4 1
2 3 4 5 1 0 3 0 5
2 3 4 5 7 1 3 7 19
2 3 4 5 7 21 1 4 16
2 3 4 5 7 21 45 1 12
2 3 4 5 7 21 45 49 1
This can be also written as (=/~ + |/~) l.
From this table we get the final list of numbers: every number whose column contains a 0, is excluded.
We build this list of exclusions simply by multiplying by column. If a column contains a 0, its product is 0 otherwise it's a positive number:
*/ (=/~ + |/~) l
256 2187 0 6250 14406 0 0 0 18240
Before doing the last step, we'll have to improve this a little. There is no reason to perform long multiplications since we are only interested in 0s and not-0s. So, when building the table, we'll keep only 0s and 1s by taking the "sign" of each number (this is the signum:*):
* (=/~ + |/~) l
1 1 0 1 1 1 1 1 1
1 1 1 1 1 0 0 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 0 1 1
1 1 1 1 1 0 1 0 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1
so,
*/ * (=/~ + |/~) l
1 1 0 1 1 0 0 0 1
From the list of exclusion, you just copy:# the numbers to your final list:
l #~ */ * (=/~ + |/~) l
2 3 5 7 61
or,
(]#~[:*/[:*=/~+|/~) l
2 3 5 7 61
Tacit iteration is usually done with the conjunction Power. When the test for completion needs to be something other than hitting a fixpoint, the Do While construction works well.
In this solution filterMultiplesOfHead is applied repeatedly until there are no more numbers not either applied or filtered. Numbers already applied are accumulated in a partial answer. When the list to be processed is empty the partial answer is the result, after stripping off the boxing used to segregate processed from unprocessed data.
filterMultiplesOfHead=: {. (((~: >.)# %~) # ]) }.
appendHead=: (>#[ , {.#>#])/
pass=: appendHead ; filterMultiplesOfHead#>#{:
prep=: a: , <
unfinished=: [: -. a: -: {:
sieve=: [: ; [: pass^:unfinished^:_ prep
sieve 2 3 4 5 7 21 45 49 61
2 3 5 7 61
prep 2 3 4 7 9 10
┌┬────────────┐
││2 3 4 7 9 10│
└┴────────────┘
appendHead prep 2 3 4 7 9 10
2
filterMultiplesOfHead 2 3 4 7 9 10
3 7 9
pass^:2 prep 2 3 4 7 9 10
┌───┬─┐
│2 3│7│
└───┴─┘
sieve 1-.~/:~~.>:?.$~100
2 3 7 11 29 31 41 53 67 73 83 95 97