How to plot shade red according to ratio variable using sgpanel plot - sas

I would like to plot dataset and obtain desired output with the right setup.
Plot the scatter such that the points are in shade red-color, from light red to dark red depending on the scale (ratio) of 0-1 (0=light red, 1=dark red).
Show the legend also showing the scale red color according to the ration 0-1 (point 1.)
Data explanation:
area - city (shortcut)
id - user id
var - variable
time - datetime
exit - consumer left
ratio - proportion (between 0-1)
Data sample and attempt plotting (obviously not correct):
data data;
input area $ id $ var $ time $ exit $ ratio $;
datalines;
A 1 1 1 0 0.18
A 1 1 2 0 0.11
A 2 1 1 1 0.14
A 2 1 2 0 0.15
A 2 1 3 0 0.14
A 3 1 1 0 0.17
A 3 1 2 0 0.19
A 3 1 3 1 0.21
A 3 1 4 0 0.14
B 4 2 1 0 0.14
B 4 2 2 1 0.15
B 5 2 1 0 0.17
B 5 2 2 0 0.25
B 5 2 3 0 0.31
A 1 3 1 0 0.22
A 1 3 2 0 0.13
A 2 3 1 1 0.16
A 2 3 2 0 0.11
A 2 3 3 0 0.22
A 3 3 1 0 0.27
A 3 3 2 0 0.29
A 3 3 3 1 0.31
A 3 3 4 0 0.24
B 4 4 1 0 0.24
B 4 4 2 1 0.35
B 5 4 1 0 0.47
B 5 4 2 0 0.15
B 5 4 3 0 0.21
;;
run;
data attrs;
input id $ risk $ fillcolor $;
datalines;
ratio 0.05 Verylightred
ratio 0.15 Lightred
ratio 0.20 Red
ratio 0.25 Darkred
ratio 0.30 Verydarkred
ratio 0.35 Verydarkstrongred
;
run;
proc sgpanel data=data dattrmap=attrs;
panelby area exit;
scatter y=id x=var / markerattrs = (symbol = squarefilled) group=ratio attrid=ratio;
run;

This will get you closer.
Ratio should be numeric to be graphed
Ratio is continuous, how should it be used to group?
For the colour on the data attribute map, the length of the colours is not long enough and risk should be numeric
I don't know exactly how to specify the ranges you'd like for the colours you'd like but this gets you closer using the automatic legend.
One way to get at this is to add the variable to the data set for each group and then you can control the colour of each group with the data attribute map. This would mean adding a column in the 'data' data set called ratio_group whcih maps to the values in the data attribute map table. Use that variable the group.
data data;
input area $ id $ var $ time $ exit $ ratio ;
datalines;
A 1 1 1 0 0.18
A 1 1 2 0 0.11
A 2 1 1 1 0.14
A 2 1 2 0 0.15
A 2 1 3 0 0.14
A 3 1 1 0 0.17
A 3 1 2 0 0.19
A 3 1 3 1 0.21
A 3 1 4 0 0.14
B 4 2 1 0 0.14
B 4 2 2 1 0.15
B 5 2 1 0 0.17
B 5 2 2 0 0.25
B 5 2 3 0 0.31
A 1 3 1 0 0.22
A 1 3 2 0 0.13
A 2 3 1 1 0.16
A 2 3 2 0 0.11
A 2 3 3 0 0.22
A 3 3 1 0 0.27
A 3 3 2 0 0.29
A 3 3 3 1 0.31
A 3 3 4 0 0.24
B 4 4 1 0 0.24
B 4 4 2 1 0.35
B 5 4 1 0 0.47
B 5 4 2 0 0.15
B 5 4 3 0 0.21
;;
run;
proc sgpanel data=data ;
panelby area exit;
scatter y=id x=var / markerattrs = (symbol = squarefilled size=10)
colorresponse=ratio
colormodel=(verylightred lightred red darkred verydarkred verydarkstrongred);
colaxis grid minorgrid;
rowaxis grid minorgrid;
run;
For marker size look at the SIZE option under the MARKERATTRS option.
For grids, look at the GRID/MINORGRID options under the COLAXIS and ROWAXIS statements.
COLAXIS documentation

Related

How to add sequential ID based on condition SAS

I have the dataset with Time and Interval variable as below. I would like to add a sequential ID (Indicator) with SAS based on a condition that Interval is greater than 0.1, as follows:
Time
Interval
Indicator
11:40:38
0.05
.
11:40:41
0.05
.
11:40:44
0.05
.
11:40:47
0.05
.
11:40:50
0.05
.
11:42:50
2
1
11:42:53
0.05
2
11:42:56
0.05
3
11:42:59
0.05
4
11:43:02
0.05
5
11:43:05
0.05
6
11:43:08
0.05
7
11:43:18
0.16667
1
11:43:21
0.05
2
11:43:24
0.05
3
11:43:27
0.05
4
11:43:30
0.05
5
11:43:33
0.05
6
If I use the code
`data out1; set out ;
by Time;
retain indicator;
if Interval > 0.1 then indicator=1;
indicator+1;
run;`
Indicator is not missing for the first five observations. I would like that it starts counting only when the condition is met (Interval > 0.1).
Thanks!
You can do it with a little modification:
data out1;
set out ;
retain indicator;
if Interval>0.1 then indicator=0;
if indicator^=. then indicator+1;
run;
The summuation will start after the condition Interval>0.1 has been met, because indicator is equal to missing value before that, so indicator+1 would not be calculated.
And you need to initial indicator as 0, not 1. If indicator is equal to 0, indicator^=. will be satisfied and indicator+1 will be calculated.
For yucks, here is a one-liner of #WhyMath logic.
data want;
set have;
retain seq;
seq = ifn(interval > 0.1, 1, ifn(seq, sum(seq,1), seq));
run;
If you want to retain INDICATOR it cannot be on the input dataset, otherwise the SET statement will overwrite the retained value with the value read from the existing dataset.
If you want INDICATOR to start as missing when using the SUM statement then you need to explicitly say so in the RETAIN statement. Otherwise the SUM statement will cause the variable to be initialized to zero.
If looks like you only want to increment when the new variable has already been assigned at least one value.
data want;
set have;
retain new .;
if interval>0.1 then new=1;
else if new > 0 then new+1;
run;
Results:
OBS Time Interval Indicator new
1 11:40:38 0.05000 . .
2 11:40:41 0.05000 . .
3 11:40:44 0.05000 . .
4 11:40:47 0.05000 . .
5 11:40:50 0.05000 . .
6 11:42:50 2.00000 1 1
7 11:42:53 0.05000 2 2
8 11:42:56 0.05000 3 3
9 11:42:59 0.05000 4 4
10 11:43:02 0.05000 5 5
11 11:43:05 0.05000 6 6
12 11:43:08 0.05000 7 7
13 11:43:18 0.16667 1 1
14 11:43:21 0.05000 2 2
15 11:43:24 0.05000 3 3
16 11:43:27 0.05000 4 4
17 11:43:30 0.05000 5 5
18 11:43:33 0.05000 6 6

How can I compare values from a pandas pivot table which have multi levels?

Below is my pivot table. I need help in comparing values and get the result based on prop1>prop0 | prop2>prop0. I used the following query...
Output[(Output.prop1 > Output.prop0) | (Output.prop2> Output.prop0)]
I am getting error. I dont know where I am going wrong. Please help!
Dose 0 1 2
dose0 prop0 dose1 prop1 dose2 prop2
Organ Diagnosis
heart xyz 1 0.05 0 0.00 0 0.00
Lung ghi 0 0.00 0 0.00 1 0.03
Kidney def 0 0.00 1 0.03 0 0.00
skin jkl 0 0.00 5 0.16 0 0.00
liver abc 8 0.42 6 0.19 6 0.19
here is the sample code...
Organ Diagnosis Dose
heart xyz 0
kidney abc 1
liver def 2
kidney qrs 1
liver dfj 2
heart gdh 0
heart hdh 1
kidney edr 2
from the above table I created a pivot table 1 with dose0 and prop0.
Column dose0 based on count of dose '0' and prop0 is the calculated based on dose0/X. X is sum integer. Then I created two more pivot tables for dose1 and dose2 and concatenated them.

pandas.DataFrame: How to div row by row [python]

I want to div row[i] by row[i+1] in pandas.DataFrame
row[i] = row[i+1] / row[i]
for example:
1 2 3 4
4 2 6 2
8 5 3 1
the result is
0.25 1 0.5 2
0.5 0.4 2 2
You can divide by div shifted DataFrame, last remove NaN row by dropna:
print (df)
a b c d
0 1 2 3 4
1 4 2 6 2
2 8 5 3 1
print (df.div(df.shift(-1), axis=1))
a b c d
0 0.25 1.0 0.5 2.0
1 0.50 0.4 2.0 2.0
2 NaN NaN NaN NaN
df = df.div(df.shift(-1), axis=1).dropna(how='all')
print (df)
a b c d
0 0.25 1.0 0.5 2.0
1 0.50 0.4 2.0 2.0
Another solution for remove last row is select by iloc:
df = df.div(df.shift(-1), axis=1).iloc[:-1]
print (df)
a b c d
0 0.25 1.0 0.5 2.0
1 0.50 0.4 2.0 2.0

Stata: Capture p-value from ranksum test

When I run return list, all after running a ranksum test, the count and z-score are available, but not the p-value. Is there any way of picking it up?
clear
input eventtime prefflag winner stakechange
1 1 1 10
1 2 1 5
2 1 0 50
2 2 0 31
2 1 1 51
2 2 1 20
1 1 0 10
2 2 1 10
2 1 0 5
3 2 0 8
4 2 0 8
5 2 0 8
5 2 1 8
3 1 1 8
4 1 1 8
5 1 1 8
5 1 1 8
end
bysort eventtime winner: tabstat stakechange, stat(mean median n) columns(statistics)
ranksum stakechange if inlist(eventtime, 1, 2) & inlist(winner, 0, .), by (eventtime)
return list, all
Try computing it after ranksum:
scalar pval = 2 * normprob(-abs(r(z)))
display pval
The answer is by #NickCox:
http://www.stata.com/statalist/archive/2004-12/msg00622.html
The Statalist archive is a valuable resource.

Hidden Markov Model Training for Dynamic Gestures?

I know there is a lot of material related to hidden markov model and I have also read all the questions and answers related to this topic. I understand how it works and how it can be trained, however I am not able to solve the following problem I am having when trying to train it for a simple dynamic gesture.
I am using HMM implementation for OpenCV
I have looked into previously asked questions and answer here. Which has really helped me in understanding and using markov models.
I have total of two dynamic gestures, which are both symmetric (swipe left and swipe right)
There are total of 5 observations in which 4 are the different stages in the gesture and 5th one is an observation when non of these stages are occuring.
Swipe left gesture consists of the following observation: 1->2->3->4 (which should trigger a swipe left state)
Likewise Swipe Right gesture consists of the following observation: 4->3->2->1
I have 25 sequences. I am taking 20 observations for each of the sequence, which are used to train hidden markov model using Baum-Welch algorithm.
The following is the input sequence:
1 0 1 1 0 2 2 2 2 0 0 2 3 3 3 0 0 4 4 4
4 4 4 4 4 0 3 3 3 3 3 0 0 1 0 0 1 1 0 1
4 4 4 4 4 4 0 3 3 3 3 3 0 0 1 0 0 1 1 0
4 4 4 4 4 4 4 0 3 3 3 3 3 0 0 1 0 0 1 1
1 1 1 1 0 2 2 2 0 1 0 3 3 0 0 0 4 4 4 4
1 1 1 1 1 0 2 2 2 0 1 0 3 3 0 0 0 4 4 4
0 1 1 1 1 1 0 2 2 2 0 1 0 3 3 0 0 0 4 4
0 0 1 1 1 1 1 0 2 2 2 0 1 0 3 3 0 0 0 4
4 4 0 0 3 0 3 3 3 3 0 0 0 0 0 1 1 1 1 1
4 4 4 0 0 3 0 3 3 3 3 0 0 0 0 0 1 1 1 1
4 4 4 4 0 0 3 0 3 3 3 3 0 0 0 0 0 1 1 1
1 1 1 1 0 0 2 2 0 3 2 3 3 3 0 0 4 4 4 4
1 1 1 1 1 0 0 2 2 0 3 2 3 3 3 0 0 4 4 4
1 1 1 1 1 1 0 0 2 2 0 3 2 3 3 3 0 0 4 4
1 3 4 4 4 0 3 0 0 0 0 0 3 2 0 0 1 1 1 1
In this sequence you can see pattern for Swipe left and Swipe right gestures.
To train the hidden markov model I am initilizing it with the following values and then calling the train function to get the output:
TRANS:
0.7 0.15 0.15
0.3 0.4 0.3
0.3 0.4 0.3
EMIS:
0.3 0.1 0.1 0.1 0.1
0.2 0.1 0.2 0.2 0.3
0.2 0.3 0.2 0.2 0.1
INIT:
0.6 0.2 0.2
After training the output is:
TRANS:
0.81611 0.0847926 0.0990979
0.398458 0.346433 0.255109
0.371391 0.35587 0.272739
EMIS:
0.534127 0.125568 0.0824495 0.200169 0.0576869
0.294653 0.0250053 0.0500311 0.200616 0.429694
0.238808 0.075001 0.0500019 0.130455 0.505733
INIT:
0.443984 0.391323 0.164693
Using this model in my recognition program, I am not getting results.
I want the system to remain in a NULL STATE unless one of the gesture is detected.
In the Transition and Emission matrix I gave my guess values for both these gesture.
What do you think I might be doing wrong? Any pointers or help?
Lastly here is the code I am using for doing this (if anyone wants to have a look)
double TRGUESSdata[] = {0.7, 0.15, 0.15,
0.3, 0.4, 0.3,
0.3, 0.4, 0.3};
cv::Mat TRGUESS = cv::Mat(3,3,CV_64F,TRGUESSdata).clone();
double EMITGUESSdata[] = {0.3, 0.1, 0.1, 0.1, 0.1,
0.2, 0.1, 0.2, 0.2, 0.3,
0.2, 0.3, 0.2, 0.2, 0.1};
cv::Mat EMITGUESS = cv::Mat(3,5,CV_64F,EMITGUESSdata).clone();
double INITGUESSdata[] = {0.6 , 0.2 , 0.2};
cv::Mat INITGUESS = cv::Mat(1,3,CV_64F,INITGUESSdata).clone();
std::cout << seq.rows << " " << seq.cols << std::endl;
int a = 0;
std::ifstream fin;
fin.open("observations.txt");
for(int y =0; y < seq.rows; y++)
{
for(int x = 0; x<seq.cols ; x++)
{
fin >> a;
seq.at<signed int>(y,x) = (signed int)a;
std::cout << a;
}
std::cout << std::endl;
}
hmm.printModel(TRGUESS,EMITGUESS,INITGUESS);
hmm.train(seq,1000,TRGUESS,EMITGUESS,INITGUESS);
hmm.printModel(TRGUESS,EMITGUESS,INITGUESS);
Here fin is used to read the observation I have from my other code.
What does the 0 mean in your model ? It seems to me in your data there are no direct transitions for both states, it always goes back to the state 0. Try something like the following in your data for a state transition sequence.
1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 4
1 2 3 4 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 2 2 3 3 4 4 0 0 0 0 0
4 4 3 3 2 2 1 1 0 0 0 0 0 0 0 0 0
4 4 4 3 3 3 2 2 2 2 2 1 1 1 1 1 1
As a general rule:
I would recommend to work with openCV only after you have a proof of concept in Matlab/octave. This has two reasons. First of all you know exactly what you want to do and how it works, and don't waste your time implementing and debugging your theory in a 'low' level language (compared to matlab). Debugging algorithms in openCV is really time-consuming.
Secondly after you know your stuff works as expected, if you implement it and hit a bug (of openCV or C++, python) you know it's not your theory, not your implementation, it's the framework. It happened to me already two times that employed computer scientists implemented directly from a paper (after being told not to do so), spending 80% of the remaining time to debug the algorithm without ANY success only to find out that: they didn't really get the theory or some submodule of openCV had a slight bug which degenerated their results.
The link you've mentioned uses a HMM toolbox in matlab. Try to implement and understand your problem there, it's really worth spending the time. Not only you can verify each step for correctness, you can use the itermediate matrices with your openCV code after you have a working model.