Using factor inputs in a caret-driven shiny app

Using factor inputs in a caret-driven shiny app - shiny

I am trying to develop a simple shiny app which takes in patient data and predicts the probability of a disease condition using caret.
Diagnosis
Age
Creatine
Chronic
70
765
Chronic
80
784
Chronic
72
692
Chronic
88
965
Chronic
68
1065
Chronic
75
1005
Acute
56
445
Acute
67
378
Acute
78
501
Acute
45
678
Acute
37
776
Acute
39
644
The following code works and returns a probability value.
library(shiny)
library(caret)
library(readxl)
hepC_lite <- read_excel("hepC_lite.xlsx")
model_mars <- train(Diagnosis ~ ., data = hepC_lite, method = "earth")
ui <- fluidPage(
numericInput("age", label = "Age", value = 50, min = 1, max = 99),
numericInput("creatine", label = "Creatine", value = 100, min = 1, max = 2000),
actionButton("submitButton", "Submit"),
tableOutput("userDefinedTable"),
textOutput('probability')
)
server <- function(input, output) {
values <- reactiveValues()
observeEvent(input$submitButton, {
values$new_row <- data.frame(Age = input$age, Creatine = input$creatine)
values$predicted_mars <- predict(model_mars, values$new_row, type="prob")[,2]
})
output$userDefinedTable <- renderTable(values$new_row)
output$probability <- renderText(values$predicted_mars)
}
shinyApp(ui = ui, server = server)
To include a factor variable (Gender) in the prediction, I am first one-hot encoding the dataset.
Diagnosis
Age
Creatine
Gender
Chronic
70
765
M
Chronic
80
784
M
Chronic
72
692
F
Chronic
88
965
M
Chronic
68
1065
M
Chronic
75
1005
F
Acute
56
445
F
Acute
67
378
F
Acute
78
501
F
Acute
45
678
M
Acute
37
776
M
Acute
39
644
F
#one-hot encoding
x = hepC_lite[, 2:4]
y = hepC_lite$Diagnosis
dummy_model <- dummyVars(Diagnosis ~ ., data = hepC_lite)
trainData <- predict(dummy_model, newdata = hepC_lite)
hepC_lite <- data.frame(trainData)
hepC_lite$Diagnosis <- y
How should I edit the following line to include the Gender variable?
values$new_row <- data.frame(Age = input$age, Creatine = input$creatine)
Running this line with Gender = input$gender causes an error - Warning: Error in eval: object 'Gender.F' not found

Related

SAS: Unable to add variable to data set

I have a data set and am trying to add four new variables using the existing ones. I keep getting an error that says the code is incomplete. I'm having trouble seeing where it is incomplete. How do I fix this?
data dataset;
input ID $
Height
Weight
SBP
DBP
WtKg = Weight/2.2;
HtCm = Height/2.4;
AveBP = DBP + (SBP - DBP)/3;
HtPolynomial = (2*Height)**2 + (1.5*Height)**3;
datalines;
001 68 150 110 70
002 73 240 150 90
003 62 101 120 80
run;

You did not end your input statement with a semicolon. input reads variables from external data (in this case, in-line data with the datalines statement). New variables are not created within input in the way you've specified.
Use input to read in the five variables of your data. After that, create new variables based on those five read-in variables:
data dataset;
input ID $
Height
Weight
SBP
DBP
;
WtKg = Weight/2.2;
HtCm = Height/2.4;
AveBP = DBP + (SBP - DBP)/3;
HtPolynomial = (2*Height)**2 + (1.5*Height)**3;
datalines;
001 68 150 110 70
002 73 240 150 90
003 62 101 120 80
;
run;

Correcting 2 errors should fix this:
Add a semicolon after the last field being read in from the datalines, which is DBP.
(A previous version of this question used the ^ symbol for exponents.) Instead of ^ to raise to the power of something, use **
For reference, SAS arithmetic operators are described here.
After making the 2 corrections above I ran the revised code below without any errors.
data dataset;
input ID $
Height
Weight
SBP
DBP;
WtKg = Weight/2.2;
HtCm = Height/2.4;
AveBP = DBP + (SBP - DBP)/3;
HtPolynomial = (2*Height)**2 + (1.5*Height)**3;
datalines;
001 68 150 110 70
002 73 240 150 90
003 62 101 120 80
run;

Doing Principal Components in SAS Using a Holdout and to Score New Data

I am performing Principal Components Analysis in SAS Enterprise Guide and wish to compute factor/component scores on some holdout.
KeepCombinedLR is my primary source of truth. I have another dataset, with the exact same variables, that I would like to be scored without including it in the actual factor analyses.
proc factor data = KeepCombinedLR
simple
method = prin
priors = one
rotate = varimax reorder
mineigen = 1
nfactors = 25
out = FactorScores;
var var1--var40;
run;

data Fitness;
input Age Weight Oxygen RunTime RestPulse RunPulse ##;
datalines;
44 89.47 44.609 11.37 62 178 40 75.07 45.313 10.07 62 185
44 85.84 54.297 8.65 45 156 42 68.15 59.571 8.17 40 166
38 89.02 49.874 9.22 55 178 47 77.45 44.811 11.63 58 176
40 75.98 45.681 11.95 70 176 43 81.19 49.091 10.85 64 162
44 81.42 39.442 13.08 63 174 38 81.87 60.055 8.63 48 170
44 73.03 50.541 10.13 45 168 45 87.66 37.388 14.03 56 186
;
proc factor data=Fitness outstat=FactOut
method=prin rotate=varimax score;
var Age Weight RunTime RunPulse RestPulse;
title 'Factor Scoring Example';
run;
proc print data=FactOut;
title2 'Data Set from PROC FACTOR';
run;
proc score data=Fitness score=FactOut out=FScore;
var Age Weight RunTime RunPulse RestPulse;
run;
proc print data=FScore;
title2 'Data Set from PROC SCORE';
run;
PROC SCORE will score your data for you, using your 'holdout' data set.
https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_score_examples01.htm&docsetVersion=14.3&locale=en

TOPN in PowerBI DAX not arranging values in proper order

I have been running into some issues with the TOPN function in DAX in PowerBI.
Below is the original dataset:
regions sales
--------------
a 1191
b 807
c 1774
d 376
e 899
f 1812
g 1648
h 6
i 1006
j 1780
k 243
l 777
m 747
n 61
o 1637
p 170
q 1319
r 1437
s 493
t 1181
u 118
v 1787
w 1396
x 102
y 104
z 656
So now, I want to get the Top 5 sales in descending order.
I used the following code:
Table = TOPN(5, SUMMARIZE(Sheet1, Sheet1[regions], Sheet1[sales]), Sheet1[sales], DESC)
The resulting table is as follows:
regions sales
--------------
g 1648
j 1780
c 1774
v 1787
f 1812
Any idea why this is happening?

According to Microsoft documentation this is working as intended.
https://msdn.microsoft.com/en-us/query-bi/dax/topn-function-dax
Remarks
TOPN does not guarantee any sort order for the results.
What you can do is to create a RANKX to sort by.

Create date variable from time (Using SAS 9.3)

Using SAS 9.3
I have files with two variables (Time and pulse), one file for each person.
I have the information which date they started measuring for each person.
Now I want create a date variable whom change date at midnight (of course), how?
Example from text files:
23:58:02 106
23:58:07 105
23:58:12 103
23:58:17 98
23:58:22 100
23:58:27 97
23:58:32 99
23:58:37 100
23:58:42 99
23:58:47 104
23:58:52 95
23:58:57 96
23:59:02 98
23:59:07 96
23:59:12 104
23:59:17 109
23:59:22 105
23:59:27 111
23:59:32 111
23:59:37 104
23:59:42 110
23:59:47 100
23:59:52 106
23:59:57 114
00:00:02 123
00:00:07 130
00:00:12 130
00:00:17 125
00:00:22 119
00:00:27 116
00:00:32 122
00:00:37 116
00:00:42 119
00:00:47 117
00:00:52 114
00:00:57 114
00:01:02 110
00:01:07 103
00:01:12 98
00:01:17 98
00:01:22 102
00:01:27 97
00:01:32 99
00:01:37 93
00:01:42 97
00:01:47 103
00:01:52 96
00:01:57 93
00:02:02 93
00:02:07 95
00:02:12 106
00:02:17 99
00:02:22 102
00:02:27 96
00:02:32 93
00:02:37 97
00:02:42 102
00:02:47 101
00:02:52 95
00:02:57 92
00:03:02 100
00:03:07 95
00:03:12 102
00:03:17 102
00:03:22 109
00:03:27 109
00:03:32 107
00:03:37 111
00:03:42 112
00:03:47 113
00:03:52 115

Regex:
\d{2}:\d{2}:\d{2} \d*
See here for an example and play around with regex:
https://regex101.com/r/xF1fQ5/1
EDIT: and have a look at the SAS regex tip sheet: http://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf

Something like this:
Date lastDate = startDate;
List<NData> ListData = new ArrayList<NData>();
for(FileData fdat:ListFileData){
Date nDate = this.getDate(lastDate,fdat.gettime());
NData ndata= new NData(ndate,fdat.getMeasuring());
LisData.add(nData);
lastDate = nDate;
}
.
.
.
.
function Date getDate(Date ld,String time){
Calendar cal = Calendar.getInstance();
cal.setTime(ld);
int year = cal.get(Calendar.YEAR);
int month = cal.get(Calendar.MONTH)+1;
int day = cal.get(Calendar.DAY_OF_MONTH);
int hourOfDay = this.getHour(time);
int minuteOfHour = this.getMinute(time);
org.joda.time.LocalDateTime lastDate = new org.joda.time.LocalDateTime(ld)
org.joda.time.LocalDateTime newDate = new org.joda.time.LocalDateTime(year,month,day,hourOfDay,minuteOfHour);
if(newDate.isBefore(lastDate)){
newDate = newDate.plusDays(1);
}
return newDate.toDate();
}

It's hard to provide a complete answer without sample code, but the SAS lag() function might be enough to do what you need. Your data step would include lines like the following, assuming your time variable is called time and your date variable is called date:
retain date;
if time < lag(time) then date = date + 1;
This assumes you never have any 24 hour gaps (but it appears you'd have to assume that anyway).
This answer also assumes that the time field is already in a SAS time format.

Creating A Dataframe From A Text Dataset

I have a dataset that has hundreds of thousands of fields. The following is a simplified dataset
dataSet <- c("Plnt SLoc Material Description L.T MRP Stat Auto MatSG PC PN Freq Qty CFreq CQty Cur.RPt New.RPt CurRepl NewRepl Updt Cost ServStock Unit OpenMatResb DFStorLocLevel",
"0231 0002 GB.C152260-00001 ASSY PISTON & SEAL/O-RING 44 PD X A A A 18 136 30 29 50 43 24.88 51.000 EA",
"0231 0002 WH.112734 MOTOR REDUCER, THREE-PHAS 41 PD X B B A 16 17 3 3 5 4 483.87 1.000 EA X",
"0231 0002 WH.920569 SPINDLE MOTOR MINI O 22 PD X A A A 69 85 15 9 25 13 680.91 21.000 EA",
"0231 0002 GB.C150583-00001 VALVE-AIR MDI 64 PD X A A A 16 113 50 35 80 52 19.96 116.000 EA",
"0231 0002 FG.124-0140 BEARING 32 PD X A A A 36 205 35 32 50 48 21.16 55.000 EA",
"0231 0002 WP.254997 BEARING,BALL .9843 X 2.04 52 PD X A A A 18 155 50 39 100 58 2.69 181.000 EA"
)
I would like to create a dataframe out of this dataSet for further calculation. The approach I am following is as follows:
I split the dataSet by space and then recombine it.
dataSetSplit <- strsplit(dataSet, "\\s+")
The header (which is the first line) splits correctly and produces 25 characters. This can be seen by the str() function.
str(dataSetSplit)
I will then intend to combine all the rows together using the folloing script
combinedData <- data.frame(do.call(rbind, dataSetSplit))
Please note that the above script "combinedData " errors because the split did not produce equal number of fields.
For this approach to work all the fields must split correctly into 25 fields.
If you think this is a sound approach please let me know how to split the fileds into 25 fields.
It is worth mentioning that I do not like the approach of splitting the data set with the function strsplit(). It is an extremely time consuming step if used with a large data set. Can you please recommend an alternate approach to create a data frame out of the supplied data?

By the looks of it, you have a header row that is actually helpful. You can easily use gregexpr to calculate your "widths" to use with read.fwf.
Here's how:
## Use gregexpr to find the position of consecutive runs of spaces
## This will tell you the starting position of each column
Widths <- gregexpr("\\s+", dataSet[1])[[1]]
## `read.fwf` doesn't need the starting position, but the width of
## each column. We can use `diff` to calculate this.
Widths <- c(Widths[1], diff(Widths))
## Since there are no spaces after the last column, we need to calculate
## a reasonable width for that column too. We can do this with `nchar`
## to find the widest row in the data. From this, subtract the `sum`
## of all the previous values.
Widths <- c(Widths, max(nchar(dataSet)) - sum(Widths))
Let's also extract the column names. We could do this in read.fwf, but it would require us to substitute the spaces in the first line with a "sep" character.
Names <- scan(what = "", text = dataSet[1])
Now, read in everything except the first line. You would use the actual file instead of textConnection, I would suppose.
read.fwf(textConnection(dataSet), widths=Widths, strip.white = TRUE,
skip = 1, col.names = Names)
# Plnt SLoc Material Description L.T MRP Stat Auto MatSG PC PN Freq Qty
# 1 231 2 GB.C152260-00001 ASSY PISTON & SEAL/O-RING 44 PD NA X A A A 18 136
# 2 231 2 WH.112734 MOTOR REDUCER, THREE-PHAS 41 PD NA X B B A 16 17
# 3 231 2 WH.920569 SPINDLE MOTOR MINI O 22 PD NA X A A A 69 85
# 4 231 2 GB.C150583-00001 VALVE-AIR MDI 64 PD NA X A A A 16 113
# 5 231 2 FG.124-0140 BEARING 32 PD NA X A A A 36 205
# 6 231 2 WP.254997 BEARING,BALL .9843 X 2.04 52 PD NA X A A A 18 155
# CFreq CQty Cur.RPt New.RPt CurRepl NewRepl Updt Cost ServStock Unit OpenMatResb
# 1 NA NA 30 29 50 43 NA 24.88 51 EA <NA>
# 2 NA NA 3 3 5 4 NA 483.87 1 EA X
# 3 NA NA 15 9 25 13 NA 680.91 21 EA <NA>
# 4 NA NA 50 35 80 52 NA 19.96 116 EA <NA>
# 5 NA NA 35 32 50 48 NA 21.16 55 EA <NA>
# 6 NA NA 50 39 100 58 NA 2.69 181 EA <NA>
# DFStorLocLevel
# 1 NA
# 2 NA
# 3 NA
# 4 NA
# 5 NA
# 6 NA

Many thanks to Ananda Mahto, he provided many pieces to this answer.
widthMinusFirst <- diff(gregexpr('(\\s[A-Z])+', dataSet[1])[[1]])
widthFirst <- gregexpr('\\s+', dataSet[1])[[1]][1]
Width <- c(widthFirst, widthMinusFirst)
Widths <- c(Width, max(nchar(dataSet)) - sum(Width))
columnNames <- scan(what = "", text = dataSet[1])
read.fwf(textConnection(dataSet[-1]), widths = Widths, strip.white = FALSE,
skip = 0, col.names = columnNames)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Using factor inputs in a caret-driven shiny app - shiny

Related

SAS: Unable to add variable to data set

Doing Principal Components in SAS Using a Holdout and to Score New Data

TOPN in PowerBI DAX not arranging values in proper order

Create date variable from time (Using SAS 9.3)

Creating A Dataframe From A Text Dataset

Categories

Resources