R code works in script but not R markdown - r-markdown

#The code below works fine in my scripts but not in R markdown.
library(tidyverse)
library(scales)
age <- kaggle_2020_Survey %>%
transmute(Q1 = as.factor(Q1)) %>%
filter(!is.na(Q1)) %>%
count(Q1) %>%
mutate(perc = n/sum(n)*100)
ggplot(age, aes(x = Q1, y = n)) + geom_col(fill = "darkblue", alpha =.7) +
geom_text (aes(x = Q1, y = n, label = paste0(round(perc,1), "%"),hjust = -.3), size = 3)
+
coord_flip() + labs(title = "Age of participants", x = "Percent", y = "Number",
subtitle = "Highest age group: 22-24") +
theme_classic()
#This is the error I am getting:
enter image description here

It could be that the object is in your environment but you forgot to include something like
library(tidyverse)
kaggle_2020_Survey <- read_csv("kaggle_2020_Survey.csv")
where you load in the data. When you knit a rmarkdown file it starts a new session each time so if the data isn't called in before you start doing stuff then it will throw that error
This answer goes into other solutions.

Related

Making Power Bi - R (HTMLwidgets pbiviz based) custom visuals interactive with other Power BI visuals

I have made a pbiviz custom visual using developer tools of Normal distribution curve over a Histogram plot with R - ggplot2 and plotly libraries in a pbiviz.package
The visual works fine. Now I want to add interactivity of the Histogram with other Power BI visuals.
i.e. If user clicks on a bar of the Histogram, it should filter out a Table on my PBI report with rows relevant to the histogram bar data.
Considering the limitations of using R script with Power BI, I do not know if it is possible with my current visual as I am new to scripting.
Is there a better way (Typescript, JS, Python, etc.) other than what I have done to make an Interactive Histogram & Distribution Curve in Power BI?
This is the R script along with sample data and Visual Image
Histogram represents the projects falling in different durations
There are two bell curves - One for closed projects and Other for Active Projects
source('./r_files/flatten_HTML.r')
############### Library Declarations ###############
libraryRequireInstall("ggplot2");
libraryRequireInstall("plotly");
libraryRequireInstall("tidyverse");
libraryRequireInstall("scales");
libraryRequireInstall("htmlwidgets");
library(ggplot2)
library(tidyverse)
library(scales)
library(plotly)
theme_set(theme_bw())
##### Making DataSet for All Selected Projects #####
Duration <- dataset$Duration
Status <- (rep(dataset$ProjectStatus))
da <- data.frame(Duration,Status)
lenx <- length(Duration)
meanall <- mean(da$Duration)
sdx <- sd(da$Duration)
binwidth <- 30
font_label <- list(family = "Segoe UI", size = 21, colour = "black")
hovlabel <- list(bordercolor = "black", font = font_label)
#Filtering Out Closed Projects from Dataset
#Creating Data Frame for Closed Projects
closedproj <- dataset %>%
select(Duration,ProjectStatus) %>%
filter(ProjectStatus == "Closed")
closed <- closedproj$Duration
df <- data.frame(closed)
xclosed <- closedproj$
df2 <- data.frame(xclosed)
lenc <- length(xclosed)
mean_closed <- mean(df2$xclosed)
sdc <- sd(df2$xclosed)
a <-
(ggplot(da,aes(x=Duration, fill = Status, text = paste("Duration: ",x,"-", x + binwidth,"<br />Project Count", ..count..)))+
#Histogram
geom_histogram(aes(y=..count..),alpha=0.5, position='identity',binwidth = binwidth)+
# #Distribution Curve
annotate(
geom = "line",
x = da$Duration,
y = dnorm(da$Duration, mean = meanall, sd = sdx) * lenx * binwidth,
width = 3,
color = "red"
) +
annotate(
geom = "line",
x = df2$xclosed,
y = dnorm(df2$xclosed, mean = mean_closed, sd = sdc)* lenc * binwidth,
width = 3,
color = "blue"
) +
labs(
x = "Project Duration (Days)",
y = "Project_Count",
fill = "Project Status")+
#Mean
geom_vline(aes(xintercept=meanall),color="red",linetype="dashed",size = 0.8,label=paste("Mean :",round(meanall,0)))+
geom_vline(aes(xintercept=mean_closed),color="blue",linetype="dashed",size = 0.8,label=paste("Mean (Closed):",round(mean_closed,0)))+
# 1 Sigma
geom_vline(aes(xintercept = (meanall + sdx)), color = "red", size = 1, linetype = "dashed") +
geom_vline(aes(xintercept = (meanall - sdx)), color = "red", size = 1, linetype = "dashed")+
geom_vline(aes(xintercept = (mean_closed + sdc)), color = "blue", size = 1, linetype = "dashed") +
geom_vline(aes(xintercept = (mean_closed - sdc)), color = "blue", size = 1, linetype = "dashed")+
# Theme
theme(
plot.background = element_rect(fill = "transparent"),
legend.background = element_rect(fill = "lightgray"),
axis.title.x = element_text(colour = "Black",size = 18,face = "bold"),
axis.title.y = element_text(colour = "Black",size = 18,face = "bold"),
axis.text.x = element_text(colour = "Black",size = 15),
axis.text.y = element_text(colour = "Black",size = 15),
panel.grid.major = element_blank(), panel.grid.minor = element_blank())+
scale_x_continuous(labels = comma,
breaks = seq(0, max(Duration),50)) +
scale_y_continuous(labels = comma,
breaks = seq(0,max(Duration),10)))
############# Create and save widget ###############
p = ggplotly(a, tooltip = c("text")) %>%
style(hoverlabel = hovlabel) %>%
layout(legend = list(
orientation = "h",
x = 0,
y = 1.13,
title = list(text = "Project Status",font = list(family = "Segoe UI", size = 23)),
font = font_label
),
yaxis = list(title = list(standoff = 25L)),
xaxis = list(title = list(standoff = 25L)),
annotations = list(showarrow=FALSE,align = "left",valign = "top",x = 0.95, xref = "paper",yref = "paper",y = 0.955,
font = list(family = "Segoe UI", size = 22, color = "#cc0000"),
text = paste("Max Duration: ", comma(round(max(da$Duration),0)),
"<br>Mean (Closed): ", comma(round(mean_closed,0)),
"<br>Mean (All) : ", comma(round(meanall,0))))
) %>%
config(modeBarButtonsToRemove = c("select2d","hoverClosestCartesian", "lasso2d","hoverCompareCartesian","toggleSpikelines"), displaylogo = FALSE);
internalSaveWidget(p, 'out.html');
}
####################################################
################ Reduce paddings ###################
ReadFullFileReplaceString('out.html', 'out.html', ',"padding":[0-5]*,', ',"padding":0,')
What I expect is -- If user clicks on a bar of the Histogram, it should reflect on a Table visual on my PBI report with rows relevant to the histogram bar data.
Any help will be highly appreciated !
Regards

How do I get the word-embedding matrix from ft_word2vec (sparklyr-package)?

I have another question in the word2vec universe.
I am using the 'sparklyr'-package. Within this package I call the ft_word2vec() function. I have some trouble understanding the output:
For each number of sentences/paragraphs I am providing to the ft_word2vec() function, I always get the same amount of vectors. Even, if I have more sentences/paragraphs than words. For me, that looks like I get the paragraph-vectors. Maybe a Code-example helps to understand my problem?
# add your spark_connection here as 'spark_connection = '
# create example data frame
FK_data = data.frame(sentences = c("This is my first sentence",
"It is followed by the second sentence",
"At the end there is the last sentence"))
# move the data to spark
sc_FK_data <- copy_to(spark_connection, FK_data, name = "FK_data", overwrite = TRUE)
# prepare data for ft_word2vec (sentences have to be tokenized [=list of words instead of one string in each row])
sc_FK_data <- ft_tokenizer(sc_FK_data, input_col = "icd_long", output_col = "tokens")
# split data into test and trainings sets
partitions <- sc_FK_data %>%
sdf_random_split(training = 0.7, test = 0.3, seed = 123456)
FK_train <- partitions$training
FK_test <- partitions$test
# given a trainings data set (FK_train) with a column "tokens" (for each row = a list of strings)
mymodel = ft_word2vec(
FK_train,
input_col = "tokens",
output_col = "word2vec",
vector_size = 15,
min_count = 1,
max_sentence_length = 4444,
num_partitions = 1,
step_size = 0.1,
max_iter = 10,
seed = 123456,
uid = random_string("word2vec_"))
# I tried to get the data from spark with:
myemb = mymodel %>% sparklyr::collect()
Has somebody had similar experiences? Can someone explain what exactly the ft_word2vec() function returns? Do you have an example on how to get the word embedding vectors with this function? Or does the returned column indeed contain the paragraph vectors?
my colleague found a solution! If you know how to do it, the instructions really begin to make sense!
# add your spark_connection here as 'spark_connection = '
# create example data frame
FK_data = data.frame(sentences = c("This is my first sentence",
"It is followed by the second sentence",
"At the end there is the last sentence"))
# move the data to spark
sc_FK_data <- copy_to(spark_connection, FK_data, name = "FK_data", overwrite = TRUE)
# prepare data for ft_word2vec (sentences have to be tokenized [=list of words instead of one string in each row])
sc_FK_data <- ft_tokenizer(sc_FK_data, input_col = "icd_long", output_col = "tokens")
# split data into test and trainings sets
partitions <- sc_FK_data %>%
sdf_random_split(training = 0.7, test = 0.3, seed = 123456)
FK_train <- partitions$training
FK_test <- partitions$test
# CHANGES FOLLOW HERE:
# We have to use the spark connection instead of the data. For me this was the confusing part, since i thought no data -> no model.
# maybe we can think of this step as an initialization
mymodel = ft_word2vec(
spark_connection,
input_col = "tokens",
output_col = "word2vec",
vector_size = 15,
min_count = 1,
max_sentence_length = 4444,
num_partitions = 1,
step_size = 0.1,
max_iter = 10,
seed = 123456,
uid = random_string("word2vec_"))
# now that we have our model initialized, we add the word-embeddings to the model
w2v_model = ml_fit(w2v_model, sc_FK_EMB)
# now we can collect the embedding vectors
emb = word2vecmodel$vectors %>% collect()

Layout of main panel for plot

I want to get the following layout.
In my actual plots, the two plots in the third column are same x-axis and thus I
exhibit them in one column.
The following example Shiny code has the three histograms with one column.
So, we cannot observe how the most lowest histogram changes according to the bins. Thus I want to get the above layout.
Example Shiny Code
library(shiny)
# Define UI for application that draws a histogram
ui <- fluidPage(
# Application title
titlePanel("Old Faithful Geyser Data"),
# Sidebar with a slider input for number of bins
sidebarLayout(
sidebarPanel(
sliderInput("bins",
"Number of bins:",
min = 1,
max = 50,
value = 30)
),
# Show a plot of the generated distribution
mainPanel(
plotOutput("distPlot1"),
plotOutput("distPlot2"),
plotOutput("distPlot3")
)
)
)
# Define server logic required to draw a histogram
server <- function(input, output) {
output$distPlot1 <- renderPlot({
# generate bins based on input$bins from ui.R
x <- faithful[, 2]
bins <- seq(min(x), max(x), length.out = input$bins + 1)
# draw the histogram with the specified number of bins
hist(x, breaks = bins, col = 'darkgray', border = 'white')
})
output$distPlot2 <- renderPlot({
# generate bins based on input$bins from ui.R
y <- faithful[, 2]
bins <- seq(min(y), max(y), length.out = input$bins + 1)
# draw the histogram with the specified number of bins
hist(y, breaks = bins, col = 'darkgray', border = 'white')
})
output$distPlot3 <- renderPlot({
# generate bins based on input$bins from ui.R
z <- faithful[, 2]
bins <- seq(min(z), max(z), length.out = input$bins + 1)
# draw the histogram with the specified number of bins
hist(z, breaks = bins, col = 'darkgray', border = 'white')
})
}
# Run the application
shinyApp(ui = ui, server = server)
Please let me know any idea.
Edit for comment
I understand your idea. I do not use ggplot as follows;
x <- c(1, 2, 3, 4, 5)
y1 <- c(1, 1, 2, 3, 1)
y2 <- c(2, 2, 1, 2, 4)
y3 <- c(4, 3, 2, 1, 2)
split.screen(figs = c(1, 2))
split.screen(figs = c(2, 1), screen = 2)
screen(1)
plot(x, y1, type = "l")
screen(3)
plot(x, y2, type = "l")
screen(4)
plot(x, y3, type = "l")
The result is as follows;
I would use ggplot2 and gridExtra to arrange the plots.
Here is the final output I got:
Screenshot
The main plots were done using grid.arrange to combine them together, and ggplot2 gives you more ability to control each of the subplots, named plot1, plot2, and plot3 in the codes, and plot2 and plot3 formed the 3rd column.
Since your 3rd column has different x-axis, I added a second bin width to control them together. And, to make the program a bit more dynamic, I use renderUI and uiOutput to push the data information from the server back to ui to generate the two sliderInputs.
Codes:
library(ggplot2)
library(grid)
library(gridExtra)
# Define UI for application that draws a histogram
ui <- fluidPage(
# Application title
titlePanel("Old Faithful Geyser Data"),
# Sidebar with a slider input for number of bins
sidebarLayout(
sidebarPanel(
uiOutput("bins1"),
uiOutput("bins2")
),
# Show a plot of the generated distribution
mainPanel(
plotOutput("ggplot")
)
)
)
# Define server logic required to draw a histogram
server <- function(input, output) {
## Your Data and give colnames for ggplot
x <- as.data.frame(faithful[, 2])
y <- as.data.frame(faithful[, 1])
z <- as.data.frame(faithful[, 1])
colnames(x) <- "Count"
colnames(y) <- "Count"
colnames(z) <- "Count"
## Set bin size 1 and 2
binWidth1 <- c(max(x))
binWidth2 <- c(max(y))
output$bins1 <- renderUI({
sliderInput("bins1",
h3("Bin width #1 "),
min = 1,
max = max(x),
value = (1 + max(x))/10)
})
output$bins2 <- renderUI({
sliderInput("bins2",
h3("Bin width #2 "),
min = 1,
max = max(y),
value = (1 +max(y))/10)
})
output$ggplot <- renderPlot({
# bins <- seq(min(x), max(x), length.out = input$bins + 1)
plot1 <- ggplot(x, aes(x = Count)) +
geom_histogram(binwidth = input$bins1, fill = "black", col = "grey")
plot2 <- ggplot(y, aes(x = Count)) +
geom_histogram(binwidth = input$bins2, fill = "black", col = "grey")
plot3 <- ggplot(z, aes(x = Count)) +
geom_histogram(binwidth = input$bins2, fill = "black", col = "grey")
grid.arrange(grid.arrange(plot1), grid.arrange(plot2, plot3, ncol = 1), ncol = 2, widths = c(2, 1))
})
}
# Run the application
shinyApp(ui = ui, server = server)

Shiny implementation

Hello my question is about user interactivity with my code, I developed this simple loop for time series using the great forecast Hybrid package, here is the rmd for it
The strange language is Portuguese sorry I was too lazy to translate the whole thing it shouldn't matter anyway.
```{r Primeira Vez, clique no play, include=FALSE, include=FALSE}
setwd("~/R")
install.packages("forecastHybrid")
setwd("~/R")
library(forecastHybrid)
```
```{r Inputs}
# A dataset variable in the global enviroment
Data<- SemZero
#How to save a witout dummies regression
ComoSalvar<- "Exemplo2.csv"
#How to save a with dummies regression
ComoSalvarReg<- "ExemploparaDummies.csv"
#Where to save them.
OndeSalvar <- "~/R"
#Month,Year, and Day variable for the ts
Mes<- 9
Ano<- 2012
Dia<- 1
#Frequency
Freq<- 12
#Forecast Period.
Forecast = 12
#Confidance intervals
IC<- c(0)
#Variables in the Data dataset that will be used for the lapply regression, usually I would read an excel file with headers.
VStart = 1
VFinish = 2
#Simple regressor dataset, they can be matrix as well as line since de data.fram combines everyone example data.frame(OddMonths,Christmas,WasTrumpPresident?,RainInThatSeason)
Regressores<- data.frame(0)
```
```{r Logic for rolling test and data simplification}
setwd(OndeSalvar)
if(IC[1] > 0) pi<-TRUE else
pi<- FALSE
if (nrow(Data) >= Freq*3 + Forecast*2) {
Weights <- "cv.error"
Multiplicador<-floor((nrow(Data)-Forecast*2)/Freq) } else
if (nrow(Data) <= Freq*3 + Forecast*2) {
Weights <- "cv.error"
Multiplicador<-3 } else
if (nrow(Data) >= Freq*2.5 + Forecast*2){
Weights <- "cv.error"
Multiplicador = 2.5} else
Weights <- "equal"
```
```{r The bunk of the regression process}
if (nrow(Regressores) < nrow(Data)) {
my_forecast1 <- try({function(x){
print(x)
print(summary(x))
names(x)
x[is.na(x)] <- 0; x
if(sum(abs(x)) < Freq){
Model<- "aenst"} else
if(mean(x[1:Freq]) == 0)Model<- "aenst" else Model<- "aenstf"
x<- ts(x, start = c(Ano, Mes,Dia), frequency = Freq)
hm<-hybridModel(x, models = Model, lambda = NULL,
a.args = list(trace = FALSE,test = "kpss", ic ="aicc", max.P = 2, max.p = 9,max.q=9,max.Q = 2,max.d = 2,max.D = 2,start.p = 9,start.P = 2,start.Q = 2,start.q = 9,allowdrift = TRUE,allowmean = TRUE
#Se tiver tempo apague o # abaixo para uma maior qualidade no modelo arima.
#,stepwise = FALSE,parallel = TRUE,num.cores = NULL
),
e.args = list(ic = "aicc"),
n.args = list(repeats = nrow(Data)),
s.args = NULL,
t.args = NULL,
weights = Weights,
errorMethod = "RMSE",cvHorizon = Forecast,windowSize = frequency(x)*Multiplicador, horizonAverage = FALSE,
verbose = TRUE)
lapply(seq_along(x), function(i) paste(names(x)[[i]], x[[i]]))
fcast1<- forecast(hm,h = Forecast,level = IC,PI = pi)
return(fcast1)
}})
Listas<- lapply(Data[,VStart:VFinish], try(my_forecast1))
if (pi == FALSE) ListaResultado<- as.data.frame(lapply(Listas, '[[', 'mean')) else
ListaResultado <- Listas
write.csv(ListaResultado , file = ComoSalvar)
} else {
my_forecastreg <- function(x){
print(x)
print(summary(x))
names(x)
x[is.na(x)] <- 0; x
x<- ts(x, start = c(Ano, Mes,Dia), frequency = Freq)
hmreg <- hybridModel(x ,models = "ans",
a.args = list(xreg = Regressores[1:nrow(Data),],trace = TRUE,test = "kpss", ic ="aicc", max.P = 2, max.p = 9,max.q=9,max.Q = 2,max.d = 2,max.D = 2,start.p = 9,start.P = 2,start.Q = 2,start.q = 9,allowdrift = TRUE,allowmean = TRUE
#Se tiver tempo apague o # abaixo para uma maior qualidade no modelo arima.
#,stepwise = FALSE,parallel = TRUE,num.cores = NULL
),
n.args = list(xreg = Regressores[1:nrow(Data),], repeats= nrow(Regressores)),
s.args = list(xreg = Regressores[1:nrow(Data),], method = "arima"))
fcast2<- forecast(hmreg,h = Forecast,level = IC,PI = pi, xreg = Regressores[nrow(Data):(nrow(Data)+Forecast-1),])
return(fcast2)
}
Listas2<- lapply(Data[,VStart:VFinish], try(my_forecastreg))
if (pi == FALSE) ListaResultado2<- as.data.frame(lapply(Listas2, '[[', 'mean')) else
ListaResultado2 <- Listas2
write.csv(ListaResultado2 , file = ComoSalvarReg)
}
```
I want to develop something to get the user input and run the regressions, my end user doesn't usually like how ugly a R Markdown file looks, I was looking into using shiny, but i dont know a few details
Who runs this regressions if I upload the whole thing successfully to shiny? My computer,the server, the user, I have no idea?
Can the user input go into the users own global environment so that the whole thing could be kept as a strictly offline process(using Shiny as a beautification app that substitutes this input chunk?)
Can someone please give an example of an Shiny app that does something similar?
Can the user read.xlsm into shiny server, or use his global environment to define a data for the shiny app to use as input?
Also is the thief package possible to implement on this lapply function as a way to increase the forecast quality, I would of course Drop the stlm and theta option from the model as they behave rather poorly in a wide range of simulations that I performed with toy sets, the stlm crashes on cross validation with a few observations and the theta model just doesn't work.
Can someone teach me how to on error inside the function ignore the variable and just keep applying the function to the next variable? or change the model to something less problematic my solution was to try to catch these cases where the model would crash and drop the theta model before it happens but it is just an ugly hack to the underlying problem.
Also if you see something ugly in the code itself feel free to criticize.

How to create a horizontal stacked bar box in shiny / shinydashboard

In the Shiny dashboard tutorial of the Wikimedia foundation a screenshot is shown with a kind of horizontal stacked bar (the one with red, green, and blue "Full-text...OpenSearch..Prefix):
I have been searching everywhere, but I cannot find out how to create a bar like this. Can anyone point me in the right direction?
This is not a great answer, but it works. Requires learning some ggplot2 if you want to tweak it, and I tried to get rid of the border around the edges but it isn't gone completely. Still, the basic idea is here.
library(ggplot2)
mydf <- data.frame(labels = c('This', 'that', 'the other'),
percents = c(0.31, 0.15, 0.54))
mydf$pos <- pmax(0, cumsum(mydf$percents) - (0.5 * mydf$percents))
p <- ggplot(mydf, aes(x = NA, y = percents)) +
geom_bar(stat = 'identity', aes(fill = percents)) +
geom_text(color = 'white', aes(label = labels, y = pos)) +
coord_flip() +
guides(fill = FALSE) +
scale_x_discrete(expand = c(0, 0)) +
scale_y_discrete(expand = c(0, 0)) +
theme_void()
png('this_plot.png', width = 800, height = 30)
p
dev.off()