Trouble downloading all works by author/Gutenberger/R - r-markdown

So, No matter what I do, I cannot seem to get all the works by this author downloaded using the exact code specified for doing so in the documentation. Ive got tidyverse loaded in w dplyr etc. There are 12 books total. here's the code chunk used and the error message it keeps returning...
Thank you!
thoreau_books <- gutenberg_works(author == "Thoreau, Henry David") %>%
gutenberg_download(meta_fields = "title")
Error message:
Error in mutate():
! Problem while computing gutenberg_id = as.integer(gutenberg_id).
✖ gutenberg_id must be size 0 or 1, not 12.
Backtrace:
gutenberg_works(author == "Thoreau, Henry David") %>% ...
dplyr:::mutate.data.frame(., gutenberg_id = as.integer(gutenberg_id))
Error in mutate(., gutenberg_id = as.integer(gutenberg_id)) :
✖ gutenberg_id must be size 0 or 1, not 12.

Related

R Markdown "function %>% not found"

I have a problem with R Markdown. In the Markdown file, I start by loading the necessary packages (including tidyverse) in a code chunk. But in the last code chunk, an error pops up which says
"could not find function %>% Calls: ...withVisible ->
eval_with_user_handlers -> eval -> eval Execution halted"
Can you please explain why this occurs and how I can fix it?
Thanks!
As requested:
This is the first code chunk where I load the packages:
library(tidyverse)
library(readxl)
library(psych)
options(scipen = 999)
And, this is the code chunk where the error occurs:
DE_GDP_1965_2020 <- DE_GDP_1965_1969 %>%
full_join(DE_GDP_1970_2020)
model <- lm(GDP ~ Period, data = DE_GDP_1965_2020)
summary(model)
Thanks!

Why does my code chunk render my plot correctly when run by itself but I get an 'object not found' error when I try to knit in R Markdown?

I am relatively new at R Markdown and am having trouble when trying to knit to create a report. The error I am getting is:
Error in ggplot(data = bio1530_sci1420_summary_stats.xlsx) :
object 'bio1530_sci1420_summary_stats.xlsx' not found
Calls: ... withVisible -> eval_with_user_handlers -> eval -> eval -> ggplot
Execution halted
Here is my code thus far:
title: "NGRMarkdown"
author: "Rob McCandless"
date: "r Sys.Date()"
output: word_document
knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
library(ggrepel)
library(tidyverse)
library(here)
read_csv("bio1530_sci1420_summary_stats.xlsx")
#ScatterPlot of mean course grade v. mean normalized gain on 1420 and 1530 data with regression lines and error bars
ggplot(data=bio1530_sci1420_summary_stats.xlsx)+
geom_errorbar(aes(x=Course_grade, y=Norm_gain, ymin=Norm_gain-CI, ymax=Norm_gain+CI), color="black", width=0.2, position=position_dodge2(10.0))+
geom_point(mapping=aes(x=Course_grade, y=Norm_gain, shape=Course, color=Course),size=3)+
geom_smooth(method=lm, se=FALSE, col='black', size=1, mapping=aes(x=Course_grade, y=Norm_gain, linetype=Course))+
geom_label_repel(aes(Course_grade, y=Norm_gain, label = Alpha), box.padding = 0.3, point.padding = 0.7, segment.color = 'grey50')+ #added point labels A-J
ylab('Mean Normalized Gain (all instructor sections)')+
xlab('Mean Course Grade (all instructor sections)')+
scale_fill_discrete(labels=c("Bio 1530", "Sci 1420"))+
labs(title="Normalized Gain v. Course Grade by Course & Instructor", subtitle="Mean and 95% CI of all sections per instructor (A-J)")+
theme(plot.title=element_text(hjust=0.5))+
theme(plot.subtitle=element_text(hjust=0.5))+
annotate("text", x=73.0, y=0.09, label="R2 = 0.68, p = 0.044")+
annotate("text", x=78.5, y=0.22, label="R2 = 0.46, p = 0.095")
And this is the plot that renders when I tell R to run this chunk only:
[Course grade v. normalized gain](https://i.stack.imgur.com/WO9S7.png)
So the code works and the dataframe the code refers to is valid, but it won't render when I try to knit in R Markdown.
I suspect it may have to do with the current and working directories not being the same, but I'm not certain of this and am not sure how to check this. I have confirmed that my my working directory is:
getwd()
[1] "/Users/robmccandless/Library/Mobile Documents/com~apple~CloudDocs/R Projects/Normalized_Gain_Data"
and this is where the dataframe and RMD file are both located. Can anyone give me some idea of what I am doing wrong? Any assistance will be greatly appreciated.
the error is saying that your dataset object (i.e., the .xlsx file) is not found in the your local environment. From snippet above it doesn't look like the dataset is saved, just read. One option is to try in your markdown:
df <- read_csv("bio1530_sci1420_summary_stats.xlsx")
ggplot(data=df)+
geom_errorbar(aes(x=Course_grade, y=Norm_gain, ymin=Norm_gain-CI, ymax=Norm_gain+CI), color="black", width=0.2, position=position_dodge2(10.0))+
geom_point(mapping=aes(x=Course_grade, y=Norm_gain, shape=Course, color=Course),size=3)+
geom_smooth(method=lm, se=FALSE, col='black', size=1, mapping=aes(x=Course_grade, y=Norm_gain, linetype=Course))+
geom_label_repel(aes(Course_grade, y=Norm_gain, label = Alpha), box.padding = 0.3, point.padding = 0.7, segment.color = 'grey50')+ #added point labels A-J
ylab('Mean Normalized Gain (all instructor sections)')+
xlab('Mean Course Grade (all instructor sections)')+
scale_fill_discrete(labels=c("Bio 1530", "Sci 1420"))+
labs(title="Normalized Gain v. Course Grade by Course & Instructor", subtitle="Mean and 95% CI of all sections per instructor (A-J)")+
theme(plot.title=element_text(hjust=0.5))+
theme(plot.subtitle=element_text(hjust=0.5))+
annotate("text", x=73.0, y=0.09, label="R2 = 0.68, p = 0.044")+
annotate("text", x=78.5, y=0.22, label="R2 = 0.46, p = 0.095")

tableby in papaja with twocolumn classoption

I use the wonderful function tableby to have some summary statistics of my population and I also use the amazing package papaja to write my article.
Here is my code for the use of tableby.
DemoGroup <- tableby (Group ~ Age + education + logMAR + QIT + IRP + ICV + AQ + otherDiag, data=demoShort)
summary(DemoGroup,digits = 1, digits.p =3, booktabs = T, title = "(\\#tab:demo) Demo")
I need my document in a two-column fashion (and use the two-column classoption). However, when I try to knit my document, I have this error related to my table (everything works fine in a single column document):
I was unable to find any missing LaTeX packages from the error log 2022_MMN_FS_TSA_WO.log.
! Package longtable Error: longtable not in 1-column mode.
Error: LaTeX failed to compile 2022_MMN_FS_TSA_WO.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips.
Is there a way to fix this error?
Thank you for your help,
Chers,
Adeline

NLTK regex parser's output has changed. Unable to parse phrases like verb followed by a noun

I have written a piece of code to parse the action items from a troubleshooting doc.
I want to extract phrases that start with a verb and end with a noun.
It was working as expected earlier (a month ago). But on running against the same input as earlier, its missing some action items that it was catching previously.
I haven't changed the code. Has something changed from nltk or punkt side that may be affecting my results?
Please help me figure what needs to be changed to make it run as earlier.
import re
import nltk
from nltk.tokenize import PunktSentenceTokenizer
from nltk.tokenize import word_tokenize
#One time downloads
#nltk.download('punkt')
#nltk.download('averaged_perceptron_tagger')
#nltk.download('wordnet')
custom_sent_tokenizer = PunktSentenceTokenizer()
def process_content(x):
try:
#sent_tag = []
act_item = []
for i in x:
print('tokenized = ',i)
words = nltk.word_tokenize(i)
print(words)
tagged = nltk.pos_tag(words)
print('tagged = ',tagged)
#sent_tag.append(tagged)
#print('sent= ',sent_tag)
#chunking
chunkGram = r"""ActionItems: {<VB.>+<JJ.|CD|VB.|,|CC|NN.|IN|DT>*<NN|NN.>+}"""
chunkParser = nltk.RegexpParser(chunkGram)
chunked = chunkParser.parse(tagged)
print(chunked)
for subtree in chunked.subtrees(filter=lambda t: t.label() == 'ActionItems'):
print('Filtered chunks= ',subtree)
ActionItems = ' '.join([w for w, t in subtree.leaves()])
act_item.append(ActionItems)
chunked.draw()
return act_item
except Exception as e:
#print(str(e))
return str(e)
res = 'replaced rev 6 aeb with a rev 7 aeb. configured new board and regained activity. tuned, flooded and calibrated camera. scanned fi rst patient with no issues. made new backups. replaced aeb board and completed setup. however, det 2 st ill not showing any counts. performed all necessary tests and the y passed . worked with tech support to try and resolve the issue. we decided to order another board due to lower rev received. camera is st ill down.'
tokenized = custom_sent_tokenizer.tokenize(res)
tag = process_content(tokenized)
With the input as shared in the code, earlier, the following action items were being parsed:
['replaced rev 6 aeb', 'configured new board', 'regained activity', 'tuned , flooded and calibrated camera', 'scanned fi rst patient', 'made new backups', 'replaced aeb board', 'completed setup', 'det 2 st ill', 'showing any counts', 'performed all necessary tests and the y', 'worked with tech support']
But now, only these are coming up:
['regained activity', 'tuned , flooded and calibrated camera', 'completed setup', 'det 2 st ill', 'showing any counts']
I finally resolved this by replacing JJ. with JJ|JJR|JJS
So my chunk is defined as :
chunkGram = r"""ActionItems: {<VB.>+<JJ|JJR|JJS|CD|NN.|CC|IN|VB.|,|DT>*<NN|NN.>+}"""
I dont understand this change in behavior.
Dot (.) was a really good way of using all modifiers on a POS

I am trying to run Dickey-Fuller test in statsmodels in Python but getting error

I am trying to run Dickey-Fuller test in statsmodels in Python but getting error P
Running from python 2.7 & Pandas version 0.19.2. Dataset is from Github and imported the same
enter code here
from statsmodels.tsa.stattools import adfuller
def test_stationarity(timeseries):
print 'Results of Dickey-Fuller Test:'
dftest = ts.adfuller(timeseries, autolag='AIC' )
dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
for key,value in dftest[4].items():
dfoutput['Critical Value (%s)'%key] = value
print dfoutput
test_stationarity(tr)
Gives me following error :
Results of Dickey-Fuller Test:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-15-10ab4b87e558> in <module>()
----> 1 test_stationarity(tr)
<ipython-input-14-d779e1ed35b3> in test_stationarity(timeseries)
19 #Perform Dickey-Fuller test:
20 print 'Results of Dickey-Fuller Test:'
---> 21 dftest = ts.adfuller(timeseries, autolag='AIC' )
22 #dftest = adfuller(timeseries, autolag='AIC')
23 dfoutput = pd.Series(dftest[0:4], index=['Test Statistic','p-value','#Lags Used','Number of Observations Used'])
C:\Users\SONY\Anaconda2\lib\site-packages\statsmodels\tsa\stattools.pyc in adfuller(x, maxlag, regression, autolag, store, regresults)
209
210 xdiff = np.diff(x)
--> 211 xdall = lagmat(xdiff[:, None], maxlag, trim='both', original='in')
212 nobs = xdall.shape[0] # pylint: disable=E1103
213
C:\Users\SONY\Anaconda2\lib\site-packages\statsmodels\tsa\tsatools.pyc in lagmat(x, maxlag, trim, original)
322 if x.ndim == 1:
323 x = x[:,None]
--> 324 nobs, nvar = x.shape
325 if original in ['ex','sep']:
326 dropidx = nvar
ValueError: too many values to unpack
tr must be a 1d array-like, as you can see here. I don't know what is tr in your case. Assuming that you defined tr as the dataframe that contains the time serie's data, you should do something like this:
tr = tr.iloc[:,0].values
Then adfuller will be able to read the data.
just change the line as:
dftest = adfuller(timeseries.iloc[:,0].values, autolag='AIC' )
It will work. adfuller requires a 1D array list. In your case you have a dataframe. Therefore mention the column or edit the line as mentioned above.
I am assuming since you are using the Dickey-Fuller test .you want to maintain the timeseries i.e date time column as index.So in order to do that.
tr=tr.set_index('Month') #I am assuming here the time series column name is Month
ts = tr['othercoulumnname'] #Just use the other column name here it might be count or anything
I hope this helps.