Generating APA Text Tables in RMarkdown - r-markdown

I'm trying to generate this text table for a research proposal. I'm writing in RMarkdown and using the papaja plugin to get APA6 styling and to generate the PDF using this command:
rmarkdown::render("appendix.Rmd")
This table will be included in landscape mode. (I used MS Word to create this version.) I'm open to whatever packages or methods would work.

If you have the contents of the table in a data.frame, papaja's apa_table() can generate landscape LaTeX tables:
df <- cars[1:5, ]
colnames(df) <- paste0("\emph{", colnames(df) "}")
apa_table(
df
, caption = "This is a caption"
, landscape = TRUE
)

I ended up using latex. I noticed there was a package available for converting it to landscape mode, but decided to format it in portrait instead.
\begin{table}[]
\centering
\caption{Decision factors and operationalizations.}
\label{my-label}
\begin{tabular}{p{.18\linewidth}p{.1\linewidth}p{.72\linewidth}}
\hline
& Level & Operationalization \\ \hline
Expected return & High & The startup is expected to return 10x. \\
& Low & The startup is expected to return 5x. \\
Expected time horizon & High & A liquidity event is expected in 5-10 years.
\\
& Low & A liquidity event is expected in 2-3 years. \\
Coinvestment & High & The investment will be made with other well-regarded
investors. \\
& Low & The investment will be made alone. \\
Participation & High & The investment will be an active investment with
direct interaction with the founder. \\
& Low & The investment will be a passive investment with no interaction with
the founder. \\
Social impact & High & The startup is building a product that could have a
significant positive impact on society. \\
& Low & The startup is building an interesting product. \\
Investment size & High & The minimum investment required to participate is
large. \\
& Low & The minimum investment required to participate is small. \\
Entrepreneurial personality & High & The founder is charismatic, self-
confident, hard-working, but appears set in their plans and maintains a
cordial demeanor. \\
& Low & The founder does not exhibit any particular personality traits that
stand out, positively or negatively. \\ \hline
\end{tabular}
\end{table}

Related

Dutch Number & 2 Decimal Place Validation Power APP

A UK number looks like this 233.25 .
For a Dutch Number, it looks like this in the Power APP 233,25.
This works for the Power App in the UK but not in Holland, due to the comma for the decimal separator. How can I change this to get it to work? I have tried an OR statement but it does not work . ( ( TextInput.Text,"\d+(.\d{0,2})?") ) - REGEX
If(IsMatch( TextInput.Text,"\d+(.\d{0,2})?"), //justify whether meet the format
Submit(Form1), //submit the form if the result is true
Nodify("wrong format",NotificationType.Warning)) //display Notification if the result is false
You could try to format it yourself. Regardless of what the user inputs.
Left(Value(TextInput.Text),Len(Value(TextInput.Text))-2) & "," & Right(Value(TextInput.Text),2) & "."

Github icon in latex

I have resume template on overleaf. Now I want to add github logo on that after linkdein logo. How can I do that?
https://www.overleaf.com/articles/aditya-gadepallis-resume/kzdksnkdcrsr
How I want to add github logo after linkdein logo.
I am confused how to do that can someone please help me in that.
This is the tex file : https://pastebin.com/qVJ7HViy
This is the cls file : https://pastebin.com/AX27WY7e
\ProvidesClass{twentysecondcv}[2017/01/08 CV class]
\LoadClass{article}
\NeedsTeXFormat{LaTeX2e}
%----------------------------------------------------------------------------------------
% REQUIRED PACKAGES
%----------------------------------------------------------------------------------------
\RequirePackage[sfdefault]{ClearSans}
\RequirePackage[T1]{fontenc}
\RequirePackage{tikz}
\RequirePackage{xcolor}
\RequirePackage[absolute,overlay]{textpos}
\RequirePackage{ragged2e}
\RequirePackage{etoolbox}
\RequirePackage{ifmtarg}
\RequirePackage{ifthen}
\RequirePackage{pgffor}
\RequirePackage{marvosym}
\RequirePackage{parskip}
\DeclareOption*{\PassOptionsToClass{\CurrentOption}{article}}
\ProcessOptions\relax
%----------------------------------------------------------------------------------------
% COLOURS
%----------------------------------------------------------------------------------------
\definecolor{white}{RGB}{255,255,255}
\definecolor{gray}{HTML}{4D4D4D}
\definecolor{sidecolor}{HTML}{E7E7E7}
\definecolor{mainblue}{HTML}{0E5484}
\definecolor{maingray}{HTML}{B9B9B9}
%----------------------------------------------------------------------------------------
% MISC CONFIGURATIONS
%----------------------------------------------------------------------------------------
\renewcommand{\bfseries}{\color{gray}} % Make \textbf produce coloured text instead
\pagestyle{empty} % Disable headers and footers
\setlength{\parindent}{0pt} % Disable paragraph indentation
%----------------------------------------------------------------------------------------
% SIDEBAR DEFINITIONS
%----------------------------------------------------------------------------------------
\setlength{\TPHorizModule}{1cm} % Left margin
\setlength{\TPVertModule}{1cm} % Top margin
\newlength\imagewidth
\newlength\imagescale
\pgfmathsetlength{\imagewidth}{5cm}
\pgfmathsetlength{\imagescale}{\imagewidth/600}
\newlength{\TotalSectionLength} % Define a new length to hold the remaining line width after the section title is printed
\newlength{\SectionTitleLength} % Define a new length to hold the width of the section title
\newcommand{\profilesection}[1]{%
\setlength\TotalSectionLength{\linewidth}% Set the total line width
\settowidth{\SectionTitleLength}{\huge #1 }% Calculate the width of the section title
\addtolength\TotalSectionLength{-\SectionTitleLength}% Subtract the section title width from the total width
\addtolength\TotalSectionLength{-2.22221pt}% Modifier to remove overfull box warning
\vspace{8pt}% Whitespace before the section title
{\color{black!80} \huge #1 \rule[0.15\baselineskip]{\TotalSectionLength}{1pt}}% Print the title and auto-width rule
}
% Define custom commands for CV info
\newcommand{\cvdate}[1]{\renewcommand{\cvdate}{#1}}
\newcommand{\cvnumberphone}[1]{\renewcommand{\cvnumberphone}{#1}}
\newcommand{\cvaddress}[1]{\renewcommand{\cvaddress}{#1}}
\newcommand{\cvsite}[1]{\renewcommand{\cvsite}{#1}}
\newcommand{\Education}[1]{\renewcommand{\Education}{#1}}
\newcommand{\skills}[1]{\renewcommand{\skills}{#1}}
\newcommand{\COURSEWORK}[1]{\renewcommand{\COURSEWORK}{#1}}
\newcommand{\cvname}[1]{\renewcommand{\cvname}{#1}}
\newcommand{\cvjobtitle}[1]{\renewcommand{\cvjobtitle}{#1}}
% Command for printing the contact information icons
\newcommand*\icon[1]{\tikz[baseline=(char.base)]{\node[shape=circle,draw,inner sep=1pt, fill=mainblue,mainblue,text=white] (char) {#1};}}
% Command for printing skill progress bars
% Command for printing skills text
\newcommand\skillstext[1]{
\renewcommand{\skillstext}{
\begin{flushleft}
\foreach [count=\i] \x/\y in {#1}{
\x$ \star $\y
}
\end{flushleft}
}
}
%----------------------------------------------------------------------------------------
% SIDEBAR LAYOUT
%----------------------------------------------------------------------------------------
\newcommand{\makeprofile}{
\begin{tikzpicture}[remember picture,overlay]
\node [rectangle, fill=sidecolor, anchor=north, minimum width=9cm, minimum height=\paperheight+1cm] (box) at (-5cm,0.5cm){};
\end{tikzpicture}
%------------------------------------------------
\begin{textblock}{6}(0.5, 0.2)
%------------------------------------------------
\ifthenelse{\equal{\profilepic}{}}{}{
\begin{center}
\begin{tikzpicture}[x=\imagescale,y=-\imagescale]
\clip (600/2, 567/2) circle (567/2);
\node[anchor=north west, inner sep=0pt, outer sep=0pt] at (0,0) {\includegraphics[width=\imagewidth]{\profilepic}};
\end{tikzpicture}
\end{center}
}
%------------------------------------------------
{\Huge\color{mainblue}\cvname}
%------------------------------------------------
{\Large\color{black!80}\cvjobtitle}
%------------------------------------------------
\renewcommand{\arraystretch}{1.6}
\begin{tabular}{p{0.5cm} #{\hskip 0.5cm}p{5cm}}
\ifthenelse{\equal{\cvdate}{}}{}{\textsc{\Large\icon{\Info}} & \cvdate\\}
\ifthenelse{\equal{\cvaddress}{}}{}{\textsc{\Large\icon{\Letter}} & \cvaddress\\}
\ifthenelse{\equal{\cvnumberphone}{}}{}{\textsc{\Large\icon{\Telefon}} & \cvnumberphone\\}
\ifthenelse{\equal{\cvsite}{}}{}{\textsc{\Large\icon{\Mundus}} & \cvsite\\}
\ifthenelse{\equal{\cvmail}{}}{}{\textsc{\large\icon{#}} & \href{mailto:\cvmail}{\cvmail}}
\end{tabular}
%------------------------------------------------
\ifthenelse{\equal{\Education}{}}{}{
\profilesection{Education}
\begin{flushleft}
\Education
\end{flushleft}
}
%------------------------------------------------
\ifthenelse{\equal{\skills}{}}{}{
\profilesection{Skills}
\begin{flushleft}
\skills
\end{flushleft}
}
\ifthenelse{\equal{\COURSEWORK}{}}{}{
\profilesection{COURSEWORK}
\begin{flushleft}
\COURSEWORK
\end{flushleft}
}
%------------------------------------------------
\end{textblock}
}
%----------------------------------------------------------------------------------------
% COLOURED SECTION TITLE BOX
%----------------------------------------------------------------------------------------
% Command to create the rounded boxes around the first three letters of section titles
\newcommand*\round[2]{%
\tikz[baseline=(char.base)]\node[anchor=north west, draw,rectangle, rounded corners, inner sep=1.6pt, minimum size=5.5mm, text height=3.6mm, fill=#2,#2,text=white](char){#1};%
}
\newcounter{colorCounter}
\newcommand{\sectioncolor}[1]{%
{%
\round{#1}{
\ifcase\value{colorCounter}%
maingray\or%
mainblue\or%
maingray\or%
mainblue\or%
maingray\or%
mainblue\or%
maingray\or%
mainblue\or%
maingray\or%
mainblue\else%
maingray\fi%
}%
}%
\stepcounter{colorCounter}%
}
\renewcommand{\section}[1]{
{%
\color{gray}%
\Large\sectioncolor{#1}%
}
}
\renewcommand{\subsection}[1]{
\par\vspace{.5\parskip}{%
\large\color{gray} #1%
}
\par\vspace{.25\parskip}%
}
%----------------------------------------------------------------------------------------
% LONG LIST ENVIRONMENT
%----------------------------------------------------------------------------------------
\setlength{\tabcolsep}{0pt}
% New environment for the long list
\newenvironment{twenty}{%
\begin{tabular*}{\textwidth}{#{\extracolsep{\fill}}ll}
}{%
\end{tabular*}
}
\newcommand{\twentyitem}[4]{%
#1&\parbox[t]{0.83\textwidth}{%
\textbf{#2}%
\hfill%
{\footnotesize#3}\\%
#4\vspace{\parsep}%
}\\
}
%----------------------------------------------------------------------------------------
% SMALL LIST ENVIRONMENT
%----------------------------------------------------------------------------------------
\setlength{\tabcolsep}{0pt}
% New environment for the small list
\newenvironment{twentyshort}{%
\begin{tabular*}{\textwidth}{#{\extracolsep{\fill}}ll}
}{%
\end{tabular*}
}
\newcommand{\twentyitemshort}[2]{%
#1&\parbox[t]{0.83\textwidth}{%
\textbf{#2}%
}\\
}
%----------------------------------------------------------------------------------------
% MARGINS AND LINKS
%----------------------------------------------------------------------------------------
\RequirePackage[left=7.6cm,top=0.1cm,right=1cm,bottom=0.2cm,nohead,nofoot]{geometry}
\RequirePackage{hyperref}
Quick hack:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Twenty Seconds Resume/CV
% LaTeX Template
% Version 1.1 (8/1/17)
%
% This template has been downloaded from:
% http://www.LaTeXTemplates.com
%
% Original author:
% Carmine Spagnuolo (cspagnuolo#unisa.it) with major modifications by
% Vel (vel#LaTeXTemplates.com)
%
% License:
% The MIT License (see included LICENSE file)
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%----------------------------------------------------------------------------------------
% PACKAGES AND OTHER DOCUMENT CONFIGURATIONS
%----------------------------------------------------------------------------------------
\documentclass[letterpaper]{twentysecondcv} % a4paper for A4
%----------------------------------------------------------------------------------------
% PERSONAL INFORMATION
%----------------------------------------------------------------------------------------
% If you don't need one or more of the below, just remove the content leaving the command, e.g. \cvnumberphone{}
\cvname{Aditya} % Your name
\cvjobtitle{Gadepalli} % Job title/career
\cvdate{16 February 1997} % Date of birth
\cvaddress{adityagadepalli#gmail.com} % Short address/location, use \newline if more than 1 line is required
\cvnumberphone{+91 9553336954} % Phone number
\cvsite{linkedin.com/in/a-gadepalli} % Personal website
\makeatletter
\newcommand{\cvmail}{example#mail.com}
\usepackage{fontawesome5}
\renewcommand{\makeprofile}{
\begin{tikzpicture}[remember picture,overlay]
\node [rectangle, fill=sidecolor, anchor=north, minimum width=9cm, minimum height=\paperheight+1cm] (box) at (-5cm,0.5cm){};
\end{tikzpicture}
%------------------------------------------------
\begin{textblock}{6}(0.5, 0.2)
%------------------------------------------------
\ifthenelse{\equal{\profilepic}{}}{}{
\begin{center}
\begin{tikzpicture}[x=\imagescale,y=-\imagescale]
\clip (600/2, 567/2) circle (567/2);
\node[anchor=north west, inner sep=0pt, outer sep=0pt] at (0,0) {\includegraphics[width=\imagewidth]{\profilepic}};
\end{tikzpicture}
\end{center}
}
%------------------------------------------------
{\Huge\color{mainblue}\cvname}
%------------------------------------------------
{\Large\color{black!80}\cvjobtitle}
%------------------------------------------------
\renewcommand{\arraystretch}{1.6}
\begin{tabular}{p{0.5cm} #{\hskip 0.5cm}p{5cm}}
\ifthenelse{\equal{\cvdate}{}}{}{\textsc{\Large\icon{\Info}} & \cvdate\\}
\ifthenelse{\equal{\cvaddress}{}}{}{\textsc{\Large\icon{\Letter}} & \cvaddress\\}
\ifthenelse{\equal{\cvnumberphone}{}}{}{\textsc{\Large\icon{\Telefon}} & \cvnumberphone\\}
\ifthenelse{\equal{\cvsite}{}}{}{\textsc{\Large\icon{\scalebox{0.85}{\faLinkedin}}} & \url{linkedin.com/in/a-gadepalli}\\}
\textsc{\Large\icon{\scalebox{0.8}{\faGithub}}} & \url{https://github.com}\\
\ifthenelse{\equal{\cvmail}{}}{}{\textsc{\large\icon{#}} & \href{mailto:\cvmail}{\cvmail}}
\end{tabular}
%------------------------------------------------
\ifthenelse{\equal{\Education}{}}{}{
\profilesection{Education}
\begin{flushleft}
\Education
\end{flushleft}
}
%------------------------------------------------
\ifthenelse{\equal{\skills}{}}{}{
\profilesection{Skills}
\begin{flushleft}
\skills
\end{flushleft}
}
\ifthenelse{\equal{\ExtraCurricular}{}}{}{
\profilesection{Extra-Curricular}
\begin{flushleft}
\ExtraCurricular
\end{flushleft}
}
%------------------------------------------------
\end{textblock}
}
\makeatother
\newcommand{\profilepic}{example-image-duck}
%----------------------------------------------------------------------------------------
\begin{document}
%----------------------------------------------------------------------------------------
% Education
%----------------------------------------------------------------------------------------
\Education{B.E.(Hons.) Mechanical Engineering BITS Pilani | 2018 | GPA:8.947/10
\newline \newline Class XII \newline Mahathi Jr College | 2014 | 95.7 \%
\newline \newline Class X \newline Rishi Vid. Gurukulam | 2012 | 10/10} % To have no Education section, just remove all the text and leave \Education{}
%----------------------------------------------------------------------------------------
% SKILLS
%----------------------------------------------------------------------------------------
% Skill bar section, each skill must have a value between 0 an 6 (float)
\skills{Languages: C, Java, Matlab, SQL, Gosu, Basics of R and Python
\newline WebDev : HTML, CSS, JS, JQuery, php
\newline Simulation : Ansys, Comsol, Arduino
\newline Design : ProE, AutoCAD, Solidworks, Autodesk Alias, Adobe Photoshop
\newline Other: MS Office, JIRA, Git, Adobe AE
\newline Certifications: Oracle Certified Associate, Java SE 8 Programmer}
\ExtraCurricular{Co-Founder \& Head # Evolve \\ - Organized talks \& workshops
\\ - Brought the alumni of IITs, BITS, NITs and industry professionals to guide students
\newline\newline Graphic Designer # Designers Anonymous \& Dept. of Technical Arts\\- Taught design softwares to students\\- Designed content to publicize fests
\newline \newline Event Manager # BITS Embryo \\ - Organizer of conclave forums \& talks\\
- Handled logistics \& pitched speakers
\newline\newline Volunteer # NSS \& Nirmaan NGO \\ - Created jobs for rural women\\- Co-organized Cyclone relief fund-raiser, cleanliness drives \& taught rural school kids
\newline\newline Class Committee representative \& Teaching Assistant for the courses Production Techniques \& Human Resource Development}
%------------------------------------------------
% Skill text section, each skill must have a value between 0 an 6
%----------------------------------------------------------------------------------------
\makeprofile % Print the sidebar
%----------------------------------------------------------------------------------------
% INTERESTS
%----------------------------------------------------------------------------------------
%----------------------------------------------------------------------------------------
% EDUCATION
%----------------------------------------------------------------------------------------
\section{Work Experience and Internships}
\begin{twenty} % Environment for a list with descriptions
\twentyitem{Since Aug'18}{Senior Analyst}{Capgemini, Hyderabad}{Building a Rating Engine for Insurance firms using Guidewire \& Java}
\twentyitem{Jan-Jun'18}{Research Intern}{Center for AI \& Robotics, DRDO, Bengaluru}{Designed \& built a Stair Climber bot along with its basic SDK library\\ Worked on Active Noise Cancellation algorithms for Wall Climber bot}
\twentyitem{July'17}{Summer Intern}{NTPC Limited, Solapur}{Analyed \emph{Scope of Wind capacity installations} using Meteorological data and learnt about the practical challenges of commissioning them}
\twentyitem{June'17}{Summer Intern}{RINL, Visakhapatnam}{Learnt about steel making processes \& waste heat recovery systems\\ Analyzed workflow \& suggested optimization of few bottlenecks}
\twentyitem{Summer'16}{Summer Intern}{Century Rayon, Mumbai}{Investigated the "Heating Effects in Cake conditioning Rooms"\\
Studied TQM and DMAIC practices to address performance issues}
%\twentyitem{<dates>}{<title>}{<location>}{<description>}
\end{twenty}
%----------------------------------------------------------------------------------------
% PUBLICATIONS
%----------------------------------------------------------------------------------------
\section{Research and Projects}
\begin{twenty} % Environment for a short list with no descriptions
\twentyitem{May-Dec'17}{Advanced Materials for Energy Efficient Buildings} {}{Studied the application of Phase Change Materials in buildings to \\reduce cooling load \& energy usage across various geographies}
\twentyitem{Jan-May'17}{Design of Cleanroom for MEMS Fabrication}{}{Optimized the control parameters of an ISO 5 Cleanroom using DOE methods \& COMSOL5.1 (CFD) \& validated results with Hemair SI Ltd.}
\twentyitem{Jan-May'17}{Interactive Creation of Splines}{}{Formulated algorithms and built a GUI using Matlab to obtain splines and their respective blending functions for any given dynamic input}
\twentyitem{Aug-Nov'16}{Fabrication of Tabletop EDM Machine}{}{Led a team of 8 students to design \& build the product from genesis \&
achieved 20 micron erosion on AISI 1020 steel using brass electrode}
\twentyitem{Aug-Nov'15}{Critical analysis of Performance appraisal systems}{}{Led a team of 8 students to survey \& review the appraisal systems of NTPC and IBM to measure its influence on employee's work outputs}
%\twentyitemshort{<dates>}{<title/description>}
\end{twenty}
%----------------------------------------------------------------------------------------
% AWARDS
%----------------------------------------------------------------------------------------
\section{Achievements}
\begin{twenty} % Environment for a short list with no descriptions
\twentyitem{Dec'17}{Book Distribution Campaign}{}{Single handedly ran a campaign \& sold over 150+ books within 3 days}
\twentyitem{Mar'17}{First - VIKAS Soch Ideation Marathon}{}{Won 10k in cash for best social startup idea at Launchpad E-Summit}
\twentyitem{Mar'16}{Runners Up - Ground Reality}{}{Won 12k in cash for the best B-Plan submission at Pearl 2017}
\twentyitem{Feb'10}{Best All Rounder Award}{}{Ranked first across all campuses of school for all-round excellence}
\twentyitem{2010-11}{Olympiads(School Level)}{}{GOLD Medals won at NSO(2011), NCO(2010) \& IMO(2010))}
%\twentyitemshort{<dates>}{<title/description>}
\end{twenty}
%----------------------------------------------------------------------------------------
% EXPERIENCE
%----------------------------------------------------------------------------------------
\section{Electives and MOOCs}
\begin{twenty} % Environment for a list with descriptions
\twentyitem{Electives}{Reverse Engineering \& Rapid Prototyping, Renewable Energy, Quality Control Assurance \& Reliability, Project Appraisal, Public Policy,\\Principles of Management, HR Development, Fundamentals of\\Finance \& Accounting, Srimad Bhagavad Gita}{}{}
\twentyitem{MOOCs}{\textbf{Introduction to R,SQL \& Python Courses on Datacamp}\\Currently pursuing Deep Learning Specialization on Coursera}{}{}
%\twentyitem{<dates>}{<title>}{<location>}{<description>}
\end{twenty}
%----------------------------------------------------------------------------------------
% OTHER INFORMATION
%----------------------------------------------------------------------------------------
%----------------------------------------------------------------------------------------
% SECOND PAGE EXAMPLE
%----------------------------------------------------------------------------------------
%\newpage % Start a new page
%\makeprofile % Print the sidebar
%\section{Other information}
%\subsection{Review}
%Alice approaches Wonderland as an anthropologist, but maintains a strong sense of noblesse oblige that comes with her class status. She has confidence in her social position, education, and the Victorian virtue of good manners. Alice has a feeling of entitlement, particularly when comparing herself to Mabel, whom she declares has a ``poky little house," and no toys. Additionally, she flaunts her limited information base with anyone who will listen and becomes increasingly obsessed with the importance of good manners as she deals with the rude creatures of Wonderland. Alice maintains a superior attitude and behaves with solicitous indulgence toward those she believes are less privileged.
%\section{Other information}
%\subsection{Review}
%Alice approaches Wonderland as an anthropologist, but maintains a strong sense of noblesse oblige that comes with her class status. She has confidence in her social position, education, and the Victorian virtue of good manners. Alice has a feeling of entitlement, particularly when comparing herself to Mabel, whom she declares has a ``poky little house," and no toys. Additionally, she flaunts her limited information base with anyone who will listen and becomes increasingly obsessed with the importance of good manners as she deals with the rude creatures of Wonderland. Alice maintains a superior attitude and behaves with solicitous indulgence toward those she believes are less privileged.
%----------------------------------------------------------------------------------------
\end{document}

regular expression to return the text which has 1 or more periods in between parenthesis

I have a text which has 1 or more 2 period in between parenthesis.
K= 'Product will be hot(These cooking instructions were developed using an 100 watt microwave oven. For lower wattage ovens, up to an additional 2 minutes cooking time may be required).'
I'd like to extract or eliminate that entire text.I have tried
re.search(r'\((.*?)+\)',K).group(1)
and
K[K.find("(")+1:K.find(")")]
but none of them returns the text
IIUC, the following regex will remove any text between parentheses that contains one or more periods, as well as the parentheses themselves:
re.sub('\(.*?\.+.*\)','', K)
Example:
>>> re.sub('\(.*?\.+.*\)','', K)
'Product will be hot.'
To extract the text instead of removing it, use re.findall with the same regex:
>>> re.findall('\(.*?\.+.*\)', K)
['(These cooking instructions were developed using an 100 watt microwave oven. For lower wattage ovens, up to an additional 2 minutes cooking time may be required)']
[Edit]: To match if there are more than one set of braces, the following works:
K='Product will be hot (These cooking instructions were. developed using an 100 watt microwave oven). For lower wattage ovens (up to an additional 2 minutes. cooking time may be required).'
>>> re.findall('\(.*?\.+.*?\)', K)
['(These cooking instructions were. developed using an 100 watt microwave oven)', '(up to an additional 2 minutes. cooking time may be required)']
>>> re.sub('\(.*?\.+.*?\)', '', K)
'Product will be hot . For lower wattage ovens .'
You can use expression:
(?<=\()[^()]*(?=\))
Try the expression live here.
Use re.findall to find the text you are interested in.
import re
K = 'Product will be hot(These cooking instructions were developed using an 100 watt microwave oven. For lower wattage ovens, up to an additional 2 minutes cooking time may be required).'
print(re.findall(r'(?<=\()[^()]*(?=\))',K))
Prints:
['These cooking instructions were developed using an 100 watt microwave oven. For lower wattage ovens, up to an additional 2 minutes cooking time may be required']
Alternatively wrap the character set in a capturing group:
import re
K = 'Product will be hot(These cooking instructions were developed using an 100 watt microwave oven. For lower wattage ovens, up to an additional 2 minutes cooking time may be required).'
print(re.search(r'(?<=\()([^()]*)(?=\))',K).group(1))
Prints:
These cooking instructions were developed using an 100 watt microwave oven. For lower wattage ovens, up to an additional 2 minutes cooking time may be required
This takes care that no substitution is done if more than two periods are in the parentheses, and also, that not two parenthesized sections get merged thus eliminating text between them:
>>> re.sub(r'\(([^.(]*\.){1,2}[^.()]*\)',"",K)
'Product will be hot.'
If you also want to remove parenthesized sections with more than two periods, you may simply replace {1,2} by a +:
>>> re.sub(r'\(([^.(]*\.)+[^.()]*\)',"",K)

Extract texts from a large character string based a pattern

I have a large string of characters and would like to extract certain information from it matching pattern:
str(input)
chr [1:109094] "{'asin': '0981850006', 'description': 'Steven Raichlen\'s Best of Barbecue Primal Grill DVD. The first three volumes of the si"| truncated ...
I get the following content of input[1] - description of product meta
[1] ("{'asin': '144072007X', 'related': {'also_viewed': ['B008WC0X0A', 'B000CPMOVG', 'B0046641AE', 'B00J150GAO', 'B00005AMCG', 'B005WGX97I'],
'bought_together': ['B000H85WSA']},
'title': 'Sand Shark Margare Maron Audio CD',
'price': 577.15,
'salesRank': {'Patio, Lawn & Garden': 188289},
'imUrl': 'http://ecx.images-amazon.com/images/I/31B9X0S6dqL._SX300_.jpg',
'brand': 'Tesoro',
'categories': [['Patio, Lawn & Garden', 'Lawn Mowers & Outdoor Power Tools', 'Metal Detectors']],
'description': \"The Tesoro Sand Shark metal combines time-proven PI circuits with the latest digital technology creating the first.\"}")
Now I would like to iterate over each element of the large string and extract asin, title, price, salesRank, brand and categories that should be saved in a data.frame for better handling.
The data is originally from a JSON file as you might notice. I tried to import it using stream_in command, but it didn't help. So just imported it using readLines. Please please help! Being a bit desperate...Any hint is appreciated!
The jsonlite package shows the following problem:
lexical error: invalid char in json text.
{'asin': '0981850006', 'descript
(right here) ------^
closing fileconnectionoldClass input connection.
Any new ideas on that?
Given lots of unanswered questions on that issue, must be very relevant for newbies ;)

R - does failed RegEx pattern matching originate in file conversion or use of tm package?

As a relative novice in R and programming, my first ever question in this forum is about regex pattern matching, specifically line breaks. First some background. I am trying to perform some preprocessing on a corpus of texts using R before processing them further on the NLP platform GATE. I convert the original pdf files to text as follows (the text files, unfortunately, go into the same folder):
dest <- "./MyFolderWithPDFfiles"
myfiles <- list.files(path = dest, pattern = "pdf", full.names = TRUE)
lapply(myfiles, function(i) system(paste('"C:/Program Files (x86)/xpdfbin-win-3.04/bin64/pdftotext.exe"', paste0('"', i, '"')), wait = FALSE))
Then, having loaded the tm package and physically(!) moved the text files to another folder, I create a corpus:
TextFiles <- "./MyFolderWithTXTfiles"
EU <- Corpus(DirSource(TextFiles))
I then want to perform a series of custom transformations to clean the texts. I succeeded to replace a simple string as follows:
ReplaceText <- content_transformer(function(x, from, to) gsub(from, to, x, perl=T))
EU2 <- tm_map(EU, ReplaceText, "Table of contents", "TOC")
However, a pattern that is a 1-3 digit page number followed by two line breaks and a page break is causing me problems. I want to replace it with a blank space:
EU2 <- tm_map(EU, ReplaceText, "[0-9]{1,3}\n\n\f", " ")
The ([0-9]{1,3}) and \f alone match. The line breaks don't. If I copy text from one of the original .txt files into the RegExr online tool and test the expression "[0-9]{1,3}\n\n\f", it matches. So the line breaks do exist in the original .txt file.
But when I view one of the .txt files as read into the EU corpus in R, there appear to be no line breaks even though the lines are obviously breaking before the margin, e.g.
[3] "PROGRESS TOWARDS ACCESSION"
[4] "1"
[5] ""
[6] "\fTable of contents"
Seeing this, I tried other patterns, e.g. to detect one or more blank space ("[0-9]{1,3}\s*\f"), but no patterns worked.
So my questions are:
Am I converting and reading the files into R correctly? If so, what has happened to the line breaks?
If no line breaks is normal, how can I pattern match the character on line 5? Is that not a blank
space?
(A tangential concern:) When converting the pdf files, is there code that will put them directly in a new folder?
Apologies for extending this, but how can one print or inspect only a few lines of the text object? The tm commands and head(EU) print the entire object, each a very long text.
I know my problem(s) must appear simple and perhaps stupid, but one has to start somewhere and extensive searching has not revealed a source that explains comprehensively how to use RegExes to modify text objects in R. I am so frustrated and hope someone here will take pity and can help me.
Thanks for any advice you can offer.
Brigitte
p.s. I think it's not possible to upload attachments in this forum, therefore, here is a link to one of the original PDF documents: http://ec.europa.eu/enlargement/archives/pdf/key_documents/1998/czech_en.pdf
Because the doc is long, I created a snippet of the first 3 pages of the TXT doc, read it into the R corpus ('EU') and printed it to the console and this is it:
dput(EU[[2]])
structure(list(content = c("REGULAR REPORT", "FROM THE COMMISSION ON",
"CZECH REPUBLIC'S", "PROGRESS TOWARDS ACCESSION ***********************",
"1", "", "\fTable of contents", "A. Introduction", "a) Preface The Context of the Progress Report",
"b) Relations between the European Union and the Czech Republic The enhanced Pre-Accession Strategy Recent developments in bilateral relations",
"B. Criteria for membership", "1. Political criteria", "1.1. Democracy and the Rule of Law Parliament The Executive The judicial system Anti-Corruption measures",
"1.2. Human Rights and the Protection of Minorities Civil and Political Rights Economic, Social and Cultural Rights Minority Rights and the Protection of Minorities",
"1.3. General evaluation", "2. Economic criteria", "2.1. Introduction 2.2. Economic developments since the Commission published its Opinion",
"Macroeconomic developments Structural reforms 2.3. Assessment in terms of the Copenhagen criteria The existence of a functioning market economy The capacity to cope with competitive pressure and market forces 2.4. General evaluation",
"3. Ability to assume the obligations of Membership", "3.1. Internal Market without frontiers General framework The Four Freedoms Competition",
"3.2. Innovation Information Society Education, Training and Youth Research and Technological Development Telecommunications Audio-visual",
"3.3. Economic and Fiscal Affairs Economic and Monetary Union",
"2", "", "\fTaxation Statistics "), meta = structure(list(author = character(0),
datetimestamp = structure(list(sec = 50.1142621040344, min = 33L,
hour = 15L, mday = 3L, mon = 10L, year = 114L, wday = 1L,
yday = 306L, isdst = 0L), .Names = c("sec", "min", "hour",
"mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXlt",
"POSIXt"), tzone = "GMT"), description = character(0), heading = character(0),
id = "CZ1998ProgressSnippet.txt", language = "en", origin = character(0)), .Names = c("author",
"datetimestamp", "description", "heading", "id", "language",
"origin"), class = "TextDocumentMeta")), .Names = c("content",
"meta"), class = c("PlainTextDocument", "TextDocument"))
Yes, working with text in R is not always a smooth experience! But you can get a lot done quickly with some effort (maybe too much effort!)
If you could share one of your PDF files or the output of dput(EU), that might help to identify exactly how to capture your page numbers with regex. That would also add a reproducible example to your question, which is an important thing to have in questions here so that people can test their answers and make sure they work for your specific problem.
No need to put PDF and text files in separate folders, instead you can use a pattern like so:
EU <- Corpus(DirSource(pattern = ".txt"))
This will only read the text files and ignore the PDF files
There is no 'snippet view' method in tm, which is annoying. I often use just names(EU) and EU[[1]] for quick looks
UPDATE
With the data you've just added, I'd suggest a slightly tangential approach. Do the regex work before passing the data to the tm package formats, like so:
# get the PDF
download.file("http://ec.europa.eu/enlargement/archives/pdf/key_documents/1998/czech_en.pdf", "my_pdf.pdf", method = "wget")
# get the file name of the PDF
myfiles <- list.files(path = getwd(), pattern = "pdf", full.names = TRUE)
# convert to text (not my pdftotext is in a different location to you)
lapply(myfiles, function(i) system(paste('"C:/Program Files/xpdf/bin64/pdftotext.exe"', paste0('"', i, '"')), wait = FALSE))
# read plain text int R
x1 <- readLines("my_pdf.txt")
# make into a single string
x2 <- paste(x1, collapse = " ")
# do some regex...
x3 <- gsub("Table of contents", "TOC", x2)
x4 <- gsub("[0-9]{1,3} \f", "", x3)
# convert to corpus for text mining operations
x5 <- Corpus(VectorSource(x4))
With the snippet of data your provided using dput, the output from this method is
inspect(x5)
<<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>>
[[1]]
<<PlainTextDocument (metadata: 7)>>
REGULAR REPORT FROM THE COMMISSION ON CZECH REPUBLIC'S PROGRESS TOWARDS ACCESSION *********************** TOC A. Introduction a) Preface The Context of the Progress Report b) Relations between the European Union and the Czech Republic The enhanced Pre-Accession Strategy Recent developments in bilateral relations B. Criteria for membership 1. Political criteria 1.1. Democracy and the Rule of Law Parliament The Executive The judicial system Anti-Corruption measures 1.2. Human Rights and the Protection of Minorities Civil and Political Rights Economic, Social and Cultural Rights Minority Rights and the Protection of Minorities 1.3. General evaluation 2. Economic criteria 2.1. Introduction 2.2. Economic developments since the Commission published its Opinion Macroeconomic developments Structural reforms 2.3. Assessment in terms of the Copenhagen criteria The existence of a functioning market economy The capacity to cope with competitive pressure and market forces 2.4. General evaluation 3. Ability to assume the obligations of Membership 3.1. Internal Market without frontiers General framework The Four Freedoms Competition 3.2. Innovation Information Society Education, Training and Youth Research and Technological Development Telecommunications Audio-visual 3.3. Economic and Fiscal Affairs Economic and Monetary Union Taxation Statistics