Github icon in latex - templates

I have resume template on overleaf. Now I want to add github logo on that after linkdein logo. How can I do that?
https://www.overleaf.com/articles/aditya-gadepallis-resume/kzdksnkdcrsr
How I want to add github logo after linkdein logo.
I am confused how to do that can someone please help me in that.
This is the tex file : https://pastebin.com/qVJ7HViy
This is the cls file : https://pastebin.com/AX27WY7e
\ProvidesClass{twentysecondcv}[2017/01/08 CV class]
\LoadClass{article}
\NeedsTeXFormat{LaTeX2e}
%----------------------------------------------------------------------------------------
% REQUIRED PACKAGES
%----------------------------------------------------------------------------------------
\RequirePackage[sfdefault]{ClearSans}
\RequirePackage[T1]{fontenc}
\RequirePackage{tikz}
\RequirePackage{xcolor}
\RequirePackage[absolute,overlay]{textpos}
\RequirePackage{ragged2e}
\RequirePackage{etoolbox}
\RequirePackage{ifmtarg}
\RequirePackage{ifthen}
\RequirePackage{pgffor}
\RequirePackage{marvosym}
\RequirePackage{parskip}
\DeclareOption*{\PassOptionsToClass{\CurrentOption}{article}}
\ProcessOptions\relax
%----------------------------------------------------------------------------------------
% COLOURS
%----------------------------------------------------------------------------------------
\definecolor{white}{RGB}{255,255,255}
\definecolor{gray}{HTML}{4D4D4D}
\definecolor{sidecolor}{HTML}{E7E7E7}
\definecolor{mainblue}{HTML}{0E5484}
\definecolor{maingray}{HTML}{B9B9B9}
%----------------------------------------------------------------------------------------
% MISC CONFIGURATIONS
%----------------------------------------------------------------------------------------
\renewcommand{\bfseries}{\color{gray}} % Make \textbf produce coloured text instead
\pagestyle{empty} % Disable headers and footers
\setlength{\parindent}{0pt} % Disable paragraph indentation
%----------------------------------------------------------------------------------------
% SIDEBAR DEFINITIONS
%----------------------------------------------------------------------------------------
\setlength{\TPHorizModule}{1cm} % Left margin
\setlength{\TPVertModule}{1cm} % Top margin
\newlength\imagewidth
\newlength\imagescale
\pgfmathsetlength{\imagewidth}{5cm}
\pgfmathsetlength{\imagescale}{\imagewidth/600}
\newlength{\TotalSectionLength} % Define a new length to hold the remaining line width after the section title is printed
\newlength{\SectionTitleLength} % Define a new length to hold the width of the section title
\newcommand{\profilesection}[1]{%
\setlength\TotalSectionLength{\linewidth}% Set the total line width
\settowidth{\SectionTitleLength}{\huge #1 }% Calculate the width of the section title
\addtolength\TotalSectionLength{-\SectionTitleLength}% Subtract the section title width from the total width
\addtolength\TotalSectionLength{-2.22221pt}% Modifier to remove overfull box warning
\vspace{8pt}% Whitespace before the section title
{\color{black!80} \huge #1 \rule[0.15\baselineskip]{\TotalSectionLength}{1pt}}% Print the title and auto-width rule
}
% Define custom commands for CV info
\newcommand{\cvdate}[1]{\renewcommand{\cvdate}{#1}}
\newcommand{\cvnumberphone}[1]{\renewcommand{\cvnumberphone}{#1}}
\newcommand{\cvaddress}[1]{\renewcommand{\cvaddress}{#1}}
\newcommand{\cvsite}[1]{\renewcommand{\cvsite}{#1}}
\newcommand{\Education}[1]{\renewcommand{\Education}{#1}}
\newcommand{\skills}[1]{\renewcommand{\skills}{#1}}
\newcommand{\COURSEWORK}[1]{\renewcommand{\COURSEWORK}{#1}}
\newcommand{\cvname}[1]{\renewcommand{\cvname}{#1}}
\newcommand{\cvjobtitle}[1]{\renewcommand{\cvjobtitle}{#1}}
% Command for printing the contact information icons
\newcommand*\icon[1]{\tikz[baseline=(char.base)]{\node[shape=circle,draw,inner sep=1pt, fill=mainblue,mainblue,text=white] (char) {#1};}}
% Command for printing skill progress bars
% Command for printing skills text
\newcommand\skillstext[1]{
\renewcommand{\skillstext}{
\begin{flushleft}
\foreach [count=\i] \x/\y in {#1}{
\x$ \star $\y
}
\end{flushleft}
}
}
%----------------------------------------------------------------------------------------
% SIDEBAR LAYOUT
%----------------------------------------------------------------------------------------
\newcommand{\makeprofile}{
\begin{tikzpicture}[remember picture,overlay]
\node [rectangle, fill=sidecolor, anchor=north, minimum width=9cm, minimum height=\paperheight+1cm] (box) at (-5cm,0.5cm){};
\end{tikzpicture}
%------------------------------------------------
\begin{textblock}{6}(0.5, 0.2)
%------------------------------------------------
\ifthenelse{\equal{\profilepic}{}}{}{
\begin{center}
\begin{tikzpicture}[x=\imagescale,y=-\imagescale]
\clip (600/2, 567/2) circle (567/2);
\node[anchor=north west, inner sep=0pt, outer sep=0pt] at (0,0) {\includegraphics[width=\imagewidth]{\profilepic}};
\end{tikzpicture}
\end{center}
}
%------------------------------------------------
{\Huge\color{mainblue}\cvname}
%------------------------------------------------
{\Large\color{black!80}\cvjobtitle}
%------------------------------------------------
\renewcommand{\arraystretch}{1.6}
\begin{tabular}{p{0.5cm} #{\hskip 0.5cm}p{5cm}}
\ifthenelse{\equal{\cvdate}{}}{}{\textsc{\Large\icon{\Info}} & \cvdate\\}
\ifthenelse{\equal{\cvaddress}{}}{}{\textsc{\Large\icon{\Letter}} & \cvaddress\\}
\ifthenelse{\equal{\cvnumberphone}{}}{}{\textsc{\Large\icon{\Telefon}} & \cvnumberphone\\}
\ifthenelse{\equal{\cvsite}{}}{}{\textsc{\Large\icon{\Mundus}} & \cvsite\\}
\ifthenelse{\equal{\cvmail}{}}{}{\textsc{\large\icon{#}} & \href{mailto:\cvmail}{\cvmail}}
\end{tabular}
%------------------------------------------------
\ifthenelse{\equal{\Education}{}}{}{
\profilesection{Education}
\begin{flushleft}
\Education
\end{flushleft}
}
%------------------------------------------------
\ifthenelse{\equal{\skills}{}}{}{
\profilesection{Skills}
\begin{flushleft}
\skills
\end{flushleft}
}
\ifthenelse{\equal{\COURSEWORK}{}}{}{
\profilesection{COURSEWORK}
\begin{flushleft}
\COURSEWORK
\end{flushleft}
}
%------------------------------------------------
\end{textblock}
}
%----------------------------------------------------------------------------------------
% COLOURED SECTION TITLE BOX
%----------------------------------------------------------------------------------------
% Command to create the rounded boxes around the first three letters of section titles
\newcommand*\round[2]{%
\tikz[baseline=(char.base)]\node[anchor=north west, draw,rectangle, rounded corners, inner sep=1.6pt, minimum size=5.5mm, text height=3.6mm, fill=#2,#2,text=white](char){#1};%
}
\newcounter{colorCounter}
\newcommand{\sectioncolor}[1]{%
{%
\round{#1}{
\ifcase\value{colorCounter}%
maingray\or%
mainblue\or%
maingray\or%
mainblue\or%
maingray\or%
mainblue\or%
maingray\or%
mainblue\or%
maingray\or%
mainblue\else%
maingray\fi%
}%
}%
\stepcounter{colorCounter}%
}
\renewcommand{\section}[1]{
{%
\color{gray}%
\Large\sectioncolor{#1}%
}
}
\renewcommand{\subsection}[1]{
\par\vspace{.5\parskip}{%
\large\color{gray} #1%
}
\par\vspace{.25\parskip}%
}
%----------------------------------------------------------------------------------------
% LONG LIST ENVIRONMENT
%----------------------------------------------------------------------------------------
\setlength{\tabcolsep}{0pt}
% New environment for the long list
\newenvironment{twenty}{%
\begin{tabular*}{\textwidth}{#{\extracolsep{\fill}}ll}
}{%
\end{tabular*}
}
\newcommand{\twentyitem}[4]{%
#1&\parbox[t]{0.83\textwidth}{%
\textbf{#2}%
\hfill%
{\footnotesize#3}\\%
#4\vspace{\parsep}%
}\\
}
%----------------------------------------------------------------------------------------
% SMALL LIST ENVIRONMENT
%----------------------------------------------------------------------------------------
\setlength{\tabcolsep}{0pt}
% New environment for the small list
\newenvironment{twentyshort}{%
\begin{tabular*}{\textwidth}{#{\extracolsep{\fill}}ll}
}{%
\end{tabular*}
}
\newcommand{\twentyitemshort}[2]{%
#1&\parbox[t]{0.83\textwidth}{%
\textbf{#2}%
}\\
}
%----------------------------------------------------------------------------------------
% MARGINS AND LINKS
%----------------------------------------------------------------------------------------
\RequirePackage[left=7.6cm,top=0.1cm,right=1cm,bottom=0.2cm,nohead,nofoot]{geometry}
\RequirePackage{hyperref}

Quick hack:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Twenty Seconds Resume/CV
% LaTeX Template
% Version 1.1 (8/1/17)
%
% This template has been downloaded from:
% http://www.LaTeXTemplates.com
%
% Original author:
% Carmine Spagnuolo (cspagnuolo#unisa.it) with major modifications by
% Vel (vel#LaTeXTemplates.com)
%
% License:
% The MIT License (see included LICENSE file)
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%----------------------------------------------------------------------------------------
% PACKAGES AND OTHER DOCUMENT CONFIGURATIONS
%----------------------------------------------------------------------------------------
\documentclass[letterpaper]{twentysecondcv} % a4paper for A4
%----------------------------------------------------------------------------------------
% PERSONAL INFORMATION
%----------------------------------------------------------------------------------------
% If you don't need one or more of the below, just remove the content leaving the command, e.g. \cvnumberphone{}
\cvname{Aditya} % Your name
\cvjobtitle{Gadepalli} % Job title/career
\cvdate{16 February 1997} % Date of birth
\cvaddress{adityagadepalli#gmail.com} % Short address/location, use \newline if more than 1 line is required
\cvnumberphone{+91 9553336954} % Phone number
\cvsite{linkedin.com/in/a-gadepalli} % Personal website
\makeatletter
\newcommand{\cvmail}{example#mail.com}
\usepackage{fontawesome5}
\renewcommand{\makeprofile}{
\begin{tikzpicture}[remember picture,overlay]
\node [rectangle, fill=sidecolor, anchor=north, minimum width=9cm, minimum height=\paperheight+1cm] (box) at (-5cm,0.5cm){};
\end{tikzpicture}
%------------------------------------------------
\begin{textblock}{6}(0.5, 0.2)
%------------------------------------------------
\ifthenelse{\equal{\profilepic}{}}{}{
\begin{center}
\begin{tikzpicture}[x=\imagescale,y=-\imagescale]
\clip (600/2, 567/2) circle (567/2);
\node[anchor=north west, inner sep=0pt, outer sep=0pt] at (0,0) {\includegraphics[width=\imagewidth]{\profilepic}};
\end{tikzpicture}
\end{center}
}
%------------------------------------------------
{\Huge\color{mainblue}\cvname}
%------------------------------------------------
{\Large\color{black!80}\cvjobtitle}
%------------------------------------------------
\renewcommand{\arraystretch}{1.6}
\begin{tabular}{p{0.5cm} #{\hskip 0.5cm}p{5cm}}
\ifthenelse{\equal{\cvdate}{}}{}{\textsc{\Large\icon{\Info}} & \cvdate\\}
\ifthenelse{\equal{\cvaddress}{}}{}{\textsc{\Large\icon{\Letter}} & \cvaddress\\}
\ifthenelse{\equal{\cvnumberphone}{}}{}{\textsc{\Large\icon{\Telefon}} & \cvnumberphone\\}
\ifthenelse{\equal{\cvsite}{}}{}{\textsc{\Large\icon{\scalebox{0.85}{\faLinkedin}}} & \url{linkedin.com/in/a-gadepalli}\\}
\textsc{\Large\icon{\scalebox{0.8}{\faGithub}}} & \url{https://github.com}\\
\ifthenelse{\equal{\cvmail}{}}{}{\textsc{\large\icon{#}} & \href{mailto:\cvmail}{\cvmail}}
\end{tabular}
%------------------------------------------------
\ifthenelse{\equal{\Education}{}}{}{
\profilesection{Education}
\begin{flushleft}
\Education
\end{flushleft}
}
%------------------------------------------------
\ifthenelse{\equal{\skills}{}}{}{
\profilesection{Skills}
\begin{flushleft}
\skills
\end{flushleft}
}
\ifthenelse{\equal{\ExtraCurricular}{}}{}{
\profilesection{Extra-Curricular}
\begin{flushleft}
\ExtraCurricular
\end{flushleft}
}
%------------------------------------------------
\end{textblock}
}
\makeatother
\newcommand{\profilepic}{example-image-duck}
%----------------------------------------------------------------------------------------
\begin{document}
%----------------------------------------------------------------------------------------
% Education
%----------------------------------------------------------------------------------------
\Education{B.E.(Hons.) Mechanical Engineering BITS Pilani | 2018 | GPA:8.947/10
\newline \newline Class XII \newline Mahathi Jr College | 2014 | 95.7 \%
\newline \newline Class X \newline Rishi Vid. Gurukulam | 2012 | 10/10} % To have no Education section, just remove all the text and leave \Education{}
%----------------------------------------------------------------------------------------
% SKILLS
%----------------------------------------------------------------------------------------
% Skill bar section, each skill must have a value between 0 an 6 (float)
\skills{Languages: C, Java, Matlab, SQL, Gosu, Basics of R and Python
\newline WebDev : HTML, CSS, JS, JQuery, php
\newline Simulation : Ansys, Comsol, Arduino
\newline Design : ProE, AutoCAD, Solidworks, Autodesk Alias, Adobe Photoshop
\newline Other: MS Office, JIRA, Git, Adobe AE
\newline Certifications: Oracle Certified Associate, Java SE 8 Programmer}
\ExtraCurricular{Co-Founder \& Head # Evolve \\ - Organized talks \& workshops
\\ - Brought the alumni of IITs, BITS, NITs and industry professionals to guide students
\newline\newline Graphic Designer # Designers Anonymous \& Dept. of Technical Arts\\- Taught design softwares to students\\- Designed content to publicize fests
\newline \newline Event Manager # BITS Embryo \\ - Organizer of conclave forums \& talks\\
- Handled logistics \& pitched speakers
\newline\newline Volunteer # NSS \& Nirmaan NGO \\ - Created jobs for rural women\\- Co-organized Cyclone relief fund-raiser, cleanliness drives \& taught rural school kids
\newline\newline Class Committee representative \& Teaching Assistant for the courses Production Techniques \& Human Resource Development}
%------------------------------------------------
% Skill text section, each skill must have a value between 0 an 6
%----------------------------------------------------------------------------------------
\makeprofile % Print the sidebar
%----------------------------------------------------------------------------------------
% INTERESTS
%----------------------------------------------------------------------------------------
%----------------------------------------------------------------------------------------
% EDUCATION
%----------------------------------------------------------------------------------------
\section{Work Experience and Internships}
\begin{twenty} % Environment for a list with descriptions
\twentyitem{Since Aug'18}{Senior Analyst}{Capgemini, Hyderabad}{Building a Rating Engine for Insurance firms using Guidewire \& Java}
\twentyitem{Jan-Jun'18}{Research Intern}{Center for AI \& Robotics, DRDO, Bengaluru}{Designed \& built a Stair Climber bot along with its basic SDK library\\ Worked on Active Noise Cancellation algorithms for Wall Climber bot}
\twentyitem{July'17}{Summer Intern}{NTPC Limited, Solapur}{Analyed \emph{Scope of Wind capacity installations} using Meteorological data and learnt about the practical challenges of commissioning them}
\twentyitem{June'17}{Summer Intern}{RINL, Visakhapatnam}{Learnt about steel making processes \& waste heat recovery systems\\ Analyzed workflow \& suggested optimization of few bottlenecks}
\twentyitem{Summer'16}{Summer Intern}{Century Rayon, Mumbai}{Investigated the "Heating Effects in Cake conditioning Rooms"\\
Studied TQM and DMAIC practices to address performance issues}
%\twentyitem{<dates>}{<title>}{<location>}{<description>}
\end{twenty}
%----------------------------------------------------------------------------------------
% PUBLICATIONS
%----------------------------------------------------------------------------------------
\section{Research and Projects}
\begin{twenty} % Environment for a short list with no descriptions
\twentyitem{May-Dec'17}{Advanced Materials for Energy Efficient Buildings} {}{Studied the application of Phase Change Materials in buildings to \\reduce cooling load \& energy usage across various geographies}
\twentyitem{Jan-May'17}{Design of Cleanroom for MEMS Fabrication}{}{Optimized the control parameters of an ISO 5 Cleanroom using DOE methods \& COMSOL5.1 (CFD) \& validated results with Hemair SI Ltd.}
\twentyitem{Jan-May'17}{Interactive Creation of Splines}{}{Formulated algorithms and built a GUI using Matlab to obtain splines and their respective blending functions for any given dynamic input}
\twentyitem{Aug-Nov'16}{Fabrication of Tabletop EDM Machine}{}{Led a team of 8 students to design \& build the product from genesis \&
achieved 20 micron erosion on AISI 1020 steel using brass electrode}
\twentyitem{Aug-Nov'15}{Critical analysis of Performance appraisal systems}{}{Led a team of 8 students to survey \& review the appraisal systems of NTPC and IBM to measure its influence on employee's work outputs}
%\twentyitemshort{<dates>}{<title/description>}
\end{twenty}
%----------------------------------------------------------------------------------------
% AWARDS
%----------------------------------------------------------------------------------------
\section{Achievements}
\begin{twenty} % Environment for a short list with no descriptions
\twentyitem{Dec'17}{Book Distribution Campaign}{}{Single handedly ran a campaign \& sold over 150+ books within 3 days}
\twentyitem{Mar'17}{First - VIKAS Soch Ideation Marathon}{}{Won 10k in cash for best social startup idea at Launchpad E-Summit}
\twentyitem{Mar'16}{Runners Up - Ground Reality}{}{Won 12k in cash for the best B-Plan submission at Pearl 2017}
\twentyitem{Feb'10}{Best All Rounder Award}{}{Ranked first across all campuses of school for all-round excellence}
\twentyitem{2010-11}{Olympiads(School Level)}{}{GOLD Medals won at NSO(2011), NCO(2010) \& IMO(2010))}
%\twentyitemshort{<dates>}{<title/description>}
\end{twenty}
%----------------------------------------------------------------------------------------
% EXPERIENCE
%----------------------------------------------------------------------------------------
\section{Electives and MOOCs}
\begin{twenty} % Environment for a list with descriptions
\twentyitem{Electives}{Reverse Engineering \& Rapid Prototyping, Renewable Energy, Quality Control Assurance \& Reliability, Project Appraisal, Public Policy,\\Principles of Management, HR Development, Fundamentals of\\Finance \& Accounting, Srimad Bhagavad Gita}{}{}
\twentyitem{MOOCs}{\textbf{Introduction to R,SQL \& Python Courses on Datacamp}\\Currently pursuing Deep Learning Specialization on Coursera}{}{}
%\twentyitem{<dates>}{<title>}{<location>}{<description>}
\end{twenty}
%----------------------------------------------------------------------------------------
% OTHER INFORMATION
%----------------------------------------------------------------------------------------
%----------------------------------------------------------------------------------------
% SECOND PAGE EXAMPLE
%----------------------------------------------------------------------------------------
%\newpage % Start a new page
%\makeprofile % Print the sidebar
%\section{Other information}
%\subsection{Review}
%Alice approaches Wonderland as an anthropologist, but maintains a strong sense of noblesse oblige that comes with her class status. She has confidence in her social position, education, and the Victorian virtue of good manners. Alice has a feeling of entitlement, particularly when comparing herself to Mabel, whom she declares has a ``poky little house," and no toys. Additionally, she flaunts her limited information base with anyone who will listen and becomes increasingly obsessed with the importance of good manners as she deals with the rude creatures of Wonderland. Alice maintains a superior attitude and behaves with solicitous indulgence toward those she believes are less privileged.
%\section{Other information}
%\subsection{Review}
%Alice approaches Wonderland as an anthropologist, but maintains a strong sense of noblesse oblige that comes with her class status. She has confidence in her social position, education, and the Victorian virtue of good manners. Alice has a feeling of entitlement, particularly when comparing herself to Mabel, whom she declares has a ``poky little house," and no toys. Additionally, she flaunts her limited information base with anyone who will listen and becomes increasingly obsessed with the importance of good manners as she deals with the rude creatures of Wonderland. Alice maintains a superior attitude and behaves with solicitous indulgence toward those she believes are less privileged.
%----------------------------------------------------------------------------------------
\end{document}

Related

laTex cv currvita

I found this template and I would like to modify two things but all I tried does not work:
I would like to have more white vertical space before it displays my name since it's too close to the top as it is now
I would like the text of experience (blablablablabla..) to be more wide and therefore to reduce left and right margins
Any ideas on how to modify this template?
Thx
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Classicthesis-Styled CV
% LaTeX Template
% Version 1.0 (22/2/13)
%
% This template has been downloaded from:
% http://www.LaTeXTemplates.com
%
% Original author:
% Alessandro Plasmati
%
% License:
% CC BY-NC-SA 3.0 (http://creativecommons.org/licenses/by-nc-sa/3.0/)
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%----------------------------------------------------------------------------------------
% PACKAGES AND OTHER DOCUMENT CONFIGURATIONS
%----------------------------------------------------------------------------------------
\documentclass{scrartcl}
\reversemarginpar % Move the margin to the left of the page
\newcommand{\MarginText}[1]{\marginpar{\raggedleft\itshape\small#1}} % New command defining the margin text style
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[english,italian]{babel}
\usepackage[nochapters]{classicthesis} % Use the classicthesis style for the style of the document
\usepackage[LabelsAligned]{currvita} % Use the currvita style for the layout of the document
\renewcommand{\cvheadingfont}{\hspace{3.5cm}\LARGE\color{Maroon}} % Font color of your name at the top
\usepackage{hyperref} % Required for adding links and customizing them
\hypersetup{colorlinks, breaklinks, urlcolor=Maroon, linkcolor=Maroon} % Set link colors
\newlength{\datebox}\settowidth{\datebox}{Spring 2011} % Set the width of the date box in each block
\newcommand{\NewEntry}[3]{\noindent\hangindent=2em\hangafter=0 \parbox{\datebox}{\small \textit{#1}}\hspace{1.5em} #2 #3 % Define a command for each new block - change spacing and font sizes here: #1 is the left margin, #2 is the italic date field and #3 is the position/employer/location field
\vspace{0.3em}} % Add some white space after each new entry
%
\newcommand{\Description}[1]{\hangindent=1em\hangafter=0\noindent\raggedright\footnotesize{#1}\par\normalsize\vspace{1em}} % Define a command for descriptions of each entry - change spacing and font sizes here
%----------------------------------------------------------------------------------------
\date{} % Don't print the date
\begin{document}
\thispagestyle{empty} % Stop the page count at the bottom of the first page
%----------------------------------------------------------------------------------------
% CONTACT INFORMATION
%----------------------------------------------------------------------------------------
\begin{cv}{\spacedallcaps{Mario Rossi}}\vspace{1.8em} % Your name
\noindent\spacedlowsmallcaps{Contact Information}
\vspace{0.1em}
\hrule
\vspace{1em}
\NewEntry{Address}{Salita del carro, L'isola che non c'è} % Address
\NewEntry{Email}{\href{mailto:name#gmail.com}{name#gmail.com}} % Email address
\NewEntry{Linkedin}{\href{http://it.linkedin.com/pub/....}{http://it.linkedin.com/...../}} % Linkedin
\NewEntry{Phone}{+39 333\ \ $\cdotp$\ \ 11111111} % Phone number
%\vspace{1em} % Extra white space between the personal information section and goal
%\noindent\spacedlowsmallcaps{Goal}\vspace{1em} % Goal heading, could be used for a quotation or short profile instead
%\Description{Gain fundamental experience in my area of interest and expertise.}\vspace{2em} % Goal text
%----------------------------------------------------------------------------------------
% EXPERIENCE
%----------------------------------------------------------------------------------------
\vspace{0.6em}% Extra space between major sections
\noindent\spacedlowsmallcaps{Experience}
\vspace{0.1em}
\hrule
\vspace{1em}
%------------------------------------------------
\NewEntry{}{ \textsc{Somewhere,\textit{ City} }}
\Description{\MarginText{July - December 2015}blablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablablabla\\ }
%----------------------------------------------------------------------------------------
\end{cv}
\end{document}
Here is how I addressed your two requirements:
Insert an invisible, vertical strut as part of the \cvheadingfont. I used \rule{0pt}{100pt}, but you can adjust (increase/decrease) the value of 100pt to move the content up/down.
Switched the document class to use the default article class, since there seems to be no need for using KOMA-script. This also allows for ease-of-use when changing the page layout/geometry using geometry. You can adjust the left and right margins to suit your needs.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Classicthesis-Styled CV
% LaTeX Template
% Version 1.0 (22/2/13)
%
% This template has been downloaded from:
% http://www.LaTeXTemplates.com
%
% Original author:
% Alessandro Plasmati
%
% License:
% CC BY-NC-SA 3.0 (http://creativecommons.org/licenses/by-nc-sa/3.0/)
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%----------------------------------------------------------------------------------------
% PACKAGES AND OTHER DOCUMENT CONFIGURATIONS
%----------------------------------------------------------------------------------------
\documentclass{article}
\reversemarginpar % Move the margin to the left of the page
\newcommand{\MarginText}[1]{\marginpar{\raggedleft\itshape\small#1}} % New command defining the margin text style
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage[english,italian]{babel}
\usepackage[nochapters]{classicthesis} % Use the classicthesis style for the style of the document
\usepackage[LabelsAligned]{currvita} % Use the currvita style for the layout of the document
\usepackage{lipsum}
\renewcommand{\cvheadingfont}{%
\rule{0pt}{100pt}%
\centering\LARGE\color{Maroon}} % Font color of your name at the top
\usepackage{hyperref} % Required for adding links and customizing them
\hypersetup{colorlinks, breaklinks, urlcolor=Maroon, linkcolor=Maroon} % Set link colors
\newlength{\datebox}\settowidth{\datebox}{Spring 2011} % Set the width of the date box in each block
\newcommand{\NewEntry}[3]{%
\noindent\hangindent=2em\hangafter=0
\parbox{\datebox}{\small \textit{#1}}\hspace{1.5em} #2 #3 % Define a command for each new block - change spacing and font sizes here:
% #1 is the left margin,
% #2 is the italic date field and
% #3 is the position/employer/location field
\vspace{0.3em}} % Add some white space after each new entry
%
\newcommand{\Description}[1]{%
\hangindent=1em\hangafter=0
\noindent\raggedright\footnotesize #1\par
\normalsize\vspace{1em}} % Define a command for descriptions of each entry - change spacing and font sizes here
\usepackage[left=100pt,right=2cm]{geometry}
%----------------------------------------------------------------------------------------
\date{} % Don't print the date
\begin{document}
\thispagestyle{empty} % Stop the page count at the bottom of the first page
%----------------------------------------------------------------------------------------
% CONTACT INFORMATION
%----------------------------------------------------------------------------------------
\begin{cv}{\spacedallcaps{Mario Rossi}}\vspace{1.8em} % Your name
\noindent\spacedlowsmallcaps{Contact Information}
\vspace{0.1em}
\hrule
\vspace{1em}
\NewEntry{Address}{Salita del carro, L'isola che non c'è} % Address
\NewEntry{Email}{\href{mailto:name#gmail.com}{name#gmail.com}} % Email address
\NewEntry{Linkedin}{\href{http://it.linkedin.com/pub/....}{http://it.linkedin.com/...../}} % Linkedin
\NewEntry{Phone}{+39 333\ \ $\cdotp$\ \ 11111111} % Phone number
%\vspace{1em} % Extra white space between the personal information section and goal
%\noindent\spacedlowsmallcaps{Goal}\vspace{1em} % Goal heading, could be used for a quotation or short profile instead
%\Description{Gain fundamental experience in my area of interest and expertise.}\vspace{2em} % Goal text
%----------------------------------------------------------------------------------------
% EXPERIENCE
%----------------------------------------------------------------------------------------
\vspace{0.6em}% Extra space between major sections
\noindent\spacedlowsmallcaps{Experience}
\vspace{0.1em}
\hrule
\vspace{1em}
%------------------------------------------------
\NewEntry{}{\textsc{Somewhere,\textit{City}}}
\Description{\MarginText{July - December 2015}\lipsum[1]}
%----------------------------------------------------------------------------------------
\end{cv}
\end{document}

R - does failed RegEx pattern matching originate in file conversion or use of tm package?

As a relative novice in R and programming, my first ever question in this forum is about regex pattern matching, specifically line breaks. First some background. I am trying to perform some preprocessing on a corpus of texts using R before processing them further on the NLP platform GATE. I convert the original pdf files to text as follows (the text files, unfortunately, go into the same folder):
dest <- "./MyFolderWithPDFfiles"
myfiles <- list.files(path = dest, pattern = "pdf", full.names = TRUE)
lapply(myfiles, function(i) system(paste('"C:/Program Files (x86)/xpdfbin-win-3.04/bin64/pdftotext.exe"', paste0('"', i, '"')), wait = FALSE))
Then, having loaded the tm package and physically(!) moved the text files to another folder, I create a corpus:
TextFiles <- "./MyFolderWithTXTfiles"
EU <- Corpus(DirSource(TextFiles))
I then want to perform a series of custom transformations to clean the texts. I succeeded to replace a simple string as follows:
ReplaceText <- content_transformer(function(x, from, to) gsub(from, to, x, perl=T))
EU2 <- tm_map(EU, ReplaceText, "Table of contents", "TOC")
However, a pattern that is a 1-3 digit page number followed by two line breaks and a page break is causing me problems. I want to replace it with a blank space:
EU2 <- tm_map(EU, ReplaceText, "[0-9]{1,3}\n\n\f", " ")
The ([0-9]{1,3}) and \f alone match. The line breaks don't. If I copy text from one of the original .txt files into the RegExr online tool and test the expression "[0-9]{1,3}\n\n\f", it matches. So the line breaks do exist in the original .txt file.
But when I view one of the .txt files as read into the EU corpus in R, there appear to be no line breaks even though the lines are obviously breaking before the margin, e.g.
[3] "PROGRESS TOWARDS ACCESSION"
[4] "1"
[5] ""
[6] "\fTable of contents"
Seeing this, I tried other patterns, e.g. to detect one or more blank space ("[0-9]{1,3}\s*\f"), but no patterns worked.
So my questions are:
Am I converting and reading the files into R correctly? If so, what has happened to the line breaks?
If no line breaks is normal, how can I pattern match the character on line 5? Is that not a blank
space?
(A tangential concern:) When converting the pdf files, is there code that will put them directly in a new folder?
Apologies for extending this, but how can one print or inspect only a few lines of the text object? The tm commands and head(EU) print the entire object, each a very long text.
I know my problem(s) must appear simple and perhaps stupid, but one has to start somewhere and extensive searching has not revealed a source that explains comprehensively how to use RegExes to modify text objects in R. I am so frustrated and hope someone here will take pity and can help me.
Thanks for any advice you can offer.
Brigitte
p.s. I think it's not possible to upload attachments in this forum, therefore, here is a link to one of the original PDF documents: http://ec.europa.eu/enlargement/archives/pdf/key_documents/1998/czech_en.pdf
Because the doc is long, I created a snippet of the first 3 pages of the TXT doc, read it into the R corpus ('EU') and printed it to the console and this is it:
dput(EU[[2]])
structure(list(content = c("REGULAR REPORT", "FROM THE COMMISSION ON",
"CZECH REPUBLIC'S", "PROGRESS TOWARDS ACCESSION ***********************",
"1", "", "\fTable of contents", "A. Introduction", "a) Preface The Context of the Progress Report",
"b) Relations between the European Union and the Czech Republic The enhanced Pre-Accession Strategy Recent developments in bilateral relations",
"B. Criteria for membership", "1. Political criteria", "1.1. Democracy and the Rule of Law Parliament The Executive The judicial system Anti-Corruption measures",
"1.2. Human Rights and the Protection of Minorities Civil and Political Rights Economic, Social and Cultural Rights Minority Rights and the Protection of Minorities",
"1.3. General evaluation", "2. Economic criteria", "2.1. Introduction 2.2. Economic developments since the Commission published its Opinion",
"Macroeconomic developments Structural reforms 2.3. Assessment in terms of the Copenhagen criteria The existence of a functioning market economy The capacity to cope with competitive pressure and market forces 2.4. General evaluation",
"3. Ability to assume the obligations of Membership", "3.1. Internal Market without frontiers General framework The Four Freedoms Competition",
"3.2. Innovation Information Society Education, Training and Youth Research and Technological Development Telecommunications Audio-visual",
"3.3. Economic and Fiscal Affairs Economic and Monetary Union",
"2", "", "\fTaxation Statistics "), meta = structure(list(author = character(0),
datetimestamp = structure(list(sec = 50.1142621040344, min = 33L,
hour = 15L, mday = 3L, mon = 10L, year = 114L, wday = 1L,
yday = 306L, isdst = 0L), .Names = c("sec", "min", "hour",
"mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXlt",
"POSIXt"), tzone = "GMT"), description = character(0), heading = character(0),
id = "CZ1998ProgressSnippet.txt", language = "en", origin = character(0)), .Names = c("author",
"datetimestamp", "description", "heading", "id", "language",
"origin"), class = "TextDocumentMeta")), .Names = c("content",
"meta"), class = c("PlainTextDocument", "TextDocument"))
Yes, working with text in R is not always a smooth experience! But you can get a lot done quickly with some effort (maybe too much effort!)
If you could share one of your PDF files or the output of dput(EU), that might help to identify exactly how to capture your page numbers with regex. That would also add a reproducible example to your question, which is an important thing to have in questions here so that people can test their answers and make sure they work for your specific problem.
No need to put PDF and text files in separate folders, instead you can use a pattern like so:
EU <- Corpus(DirSource(pattern = ".txt"))
This will only read the text files and ignore the PDF files
There is no 'snippet view' method in tm, which is annoying. I often use just names(EU) and EU[[1]] for quick looks
UPDATE
With the data you've just added, I'd suggest a slightly tangential approach. Do the regex work before passing the data to the tm package formats, like so:
# get the PDF
download.file("http://ec.europa.eu/enlargement/archives/pdf/key_documents/1998/czech_en.pdf", "my_pdf.pdf", method = "wget")
# get the file name of the PDF
myfiles <- list.files(path = getwd(), pattern = "pdf", full.names = TRUE)
# convert to text (not my pdftotext is in a different location to you)
lapply(myfiles, function(i) system(paste('"C:/Program Files/xpdf/bin64/pdftotext.exe"', paste0('"', i, '"')), wait = FALSE))
# read plain text int R
x1 <- readLines("my_pdf.txt")
# make into a single string
x2 <- paste(x1, collapse = " ")
# do some regex...
x3 <- gsub("Table of contents", "TOC", x2)
x4 <- gsub("[0-9]{1,3} \f", "", x3)
# convert to corpus for text mining operations
x5 <- Corpus(VectorSource(x4))
With the snippet of data your provided using dput, the output from this method is
inspect(x5)
<<VCorpus (documents: 1, metadata (corpus/indexed): 0/0)>>
[[1]]
<<PlainTextDocument (metadata: 7)>>
REGULAR REPORT FROM THE COMMISSION ON CZECH REPUBLIC'S PROGRESS TOWARDS ACCESSION *********************** TOC A. Introduction a) Preface The Context of the Progress Report b) Relations between the European Union and the Czech Republic The enhanced Pre-Accession Strategy Recent developments in bilateral relations B. Criteria for membership 1. Political criteria 1.1. Democracy and the Rule of Law Parliament The Executive The judicial system Anti-Corruption measures 1.2. Human Rights and the Protection of Minorities Civil and Political Rights Economic, Social and Cultural Rights Minority Rights and the Protection of Minorities 1.3. General evaluation 2. Economic criteria 2.1. Introduction 2.2. Economic developments since the Commission published its Opinion Macroeconomic developments Structural reforms 2.3. Assessment in terms of the Copenhagen criteria The existence of a functioning market economy The capacity to cope with competitive pressure and market forces 2.4. General evaluation 3. Ability to assume the obligations of Membership 3.1. Internal Market without frontiers General framework The Four Freedoms Competition 3.2. Innovation Information Society Education, Training and Youth Research and Technological Development Telecommunications Audio-visual 3.3. Economic and Fiscal Affairs Economic and Monetary Union Taxation Statistics

LibreCalc Search and Replace, Search for [] and replace it, along with its contents

I am compiling a list of video games.
At this time, I am currently using Wikipedia to do so.
As I copied ps3 games over to LibreCalc, the copied titles of the video games include citation brackets at the end of the line. Rather than remove this line by, I am trying to search and replace the brackets and their contents.
I continue to fail in this endeavor. An example below,
Rune Factory: Tides of Destiny[629]
Fight Night Champion[268]
Dragon Age II[209]
Major League Baseball 2K11[427]
MLB 11: The Show[459]
Warriors: Legends of Troy[817]
Dynasty Warriors 7[222]
Homefront[334]
Top Spin 4[773]
MotorStorm: Apocalypse[474]
Crysis 2[164]
Lego Star Wars III: The Clone Wars
The Tomb Raider Trilogy[765]
NASCAR 2011: The Game[488]
Shift 2: Unleashed[650]
Tiger Woods PGA Tour 12: The Masters[746]
WWE All Stars[839]
Michael Jackson: The Experience[448]
Rio[614]
Mortal Kombat[469]
Portal 2[563]
SOCOM 4: U.S. Navy SEALs[20]
AFL Live[16]
Operation Flashpoint: Red River[542]
Man vs. Wild[430]
Sniper: Ghost Warrior[679]
El Shaddai: Ascension of the Metatron[233]
Virtua Tennis 4[808]
Thor: God of Thunder[740]
MX vs. ATV Alive[478]
Brink[116]
Lego Pirates of the Caribbean: The Video Game[391]
Battle vs. Chess[82]
L.A. Noire[379]
Dirt 3[196]
Kung Fu Panda 2[377]
Hunted: The Demon's Forge[336]
Infamous 2[345]
Red Faction: Armageddon[599]
Yakuza: Dead Souls[849]
Duke Nukem Forever[217]
Alice: Madness Returns[29]
Child of Eden[146]
Transformers: Dark of the Moon[777]
Dungeon Siege III[218]
Cars 2: The Video Game[138]
F.E.A.R. 3[247]
Shadows of the Damned[647]
Atelier Meruru: The Apprentice of Arland[67]
Bleach: Soul Resurrección[108]
Angel Love Online[38]
Angel Senki
Air Conflicts: Secret Wars[24]
Harry Potter and the Deathly Hallows: Part II[322]
NCAA Football 12[511]
Captain America: Super Soldier[137]
Call of Juarez: The Cartel[136]
Phineas and Ferb: Across the 2nd Dimension[558]
Hyperdimension Neptunia Mk2[338]
Deus Ex: Human Revolution[191]
Bodycount[111]
Madden NFL 12[415]
Driver: San Francisco[216]
Dead Island[175]
Resistance 3[609]
Warhammer 40000: Space Marine[815]
NHL 12[526]
Tales of Xillia[718]
God of War: Origins Collection[298]
Tom Clancy's Splinter Cell Classic Trilogy HD[762]
Supremacy MMA[712]
Dark Souls[169]
Ico & Shadow of the Colossus Collection[340]
FIFA 12[263]
PES 2012: Pro Evolution Soccer[557]
Dynasty Warriors 7: Xtreme Legends[223]
Ra.One: The Game[584]
Crysis[163]
Rage[586]
Spider-Man: Edge of Time[692]
NBA 2K12[498]
The Cursed Crusade[733]
Ace Combat: Assault Horizon[12]
Skylanders: Spyro's Adventure[675]
Batman: Arkham City[79]
Ratchet & Clank: All 4 One[591]
Rocksmith[627]
The Sims 3: Pets[658]
The Adventures of Tintin: The Secret of the Unicorn[14]
Back to the Future: The Game[71]
Battlefield 3[83]
Dragon Ball Z: Ultimate Tenkaichi[212]
Puss in Boots[581]
The Idolmaster 2[736]
Uncharted 3: Drake's Deception[795]
GoldenEye 007: Reloaded[301]
The Lord of the Rings: War in the North[401]
Sonic Generations[683]
Call of Duty: Modern Warfare 3[131]
Metal Gear Solid HD Collection[445]
The Elder Scrolls V: Skyrim[236]
Lego Harry Potter: Years 5–7[388]
Assassin's Creed: Revelations[63]
Jurassic Park: The Game[358]
Cartoon Network: Punch Time Explosion XL[141]
Need for Speed: The Run[516]
Saints Row: The Third[633]
Apache: Air Assault[42]
After Hours Athletes[21]
Ni no Kuni[531]
WWE '12[838]
The King of Fighters XIII[371]
Just Dance 3[361]
Order Up![543]
Final Fantasy XIII-2[273]
Zack Zero[853]
Armored Core V[52]
NeverDead[520]
Soulcalibur V[689]
Kingdoms of Amalur: Reckoning[374]
The Darkness II[735]
Grand Slam Tennis 2[306]
Twisted Metal[787]
UFC Undisputed 3[792]
Binary Domain[95]
Asura's Wrath[64]
Syndicate[714]
Gal*Gun[288]
Naruto Shippuden: Ultimate Ninja Storm Generations[485]
SSX[698]
One Piece: Pirate Warriors[539][540]
Blades of Time[101]
Major League Baseball 2K12[428]
Mass Effect 3[436]
MLB 12: The Show[460]
Street Fighter X Tekken[706]
Top Gun: Hard Lock[771]
Mobile Suit Gundam Unicorn[465]
FIFA Street[265]
Silent Hill: Downpour[653]
Silent Hill HD Collection[654]
Ninja Gaiden 3[535]
Resident Evil: Operation Raccoon City[605]
Ridge Racer Unbounded[613]
Battleship[87]
Prototype 2[577]
Max Payne 3[438]
Dragon's Dogma[214]
Tom Clancy's Ghost Recon: Future Soldier[756]
Dirt: Showdown[197]
Inversion[350]
Tokyo Jungle[753]
Part of my problem seems to be that brackets are characters used in regular expressions.
Can some one assist me, or toss me in the right direction to solving this problem.
You can escape the brackets with a backslash so they are treated as regular characters. On that base, you could use the following regex to match all square brackets containing only digits:
\[[:digit:]*\]
When leaving the Replace with box empty, a search/replace run should remove all footnote marks in your example.
Since only the opening bracket is a special character for LO Calc, the following should work, too:
\[[:digit:]*]

Identify subsequent event windows (or occurrences) for each individual

This question is in the context of twoway line with the by() option, but I think the bigger problem is how to identify a second (and all subsequent) event windows without a priori knowing every event window.
Below I generate some data with five countries over the 1990s and 2000s. In all countries an event occurs in 1995 and in Canada only the event repeats in 2005. I would like to plot outcome over the five years centered on each event in each country. If I do this using twoway line and by(), then Canada plots twice in the same plot window.
clear
set obs 100
generate year = 1990 + mod(_n, 20)
generate country = "United Kingdom" in 1/20
replace country = "United States" in 21/40
replace country = "Canada" in 41/60
replace country = "Australia" in 61/80
replace country = "New Zealand" in 81/100
generate event = (year == 1995) ///
| ((year == 2005) & (country == "Canada"))
generate time_to_event = 0 if (event == 1)
generate outcome = runiform()
encode country, generate(countryn)
xtset countryn year
forvalue i = 1/2 {
replace time_to_event = `i' if (l`i'.event == 1)
replace time_to_event = -`i' if (f`i'.event == 1)
}
twoway line outcome time_to_event, ///
by(country) name(orig, replace)
A manual solution adds an occurrence variable that numbers each event occurrence by country, then adds occurrence to the by() option.
generate occurrence = 1 if !missing(time_to_event)
replace occurrence = 2 if ///
(inrange(year, 2005 - 2, 2005 + 2) & (country == "Canada"))
twoway line outcome time_to_event, ///
by(country occurrence) name(attempt, replace)
This works great in the play data, but in my real data there are many more countries and many more events. I can manually code this occurrence variable, but that is tedious (and now I'm really curious if there's a tool or logic that works :) ).
Is there a logic to automate identifying windows? Or one that at least works with twoway line? Thanks!
You have generated a variable time_to_event which is -2 .. 2 in a window and missing otherwise. You can use tsspell from SSC, installed by
ssc inst tsspell
to label such windows. Windows are defined by spells or runs of observations all non-missing on that time_to_event:
tsspell, cond(time_to_event < .)
tsspell requires a prior tsset and generates three variables explained in its help. You can then renumber windows by using one of those variables _seq (sequence number within spell, numbered 1 up)
gen _spell2 = (_seq > 0) * sum(_seq == 1)
and then label spells distinctly by using country and the spell identifier for each spell from _spell, another variable produced by tsspell:
egen gspell = group(country _spell) if _spell2, label
My code assumes that windows are disjoint and cannot overlap, but that seems to be one of your assumptions too. Some technique for handling spells is given by http://www.stata-journal.com/sjpdf.html?articlenum=dm0029 That article does not mention tsspell, which in essence is an implementation of its principles. I started explaining the principles, but the article got long enough before I could explain the program. As the help of tsspell is quite detailed, I doubt that a sequel paper is needed, or at least that it will be written.
(LATER) This code also assumes that windows don't touch. Solving that problem suggests a more direct approach not involving tsspell at all:
bysort country (year) : gen w_id = (time_to_event < .) * sum(time_to_event == -2)
egen w_label = group(country w_id) if w_id, label

Help: Extracting data tuples from text... Regex or Machine learning?

I would really appreciate your thoughts on the best approach to the following problem. I am using a Car Classified listing example which is similar in nature to give an idea.
Problem: Extract a data tuple from the given text.
Here are some characteristics of the data.
The vocabulary (words) in the text is limited to a specific domain. Lets assume 100-200 words at the most.
Text that needs to be parsed is a headline like a Car Ad data shown below. So each record corresponds to one tuple (row).
In some cases some of the attributes may be missing. So for example, in raw data row #5 below the year is missing.
Some words go together (bigrams). Like "Low miles".
Historical data available = 10,000 records
Incoming New Data volume = 1000-1500 records / week
The expected output should be in the form of (Year,Make,Model, feature). So the output should look like
1 -> (2009, Ford, Fusion, SE)
2 -> (1997, Ford, Taurus, Wagon)
3 -> (2000, Mitsubishi, Mirage, DE)
4 -> (2007, Ford, Expedition, EL Limited)
5 -> ( , Honda, Accord, EX)
....
....
Raw Headline Data:
1 -> 2009 Ford Fusion SE - $7000
2 -> 1997 Ford Taurus Wagon - $800 (san jose east)
3 -> '00 Mitsubishi Mirage DE - $2499 (saratoga) pic
4 -> 2007 Ford Expedition EL Limited - $7800 (x)
5 -> Honda Accord ex low miles - $2800 (dublin / pleasanton / livermore) pic
6 -> 2004 HONDA ODASSEY LX 68K MILES - $10800 (danville / san ramon)
7 -> 93 LINCOLN MARK - $2000 (oakland east) pic
8 -> #######2006 LEXUS GS 430 BLACK ON BLACK 114KMI ####### - $19700 (san rafael) pic
9 -> 2004 Audi A4 1.8T FWD - $8900 (Sacramento) pic
10 -> #######2003 GMC C2500 HD EX-CAB 6.0 V8 EFI WHITE 4X4 ####### - $10575 (san rafael) pic
11 -> 1990 Toyota Corolla RUNS GOOD! GAS SAVER! 5SPEED CLEAN! REG 2011 O.B.O - $1600 (hayward / castro valley) pic img
12 -> HONDA ACCORD EX 2000 - $4900 (dublin / pleasanton / livermore) pic
13 -> 2009 Chevy Silverado LT Crew Cab - $23900 (dublin / pleasanton / livermore) pic
14 -> 2010 Acura TSX - V6 - TECH - $29900 (dublin / pleasanton / livermore) pic
15 -> 2003 Nissan Altima - $1830 (SF) pic
Possible choices:
A machine learning Text Classifier (Naive Bayes etc)
Regex
What I am trying to figure out is if RegEx is too complicated for the job and a Text classifier is an overkill?
If the choice is to go with a text classifier then what would you consider to be the easiest to implement.
Thanks in advance for your kind help.
This is a well studied problem called information extraction. It is not straight forward to do what you want to do, and it is not as simple as you make it sound (ie machine learning is not an overkill). There are several techniques, you should read an overview of the research area.
Check this IE library for writing extraction rule< I think it will work best for you problem.
There also example how to create fast dictionary matching.
I think that the ARX or Phoebus systems may suit your needs if you already have annotated data and a list of words associated to each field. Their approach is a mix of information extraction and information integration.
There are a few good entity recognition libraries. Have you taken a look at Apache opennlp?
As a user looking for a specific model of car the task is easier. I'm pretty sure I could classify, say, most Ford Rangers since I know what to look for with regexp.
I think your best bet is to write a function for each car model with type String -> Maybe Tuple. Then run all these on each input and throw away those inputs resulting in zero or too many tuples.
You should use a tool like Amazon Mechanical Turk for this. Human microtasking. Another alternative is to use a data entry freelancer. upWork is a great place to look. You can get excellent quality results and the cost is very reasonable for each.