How to make sure the words being read in from the file are how I want them to be C++ - c++

If I had to read in a word from a document (one word at a time), and then pass that word into a function until I reach the end of the file, how would I do this?
What also must be kept in mind is that a word is any consecutive string of letters and the apostrophe ( so can't or rojas' is one word). Something like bad-day should be two separate words, and something like to-be-husband should be 3 separate words. I also need to ignore periods ., semi-colons ;, and pretty much anything that isn't part of a word. I have been reading it in using file >> s; and then removing stuff from the string but it has gotten very complicated. Is there a way to store into s only alphabet characters+apostrophes and stop at the end of a word (when a space occurs)?
while (!file.eof()) {
string s;
file >> s; //this is how I am currently reading it it
passToFunction(s);
}

Yes, there is a way: simply write the code to do it. Read one character at a time, and collect the characters in the string, until you gets a non-alphabetic, non-apostrophe character. You've now read one word. Wait until you read the next character that's a letter or an apostrophe, and then you take it from the top.
One other thing:
while (!file.eof())
This is always a bug, and a wrong thing to do. Just thought I'd mention this. I suppose that fixing this is going to be your first order of business, before writing the rest of your code.

OnlyLetterNumAndApp facet for a stream
#include <locale>
#include <string>
#include <fstream>
#include <iostream>
// This facet treats letters/numbers and apostrophe as alpha
// Everything else is treated like a space.
//
// This makes reading words with operator>> very easy to sue
// when you want to ignore all the other characters.
class OnlyLetterNumAndApp: public std::ctype<char>
{
public:
typedef std::ctype<char> base;
typedef base::char_type char_type;
OnlyLetterNumAndApp(std::locale const& l)
: base(table)
{
std::ctype<char> const& defaultCType = std::use_facet<std::ctype<char> >(l);
for(int loop = 0;loop < 256;++loop) {
table[loop] = (defaultCType.is(base::alnum, loop) || loop == '\'')
? base::alpha
: base::space;
}
}
private:
base::mask table[256];
};
Usage
int main()
{
std::ifstream file;
file.imbue(std::locale(std::locale(), new OnlyLetterNumAndApp(std::locale())));
file.open("test.txt");
std::string word;
while(file >> word) {
std::cout << word << "\n";
}
}
Test File
> cat test.txt
This is %%% a test djkhfdkjfd
try another $gh line's
bad-people.Do bad things
Result
> ./a.out
This
is
a
test
djkhfdkjfd
try
another
gh
line's
bad
people
Do
bad
things

Related

C++ Im trying to stream a file, and replace the first letter of every line streamed. It doesn't seem to be working as expected

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <iomanip>
void add1(std::fstream& files)
{
char c;
int i=0;
int j=0;
int k=0;
int con=0;
string word;
while(files.get(c)&&!files.eof())
{
i++;
j++;
if(c=='\n'||(con>=1&&isspace(c)))
{
con++;
if(con>=2)
{
break;
}
else
{
cout<<j<<"\/"<<i<<endl;
files.seekp(i-j,files.beg);
files.write("h",1);
files.seekg(i);
*seekg ends the loops I tried fstream::clear. I think it would work perfect if seekg worked.
+ without seekg it works but only for 3 lines then its off.
j=0;
word="";
}
}
else
{
con=0;
word=word+c;
}
}
}
*The goal is to be able stream the file, and replace the first letter of every line in the file while streaming.*
You seam to have a logical error and make thinks overcomplicated.
I do not knwow, what you want to do with your variable "word". It is consumed nowhere. So, I will ignore it.
Then you are playing with read and write pointers. That is not necessary. You only need to manipulate the write pointer.
Then, you want to "stream" something. This I do not fully understand. Maybe it means, that you want to write always something to the stream, even, if you do not replace anything. This would in my understanding only make sense, if you would have 2 streams. But in that case it would be brutally simple and no further thinking necessary.
If we use the same stream and do not want to replace a character, then this is already there, existing, and maybe not overwritten by the same character again.
So, if there is nothing to replace, then we will write nothing . . .
Also, and that is very important, we do no replacement operation, if we have an empty line, because then there is nothing to replace. There is now first character in an empty line.
And, most important, we cannot add characters to the same fstream. In that case we would have to shift the rest of the file one to the right. Therefore. 2 streams are always better. Then, this problem would not occur.
So, what's the logic.
Algorithm:
We always look at the previuosly read character. If that was a '\n' and the current character is not, then we are now in a new line and can replace the first character.
That is all.
It will take also into account, if a '\n' is encoded with 2 characters (for example \r\n). It will always work.
And, it is easy to implement. 10 lines of code.
Please see:
#include <iostream>
#include <fstream>
#include <string>
constexpr char ReplacementCharacter{ 'h' };
void replaceFirstCharacterOfLine(std::fstream& fileStream) {
// Here we stor the previously read character. In the beginning, a file always starts
// with a newline. Therefore we pretend that the last read character is a newline
char previouslyReadCharacter{'\n'};
// Here we store the current read character
char currentCharacter{};
// Get characters from file as lon as there are characters, so, until eof
while (fileStream.get(currentCharacter)) {
// No check, if a new line has started. We ignore empty lines!
if ((previouslyReadCharacter == '\n') && (currentCharacter != '\n')) {
// So last charcter was a newline and this is different. So, we are in a new, none empty line
// Set replacement character
currentCharacter = ReplacementCharacter;
// Go one back with the write pointer
fileStream.seekp(-1, std::ios_base::cur);
// Write (an with taht increment file pointer again)
fileStream.put(currentCharacter);
// Write to file
fileStream.flush();
}
else {
// Do not replace the first charcater. So nothing to be done here
}
// Now, set the previouslyReadCharacter to the just read currentCharacter
previouslyReadCharacter = currentCharacter;
}
}
int main() {
const std::string filename{"r:\\replace.txt"};
// Open file
std::fstream fileStream{ filename };
// Check, if file could be opened
if (fileStream)
replaceFirstCharacterOfLine(fileStream);
else
std::cerr << "\n\n*** Error: Could not open file '" << filename << "'\n\n";
return 0;
}

How do I make an alphabetized list of all distinct words in a file with the number of times each word was used?

I am writing a program using Microsoft Visual C++. In the program I must read in a text file and print out an alphabetized list of all distinct words in that file with the number of times each word was used.
I have looked up different ways to alphabetize a string but they do not work with the way I have my string initialized.
// What is inside my text file
Any experienced programmer engaged in writing programs for use by others knows
that, once his program is working correctly, good output is a must. Few people
really care how much time and trouble a programmer has spent in designing and
debugging a program. Most people see only the results. Often, by the time a
programmer has finished tackling a difficult problem, any output may look
great. The programmer knows what it means and how to interpret it. However,
the same cannot be said for others, or even for the programmer himself six
months hence.
string lines;
getline(input, lines); // Stores what is in file into the string
I expect an alphabetized list of words with the number of times each word was used. So far, I do not know how to begin this process.
It's rather simple, std::map automatically sorts based on key in the key/value pair you get. The key/value pair represents word/count which is what you need. You need to do some filtering for special characters and such.
EDIT: std::stringstream is a nice way of splitting std::string using whitespace delimiter as it's the default delimiter. Therefore, using stream >> word you will get whitespace-separated words. However, this might not be enough due to punctuation. For example: Often, has comma which we need to filter out. Therefore, I used std::replaceif which replaces puncts and digits with whitespaces.
Now a new problem arises. In your example, you have: "must.Few" which will be returned as one word. After replacing . with we have "must Few". So I'm using another stringstream on the filtered "word" to make sure I have only words in the final result.
In the second loop you will notice if(word == "") continue;, this can happen if the string is not trimmed. If you look at the code you will find out that we aren't trimming after replacing puncts and digits. That is, "Often," will be "Often " with trailing whitespace. The trailing whitespace causes the second loop to extract an empty word. This is why I added the condition to ignore it. You can trim the filtered result and then you wouldn't need this check.
Finally, I have added ignorecase boolean to check if you wish to ignore the case of the word or not. If you wish to do so, the program will simply convert the word to lowercase and then add it to the map. Otherwise, it will add the word the same way it found it. By default, ignorecase = true, if you wish to consider case, just call the function differently: count_words(input, false);.
Edit 2: In case you're wondering, the statement counts[word] will automatically create key/value pair in the std::map IF there isn't any key matching word. So when we call ++: if the word isn't in the map, it will create the pair, and increment value by 1 so you will have newly added word. If it exists already in the map, this will increment the existing value by 1 and hence it acts as a counter.
The program:
#include <iostream>
#include <map>
#include <sstream>
#include <cstring>
#include <cctype>
#include <string>
#include <iomanip>
#include <algorithm>
std::string to_lower(const std::string& str) {
std::string ret;
for (char c : str)
ret.push_back(tolower(c));
return ret;
}
std::map<std::string, size_t> count_words(const std::string& str, bool ignorecase = true) {
std::map<std::string, size_t> counts;
std::stringstream stream(str);
while (stream.good()) {
// wordW may have multiple words connected by special chars/digits
std::string wordW;
stream >> wordW;
// filter special chars and digits
std::replace_if(wordW.begin(), wordW.end(),
[](const char& c) { return std::ispunct(c) || std::isdigit(c); }, ' ');
// now wordW may have multiple words seperated by whitespaces, extract them
std::stringstream word_stream(wordW);
while (word_stream.good()) {
std::string word;
word_stream >> word;
// ignore empty words
if (word == "") continue;
// add to count.
ignorecase ? counts[to_lower(word)]++ : counts[word]++;
}
}
return counts;
}
void print_counts(const std::map<std::string, size_t>& counts) {
for (auto pair : counts)
std::cout << std::setw(15) << pair.first << " : " << pair.second << std::endl;
}
int main() {
std::string input = "Any experienced programmer engaged in writing programs for use by others knows \
that, once his program is working correctly, good output is a must.Few people \
really care how much time and trouble a programmer has spent in designing and \
debugging a program.Most people see only the results.Often, by the time a \
programmer has finished tackling a difficult problem, any output may look \
great.The programmer knows what it means and how to interpret it.However, \
the same cannot be said for others, or even for the programmer himself six \
months hence.";
auto counts = count_words(input);
print_counts(counts);
return 0;
}
I have tested this with Visual Studio 2017 and here is the part of the output:
a : 5
and : 3
any : 2
be : 1
by : 2
cannot : 1
care : 1
correctly : 1
debugging : 1
designing : 1
As others have already noted, an std::map handles the counting you care about quite easily.
Iostreams already have a tokenize to break an input stream up into words. In this case, we want to to only "think" of letters as characters that can make up words though. A stream uses a locale to make that sort of decision, so to change how it's done, we need to define a locale that classifies characters as we see fit.
struct alpha_only: std::ctype<char> {
alpha_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
// everything is white space
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
// except lower- and upper-case letters, which are classified accordingly:
std::fill(&rc['a'], &rc['z'], std::ctype_base::lower);
std::fill(&rc['A'], &rc['Z'], std::ctype_base::upper);
return &rc[0];
}
};
With that in place, we tell the stream to use our ctype facet, then simply read words from the file and count them in the map:
std::cin.imbue(std::locale(std::locale(), new alpha_only));
std::map<std::string, std::size_t> counts;
std::string word;
while (std::cin >> word)
++counts[to_lower(word)];
...and when we're done with that, we can print out the results:
for (auto w : counts)
std::cout << w.first << ": " << w.second << "\n";
Id probably start by inserting all of those words into an array of strings, then start with the first index of the array and compare that with all of the other indexes if you find matches, add 1 to a counter and after you went through the array you could display the word you were searching for and how many matches there were and then go onto the next element and compare that with all of the other elements in the array and display etc. Or maybe if you wanna make a parallel array of integers that holds the number of matches you could do all the comparisons at one time and the displays at one time.
EDIT:
Everyone's answer seems more elegant because of the map's inherent sorting. My answer functions more as a parser, that later sorts the tokens. Therefore my answer is only useful to the extent of a tokenizer or lexer, whereas Everyone's answer is only good for sorted data.
You first probably want to read in the text file. You want to use a streambuf iterator to read in the file(found here).
You will now have a string called content, which is the content of you file. Next you will want to iterate, or loop, over the contents of this string. To do that you'll want to use an iterator. There should be a string outside of the loop that stores the current word. You will iterate over the content string, and each time you hit a letter character, you will add that character to your current word string. Then, once you hit a space character, you will take that current word string, and push it back into the wordString vector. (Note: that means that this will ignore non-letter characters, and that only spaces denote word separation.)
Now that we have a vector of all of our words in strings, we can use std::sort, to sort the vector in alphabetical order.(Note: capitalized words take precedence over lowercase words, and therefore will be sorted first.) Then we will iterate over our vector of stringWords and convert them into Word objects (this is a little heavy-weight), that will store their appearances and the word string. We will push these Word objects into a Word vector, but if we discover a repeat word string, instead of adding it into the Word vector, we'll grab the previous entry and increment its appearance count.
Finally, once this is all done, we can iterate over our Word object vector and output the word followed by its appearances.
Full Code:
#include <vector>
#include <fstream>
#include <iostream>
#include <streambuf>
#include <algorithm>
#include <string>
class Word //define word object
{
public:
Word(){appearances = 1;}
~Word(){}
int appearances;
std::string mWord;
};
bool isLetter(const char x)
{
return((x >= 'a' && x <= 'z') || (x >= 'A' && x <= 'Z'));
}
int main()
{
std::string srcFile = "myTextFile.txt"; //what file are we reading
std::ifstream ifs(srcFile);
std::string content( (std::istreambuf_iterator<char>(ifs) ),
( std::istreambuf_iterator<char>() )); //read in the file
std::vector<std::string> wordStringV; //create a vector of word strings
std::string current = ""; //define our current word
for(auto it = content.begin(); it != content.end(); ++it) //iterate over our input
{
const char currentChar = *it; //make life easier
if(currentChar == ' ')
{
wordStringV.push_back(current);
current = "";
continue;
}
else if(isLetter(currentChar))
{
current += *it;
}
}
std::sort(wordStringV.begin(), wordStringV.end(), std::less<std::string>());
std::vector<Word> wordVector;
for(auto it = wordStringV.begin(); it != wordStringV.end(); ++it) //iterate over wordString vector
{
std::vector<Word>::iterator wordIt;
//see if the current word string has appeared before...
for(wordIt = wordVector.begin(); wordIt != wordVector.end(); ++wordIt)
{
if((*wordIt).mWord == *it)
break;
}
if(wordIt == wordVector.end()) //...if not create a new Word obj
{
Word theWord;
theWord.mWord = *it;
wordVector.push_back(theWord);
}
else //...otherwise increment the appearances.
{
++((*wordIt).appearances);
}
}
//print the words out
for(auto it = wordVector.begin(); it != wordVector.end(); ++it)
{
Word theWord = *it;
std::cout << theWord.mWord << " " << theWord.appearances << "\n";
}
return 0;
}
Side Notes
Compiled with g++ version 4.2.1 with target x86_64-apple-darwin, using the compiler flag -std=c++11.
If you don't like iterators you can instead do
for(int i = 0; i < v.size(); ++i)
{
char currentChar = vector[i];
}
It's important to note that if you are capitalization agnostic simply use std::tolower on the current += *it; statement (ie: current += std::tolower(*it);).
Also, you seem like a beginner and this answer might have been too heavyweight, but you're asking for a basic parser and that is no easy task. I recommend starting by parsing simpler strings like math equations. Maybe make a calculator app.

How to read a complex input with istream&, string& and getline in c++?

I am very new to C++, so I apologize if this isn't a good question but I really need help in understanding how to use istream.
There is a project I have to create where it takes several amounts of input that can be on one line or multiple and then pass it to a vector (this is only part of the project and I would like to try the rest on my own), for example if I were to input this...
>> aaa bb
>> ccccc
>> ddd fff eeeee
Makes a vector of strings with "aaa", "bb", "ccccc", "ddd", "fff", "eeeee"
The input can be a char or string and the program stops asking for input when the return key is hit.
I know getline() gets a line of input and I could probably use a while loop to try and get the input such as...(correct me if I'm wrong)
while(!string.empty())
getline(cin, string);
However, I don't truly understand istream and it doesn't help that my class has not gone over pointers so I don't know how to use istream& or string& and pass it into a vector. On the project description, it said to NOT use stringstream but use functionality from getline(istream&, string&). Can anyone give somewhat of a detailed explanation as to how to make a function using getline(istream&, string&) and then how to use it in the main function?
Any little bit helps!
You're on the right way already; solely, you'd have to pre-fill the string with some dummy to enter the while loop at all. More elegant:
std::string line;
do
{
std::getline(std::cin, line);
}
while(!line.empty());
This should already do the trick reading line by line (but possibly multiple words on one line!) and exiting, if the user enters an empty line (be aware that whitespace followed by newline won't be recognised as such!).
However, if anything on the stream goes wrong, you'll be trapped in an endless loop processing previous input again and again. So best check the stream state as well:
if(!std::getline(std::cin, line))
{
// this is some sample error handling - do whatever you consider appropriate...
std::cerr << "error reading from console" << std::endl;
return -1;
}
As there might be multiple words on a single line, you'd yet have to split them. There are several ways to do so, quite an easy one is using an std::istringstream – you'll discover that it ressembles to what you likely are used to using std::cin:
std::istringstream s(line);
std::string word;
while(s >> word)
{
// append to vector...
}
Be aware that using operator>> ignores leading whitespace and stops after first trailing one (or end of stream, if reached), so you don't have to deal with explicitly.
OK, you're not allowed to use std::stringstream (well, I used std::istringstream, but I suppose this little difference doesn't count, does it?). Changes matter a little, it gets more complex, on the other hand, we can decide ourselves what counts as words an what as separators... We might consider punctuation marks as separators just like whitespace, but allow digits to be part of words, so we'd accept e. g. ab.7c d as "ab", "7c", "d":
auto begin = line.begin();
auto end = begin;
while(end != line.end()) // iterate over each character
{
if(std::isalnum(static_cast<unsigned char>(*end)))
{
// we are inside a word; don't touch begin to remember where
// the word started
++end;
}
else
{
// non-alpha-numeric character!
if(end != begin)
{
// we discovered a word already
// (i. e. we did not move begin together with end)
words.emplace_back(begin, end);
// ('words' being your std::vector<std::string> to place the input into)
}
++end;
begin = end; // skip whatever we had already
}
}
// corner case: a line might end with a word NOT followed by whitespace
// this isn't covered within the loop, so we need to add another check:
if(end != begin)
{
words.emplace_back(begin, end);
}
It shouldn't be too difficult to adjust to different interpretations of what is a separator and what counts as word (e. g. std::isalpha(...) || *end == '_' to detect underscore as part of words, but digits not). There are quite a few helper functions you might find useful...
You could input the value of the first column, then call functions based on the value:
void Process_Value_1(std::istream& input, std::string& value);
void Process_Value_2(std::istream& input, std::string& value);
int main()
{
// ...
std::string first_value;
while (input_file >> first_value)
{
if (first_value == "aaa")
{
Process_Value_1(input_file, first_value);
}
else if (first_value = "ccc")
{
Process_Value_2(input_file, first_value);
}
//...
}
return 0;
}
A sample function could be:
void Process_Value_1(std::istream& input, std::string& value)
{
std::string b;
input >> b;
std::cout << value << "\t" << b << endl;
input.ignore(1000, '\n'); // Ignore until newline.
}
There are other methods to perform the process, such as using tables of function pointers and std::map.

Split english text into senteces(multiple lines)

I wondering about an efficient way to split text into sentences.
Sentences are split by a dot + space
Example text
The quick brown fox jumps
over the lazy dog. I love eating toasted cheese and tuna sandwiches.
My algorithm works like this
Read first line from text file to string
Find what is needed
Write to file
However sometimes half of a sentence can be on a upcoming line.
So I was wondering what is the best way to confront this problem
Yes a tried googling "search across multiple lines" and I don't want to use regex
Initially my idea is to check if the first line ends with a .+ space and if not grab another line and search through it. But I have a feeling I am missing out on something.
EDIT: Sorry forgot to mention that I am doing this in C++
You can use something like accumulator.
1. Read line
2. Check the last symbols in this line.
3. If last symbols are dot or dot+space
3.1 Split it and write all strings to output
3.2 GOTO 1
ELSE
3.3 split the line, write length-1 strings to output
3.4 Keep last piece in some variable and append next readed line to it.
Hope my idea is clear.
Here is my approach for this problem
void to_sentences()
{
// Do not skip whitespaces
std::cin >> std::noskipws;
char c;
// Loop until there is no input
while (std::cin >> c) {
// Skip new lines
if (c == '\n')
continue;
// Output the character
std::cout << c;
// check if there is a dot folowed by space
// if there add new line
if (c == '.') {
std::cin >> c;
if (c == ' ')
std::cout << endl;
}
}
// Reset skip whitespaces
std::cin >> std::skipws;
}
You can read the comments and ask if there is something unclear.
You can use std::getline(), with custom delimeter '.'
#include <sstream>
#include <string>
#include <vector>
auto split_to_sentences(std::string inp)
{
std::istringstream ss(inp); // make a stream using the string
std::vector< std::string > sentences; // return value
while(true) {
std::string this_sentence;
std::getline(ss, this_sentence, '.');
if (this_sentence != "")
sentences.push_back(std::move(this_sentence));
else
return sentences;
}
}
Note that if you have the input text as a stream, then you can skip the std::stringstream step, and give the stream directly to std::getline, in the place of ss.
The use of std::move is not necessary, but might increase performance, by preventing a copy and a deletion of the dynamic parts (on heap) of std::string.

Extracting individual sentences from a text file ... I haven't got it right YET

As part of a larger program, I'm extracting individual sentences from a text file and placing them as strings into a vector of strings. I first decided to use the procedure I've commented out. But then, after a test, I realized that it's doing 2 things wrong:
(1) It's not separating sentences when they are separated by a new line.
(2) It's not separating sentences when they end in a quotation mark. (Ex. The sentences The string Obama said, "Yes, we can." Then he audience gave a thunderous applause. would not be separated.)
I need to fix those problems. However, I'm afraid this going to end up as spaghetti code, if it isn't already. Am I going about this wrong? I don't want to keep going back and fixing things. Maybe there's some easier way?
// Extract sentences from Plain Text file
std::vector<std::string> get_file_sntncs(std::fstream& file) {
// The sentences will be stored in a vector of strings, strvec:
std::vector<std::string> strvec;
// Print out error if the file could not be found:
if(file.fail()) {
std::cout << "Could not find the file. :( " << std::endl;
// Otherwise, proceed to add the sentences to strvec.
} else {
char curchar;
std::string cursentence;
/* While we haven't reached the end of the file, add the current character to the
string representing the current sentence. If that current character is a period,
then we know we've reached the end of a sentence if the next character is a space or
if there is no next character; we then must add the current sentence to strvec. */
while (file >> std::noskipws >> curchar) {
cursentence.push_back(curchar);
if (curchar == '.') {
if (file >> std::noskipws >> curchar) {
if (curchar == ' ') {
strvec.push_back(cursentence);
cursentence.clear();
} else {
cursentence.push_back(curchar);
}
} else {
strvec.push_back(cursentence);
cursentence.clear();
}
}
}
}
return strvec;
}
Given your request to detect sentence boundaries by punctuation, whitespace, and certain combinations of them, using a regular expression seems to be a good solution. You can use regular expression to describe possible sequences of characters that indicate sentence boundaries, e.g.
[.!?]\s+
which means: "one of dot, exclamation mark question mark, followed by one or more whitespaces".
One particularly convenient way of using regular expressions in C++ is to use the regex implementation included in the Boost library. Here is an example of how it work in your case:
#include <string>
#include <vector>
#include <iostream>
#include <iterator>
#include <boost/regex.hpp>
int main()
{
/* Input. */
std::string input = "Here is a short sentence. Here is another one. And we say \"this is the final one.\", which is another example.";
/* Define sentence boundaries. */
boost::regex re("(?: [\\.\\!\\?]\\s+" // case 1: punctuation followed by whitespace
"| \\.\\\",?\\s+" // case 2: start of quotation
"| \\s+\\\")", // case 3: end of quotation
boost::regex::perl | boost::regex::mod_x);
/* Iterate through sentences. */
boost::sregex_token_iterator it(begin(input),end(input),re,-1);
boost::sregex_token_iterator endit;
/* Copy them onto a vector. */
std::vector<std::string> vec;
std::copy(it,endit,std::back_inserter(vec));
/* Output the vector, so we can check. */
std::copy(begin(vec),end(vec),
std::ostream_iterator<std::string>(std::cout,"\n"));
return 0;
}
Notice I used the boost::regex::perl and boost:regex:mod_x options to construct the regex matcher. This allowed by to use extra whitespace inside the regex to make it more readable.
Also note that certain characters, such as . (dot), ! (exclamation mark) and others need to be escaped (i.e. you need to put \\ in front of them), because they would meta characters with special meanings otherwise.
When compiling/linking the code above, you need to link it with the boost-regex library. Using GCC the command looks something like:
g++ -W -Wall -std=c++11 -o test test.cpp -lboost_regex
(assuming your program in stored in a file called test.cpp).