Trying to read from a file and skip punctuation in C++, tips? - c++

I'm trying to read from a file, and make a vector of all the words from the file. What I tried to do below is have the user input the filename, and then have the code open the file, and skip characters if they aren't alphanumeric, then input that to a file.
Right now it just closes immediately when I input the filename. Any idea what I could be doing wrong?
#include <vector>
#include <string>
#include <iostream>
#include <iomanip>
#include <fstream>
using namespace std;
int main()
{
string line; //for storing words
vector<string> words; //unspecified size vector
string whichbook;
cout << "Welcome to the book analysis program. Please input the filename of the book you would like to analyze: ";
cin >> whichbook;
cout << endl;
ifstream bookread;
//could be issue
//ofstream bookoutput("results.txt");
bookread.open(whichbook.c_str());
//assert(!bookread.fail());
if(bookread.is_open()){
while(bookread.good()){
getline(bookread, line);
cout << line;
while(isalnum(bookread)){
words.push_back(bookread);
}
}
}
cout << words[];
}

I think I'd do the job a bit differently. Since you want to ignore all but alphanumeric characters, I'd start by defining a locale that treats all other characters as white space:
struct digits_only: std::ctype<char> {
digits_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
std::fill(&rc['0'], &rc['9']+1, std::ctype_base::digit);
std::fill(&rc['a'], &rc['z']+1, std::ctype_base::lower);
std::fill(&rc['A'], &rc['Z']+1, std::ctype_base::upper);
return &rc[0];
}
};
That makes reading words/numbers from the stream quite trivial. For example:
int main() {
char const test[] = "This is a bunch=of-words and 2#numbers#4(with)stuff to\tseparate,them, I think.";
std::istringstream infile(test);
infile.imbue(std::locale(std::locale(), new digits_only));
std::copy(std::istream_iterator<std::string>(infile),
std::istream_iterator<std::string>(),
std::ostream_iterator<std::string>(std::cout, "\n"));
return 0;
}
For the moment, I've copied the words/numbers to standard output, but copying to a vector just means giving a different iterator to std::copy. For real use, we'd undoubtedly want to get the data from an std::ifstream as well, but (again) it's just a matter of supplying the correct iterator. Just open the file, imbue it with the locale, and read your words/numbers. All the punctuation, etc., will be ignored automatically.

The following would read every line, skip non-alpha numeric characters and add each line as an item to the output vector. You can adapt it so it outputs words instead of lines. I did not want to provide the entire solution, as this looks a bit like a homework problem.
#include <vector>
#include <sstream>
#include <string>
#include <iostream>
#include <iomanip>
#include <fstream>
using namespace std;
int _tmain(int argc, _TCHAR* argv[])
{
string line; //for storing words
vector<string> words; //unspecified size vector
string whichbook;
cout << "Welcome to the book analysis program. Please input the filename of the book you would like to analyze: ";
cin >> whichbook;
cout << endl;
ifstream bookread;
//could be issue
//ofstream bookoutput("results.txt");
bookread.open(whichbook.c_str());
//assert(!bookread.fail());
if(bookread.is_open()){
while(!(bookread.eof())){
line = "";
getline(bookread, line);
string lineToAdd = "";
for(int i = 0 ; i < line.size(); ++i)
{
if(isalnum(line[i]) || line[i] == ' ')
{
if(line[i] == ' ')
lineToAdd.append(" ");
else
{ // just add the newly read character to the string 'lineToAdd'
stringstream ss;
string s;
ss << line[i];
ss >> s;
lineToAdd.append(s);
}
}
}
words.push_back(lineToAdd);
}
}
for(int i = 0 ; i < words.size(); ++i)
cout << words[i] + " ";
return 0;
}

Related

How to get input an array of strings with \n as delimiter?

#include<bits/stdc++.h>
using namespace std;
int main()
{
int i=0;
char a[100][100];
do {
cin>>a[i];
i++;
}while( strcmp(a[i],"\n") !=0 );
for(int j=0;j<i;i++)
{
cout<<a[i]<<endl;
}
return 0;
}
Here , i want to exit the do while loop as the users hits enter .But, the code doesn't come out of the loop..
The following reads one line and splits it on white-space. This code is not something one would normally expect a beginner to write from scratch. However, searching on Duckduckgo or Stackoverflow will reveal lots of variations on this theme. When progamming, know that you are probably not the first to need the functionality you seek. The engineering way is to find the best and learn from it. Study the code below. From one tiny example, you will learn about getline, string-streams, iterators, copy, back_inserter, and more. What a bargain!
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
#include <vector>
int main() {
using namespace std;
vector<string> tokens;
{
string line;
getline(cin, line);
istringstream stream(line);
copy(istream_iterator<string>(stream),
istream_iterator<string>(),
back_inserter(tokens));
}
for (auto s : tokens) {
cout << s << '\n';
}
return 0;
}
First of all, we need to read the line until the '\n' character, which we can do with getline(). The extraction operator >> won't work here, since it will also stop reading input upon reaching a space. Once we get the whole line, we can put it into a stringstream and use cin >> str or getline(cin, str, ' ') to read the individual strings.
Another approach might be to take advantage of the fact that the extraction operator will leave the delimiter in the stream. We can then check if it's a '\n' with cin.peek().
Here's the code for the first approach:
#include <iostream> //include the standard library files individually
#include <vector> //#include <bits/stdc++.h> is terrible practice.
#include <sstream>
int main()
{
std::vector<std::string> words; //vector to store the strings
std::string line;
std::getline(std::cin, line); //get the whole line
std::stringstream ss(line); //create stringstream containing the line
std::string str;
while(std::getline(ss, str, ' ')) //loops until the input fails (when ss is empty)
{
words.push_back(str);
}
for(std::string &s : words)
{
std::cout << s << '\n';
}
}
And for the second approach:
#include <iostream> //include the standard library files individually
#include <vector> //#include <bits/stdc++.h> is terrible practice.
int main()
{
std::vector<std::string> words; //vector to store the strings
while(std::cin.peek() != '\n') //loop until next character to be read is '\n'
{
std::string str; //read a word
std::cin >> str;
words.push_back(str);
}
for(std::string &s : words)
{
std::cout << s << '\n';
}
}
You canuse getline to read ENTER, run on windows:
//#include<bits/stdc++.h>
#include <iostream>
#include <string> // for getline()
using namespace std;
int main()
{
int i = 0;
char a[100][100];
string temp;
do {
getline(std::cin, temp);
if (temp.empty())
break;
strcpy_s(a[i], temp.substr(0, 100).c_str());
} while (++i < 100);
for (int j = 0; j<i; j++)
{
cout << a[j] << endl;
}
return 0;
}
While each getline will got a whole line, like "hello world" will be read once, you can split it, just see this post.

How to parse table of numbers in C++

I need to parse a table of numbers formatted as ascii text. There are 36 space delimited signed integers per line of text and about 3000 lines in the file. The input file is generated by me in Matlab so I could modify the format. On the other hand, I also want to be able to parse the same file in VHDL and so ascii text is about the only format possible.
So far, I have a little program like this that can loop through all the lines of the input file. I just haven't found a way to get individual numbers out of the line. I am not a C++ purest. I would consider fscanf() but 36 numbers is a bit much for that. Please suggest practical ways to get numbers out of a text file.
int main()
{
string line;
ifstream myfile("CorrOut.dat");
if (!myfile.is_open())
cout << "Unable to open file";
else{
while (getline(myfile, line))
{
cout << line << '\n';
}
myfile.close();
}
return 0;
}
Use std::istringstream. Here is an example:
#include <sstream>
#include <string>
#include <fstream>
#include <iostream>
using namespace std;
int main()
{
string line;
istringstream strm;
int num;
ifstream ifs("YourData");
while (getline(ifs, line))
{
istringstream strm(line);
while ( strm >> num )
cout << num << " ";
cout << "\n";
}
}
Live Example
If you want to create a table, use a std::vector or other suitable container:
#include <sstream>
#include <string>
#include <fstream>
#include <iostream>
#include <vector>
using namespace std;
int main()
{
string line;
// our 2 dimensional table
vector<vector<int>> table;
istringstream strm;
int num;
ifstream ifs("YourData");
while (getline(ifs, line))
{
vector<int> vInt;
istringstream strm(line);
while ( strm >> num )
vInt.push_back(num);
table.push_back(vInt);
}
}
The table vector gets populated, row by row. Note we created an intermediate vector to store each row, and then that row gets added to the table.
Live Example
You can use a few different approaches, the one offered above is probable the quickest of them, however in case you have different delimitation characters you may consider one of the following solutions:
The first solution, read strings line by line. After that it use the find function in order to find the first position o the specific delimiter. It then removes the number read and continues till the delimiter is not found anymore.
You can customize the delimiter by modifying the delimiter variable value.
#include <iostream>
#include <string>
#include <fstream>
using namespace std;
int main()
{
string line;
ifstream myfile("CorrOut.dat");
string delimiter = " ";
size_t pos = 0;
string token;
vector<vector<int>> data;
if (!myfile.is_open())
cout << "Unable to open file";
else {
while (getline(myfile, line))
{
vector<int> temp;
pos = 0;
while ((pos = line.find(delimiter)) != std::string::npos) {
token = line.substr(0, pos);
std::cout << token << std::endl;
line.erase(0, pos + delimiter.length());
temp.push_back(atoi(token.c_str()));
}
data.push_back(temp);
}
myfile.close();
}
return 0;
}
The second solution make use of regex and it doesn't care about the delimiter use, it will search and match any integers found in the string.
#include <iostream>
#include <string>
#include <regex> // The new library introduced in C++ 11
#include <fstream>
using namespace std;
int main()
{
string line;
ifstream myfile("CorrOut.dat");
std::smatch m;
std::regex e("[-+]?\\d+");
vector<vector<int>> data;
if (!myfile.is_open())
cout << "Unable to open file";
else {
while (getline(myfile, line))
{
vector<int> temp;
while (regex_search(line, m, e)) {
for (auto x : m) {
std::cout << x.str() << " ";
temp.push_back(atoi(x.str().c_str()));
}
std::cout << std::endl;
line = m.suffix().str();
}
data.push_back(temp);
}
myfile.close();
}
return 0;
}

Read a file of strings with quotes and commas into string array

Let's say I have a file of names such as:
"erica","bosley","bob","david","janice"
That is, quotes around each name, each name separated by a comma with no space in between.
I want to read these into an array of strings, but can't seem to find the ignore/get/getline/whatever combo to work. I imagine this is a common problem but I'm trying to get better at file I/O and don't know much yet. Here's a basic version that just reads in the entire file as one string (NOT what I want, obviously):
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
fstream iFile("names.txt", ios::in);
string names[5];
int index = 0;
while(iFile)
{
iFile >> names[index];
index++;
}
for(int i = 0; i < 5; i++)
{
cout << "names[" << i << "]: " << names[i] << endl;
}
Output:
names[0]: "erica","bosley","bob","david","janice"
names[1]:
names[2]:
names[3]:
names[4]:
Also, I understand why it all gets read as a single string, but then why are the remaining elements not filled with garbage?
To be clear, I want the output to look like:
names[0]: erica
names[1]: bosley
names[2]: bob
names[3]: david
names[4]: janice
The easiest way to handle this:
Read the entire file and place it into a string, Here is an example of how to do it.
Split the string that you got from number 1. Here is an example of how to do that.
Stream extraction delimits by a space. Therefore the entire file gets read as one string. What you want instead is to split the string by commas.
#include <iostream>
#include <fstream>
#include <algorithm>
#include <sstream>
fstream iFile("names.txt", ios::in);
string file;
iFile >> file;
std::istringstream ss(file);
std::string token;
std::vector<std::string> names;
while(std::getline(ss, token, ',')) {
names.push_back(token);
}
To remove the quotes, use this code:
for (unsigned int i = 0; i < names.size(); i++) {
auto it = std::remove_if(names[i].begin(), names[i].end(), [&] (char c) { return c == '"'; });
names[i] = std::string(names[i].begin(), it);
}
remove_if returns the end iterator for the transformed string, which is why you construct the new string with (s.begin(), it).
Then output it:
for (unsigned int i = 0; i < names.size(); i++) {
std::cout << "names["<<i<<"]: " << names[i] << std::endl;
}
Live Example

How to convert a sentence from upper case to lower case that contains space and numbers

I'm trying to convert a sentence from upper case to lowercase. I also write a code but I stopper when a space is appear. How can I fix this problem and convert the whole sentence? Here is my code
#include<iostream>
#include<cstring>
using namespace std;
int main()
{
char str[100];
cin>>str;
for(int i=0;i<strlen(str);i++)
{
if(str[i]>='A'&&str[i]<='Z')
{
str[i]=str[i]+32;
}
}
cout<<str<<endl;
return 0;
}
It's because of theinput operator >>, it breaks on space. If you want to read a whole line then use std::getline to read into a std::string instead.
Then read about the C++ standard algorithms, like for example std::transform. Also, std::tolower doesn't modify anything that's not an upper-case letter, so it's a good function to use.
The error is because operator>> delimites on spaces. The alternative is to use getline. See the following example:
#include <cstring>
#include <iostream>
#include <string>
int main() {
std::string s;
std::getline(std::cin, s);
std::cout << "Original string: " << s << std::endl;
if (!std::cin.fail()) {
const int len = strlen(s.c_str());
for (size_t i = 0; len > i; ++i) {
if ((s[i] >= 'A') && (s[i] <= 'Z'))
s[i] = s[i] - 'A' + 'a';
}
}
std::cout << "New string: " << s << std::endl;
return 0;
}
The reason input stops at whitespace is because formatted input is delimited by whitespace characters (among others). You will need unformatted I/O in order to extract the entire string into str. One way to do this is to use std::istream::getline:
std::cin.getline(str, 100, '\n');
It's also useful to check if the input succeeded by using gcount:
if (std::cin.getline(str, 100, '\n') && std::cin.gcount())
{
...
}
But in practice it's recommended that you use the standard string object std::string which holds a dynamic buffer. To extract the entire input you use std::getline:
std::string str;
if (std::getline(std::cin, str)
{
...
}
Here is one of the examples of doing it using transform function.
#include <iostream>
#include <string>
#include <algorithm>
using namespace std;
int main()
{
string str;
if (getline(cin, str))
{
transform(str.begin(), str.end(), str.begin(), ptr_fun<int, int>(toupper));
}
cout << str << endl;
return 0;
}

Locate and tag words in text file

I need to read in a text file of 500 words or more(a real world article from newspaper, etc..) and locate and tag like this, <location> word <location/>, and then print the entire article on the screen. Im using boost regex right now and its working ok. I want to try and use a list or array or some other data structure to have a list of the states and major cities, and search those and compare to the aticle. right now I'm using an array but I'm willing to use anything. Any ideas or clues?
#include <boost/regex.hpp>
#include <iostream>
#include <string>
#include <boost/iostreams/filter/regex.hpp>
#include <fstream>
using namespace std;
int main()
{
string cities[389];
string states [60];
string filename, line,city,state;
ifstream file,cityfile, statefile;
int i=0;
int j=0;
cityfile.open("c:\\cities.txt");
while (!cityfile.eof())
{
getline(cityfile,city);
cities[i]=city;
i++;
//for (int i=0;i<500;i++)
//file>>cities[i];
}
cityfile.close();
statefile.open("c:\\states.txt");
while (!statefile.eof())
{
getline(statefile,state);
states[j]=state;
//for (int i=0;i<500;i++)
//cout<<states[j];
j++;
}
statefile.close();
//4cout<<cities[4];
cout<<"Please enter the path and file name "<<endl;
cin>>filename;
file.open(filename);
while (!file.eof())
{
while(getline(file, line)
{
}
while(getline(file, line))
{
//string text = "Hello world";
boost::regex re("[A-Z/]\.[A-Z\]\.|[A-Z/].*[:space:][A-Z/]|C........a");
//boost::regex re(
string fmt = "<locations>$&<locations\>";
if(boost::regex_search(line, re))
{
string result = boost::regex_replace(line, re, fmt);
cout << result << endl;
}
/*else
{
cout << "Found Nothing" << endl;
}*/
}
}
file.close();
cin.get(),cin.get();
return 0;
}
If you are after asymptotic complexity - Aho-Corasick algorithm offers a linear time complexity ( O(n+m)) (n and m are the lengths of the input strings). for searching a dictionary in a string.
An alternative is to put the tokenized words in a map (where the value is a list to the places in the stream of each string), and search for each string in the data in the tree. The complexity will be O(|S| * (nlogn + mlogn) ) (m being the number of searched words, n is the number of words in the string, and |S| is the length of the average word)
You can use any container that has a .find() method or supports std::find(). I'd use set, since set::find() runs in less than linear time.
Here is a program which does what you talk about. Note that the parsing doesn't work great, but that's not what I'm trying to demonstrate. You could continue to find the words using your parser, and use the call to set::find() to determine if they are locations.
#include <set>
#include <string>
#include <iostream>
#include <sstream>
const std::set<std::string> locations { "Springfield", "Illinois", "Pennsylvania" };
int main () {
std::string line;
while(std::getline(std::cin, line)) {
std::istringstream iss(line);
std::string word;
while(iss >> word) {
if(locations.find(word) == locations.end())
std::cout << word << " ";
else
std::cout << "<location>" << word << "</location> ";
}
std::cout << "\n";
}
}