How to read a file and get words in C++ - c++

I am curious as to how I would go about reading the input from a text file with no set structure (Such as notes or a small report) word by word.
The text for example might be structured like this:
"06/05/1992
Today is a good day;
The worm has turned and the battle was won."
I was thinking maybe getting the line using getline, and then seeing if I can split it into words via whitespace from there. Then I thought using strtok might work! However I don't think that will work with the punctuation.
Another method I was thinking of was getting everything char by char and omitting the characters that were undesired. Yet that one seems unlikely.
So to sort the thing short:
Is there an easy way to read an input from a file and split it into words?

Since it's easier to write than to find the duplicate question,
#include <iterator>
std::istream_iterator<std::string> word_iter( my_file_stream ), word_iter_end;
size_t wordcnt;
for ( ; word_iter != word_iter_end; ++ word_iter ) {
std::cout << "word " << wordcnt << ": " << * word_iter << '\n';
}
The std::string argument to istream_iterator tells it to return a string when you do *word_iter. Every time the iterator is incremented, it grabs another word from its stream.
If you have multiple iterators on the same stream at the same time, you can choose between data types to extract. However, in that case it may be easier just to use >> directly. The advantage of an iterator is that it can plug into the generic functions in <algorithm>.

Yes. You're looking for std::istream::operator>> :) Note that it will remove consecutive whitespace but I doubt that's a problem here.
i.e.
std::ifstream file("filename");
std::vector<std::string> words;
std::string currentWord;
while(file >> currentWord)
words.push_back(currentWord);

You can use getline with a space character, getline(buffer,1000,' ');
Or perhaps you can use this function to split a string into several parts, with a certain delimiter:
string StrPart(string s, char sep, int i) {
string out="";
int n=0, c=0;
for (c=0;c<(int)s.length();c++) {
if (s[c]==sep) {
n+=1;
} else {
if (n==i) out+=s[c];
}
}
return out;
}
Notes: This function assumes that it you have declared using namespace std;.
s is the string to be split.
sep is the delimiter
i is the part to get (0 based).

You can use the scanner technique to grabb words, numbers dates etc... very simple and flexible. The scanner normally returns token (word, number, real, keywords etc..) to a Parser.
If you later intend to interpret the words, I would recommend this approach.
I can warmly recommend the book "Writing Compilers and Interpreters" by Ronald Mak (Wiley Computer Publishing)

Related

C++ retrieve numerical values in a line of string

Here is the content of txt file that i've managed read.
X-axis=0-9
y-axis=0-9
location.txt
temp.txt
I'm not sure whether if its possible but after reading the contents of this txt file i'm trying to store just the x and y axis range into 2 variables so that i'll be able to use it for later functions. Any suggestion? And do i need to use vectors? Here is the code for reading of the file.
string configName;
ifstream inFile;
do {
cout << "Please enter config filename: ";
cin >> configName;
inFile.open(configName);
if (inFile.fail()){
cerr << "Error finding file, please re-enter again." << endl;
}
} while (inFile.fail());
string content;
string tempStr;
while (getline(inFile, content)){
if (content[0] && content[1] == '/') continue;
cout << endl << content << endl;
depends on the style of your file, if you are always sure that the style will remain unchanged, u can read the file character by character and implement pattern recognition stuff like
if (tempstr == "y-axis=")
and then convert the appropriate substring to integer using functions like
std::stoi
and store it
I'm going to assume you already have the whole contents of the .txt file in a single string somewhere. In that case, your next task should be to split the string. Personally, yes, I would recommend using vectors. Say you wanted to split that string by newlines. A function like this:
#include <string>
#include <vector>
std::vector<std::string> split(std::string str)
{
std::vector<std::string> ret;
int cur_pos = 0;
int next_delim = str.find("\n");
while (next_delim != -1) {
ret.push_back(str.substr(cur_pos, next_delim - cur_pos));
cur_pos = next_delim + 1;
next_delim = str.find("\n", cur_pos);
}
return ret;
}
Will split an input string by newlines. From there, you can begin parsing the strings in that vector. They key functions you'll want to look at are std::string's substr() and find() methods. A quick google search should get you to the relevant documentation, but here you are, just in case:
http://www.cplusplus.com/reference/string/string/substr/
http://www.cplusplus.com/reference/string/string/find/
Now, say you have the string "X-axis=0-9" in vec[0]. Then, what you can do is do a find for = and then get the substrings before and after that index. The stuff before will be "X-axis" and the stuff after will be "0-9". This will allow you to figure that the "0-9" should be ascribed to whatever "X-axis" is. From there, I think you can figure it out, but I hope this gives you a good idea as to where to start!
std::string::find() can be used to search for a character in a string;
std::string::substr() can be used to extract part of a string into another new sub-string;
std::atoi() can be used to convert a string into an integer.
So then, these three functions will allow you to do some processing on content, specifically: (1) search content for the start/stop delimiters of the first value (= and -) and the second value (- and string::npos), (2) extract them into temporary sub-strings, and then (3) convert the sub-strings to ints. Which is what you want.

String Management C/C++ & Writing and Reading From txt File

I am facing a problem with reading and writing a string from and to a file respectively.
Purpose:
To enter a string into a text file as a complete sentence, read the string from the text file and separate all words that start from a vowel using a function and display them as a sentence. (The sentence just needs to consist of the words from the string that start with a vowel.)
Problem:
The code is working as intended but as i have used the getline() function to obtain the string from the txt file when i withdraw a substring from it, it includes the entire file after the vowel instead of just the word. I cannot understand how to make the substring only include words.
Code:
#include <fstream>
#include <string>
#include <iostream>
#include <cstring>
using namespace std;
string vowels(string a)
{
int c=sizeof(a);
string b[c];
string d;
static int n;
for(int i=1;i<=c;i++)
{
if (a.find("a")!=-1)
{
b[i]=a.substr(a.find("a",n));
d+=b[i];
n=a.find("a")+1;
}
else if (a.find("e")!=-1)
{
b[i]=a.substr(a.find("e",n));
d+=b[i];
n=a.find("e")+1;
}
else if (a.find("i")!=-1)
{
b[i]=a.substr(a.find("i",n));
d+=b[i];
n=a.find("i")+1;
}
else if (a.find("o")!=-1)
{
b[i]=a.substr(a.find("o",n));
d+=b[i];
n=a.find("o")+1;
}
else if (a.find("u")!=-1)
{
b[i]=a.substr(a.find("u",n));
d+=b[i];
n=a.find("u")+1;
}
}
return d;
}
int main()
{
string input,lne,e;
ofstream file("output.txt", ios::app);
cout<<"Please input text for text file input: ";
getline(cin,input);
file << input;
file.close();
ifstream myfile("output.txt");
getline(myfile,lne);
e=vowels(lne);
cout<<endl<<"Text inside file reads: ";
cout<<lne;
cout<<endl;
cout<<e<<endl;
system("pause");
myfile.close();
return 0;
}
I haven't read your code VERY carefully, but several things stand out:
Look up find_first_of - it'll simplify your code A LOT.
sizeof(a) certainly doesn't do what you think it does [unless you think it gives you the size of the std::string class type - which makes it rather strange as a use-case, why not use either 12 or 24?]
find (and find_first_of), technically speaking, doesn't return -1 when the function isn't finding what you want. It returns std::string::npos [which may appear to be -1, but a) is not guaranteed to be, and b) is unsingned so can't be negative].
Your program only reads one line.
x.substr(n) will give you the string of x from position n - is that what you want?
Don't repeat find, use p = x.find("X"); and then do x.substr(p) [assuming that is what you want].
There are various problems with your code.
int c = sizeof( a );
This is the number of bytes that a string takes up in memory. And you certainly don't want to create an array of this many strings as it makes no sense for what you're trying to achieve. Don't do this to yourself. You're only copying one string inside the loop, all you need is one string and you already have string d.
To get the actual size of a string, you have to call
str.size()
The string.substr(..) has a couple overloads, one of them takes only one argument, an index. This will return sub string starting at that index in the original string. (The string starting at the vowel all the way through to the end of the string)
What you are maybe looking for is the overload that takes two arguments, the start index (beginning of the word and the end of the word).
The string input will not take the newline that you enter to flush cin. And then you add it to the file in append mode, so after running the program a few times your file is a huge one-liner. Did you really intend to do this?
Maybe you should explicitly add a new line to the file after entering the input. Something like file << std::endl;
Also, the conditions in the ifs
if (a.find("a")!=-1)
Don't match what you do next,
b[i]=a.substr(a.find("a",n));
Then you use a static int,
static int n;
This is bad, because this function will only work once. You're lucky that static initializes its values to zero, but you should always initialize explicitly. In your case, you don't need this to be static.
Finally: "so i was unsure of how many loops to run"
When you don't know how many loops you have to run, then a for loop is not adequate.
You should use a while loop or a do while.
You shouldn't try to learn C++ by guessing, because that's what it looks like you're doing. You're trying to do more than you know and making some very silly mistakes. Find a good book to learn from, or at the very least google the functions you're using to see what they do and how to use them properly. (ie: http://www.cplusplus.com/reference/string/string/substr/ )
Here's a list of books from stackoverflow's FAQ: The Definitive C++ Book Guide and List
The last thing is about finding vowels. When you find a vowel, you have to make sure it's at the beginning of a word. Then you want to read it until the word ends, that is when you find a character that is not part of a word. (a whitespace, certain punctuation, ... ) This should mark the beginning and end of the word.

How to extract formatted text in C++?

This might have appeared before, but I couldn't understand how to extract formatted data. Below is my code to extract all text between string "[87]" and "[90]" in a text file.
Apparently, the position of [87] and [90] is the same as indicated in the output.
void ExtractWebContent::filterContent(){
string str, str1;
string positionOfCurrency1 = "[87]";
string positionOfCurrency2 = "[90]";
size_t positionOfText1, positionOfText2;
ifstream reading;
reading.open("file_Currency.txt");
while (!reading.eof()){
getline (reading, str);
positionOfText1 = str.find(positionOfCurrency1);
positionOfText2 = str.find(positionOfCurrency2);
cout << "positionOfCurrency1 " << positionOfText1 << endl;
cout << "positionOfCurrency2 " << positionOfText2 << endl;
//str1= str.substr (positionOfText);
cout << "String" << str1 << endl;
}
reading.close();
An Update on the currency file:
[79]More »Brent slips to $102 on worries about euro zone economy
Market Data
* Currencies
CAPTION: Currencies
Name Price Change % Chg
[80]USD/SGD
1.2606 -0.00 -0.13%
USD/SGD [81]USDSGD=X
[82]EUR/SGD
1.5242 0.00 +0.11%
EUR/SGD [83]EURSGD=X
That really depends on what 'extracting data means'. In simple cases you can just read the file into a string and then use string member functions (especially find and substr) to extract the segment you are interested in. If you are interested in data per line getline is the way to go for line extraction. Apply find and substr as before to get the segment.
Sometimes a simple find wont get you far and you will need a regular expression to do easily get to the parts you are interested in.
Often simple parsers evolve and soon outgrow even regular expressions. This often signals time for the very large hammer of C++ parsing Boost.Spirit.
Boost.Tokenizer can be helpful for parsing out a string, but it gets a little trickier if those delimiters have to be bracketed numbers like you have them. With the delimieters as described, a regex is probably adequate.
All that does is concatenate the output of reading and the strings "[1]" and "[2]". I'm guessing this code resulted from a rather literal extrapolation of similar code using scanf. scanf (as well as the rest of C) still works in C++, so if that works for you I would use it.
That said, there are various levels of sophistication at which you can do this. Using regexes is one of the most powerful/flexible ways, but it might be overkill. The quickest way in my opinion is just to do something like:
Find index of substring "[1]", i1
Find index of substring "[2]", i2
get substring between i1+3 and i2.
In code, supposing std::string line has the text:
size_t i1 = line.find("[1]");
size_t i2 = line.find("[2]");
std::string out(line.substr(i1+3, i2));
Warning: no error checking.

C++ fstream: how to know size of string when reading?

...as someone may remember, I'm still stuck on C++ strings. Ok, I can write a string to a file using a fstream as follows
outStream.write((char *) s.c_str(), s.size());
When I want to read that string, I can do
inStream.read((char *) s.c_str(), s.size());
Everything works as expected. The problem is: if I change the length of my string after writing it to a file and before reading it again, printing that string won't bring me back my original string but a shorter/longer one. So: if I have to store many strings on a file, how can I know their size when reading it back?
Thanks a lot!
You shouldn’t be using the unformatted I/O functions (read() and write()) if you just want to write ordinary human-readable string data. Generally you only use those functions when you need to read and write compact binary data, which for a beginner is probably unnecessary. You can write ordinary lines of text instead:
std::string text = "This is some test data.";
{
std::ofstream file("data.txt");
file << text << '\n';
}
Then read them back with getline():
{
std::ifstream file("data.txt");
std::string line;
std::getline(file, line);
// line == text
}
You can also use the regular formatting operator >> to read, but when applied to string, it reads tokens (nonwhitespace characters separated by whitespace), not whole lines:
{
std::ifstream file("data.txt");
std::vector<std::string> words;
std::string word;
while (file >> word) {
words.push_back(word);
}
// words == {"This", "is", "some", "test", "data."}
}
All of the formatted I/O functions automatically handle memory management for you, so there is no need to worry about the length of your strings.
Although your writing solution is more or less acceptable, your reading solution is fundamentally flawed: it uses the internal storage of your old string as a character buffer for your new string, which is very, very bad (to put it mildly).
You should switch to a formatted way of reading and writing the streams, like this:
Writing:
outStream << s;
Reading:
inStream >> s;
This way you would not need to bother determining the lengths of your strings at all.
This code is different in that it stops at whitespace characters; you can use getline if you want to stop only at \n characters.
You can write the strings and write an additional 0 (null terminator) to the file. Then it will be easy to separate strings later. Also, you might want to read and write lines
outfile << string1 << endl;
getline(infile, string2, '\n');
If you want to use unformatted I/O your only real options are to either use a fixed size or to prepend the size somehow so you know how many characters to read. Otherwise, when using formatted I/O it somewhat depends on what your strings contain: if they can contain all viable characters, you would need to implement some sort of quoting mechanism. In simple cases, where strings consist e.g. of space-free sequence, you can just use formatted I/O and be sure to write a space after each string. If your strings don't contain some character useful as a quote, it is relatively easy to process quotes:
std::istream& quote(std::istream& out) {
char c;
if (in >> c && c != '"') {
in.setstate(std::ios_base::failbit;
}
}
out << '"' << string << "'";
std::getline(in >> std::ws >> quote, string, '"');
Obviously, you might want to bundle this functionality a class.

Reading a file of mixed data into a C++ string

I need to use C++ to read in text with spaces, followed by a numeric value.
For example, data that looks like:
text1
1.0
text two
2.1
text2 again
3.1
can't be read in with 2 "infile >>" statements. I'm not having any luck with getline
either. I ultimately want to populate a struct with these 2 data elements. Any ideas?
The standard IO library isn't going to do this for you alone, you need some sort of simple parsing of the data to determine where the text ends and the numeric value begins. If you can make some simplifying assumptions (like saying there is exactly one text/number pair per line, and minimal error recovery) it wouldn't be too bad to getline() the whole thing into a string and then scan it by hand. Otherwise, you're probably better off using a regular expression or parsing library to handle this, rather than reinventing the wheel.
Why? You can use getline providing a space as line separator. Then stitch extracted parts if next is a number.
If you can be sure that your input is well-formed, you can try something like this sample:
#include <iostream>
#include <sstream>
int main()
{
std::istringstream iss("text1 1.0 text two 2.1 text2 again 3.1");
for ( ;; )
{
double x;
if ( iss >> x )
{
std::cout << x << std::endl;
}
else
{
iss.clear();
std::string junk;
if ( !(iss >> junk) )
break;
}
}
}
If you do have to validate input (instead of just trying to parse anything looking like a double from it), you'll have to write some kind of parser, which is not hard but boring.
Pseudocode.
This should work. It assumes you have text/numbers in pairs, however. You'll have to do some jimmying to get all the typing happy, also.
while( ! eof)
getline(textbuffer)
getline(numberbuffer)
stringlist = tokenize(textbuffer)
number = atof(numberbuffer)