File input and process data - c++

I would like to read from a text file and the format of the file is
Method 1
Method 2
Insert 3 "James Tan"
I am currently using ifstream to open the text file and read the items, but when I use >> to read the lines, which is causing the name not to be fully read as "James Tan".
Attached below is the code and the output.
ifstream fileInput;
if(fileInput.is_open()){
while(fileInput.good()){
fileInput >>methondName>>value>>Name;
......
Output
methodName = Method, Method, Insert
value = 1, 2, 3 (must be a usigned integer)
Name = James
What is the better way to process the reading of the lines and the contents.
I was told about getline. But i understand that getline reads fully as a line rather than a single word by single word.
Next is fstream really fast?. cause, I would like to process 500000 lines of data and if ifstream is not fast, what other options do I have.
Please advice on this.

Method 1
Method 2
Insert 3 "James Tan"
I take it you mean that the file consists of several lines. Each line either begins with the word "Method" or the word "Insert", in each case followed by a number. Additionally, lines that begin "Insert" have a multi-word name at the end.
Is that right? If so, try:
ifstream fileInput("input.txt");
std::string methodName;
int value;
while ( fileInput >> methodName >> value ) {
std::string name;
if(methodName == "Insert")
std::getline(fileInput, name);
// Now do whatever you meant to do with the record.
records.push_back(RecordType(methodName, value, name); // for example
}

Related

How can we take out the multiple integer numbers in the string array or string and assign them to different int data type?

I am new to C++ and I am reading in a text file. The content of text file is like:
$ (first line)
2 (second)
MY NAME IS (whatever sentence with 10 or below characters)(third)
12 21 (forth)
22 22 (fifth)
221 (sixth)
fly jump run (seventh)
fish animal (eighth)
So I need to read all of these and store them into different variables line by line and so far I'd manage to store them into string array line by line but how can I store the numbers like 12 21 in forth line into 2 different integer variables such as int b and int c?
and also like last two line
how can I store the fly jump run fish animal into 5 different string variables respectively?
Basically Now I am putting them into a string array line by line and trying to access them and take them out of the array and store it.
if (file.is_open()){
cout<<"Congratulations! Your file was successfully read!";
while (!file.eof()){
getline(file,line);
txt[i]=line;
i++;
}
}
Just want to store every line into variables based on their data type.
The streams support streaming the content directly into the basic data types (int, double etc.). So the istream::operator>>(int&) does the work for you.
The below small sample class demonstrates it by reading your sample file into the members -- hope that helps:
class Creature
{
public:
void read(istream& stream)
{
string line;
stream.ignore(10, '\n'); // skip line 1 (= $)
stream >> m_integers[0]; // line 2 = single int
stream.ignore(1, '\n'); // skip end of line
getline(stream, m_sentence); // get the full sentence line ..
// and the rest ... we can read that in a single code line ...
stream >> m_integers[1] >> m_integers[2] >> m_integers[3] >> m_integers[4]
>> m_integers[5] >> m_whatCanIdDo[0] >> m_whatCanIdDo[1] >> m_whatCanIdDo[2] >> m_whatIAm[0] >> m_whatIAm[1];
}
private:
string m_sentence;
int m_integers[6];
string m_whatCanIdDo[3];
string m_whatIAm[2];
};
Calling the function:
int main()
{
ifstream file;
file.open("creature.txt");
Creature cr;
cr.read(file);
file.close();
}
There are several ways of doing this, but one of the most straightforward is to use a stringstream.
To do this, copy the lines you want to tokenize from your txt array into a stringstream. Use the stream extratction operator (>>) to read out each word from that line, separated by a space, into a separate variable.
//Required headers
#include <string>
#include <sstream>
...
string word1, word2;
stringstream words(txt[lineNumber]);
words >> word1 >> word2;
//Process words
For each line you tokenize, you'll have to reset the stream.
//Read in next line
lineNumber++;
//Reset stream flags
words.clear();
//Replace the stream's input string
words.str(txt[lineNumber]);
words >> word1 >> word2;
//Process new words
You can use the same process for both integers and strings. The stream extraction operator will automatically convert strings to whatever data type you give it. However, it's up to you to make sure that the data it's trying to convert is the correct type. If you try to write a string to an int using a stringstream, the stringstream will set a fail bit and you won't get any useful output.
It's a good idea to write your input to a string, and then check whether that string is, in fact, a number, before trying to write it to an integer. But that's an entirely different topic, there are many ways to do it, and there are several other questions on this site that cover it.

C++ read different kind of datas from file until there's a string beginning with a number

In C++, I'd like to read from an input file which contains different kind of datas: first the name of a contestant (2 or more strings with whitespaces), then an ID (string without whitespaces, always beginning with a number), then another strings without ws and a numbers (the sports and their achieved places).
For example:
Josh Michael Allen 1063Szinyei running 3 swimming 1 jumping 1
I show you the code what I started to write and then stucked..
void ContestEnor::next()
{
string line;
getline(_f , line);
if( !(_end = _f.fail()) ){
istringstream is(line);
is >> _cur.contestant >> _cur.id; // here I don't know how to go on
_cur.counter = 0;
//...
}
}
Thank you for your help in advance.
You should look into using std::getline with a delimiter. This way, you can delimit on a space character and read until you find a string where the first character in a number. Here is a short code example (this seems rather homework-like, so I don't want to write too much of it for you ;):
std::string temp, id;
while (std::getline(_f, temp, ' ')) {
if (temp[0] >= 0 && temp[0] <= '9') {
id = temp;
}
// you would need to add more code for the rest of the data on that line
}
/* close the file, etc. */
This code should be pretty self-explanatory. The most important thing to know is that you can use std::getline to get data up until a delimiter. The delimiter is consumed, just like the default behavior of delimiting on a newline character. Thus, the name getline isn't entirely accurate - you can still get only part of a line if you need to.

Running through a file to fill a map and splitting on white space

So I have a project (I don't expect anyone to do my hw for me but I'm getting boned right now on the first of 3 data structures just so I can start my project) where I need to fill a couple maps by running through some files, my example file is set up simply where I need to extract a long as the key for a value that will be a string in the map, like so:
0 A
1 B
2 C
Where my pairs would obviously be 0 is the key to A which would be a string for this project, the issue is that my instructor also said this would be a possible format:
0 W e b 1
1 W e b 2
2 W e b 3
where 0 is the key to "W e b 1". I know that I need to divide on white space but honestly I have no clue even where to begin, I have tried a couple methods but I can only get the first character of the string in this second case.
Here is ultimately what I am sitting on, don't worry about the whole boolean return and the fact that I know the whole opening the file and checking of it should occur outside this function but my professor wants all that in this function.
bool read_index(map<long, string> &index_map, string file_name)
{
//create a file stream for the file to be read
ifstream index_file(file_name);
//if file doesn't open then return false
if(!index_file)
return false;
string line;
long n;
string token;
//read file
while(!index_file.eof())
{
getline(?)
//not sure how to handle the return from getline
}
//file read?
return !index_file.fail();
}
You could possibly use strtok() function for line splitting if you are a pure C lover, but there's a good old C++ way for splitting file data: just redirect cin stream to your file, it splits any valid separators -- whitespaces, tabs, newlines, you'll need only to keep a line counter for yourself
std::ifstream in(file_name);
std::streambuf *cinbuf = std::cin.rdbuf(); //better save old buf, if needed later
std::cin.rdbuf(in.rdbuf()); //redirect std::cin to file_name
// <something>
std::string line;
while(std::getline(std::cin, line)) //now the input is from the file
{
// do whatever you need with line here,
// just find a way to distinguish key from value
// or some other logic
}

Reading a text file in c++

string numbers;
string fileName = "text.txt";
ifstream inputFile;
inputFile.open(fileName.c_str(),ios_base::in);
inputFile >> numbers;
inputFile.close();
cout << numbers;
And my text.txt file is:
1 2 3 4 5
basically a set of integers separated by tabs.
The problem is the program only reads the first integer in the text.txt file and ignores the rest for some reason. If I remove the tabs between the integers it works fine, but with tabs between them, it won't work. What causes this? As far as I know it should ignore any white space characters or am I mistaken? If so is there a better way to get each of these numbers from the text file?
When reading formatted strings the input operator starts with ignoring leading whitespace. Then it reads non-whitespace characters up to the first space and stops. The non-whitespace characters get stored in the std::string. If there are only whitespace characters before the stream reaches end of file (or some error for that matter), reading fails. Thus, your program reads one "word" (in this case a number) and stops reading.
Unfortunately, you only said what you are doing and what the problems are with your approach (where you problem description failed to cover the case where reading the input fails in the first place). Here are a few things you might want to try:
If you want to read multiple words, you can do so, e.g., by reading all words:
std::vector<std::string> words;
std::copy(std::istream_iterator<std::string>(inputFile),
std::istream_iterator<std::string>(),
std::back_inserter(words));
This will read all words from inputFile and store them as a sequence of std::strings in the vector words. Since you file contains numbers you might want to replace std::string by int to read numbers in a readily accessible form.
If you want to read a line rather than a word you can use std::getline() instead:
if (std::getline(inputFile, line)) { ... }
If you want to read multiple lines, you'd put this operation into a loop: There is, unfortunately, no read-made approach to read a sequence of lines as there is for words.
If you want to read the entire file, not just the first line, into a file, you can also use std::getline() but you'd need to know about one character value which doesn't occur in your file, e.g., the null value:
if (std::getline(inputFile, text, char()) { ... }
This approach considers a "line" a sequence of characters up to a null character. You can use any other character value as well. If you can't be sure about the character values, you can read an entire file using std::string's constructor taking iterators:
std::string text((std::istreambuf_iterator<char>(inputFile)),
std::istreambuf_iterator<char>());
Note, that the extra pair of parenthesis around the first parameter is, unfortunately, necessary (if you are using C++ 2011 you can avoid them by using braces, instead of parenthesis).
Use getline to do the reading.
string numbers;
if (inputFile.is_open())//checking if open
{
getline (inputFile,numbers); //fetches entire line into string numbers
inputFile.close();
}
Your program does behave exactly as in your description : inputFile >> numbers; just extract the first integer in the input file, so if you suppress the tab, inputFile>> will extract the number 12345, not 5 five numbers [1,2,3,4,5].
a better method :
vector< int > numbers;
string fileName = "text.txt";
ifstream inputFile;
inputFile.open(fileName.c_str(),ios_base::in);
char c;
while (inputFile.good()) // loop while extraction from file is possible
{
c = inputFile.get(); // get character from file
if ( inputFile.good() and c!= '\t' and c!=' ' ) // not sure of tab and space encoding in C++
{
numbers.push_back( (int) c);
}
}
inputFile.close();

Tokenization of a text file with frequency and line occurrence. Using C++

once again I ask for help. I haven't coded anything for sometime!
Now I have a text file filled with random gibberish. I already have a basic idea on how I will count the number of occurrences per word.
What really stumps me is how I will determine what line the word is in. Gut instinct tells me to look for the newline character at the end of each line. However I have to do this while going through the text file the first time right? Since if I do it afterwords it will do no good.
I already am getting the words via the following code:
vector<string> words;
string currentWord;
while(!inputFile.eof())
{
inputFile >> currentWord;
words.push_back(currentWord);
}
This is for a text file with no set structure. Using the above code gives me a nice little(big) vector of words, but it doesn't give me the line they occur in.
Would I have to get the entire line, then process it into words to make this possible?
Use a std::map<std::string, int> to count the word occurrences -- the int is the number of times it exists.
If you need like by line input, use std::getline(std::istream&, std::string&), like this:
std::vector<std::string> lines;
std::ifstream file(...) //Fill in accordingly.
std::string currentLine;
while(std::getline(file, currentLine))
lines.push_back(currentLine);
You can split a line apart by putting it into an std::istringstream first and then using operator>>. (Alternately, you could cobble up some sort of splitter using std::find and other algorithmic primitaves)
EDIT: This is the same thing as in #dash-tom-bang's answer, but modified to be correct with respect to error handing:
vector<string> words;
int currentLine = 1; // or 0, however you wish to count...
string line;
while (getline(inputFile, line))
{
istringstream inputString(line);
string word;
while (inputString >> word)
words.push_back(pair(word, currentLine));
}
Short and sweet.
vector< map< string, size_t > > line_word_counts;
string line, word;
while ( getline( cin, line ) ) {
line_word_counts.push_back();
map< string, size_t > &word_counts = line_word_counts.back();
istringstream line_is( line );
while ( is >> word ) ++ word_counts[ word ];
}
cout << "'Hello' appears on line 5 " << line_word_counts[5-1]["Hello"]
<< " times\n";
You're going to have to abandon reading into strings, because operator >>(istream&, string&) discards white space and the contents of the white space (== '\n' or != '\n', that is the question...) is what will give you line numbers.
This is where OOP can save the day. You need to write a class to act as a "front end" for reading from the file. Its job will be to buffer data from the file, and return words one at a time to the caller.
Internally, the class needs to read data from the file a block (say, 4096 bytes) at a time. Then a string GetWord() (yes, returning by value here is good) method will:
First, read any white space characters, taking care to increment the object's lineNumber member every time it hits a \n.
Then read non-whitespace characters, putting them into the string object you'll be returning.
If it runs out of stuff to read, read the next block and continue.
If the you hit the end of file, the string you have is the whole word (which may be empty) and should be returned.
If the function returns an empty string, that tells the caller that the end of file has been reached. (Files usually end with whitespace characters, so reading whitespace characters cannot imply that there will be a word later on.)
Then you can call this method at the same place in your code as your cin >> line and the rest of the code doesn't need to know the details of your block buffering.
An alternative approach is to read things a line at a time, but all the read functions that would work for you require you to create a fixed-size buffer to read into beforehand, and if the line is longer than that buffer, you have to deal with it somehow. It could get more complicated than the class I described.