Delimiter extracting out a range of data - c++

From textfile(Range.txt),
Range:1:2:3:4:5:6....:n:
There's a list of result, but i only need to extract digit 1 2 3 to the last digit, but sometimes the last digit may varies
so i read in the file, extract out the delimiter and push it into a vector.
ifstream myfile;
myfile.open("Range.txt");
istringstream range(temp);
getline(range, line,':');
test.push_back(line);
How do i capture all the value? I have this and it only capture one digit.

I have this and it only capture one digit
You need to use a loop :-
while (getline(range, line, ':'))
test.push_back(line);
Later using the vector you can process it to get the integers only.

Plase, read that: Parsing and adding string to vector.
You just have to change the delimiter (from whitespace to :).
std::ifstream infile(filename.c_str());
std::string line;
if (infile.is_open())
{
std::cout << "Well done! File opened successfully." << std::endl;
while (std::getline(infile, line, ':'))
{
std::istringstream iss(line);
std::vector<std::string> tokens{std::istream_iterator<std::string>(iss),std::istream_iterator<std::string>()};
// Now, tokens vector stores all data.
// There is an item for each value read from the current line.
}
}

Related

How do I deal with a carriage return line feed when trying to read in file

So I am working on a file that I need to read in which contains both commas separating words and carriage return linefeed at the end of each line and I can't figure out a way to handle it. I am trying to read in each word before the comma and put it into the a vector until it hits the carriage return line feed but I am having problems.
Here is my text file (as seen on notepad++ so you can see the symbols. on the actual text, the things inside [] don't appear)
microwave,lamp,guitar,couch,bed,dog,cat[cr][lf]
P1:microwave,couch,bed,dog,chair,bookcase,fish[cr][lf]
I have tried multiple solutions, but nothing seems to work. Here is what I have tried so far. but it obviously isn't working. I have seen some users suggest using substring to somehow read out the comma, and read in the words but I am not sure how to do that. I couldn't find a good tutorial or example of one. In my head, I have the algorithm(or at least, steps on how to go about it), but i am not sure how to go about implementing it.
Import file (istream)
Read until comma, take string and place it in vector1 (getline, input, ,), vector.push_back(input)
Repeat previous step until you reach \cr\lf stop reading. (getline(input, '/r'))
move on to the next line
Read until comma, take string and place it in vector2
Repeat
Read the line until /cr/lf
Here is the code I put in practice using part of the above steps i made.
string input;
vector<string> v1;
vector<string> v2;
ifstream infile;
infile.open("example.txt");
while(getline(infile, input)) //read until end of line
{
while(getline(infile, input, '\r')) //read until it reaches a carriage return
{
while(getline(infile, input, ',')) // read until it reaches a comma
{
v1.push_back(input); //take the word and put in vector.
}
}
}
infile.close();
Any help would be appreciated.
Edit: I forgot to mention. When I used this code, it seemed to not import anything into the vectors. I am sure all the words got lost somewhere in the getline functions, but I don't know how to just read up to comma and carriage return line feed without using it.
You should use getline() to get a whole line first. It should handle carriage returns for you. Then, put the result into a stringstream and use getline() on it to separate the line at the commas.
My code that reads input into a vector of vectors:
#include <fstream>
#include <iostream>
#include <sstream>
#include <vector>
int main()
{
std::ifstream fin("input.txt");
std::vector<std::vector<std::string>> result;
for(std::string line; std::getline(fin, line);)
{
result.emplace_back();
std::stringstream ss(line);
for(std::string word; std::getline(ss, word, ',');)
{
result.back().push_back(word);
}
}
for(const auto &i : result)
{
for(const auto &j : i)
{
std::cout << j << ' ';
}
std::cout << '\n';
}
}
You can modify it to read into two vectors by just removing the outer loop and use two separate loops for each of the two vectors/lines.
In your code, you first have a loop that reads line by line until the end of the file. After you read a line, you have a loop that reads until a '\r', which as far as I know does not occur in a normal text file. Even if there are '\r's in the file, you would be overwriting what you just read in from the outer loop. Same thing with the loop inside that.
Were you taught that while(getline(fin, str)) reads from a file without knowing how it works?

How do you know when an input stream has reached the last word in a line?

I'm reading input from an ifstream in C++. The input comes as a bunch of words separated by tabs, so I'm reading in something like "word1 word2 word3" as stream >> w1 >> w2 >> w3; I need to know when I've reached the final word in the line, so how would I go about that? The number of words is variable, but it should be always even. Also, will the last word contain the \n, or will the \n be the last word?
The simplest (and usual) solution is to read lines using
std::getline, and then parse the line using
std::istringstream:
std::string line;
while ( std::getline( std::cin, line ) ) {
std::istringstream s( line );
std::vector<std::string> words;
std::string word;
while ( s >> word ) {
words.push_back( word );
}
// ...
}
Read from ifstream and push it to vector using algorithm's std::copy like following:
std::ifstream stream("input.txt");
std::vector<std::string> vec;
//replace stream with std::cin for reading from console
std::copy(std::istream_iterator<std::string>(stream),
std::istream_iterator<std::string>(),
std::back_inserter(vec));
This needs a EOF for termination. Ctrl+Z or Ctrl+D depeding on windows or linux.
As suggested, you can use C++11's initializer list as follows:
std::vector<std::string> vec{std::istream_iterator<std::string>{stream},
std::istream_iterator<std::string>{}};
I would advise to mmap() the file. After that, you have the entire file in your virtual address space and can examine it at leisure like a big array of characters. Especially for such operations, where you have to go back a few steps, this is the most appropriate approach. As an added bonus, it's also the fastest...

Reading a text file in c++

string numbers;
string fileName = "text.txt";
ifstream inputFile;
inputFile.open(fileName.c_str(),ios_base::in);
inputFile >> numbers;
inputFile.close();
cout << numbers;
And my text.txt file is:
1 2 3 4 5
basically a set of integers separated by tabs.
The problem is the program only reads the first integer in the text.txt file and ignores the rest for some reason. If I remove the tabs between the integers it works fine, but with tabs between them, it won't work. What causes this? As far as I know it should ignore any white space characters or am I mistaken? If so is there a better way to get each of these numbers from the text file?
When reading formatted strings the input operator starts with ignoring leading whitespace. Then it reads non-whitespace characters up to the first space and stops. The non-whitespace characters get stored in the std::string. If there are only whitespace characters before the stream reaches end of file (or some error for that matter), reading fails. Thus, your program reads one "word" (in this case a number) and stops reading.
Unfortunately, you only said what you are doing and what the problems are with your approach (where you problem description failed to cover the case where reading the input fails in the first place). Here are a few things you might want to try:
If you want to read multiple words, you can do so, e.g., by reading all words:
std::vector<std::string> words;
std::copy(std::istream_iterator<std::string>(inputFile),
std::istream_iterator<std::string>(),
std::back_inserter(words));
This will read all words from inputFile and store them as a sequence of std::strings in the vector words. Since you file contains numbers you might want to replace std::string by int to read numbers in a readily accessible form.
If you want to read a line rather than a word you can use std::getline() instead:
if (std::getline(inputFile, line)) { ... }
If you want to read multiple lines, you'd put this operation into a loop: There is, unfortunately, no read-made approach to read a sequence of lines as there is for words.
If you want to read the entire file, not just the first line, into a file, you can also use std::getline() but you'd need to know about one character value which doesn't occur in your file, e.g., the null value:
if (std::getline(inputFile, text, char()) { ... }
This approach considers a "line" a sequence of characters up to a null character. You can use any other character value as well. If you can't be sure about the character values, you can read an entire file using std::string's constructor taking iterators:
std::string text((std::istreambuf_iterator<char>(inputFile)),
std::istreambuf_iterator<char>());
Note, that the extra pair of parenthesis around the first parameter is, unfortunately, necessary (if you are using C++ 2011 you can avoid them by using braces, instead of parenthesis).
Use getline to do the reading.
string numbers;
if (inputFile.is_open())//checking if open
{
getline (inputFile,numbers); //fetches entire line into string numbers
inputFile.close();
}
Your program does behave exactly as in your description : inputFile >> numbers; just extract the first integer in the input file, so if you suppress the tab, inputFile>> will extract the number 12345, not 5 five numbers [1,2,3,4,5].
a better method :
vector< int > numbers;
string fileName = "text.txt";
ifstream inputFile;
inputFile.open(fileName.c_str(),ios_base::in);
char c;
while (inputFile.good()) // loop while extraction from file is possible
{
c = inputFile.get(); // get character from file
if ( inputFile.good() and c!= '\t' and c!=' ' ) // not sure of tab and space encoding in C++
{
numbers.push_back( (int) c);
}
}
inputFile.close();

trying to read a text file data into an array to be manipulated then spit back out

My aim is to take the data from the file, split it up and place them into an array for future modification.
The is what the data looks like:
course1-Maths|course1-3215|number-3|professor-Mark
sam|scott|12|H|3.4|1/11/1991|3/15/2012
john|rummer|12|A|3|1/11/1982|7/15/2004
sammy|brown|12|C|2.4|1/11/1991|4/12/2006
end_Roster1|
I want to take maths, 3215, 3 and Mark and put into an array,
then sam scott 12 H 3.4 1/11/1991 3/15/2012.
This is what I have so far:
infile.open("file.txt", fstream::in | fstream::out | fstream::app);
while(!infile.eof())
{
while ( getline(infile, line, '-') )
{
if ( getline(infile, line, '|') )
{
r = new data;
r->setRcourse_name(line);
r->setRcourse_code(3);//error not a string
r->setRcredit(3);//error not a string pre filled
r->setRinstructor(line);
cout << line << endl;
}
}
}
Then I tried to view it nothing is stored.
Firstly line 1 is very different to the remaining lines so you need a different parsing algorithm for them. Something like:
bool first = true;
while(!infile.eof())
{
if (first)
{
// read header line
first = false;
}
else
{
// read lines 2..n
}
}
Reading lines 2..n can be handled by making a stringstream for each line, then passing that in to another getline using '|' as a delimeter, to get each token (sam, scott, 12, H, 3.4, 1/11/1991, 3/15/2012)
if (getline(infile, line, '\n'))
{
stringstream ssline(line);
string token;
while (getline(ssline, token, '|'))
vector.push_back(token);
}
Reading the header line takes the exact same concept one step further where each token is then further parsed with another getline with '-' as a delimiter. You'll ignore each time the first tokens (course1, course1, number, professor) and use the second tokens (Maths, 3215, 3, Mark).
You are completely ignoring the line that you get inside the condition of the nested while loop. You should call getline from a single spot in your while loop, and then examine its content using a sequence of if-then-else conditions.

Tokenization of a text file with frequency and line occurrence. Using C++

once again I ask for help. I haven't coded anything for sometime!
Now I have a text file filled with random gibberish. I already have a basic idea on how I will count the number of occurrences per word.
What really stumps me is how I will determine what line the word is in. Gut instinct tells me to look for the newline character at the end of each line. However I have to do this while going through the text file the first time right? Since if I do it afterwords it will do no good.
I already am getting the words via the following code:
vector<string> words;
string currentWord;
while(!inputFile.eof())
{
inputFile >> currentWord;
words.push_back(currentWord);
}
This is for a text file with no set structure. Using the above code gives me a nice little(big) vector of words, but it doesn't give me the line they occur in.
Would I have to get the entire line, then process it into words to make this possible?
Use a std::map<std::string, int> to count the word occurrences -- the int is the number of times it exists.
If you need like by line input, use std::getline(std::istream&, std::string&), like this:
std::vector<std::string> lines;
std::ifstream file(...) //Fill in accordingly.
std::string currentLine;
while(std::getline(file, currentLine))
lines.push_back(currentLine);
You can split a line apart by putting it into an std::istringstream first and then using operator>>. (Alternately, you could cobble up some sort of splitter using std::find and other algorithmic primitaves)
EDIT: This is the same thing as in #dash-tom-bang's answer, but modified to be correct with respect to error handing:
vector<string> words;
int currentLine = 1; // or 0, however you wish to count...
string line;
while (getline(inputFile, line))
{
istringstream inputString(line);
string word;
while (inputString >> word)
words.push_back(pair(word, currentLine));
}
Short and sweet.
vector< map< string, size_t > > line_word_counts;
string line, word;
while ( getline( cin, line ) ) {
line_word_counts.push_back();
map< string, size_t > &word_counts = line_word_counts.back();
istringstream line_is( line );
while ( is >> word ) ++ word_counts[ word ];
}
cout << "'Hello' appears on line 5 " << line_word_counts[5-1]["Hello"]
<< " times\n";
You're going to have to abandon reading into strings, because operator >>(istream&, string&) discards white space and the contents of the white space (== '\n' or != '\n', that is the question...) is what will give you line numbers.
This is where OOP can save the day. You need to write a class to act as a "front end" for reading from the file. Its job will be to buffer data from the file, and return words one at a time to the caller.
Internally, the class needs to read data from the file a block (say, 4096 bytes) at a time. Then a string GetWord() (yes, returning by value here is good) method will:
First, read any white space characters, taking care to increment the object's lineNumber member every time it hits a \n.
Then read non-whitespace characters, putting them into the string object you'll be returning.
If it runs out of stuff to read, read the next block and continue.
If the you hit the end of file, the string you have is the whole word (which may be empty) and should be returned.
If the function returns an empty string, that tells the caller that the end of file has been reached. (Files usually end with whitespace characters, so reading whitespace characters cannot imply that there will be a word later on.)
Then you can call this method at the same place in your code as your cin >> line and the rest of the code doesn't need to know the details of your block buffering.
An alternative approach is to read things a line at a time, but all the read functions that would work for you require you to create a fixed-size buffer to read into beforehand, and if the line is longer than that buffer, you have to deal with it somehow. It could get more complicated than the class I described.