c++ appending text into a string until see a specific character - c++

I have more than one input files like this:
>1aab_
GKGDPKKPRGKMSSYAFFVQTSREEHKKKHPDASVNFSEFSKKCSERWKT
MSAKEKGKFEDMAKADKARYEREMKTYIPPKGE
>1j46_A
MQDRVKRPMNAFIVWSRDQRRKMALENPRMRNSEISKQLGYQWKMLTEAE
KWPFFQEAQKLQAMHREKYPNYKYRPRRKAKMLPK
>1k99_A
MKKLKKHPDFPKKPLTPYFRFFMEKRAKYAKLHPEMSNLDLTKILSKKYK
ELPEKKKMKYIQDFQREKQEFERNLARFREDHPDLIQNAKK
>2lef_A
MHIKKPLNAFMLYMKEMRANVVAESTLKESAAINQILGRRWHALSREEQA
KYYELARKERQLHMQLYPGWSARDNYGKKKKRKREK
Here, what I have to do:
vector <string> names;
vector <string> seqs;
names.resize(total); //"total" is already known.
seqs.resize(total);
counter=0;char input;
while ((input = myInput.get()) != EOF)
{
if(input=='>')
names[counter]= take all line (>1aab_, >1j46_A, so...)
else
untill the see next '>' append the character into sequence[counter]
counter++;
}
Finally it will be like this:
names[0]=">1aab_"
sequence[0]="GKGDPKKPRGKMSSYAFFVQTSREEHKKKHPDASVNFSEFSKKCSERWKTMSAKEKGKFEDMAKADKARYEREMKTYIPPKGE"
and so on..
I am thinking about for 2 hours and I couldn't figure out it. Can anyone help about that? Thanks in advance.

There's a few ways to solve it; I'll present some examples but I'm not testing/compiling this code, so there may be minor bugs - the logic is the important bit.
Since your pseudocode appears to be processing the input character by character, I've taken that as a requirement.
The way you seem to be thinking about it would be implemented with essentially a pair of loops - one for reading the name, the other for reading the sequence - which are enclosed in an outer loop, in order to process all records.
This would look something like the following:
// first character in file should be a '>', indicating the start
// of a record.
input = myInput.get();
if (input != '>')
{
std::cerr << "Malformed input file!" << std::endl;
return /*...*/;
}
do
{
// record name continues up until the newline
while ((input = myInput.get()) != EOF)
{
if (input == '\n' || input == '\r')
break;
names[counter].push_back(input);
}
// read sequence until we hit a '>' or EOF
while ((input = myInput.get()) != EOF)
{
if (input == '>')
{
// advance to next record number
counter++;
break;
}
sequence[counter].push_back(input);
}
} while (input != EOF && counter < total);
You'll also notice I moved the check for the initial '>' to before the loop, just as a way of ingesting (and discarding) the character, as well as a basic sanity check of the input. This is because we really use this character to mark the end of the sequence (rather than the "start of a record") - when we enter the loop, we assume we're already reading the record name.
Another way to approach it is to use a state machine. Essentially, this utilises additional variables to track the state the parser is in.
For this particular case, you only have two states: either you're reading a record name, or the sequence. So, we can just use a single boolean to track which state we're in.
Armed with the state variable, we can then make decisions about what to do with the character we just read based upon the state we're in. At the simplest level here, if we're in "read the record name" state, we add the character to the names variable, otherwise we add it to the sequence variable.
// state flag to indicate if we're currently reading a name line,
// i.e. a line starting with ">"
// This should be set true by the first record we encounter, so
// we'll set it false (to indicate we're reading a sequence) in
// order to allow us to detect bad input files.
bool reading_name = false;
// indicate we're on the first record, so we can avoid incrementing
// the record counter
bool first_record = true;
// process input character-by-character until end of file
while ((input = myInput.get()) != EOF)
{
// check for start of new record
if (input == '>')
{
// for robustness, verify we're not already reading a name,
// as this probably indicates invalid input
if (reading_name)
{
std::cerr << "Input is malformed?!" << endl;
break;
}
// switch to reading name state
reading_name = true;
// advance to next record, but only if it isn't the first record
if (first_record)
{
// disable the first_record flag, and explicitly set the
// record counter to 0.
first_record = false;
counter = 0;
}
else if (++counter >= total)
{
std::cerr << "Error: too many records!" << std::endl;
break;
}
}
// first character in file should start a new record
else if (first_record)
{
std::cerr << "Missing record start character at beginning of input!" << std::endl;
break;
}
// make sure we are processing a valid record number
else if (counter >= total)
{
std::cerr << "Invalid record number!" << std::endl;
break;
}
// continue reading the name
else if (reading_name)
{
// check if we've reached the end of the line; you
// may also want/need to check for \r if your input
// files may have Windows-style line endings
if (input == '\n')
{
// switch to reading sequence state
reading_name = false;
}
else
{
// add character to current name
names[counter].push_back(input);
}
}
// continue reading the sequence
else
{
// you might need to handle line ending characters here,
// maybe just skipping them?
// add character to current sequence
sequence[counter].push_back(input);
}
}
This adds a fair amount of complexity, which is of questionable value for this particular exercise, but does make adding additional states easier in future. It also has the benefit of only a single place in the code where I/O is done, which reduces the chances of errors (not checking for EOF, overflow array bounds, etc.).
In this case, we're actually using the '>' character as an indicator that a new record is starting, so we add a bit of extra logic to make sure that all works properly with the record counter. You could also just use a signed integer for your counter variable and start it at -1, so it will increment to 0 at the start of the first record, but using signed variables to index into arrays isn't a good idea.
There are more ways to approach this problem, but hopefully this gives you somewhere to start on your own solution.

Related

seekg() seeminlgy skipping characters past intended position C++

I've been having an issue with parsing through a file and the use of seekg(). Whenever a certain character is reached in a file, I want to loop until a condition is met. The loops works fine for the first iteration but, when it loops back, the file seemingly skips a character and causes the loop to not behave as expected.
Specifically, the loop works fine if it is all contained in one line in the file, but fails when there is at least one newline within the loop in the file.
I should mention I am working on this on Windows, and I feel like the issue arises from how Windows ends lines with \r\n.
Using seekg(-2, std::ios::cur) after looping back fixes the issue when the beginning loop condition is immediately followed by a newline, but does not work for a loop contained in the same line.
The code is structured by having an Interpreter class hold the file pointer and relevant variables, such as the current line and column. This class also has a functional map defined like so:
// Define function type for command map
typedef void (Interpreter::*function)(void);
// Map for all the commands
std::map<char, function> command_map = {
{'+', increment_cell},
{'-', decrement_cell},
{'>', increment_ptr},
{'<', decrement_ptr},
{'.', output},
{',', input},
{'[', begin_loop},
{']', end_loop},
{' ', next_col},
{'\n', next_line}
};
It iterates through each character, deciding if it has functionality or not in the following function:
// Iterating through the file
void Interpreter::run() {
char current_char;
if(!this->file.eof() && this->file.good()) {
while(this->file.get(current_char)) {
// Make sure character is functional command (ie not a comment)
if(this->command_map.find(current_char) != this->command_map.end()) {
// Print the current command if in debug mode
if(this->debug_mode && current_char != ' ' && current_char != '\n') {
std::cout << this->filename << ":" << this->line << ":"
<< this->column << ": " << current_char << std::endl;
}
// Execute the command
(this->*(command_map[current_char]))();
}
// If it is not a functional command, it is a comment. The rest of the line is ignored
else{
std::string temp_line = "";
std::getline(file, temp_line);
this->line++;
this->column = 0;
}
this->temp_pos = file.tellg();
this->column++;
}
}
else {
std::cout << "Unable to find file " << this->filename << "." << std::endl;
exit(1);
}
file.close();
}
The beginning of the loop (signaled by a '[' char) sets the beginning loop position to this->temp_pos:
void Interpreter::begin_loop() {
this->loop_begin_pointer = this->temp_pos;
this->loop_begin_line = this->line;
this->loop_begin_col = this->column;
this->run();
}
When the end of the loop (signaled by a ']' char) is reached, if the condition for ending the loop is not met, the file cursor position is set back to the beginning of the loop:
void Interpreter::end_loop() {
// If the cell's value is 0, we can end the loop
if(this->char_array[this->char_ptr] == 0) {
this->loop_begin_pointer = -1;
}
// Otherwise, go back to the beginning of the loop
if(this->loop_begin_pointer > -1){
this->file.seekg(this->loop_begin_pointer, std::ios::beg);
this->line = this->loop_begin_line;
this->column = this->loop_begin_col;
}
}
I was able to put in debugging information and can show stack traces for further clarity on the issue.
Stack trace with one line loop ( ++[->+<] ):
+ + [ - > + < ] [ - > + < ] done.
This works as intended.
Loop with multiple lines:
++[
-
>
+<]
Stack trace:
+ + [ - > + < ] > + < ] <- when it looped back, it "skipped" '[' and '-' characters.
This loops forever since the end condition is never met (ie the value of the first cell is never 0 since it never gets decremented).
Oddly enough, the following works:
++[
-
>+<]
It follows the same stack trace as the first example. This working and the last example not working is what has made this problem hard for me to solve.
Please let me know if more information is needed about how the program is supposed to work or its outputs. Sorry for the lengthy post, I just want to be as clear as possible.
Edit 1:
The class has the file object as std::ifstream file;.
In the constructor, it is opened with
this->file.open(filename), where filename is passed in as an argument.
For a file stream, seekg is ultimately defined in terms of fseek from the C standard library. The C standard has this to say:
7.21.9.2/4 For a text stream, either offset shall be zero, or offset shall be a value returned by an earlier successful call to the ftell function on a stream associated with the same file and whence shall be SEEK_SET.
So for a file opened in text mode, you can't do any arithmetic on offsets. You can rewind to the beginning, position at the end, or return to the position you were at previously and captured with tellg (which ultimately calls ftell). Anything else would exhibit undefined behavior.

C++ cin.getline ignore empty lines

I have a program that is given a file of a shape and reads it line by line to be able to put the shape into a 2D array. It's of an unknown size, so I have to count the rows as we go. Everything works fine, except I'm having trouble getting it to stop when there are empty lines trailing the input.
The piece of code in question is below:
while(cin.eof() != true){
getline(cin, input);
shape = shape + input;
rows++;
}
For example this will count 3 rows:
===
===
===
and this counts 4:
===
===
===
(empty line)
I need my program to ignore the empty lines, regardless of how many there are.
I've tried quite a few different things such as
if (!input.empty()){
shape = shape + input;
rows++;
}
or
if (input != " " && input[0] != '\0' && input[0] != '\n'){
shape = shape + input;
rows++;
}
These work if there is only one empty line, but if I had multiple empty lines it will only not count the very last one.
Shape and Input are both strings.
You have made a good choice to read a line at a time with getline(), but you are controlling your read loop incorrectly. See Why !.eof() inside a loop condition is always wrong.
Instead always control the continuation of your read loop based on the stream state resulting from the read function itself. In your case, you ignore the state after getline() and assume you have valid input -- which you won't when you read EOF. Why?
When you read the last line in your file, you will have read input, but the eofbit will not yet be set as you haven't reached the end-of-file yet. You loop checking cin.eof() != true (which it isn't yet) and then call getline (cin, input) BAM! Nothing was read and eofbit is now set, yet you blindly assign shape = shape + input; even though your read with getline() failed.
Your second issue is how do you skip empty lines? Simple. If input.size() == 0 the line was empty. To "skip" empty lines, just continue and read the next. To "quit reading" when the first empty line is reached, replace continue with break.
A short example incorporating the changes above would be:
#include <iostream>
#include <string>
int main (void) {
std::string input{}, shape{};
std::size_t rows = 0;
while (getline(std::cin, input)) { /* control loop with getline */
if (input.size() == 0) /* if .size() == 0, empty line */
continue; /* get next */
shape += input; /* add input to shape */
rows++; /* increment rows */
}
std::cout << rows << " rows\n" << shape << '\n';
}
Also see: Why is “using namespace std;” considered bad practice? and avoid developing habits that will be harder to break later.
Example Use/Output
$ cat << eof | ./bin/inputshape
> ===
> ===
> ===
> eof
3 rows
=========
With a blank line at the end
$ cat << eof | ./bin/inputshape
> ===
> ===
> ===
>
> eof
3 rows
=========
Or with multiple blank lines:
$ cat << eof | ./bin/inputshape
> ===
> ===
> ===
>
>
> eof
3 rows
=========
(note: the eof used in input above is simply the heredoc sigil marking the end of input and has no independent significance related to the stream state eofbit or .eof(). It could just as well have been banannas, but EOF or eof are generally/traditionally used. Also, if you are not using bash or another shell supporting a heredoc, just redirect a file to the program, e.g. ./bin/inputshape < yourfile)
Look things over and let me know if you have further questions.
Edit Based On No Use of continue or break
If you can't use continue or break, then just turn the conditional around and only add to shape if input.size() != 0. For example:
while (getline(std::cin, input)) { /* control loop with getline */
if (input.size() != 0) { /* if .size() != 0, good line */
shape += input; /* add input to shape */
rows++; /* increment rows */
}
}
Exact same thing, just written a bit differently. Let me know if that works for you.

how to discard from streams? .ignore() doesnt work for this purpose, any other methods?

I have a lack of understanding about streams. The idea is, to read a file to the ifstream and then working with it. Extract Data from the stream to a string, and discard the part which is now in a string from the stream. Is that possible? Or how to handle those problems?
The following method, is for inserting a file which is properly read by the ifstream. (its a text file, containing informations about "Lost" episodes, its an episodeguide. It works fine, for one element of the class episodes. Every time i instantiate a episode file, i want to check the stream of that file, discard the informations about one episode (its indicated by "****", then the next episode starts) and process the informations discarded in a string. If I create a new object of Episode I want to discard the next informations about the episodes after "****" to the next "****" and so on.
void Episode::read(ifstream& in) {
string contents((istreambuf_iterator<char>(in)), istreambuf_iterator<char>());
size_t episodeEndPos = contents.find("****");
if ( episodeEndPos == -1) {
in.ignore(numeric_limits<char>::max());
in.clear(), in.sync();
fullContent = contents;
}
else { // empty stream for next episode
in.ignore(episodeEndPos + 4);
fullContent = contents.substr(0, episodeEndPos);
}
// fill attributes
setNrHelper();
setTitelHelper();
setFlashbackHelper();
setDescriptionHelper();
}
I tried it with inFile >> words (to read the words, this is a way to get the words out of the stream) another way i was thinking about is, to use .ignore (to ignore an amount of characters in the stream). But that doesnt work as intended. Sorry for my bad english, hopefully its clear what i want to do.
If your goal is at each call of Read() to read the next episode and advance in the file, then the trick is to to use tellg() and seekg() to bookmark the position and update it:
void Episode::Read(ifstream& in) {
streampos pos = in.tellg(); // backup current position
string fullContent;
string contents((istreambuf_iterator<char>(in)), istreambuf_iterator<char>());
size_t episodeEndPos = contents.find("****");
if (episodeEndPos == -1) {
in.ignore(numeric_limits<char>::max());
in.clear(), in.sync();
fullContent = contents;
}
else { // empty stream for next episode
fullContent = contents.substr(0, episodeEndPos);
in.seekg(pos + streamoff(episodeEndPos + 4)); // position file at next episode
}
}
In this way, you can call several time your function, every call reading the next episode.
However, please note that your approach is not optimised. When you construct your contents string from a stream iterator, you load the full rest of the file in the memory, starting at the current position in the stream. So here you keep on reading and reading again big subparts of the file.
Edit: streamlined version adapted to your format
You just need to read the line, check if it's not a separator line and concatenate...
void Episode::Read(ifstream& in) {
string line;
string fullContent;
while (getline(in, line) && line !="****") {
fullContent += line + "\n";
}
cout << "DATENSATZ: " << fullContent << endl; // just to verify content
// fill attributes
//...
}
The code you got reads the entire stream in one go just to use some part of the read text to initialize an object. Imagining a gigantic file that is almost certainly a bad idea. The easier approach is to just read until the end marker is found. In an ideal world, the end marker is easily found. Based on comments it seems to be on a line of its own which would make it quite easy:
void Episode::read(std::istream& in) {
std::string text;
for (std::string line; in >> line && line != "****"; ) {
text += line + "\n";
}
fullContent = text;
}
If the separate isn't on a line of its own, you could use code like this instead:
void Episode::read(std::istream& in) {
std::string text;
for (std::istreambuf_iterator<char> it(in), end; it != end; ++it) {
text.push_back(*it);
if (*it == '*' && 4u <= text.size() && text.substr(text.size() - 4) == "****") {
break;
}
if (4u <= text.size() && text.substr(text.size() - 4u) == "****") {
text.resize(text.size() - 4u);
}
fullContent = text;
}
Both of these approaches would simple read the file from start to end and consume the characters to be extracted in the process, stopping as soon as reading of one record is done.

Detecting space in a file in c++

Hi i was just wondering if anybody could help me i am reading characters from a file then inserting them into a map i have the code working i was just wondering how do i detect if a space is in the file cause i need to store the amount of times a space occurred in a file any help would be great thanks.
map<char, int> treeNodes; //character and the frequency
ifstream text("test.txt");
while(!text.eof())
{
text >> characters;
//getline(text,characters);
cout << characters;
if(treeNodes.count(characters) == 0)
{
if(isspace (characters))
{
cout << "space" << endl;
}
else
treeNodes.insert(pair<char,int>(characters,1));
}
else
{
treeNodes[characters] += 1;
}
}
Formatted input, i.e. when using the right shift operator>>() skips leading whitespace by default. You can turn this off using std::noskipws but depending on what sort of things you want to read it won't be a very happy experience. The best approach is probably using unformatted input, i.e. something like std::getline() and split the line on space within the program.
If you just want to count the number of times any particular character occurred, you probably want to use std::istreambuf_iterator<char> and just iterate over the content of the stream (this code also omits some other unnecessary clutter):
for (std::istreambuf_iterator<char> it(text), end(); it != end; ++it) {
++treeNodes[*it];
}
BTW, you never want to use the result of eof() for something different than determining whether the last read failed because the stream has reached its end.
couldn't you just cast the char to an int and test if it is equal to the ascii value of a space?

How to know if the next character is EOF in C++

I'm need to know if the next char in ifstream is the end of file. I'm trying to do this with .peek():
if (file.peek() == -1)
and
if (file.peek() == file.eof())
But neither works. There's a way to do this?
Edit: What I'm trying to do is to add a letter to the end of each word in a file. In order to do so I ask if the next char is a punctuation mark, but in this way the last word is left without an extra letter. I'm working just with char, not string.
istream::peek() returns the constant EOF (which is not guaranteed to be equal to -1) when it detects end-of-file or error. To check robustly for end-of-file, do this:
int c = file.peek();
if (c == EOF) {
if (file.eof())
// end of file
else
// error
} else {
// do something with 'c'
}
You should know that the underlying OS primitive, read(2), only signals EOF when you try to read past the end of the file. Therefore, file.eof() will not be true when you have merely read up to the last character in the file. In other words, file.eof() being false does not mean the next read operation will succeed.
This should work:
if (file.peek(), file.eof())
But why not just check for errors after making an attempt to read useful data?
file.eof() returns a flag value. It is set to TRUE if you can no longer read from file. EOF is not an actual character, it's a marker for the OS. So when you're there - file.eof() should be true.
So, instead of if (file.peek() == file.eof()) you should have if (true == file.eof()) after a read (or peek) to check if you reached the end of file (which is what you're trying to do, if I understand correctly).
For a stream connected to the keyboard the eof condition is that I intend to type Ctrl+D/Ctrl+Z during the next input.
peek() is totally unable to see that. :-)
Usually to check end of file I used:
if(cin.fail())
{
// Do whatever here
}
Another such way to implement that would be..
while(!cin.fail())
{
// Do whatever here
}
Additional information would be helpful so we know what you want to do.
There is no way of telling if the next character is the end of the file, and trying to do so is one of the commonest errors that new C and C++ programmers make, because there is no end-of-file character in most operating systems. What you can tell is that reading past the current position in a stream will read past the end of file, but this is in general pretty useless information. You should instead test all read operations for success or failure, and act on that status.
You didn't show any code you are working with, so there is some guessing on my part. You don't usually need low level facilities (like peek()) when working with streams. What you probably interested in is istream_iterator. Here is an example,
cout << "enter value";
for(istream_iterator<double> it(cin), end;
it != end; ++it)
{
cout << "\nyou entered value " << *it;
cout << "\nTry again ...";
}
You can also use istreambuf_iterator to work on buffer directly:
cout << "Please, enter your name: ";
string name;
for(istreambuf_iterator<char> it(cin.rdbuf()), end;
it != end && *it != '\n'; ++it)
{
name += *it;
}
cout << "\nyour name is " << name;
just use this code in macosx
if (true == file.eof())
it work for me in macosx!