seekg() not working as expected - c++

I have a small program, that is meant to copy a small phrase from a file, but it appears that I am either misinformed as to how seekg() works, or there is a problem in my code preventing the function from working as expected.
The text file contains:
//Intro
previouslyNoted=false
The code is meant to copy the word "false" into a string
std::fstream stats("text.txt", std::ios::out | std::ios::in);
//String that will hold the contents of the file
std::string statsStr = "";
//Integer to hold the index of the phrase we want to extract
int index = 0;
//COPY CONTENTS OF FILE TO STRING
while (!stats.eof())
{
static std::string tempString;
stats >> tempString;
statsStr += tempString + " ";
}
//FIND AND COPY PHRASE
index = statsStr.find("previouslyNoted="); //index is equal to 8
//Place the get pointer where "false" is expected to be
stats.seekg(index + strlen("previouslyNoted=")); //get pointer is placed at 24th index
//Copy phrase
stats >> previouslyNotedStr;
//Output phrase
std::cout << previouslyNotedStr << std::endl;
But for whatever reason, the program outputs:
=false
What I expected to happen:
I believe that I placed the get pointer at the 24th index of the file, which is where the phrase "false" begins. Then the program would've inputted from that index onward until a space character would have been met, or the end of the file would have been met.
What actually happened:
For whatever reason, the get pointer started an index before expected. And I'm not sure as to why. An explanation as to what is going wrong/what I'm doing wrong would be much appreciated.
Also, I do understand that I could simply make previouslyNotedStr a substring of statsStr, starting from where I wish, and I've already tried that with success. I'm really just experimenting here.

The VisualC++ tag means you are on windows. On Windows the end of line takes two characters (\r\n). When you read the file in a string at a time, this end-of-line sequence is treated as a delimiter and you replace it with a single space character.
Therefore after you read the file you statsStr does not match the contents of the file. Every where there is a new line in the file you have replaced two characters with one. Hence when you use seekg to position yourself in the file based on numbers you got from the statsStr string, you end up in the wrong place.
Even if you get the new line handling correct, you will still encounter problems if the file contains two or more consecutive white space characters, because these will be collapsed into a single space character by your read loop.

You are reading the file word by word. There are better methods:
while (getline(stats, statsSTr)
{
// An entire line is read into statsStr.
std::string::size_type posn = statsStr.find("previouslyNoted=");
// ...
}
By reading entire text lines into a string, there is no need to reposition the file.
Also, there is a white-space issue when reading by word. This will affect where you think the text is in the file. For example, white space is skipped, and there is no telling how many spaces, newlines or tabs were skipped.
By the way, don't even think about replacing the text in the same file. Replacement of text only works if the replacement text has the same length as the original text in the file. Write to a new file instead.
Edit 1:
A better method is to declare your key strings as array. This helps with positioning pointers within a string:
static const char key_text[] = "previouslyNoted=";
while (getline(stats, statsStr))
{
std::string::size_type key_position = statsStr.find(key_text);
std::string::size_type value_position = key_position + sizeof(key_text) - 1; // for the nul terminator.
// value_position points to the character after the '='.
// ...
}
You may want to save programming type by making your data file conform to an existing format, such as INI or XML, and using appropriate libraries to parse them.

Related

Reading from a file without skipping whitespaces

I'm trying to make a code which would change one given word from a file, and change it into another one. The program works in a way that it copies word by word, if it's normal word it just writes it into the output file, and if it's the one i need to change it writes the one i need to change to. However, I've enountered a problem. Program is not putting whitespaces where they are in the input file. I don't know the solution to this problem, and I have no idea if I can use noskipws since I wouldn't know where the file ends.
Please keep in mind I'm a complete newbie and I have no idea how things work. I don't know if the tags are visible enough, so I will mention again that I use C++
Since each reading of word is ended with either a whitespace or end of file, you could simply check whether the thing which stop your reading is end of file, or otherwise a whitespace:
if ( reached the end of file ) {
// What I have encountered is end of file
// My job is done
} else {
// What I have encountered is a whitespace
// I need to output a whitespace and back to work
}
And the problem here is how to check the eof(end of file).
Since you are using ifstream, things will be quite simple.
When a ifstream reach the end of file (all the meaningful data have been read), the ifstream::eof() function will return true.
Let's assume the ifstream instance that you have is called input.
if ( input.eof() == true ) {
// What I have encountered is end of file
// My job is done
} else {
// What I have encountered is a whitespace
// I need to output a whitespace and back to work
}
PS : ifstream::good() will return false when it reaches the eof or an error occurs. Checking whether input.good() == false instead can be a better choice here.
First I would advise you not to read and write in the same file (at least not during reading) because it will make your program much more difficult to write/read.
Second if you want to read all whitespaces easiest is to read whole line with getline().
Program that you can use for modifying words from one file to another could look something like following:
void read_file()
{
ifstream file_read;
ofstream file_write;
// File from which you read some text.
file_read.open ("read.txt");
// File in which you will save modified text.
file_write.open ("write.txt");
string line;
// Word that you look for to modify.
string word_to_modify = "something";
string word_new = "something_new";
// You need to look in every line from input file.
// getLine() goes from beginning of the file to the end.
while ( getline (file_read,line) ) {
unsigned index = line.find(word_to_modify);
// If there are one or more occurrence of target word.
while (index < line.length()) {
line.replace(index, word_to_modify.length(), word_new);
index = line.find(word_to_modify, index + word_new.length());
}
cout << line << '\n';
file_write << line + '\n';
}
file_read.close();
file_write.close();
}

C++ cout char 'return' character from file appears twice

I'm trying to create a program that encrypts files based on how Nazi Germany's Enigma machine worked, but without the flaw :P.
I have a function that gets a character at n point in a file, but when it returns a return character and I cout << it, it's like it hit enter twice.
IE if I loop cout-ing from i++ points in a file the individual lines in the terminal appear separated
by more returns
than one.
Here's the function:
char charN(string pathOf, int pointIn){
char r = NULL;
// NULL so I can tell when it doesn't return a character.
int sizeOf; //to store the found size of the file.
ifstream cf; //to store the Character Found.
ifstream siz; //used later to get the size of the file
siz.open(pathOf.c_str());
siz.seekg(0, std::ios::end);
sizeOf = siz.tellg(); // these get the length of the file and put it in sizeOf.
cf.open(pathOf.c_str());
if(cf.is_open() && pointIn < sizeOf){ //if not open, or if the character to get is farther out than the size of the file, let the function return the error condition: 'NULL'.
cf.seekg(pointIn); // move to the point in the file where the character should be, get it, and get out.
cf.get(r);
cf.close();
}
return r;
}
It works correctly if I use cout << '\n', but what's different about returns from a file and '\n'?
Or is there something else I'm missing?
I've been googling about but I can't find anything remotely similar to my problem, thanks in advance.
I'm using Code::Blocks 13.12 as my compiler if that matters.
Is this is on a windows machine? In windows new lines in text files are representing by \r\n.
\r = carriage return
\n = line feed
It's possible that you are couting each one separately and that the output buffer is creating a new line for each one.

Count the number of unique words and occurrence of each word

CSCI-15 Assignment #2, String processing. (60 points) Due 9/23/13
You MAY NOT use C++ string objects for anything in this program.
Write a C++ program that reads lines of text from a file using the ifstream getline() method, tokenizes the lines into words ("tokens") using strtok(), and keeps statistics on the data in the file. Your input and output file names will be supplied to your program on the command line, which you will access using argc and argv[].
You need to count the total number of words, the number of unique words, the count of each individual word, and the number of lines. Also, remember and print the longest and shortest words in the file. If there is a tie for longest or shortest word, you may resolve the tie in any consistent manner (e.g., use either the first one or the last one found, but use the same method for both longest and shortest). You may assume the lines comprise words (contiguous lower-case letters [a-z]) separated by spaces, terminated with a period. You may ignore the possibility of other punctuation marks, including possessives or contractions, like in "Jim's house". Lines before the last one in the file will have a newline ('\n') after the period. In your data files, omit the '\n' on the last line. You may assume that the lines will be no longer than 100 characters, the individual words will be no longer than 15 letters and there will be no more than 100 unique words in the file.
Read the lines from the input file, and echo-print them to the output file. After reaching end-of-file on the input file (or reading a line of length zero, which you should treat as the end of the input data), print the words with their occurrence counts, one word/count pair per line, and the collected statistics to the output file. You will also need to create other test files of your own. Also, your program must work correctly with an EMPTY input file – which has NO statistics.
Test file looks like this (exactly 4 lines, with NO NEWLINE on the last line):
the quick brown fox jumps over the lazy dog.
now is the time for all good men to come to the aid of their party.
all i want for christmas is my two front teeth.
the quick brown fox jumps over a lazy dog.
Copy and paste this into a small file for one of your tests.
Hints:
Use a 2-dimensional array of char, 100 rows by 16 columns (why not 15?), to hold the unique words, and a 1-dimensional array of ints with 100 elements to hold the associated counts. For each word, scan through the occupied lines in the array for a match (use strcmp()), and if you find a match, increment the associated count, otherwise (you got past the last word), add the word to the table and set its count to 1.
The separate longest word and the shortest word need to be saved off in their own C-strings. (Why can't you just keep a pointer to them in the tokenized data?)
Remember – put NO NEWLINE at the end of the last line, or your test for end-of-file might not work correctly. (This may cause the program to read a zero-length line before seeing end-of-file.)
This is not a long program – no more than about 2 pages of code
Here is what I have so far:
#include<iostream>
#include<iomanip>
#include<fstream>
#include<string>
#include<cstring>
using namespace std;
void totalwordCount(ifstream &inputFile)
{
char words[100][16]; // Holds the unique words.
char *token;
int totalCount = 0; // Counts the total number of words.
// Read every word in the file.
while(inputFile >> words[99])
{
totalCount++; // Increment the total number of words.
// Tokenize each word and remove spaces, periods, and newlines.
token = strtok(words[99], " .\n");
while(token != NULL)
{
token = strtok(NULL, " .\n");
}
}
cout << "Total number of words in file: " << totalCount << endl;
}
void uniquewordCount(ifstream &inputFile)
{
char words[100][16]; // Holds the unique words
int counter[100];
char *tok = "0";
int uniqueCount = 0; // Counts the total number of unique words
while(!inputFile.eof())
{
uniqueCount++;
tok = strtok(words[99], " .\n");
while(tok != NULL)
{
tok = strtok(NULL, " .\n");
inputFile >> words[99];
if(strcmp(tok, words[99]) == 0)
{
counter[99]++;
}
else
{
words[99][15] += 1;
}
uniqueCount++;
}
}
cout << counter[99] << endl;
}
int main(int argc, char *argv[])
{
ifstream inputFile;
char inFile[12] = "string1.txt";
char outFile[16] = "word result.txt";
// Get the name of the file from the user.
cout << "Enter the name of the file: ";
cin >> inFile;
// Open the input file.
inputFile.open(inFile);
// If successfully opened, process the data.
if(inputFile)
{
while(!inputFile.eof())
{
totalwordCount(inputFile);
uniquewordCount(inputFile);
}
}
return 0;
}
I already took care of how to count the total number of words in the file in the totalwordCount() function, but in the uniquewordCount() function, I am having trouble counting the total number of unique words and counting the number of occurrences of each word. Is there anything that I need to change in the uniquewordCount() function?
This program contains several issues which are to be considered harmful! To prevent bad software being created based on entirely nonsensical assignments like the above, here are a number of hints:
Always test the stream for success after reading from it. Using in.eof() to determine if the stream is in a good state does not work! One of the problems is that you will get an infinite loop if the stream goes bad for a different reason than end of file, e.g., failure to correctly parse a value (this will set std::ios_base::failbit but not std::ios_base::eofbit.
Reading to a fixed size char array a using in >> a without having set up limits for the number of characters to be read is the C++ way to spell gets()! If you really think that using in >> a is the right way to (see next item), you absolutely need to set up the array's width, e.g., using in >> std::setw(sizeof(a)) >> a. You still need to check that this extraction was successful, of course.
From the looks of it, your teacher wants you to actually use std::istream::getline() to read the array, e.g., using in.getline(a, sizeof(a)) (which, of course, needs to be checked for success).
Note that the formatted input, i.e., in >> a already tokenizes the stream being received by spaces! There is no need to faff about with strtok() after that.
Once you have consumed a stream, it is consumed. Assuming the characters don't come from a file but rather from something like standard input, you also can't rewind the stream to read it again. I'd think you want to tokenize the values once and use them for both purposes.
This is more of a sidenote: after you created a stream, its nature should be entirely immaterial for the processing of the stream's content (although, e.g., for string streams you might want to eventually collect the result using the str() member): implement your stream processing functions in terms of std::istream rather than std::ifstream!
Since you have a concrete question ("Is there anything that I need to change in the uniquewordCount() function?"): yes, everything! Throw away this function entirely and rethink what you need to do. Basically, the structure of the functionality should be along the lines of
char buffer[100];
while (in.getline(buffer, sizeof(buffer))) {
// tokenize buffer into words
// for each word check if it already exists
// if the word does not exist, append it to the array of known words and set count to 1
// if the word exists, increment the count
// determine if the word is shorter or longer than the shortest or longest word so far
// if it is the case, remember the word's index or a pointer to it
}

Reading a text file in c++

string numbers;
string fileName = "text.txt";
ifstream inputFile;
inputFile.open(fileName.c_str(),ios_base::in);
inputFile >> numbers;
inputFile.close();
cout << numbers;
And my text.txt file is:
1 2 3 4 5
basically a set of integers separated by tabs.
The problem is the program only reads the first integer in the text.txt file and ignores the rest for some reason. If I remove the tabs between the integers it works fine, but with tabs between them, it won't work. What causes this? As far as I know it should ignore any white space characters or am I mistaken? If so is there a better way to get each of these numbers from the text file?
When reading formatted strings the input operator starts with ignoring leading whitespace. Then it reads non-whitespace characters up to the first space and stops. The non-whitespace characters get stored in the std::string. If there are only whitespace characters before the stream reaches end of file (or some error for that matter), reading fails. Thus, your program reads one "word" (in this case a number) and stops reading.
Unfortunately, you only said what you are doing and what the problems are with your approach (where you problem description failed to cover the case where reading the input fails in the first place). Here are a few things you might want to try:
If you want to read multiple words, you can do so, e.g., by reading all words:
std::vector<std::string> words;
std::copy(std::istream_iterator<std::string>(inputFile),
std::istream_iterator<std::string>(),
std::back_inserter(words));
This will read all words from inputFile and store them as a sequence of std::strings in the vector words. Since you file contains numbers you might want to replace std::string by int to read numbers in a readily accessible form.
If you want to read a line rather than a word you can use std::getline() instead:
if (std::getline(inputFile, line)) { ... }
If you want to read multiple lines, you'd put this operation into a loop: There is, unfortunately, no read-made approach to read a sequence of lines as there is for words.
If you want to read the entire file, not just the first line, into a file, you can also use std::getline() but you'd need to know about one character value which doesn't occur in your file, e.g., the null value:
if (std::getline(inputFile, text, char()) { ... }
This approach considers a "line" a sequence of characters up to a null character. You can use any other character value as well. If you can't be sure about the character values, you can read an entire file using std::string's constructor taking iterators:
std::string text((std::istreambuf_iterator<char>(inputFile)),
std::istreambuf_iterator<char>());
Note, that the extra pair of parenthesis around the first parameter is, unfortunately, necessary (if you are using C++ 2011 you can avoid them by using braces, instead of parenthesis).
Use getline to do the reading.
string numbers;
if (inputFile.is_open())//checking if open
{
getline (inputFile,numbers); //fetches entire line into string numbers
inputFile.close();
}
Your program does behave exactly as in your description : inputFile >> numbers; just extract the first integer in the input file, so if you suppress the tab, inputFile>> will extract the number 12345, not 5 five numbers [1,2,3,4,5].
a better method :
vector< int > numbers;
string fileName = "text.txt";
ifstream inputFile;
inputFile.open(fileName.c_str(),ios_base::in);
char c;
while (inputFile.good()) // loop while extraction from file is possible
{
c = inputFile.get(); // get character from file
if ( inputFile.good() and c!= '\t' and c!=' ' ) // not sure of tab and space encoding in C++
{
numbers.push_back( (int) c);
}
}
inputFile.close();

Why is this char stopping my program?

Does the newline character have some kind of special significance in c++? Is it a non-ASCII character?
I'm trying to build a Markov chain for each unique n-character substring within a larger piece of text. Every time I come across a new unique substring I enter it into a map whose value is a 256-element vector (one element for each character in the extended ASCII table).
There's no problem when I print out the entire contents of the file ("lines" is a vector of lines of text built using ifstream and getline):
for(int i=0; i<lines.size(); i++) cout << lines[i] << endl;
The whole text file shows up in the console. The problem happens when I try to return the newline character to a function that's expecting a char. "moveSpaces" is an integer constant that determines how many characters further ahead to move in the vector of strings on each iteration.
char GetNextChar(int row, int col){
for (int i=0; i<MOVESPACES; i++) {
if (col+1<lines[row].size()) {
col+=1;
} else { // If you're not at the end of the line keep going
row+=1; // Otherwise, move to the beginning of the next row
col=0;
}
}
return lines[row].at(col);
}
I've walked through with the debugger, and when it gets to the 1st column of the 2nd line it craps out on me – no error or anything. It fails within this function, not the calling function.
The file I'm using is A Christmas Carol (first thing that came up on Project Gutenberg). For reference here are the first few lines:
STAVE I: MARLEY'S GHOST
MARLEY was dead: to begin with. There is no doubt
whatever about that. The register of his burial was
The function breaks when it should return the first character on the second line. This doesn't happen if I get rid of the newline, or if I build the "lines" vector myself line by line in the program. Any idea what's wrong?
Your GetNextChar function is assuming that if you are at the last character in some line, there will be a character in the next line. What happens if there is no character in that next line? This can happen in two places: When you have hit end of file, or when the next line is the empty string.
The second line is the empty string.