Avoid \r\n while reading text from a binary file - c++

I have a binary file packing lots of files (something like a .tar), where I can found both binary and text files.
When processing in memory strings, carriage lines are usually '\n', but if I read the text part from this packed file, I get "\r\n". Therefore processing this text gives me errors.
Here is the code for reading the text from a binary file:
FILE* _fileDescriptor; // it's always open to improve performance
fopen_s(&_fileDescriptor, _filePath.string().c_str(), "rb");
char* data = new char[size + 1]; // size is a known and correct value
fseek(_fileDescriptor, begin, SEEK_SET); // begin is another known value, where the file starts inside the packed one
fread(data, sizeof(char), size, _fileDescriptor);
data[it->second.size] = '\0';
This gives me the right text into data, but the following code gives me error when reading an empty line:
istringstream ss(data); // create a stringstream to process it in another function
delete[] data; // free the data buffer
// start processing the file
string line;
getline(infile, line); // read an empty line
if(line.size() > 0) {
/*
enters here, because the "empty" line was "\r\n", and now the value of line is '\r', therefore line.size() == 1
*/
...
So, any advice to avoid the '\r'?
I edited it on Notepad++. Changing its configuration to use '\n' instead of '\r\n' as line carriage works, but I don't want to depend on this because other people can miss that, and it would be very hard to spot the problem if that happens.

Probably easiest to trim the '\r' characters out of your string and then discard blank lines. See this answer for approaches to trimming a std::string (I'm assuming that's what 'line' is):
What's the best way to trim std::string?

Related

seekg() not working as expected

I have a small program, that is meant to copy a small phrase from a file, but it appears that I am either misinformed as to how seekg() works, or there is a problem in my code preventing the function from working as expected.
The text file contains:
//Intro
previouslyNoted=false
The code is meant to copy the word "false" into a string
std::fstream stats("text.txt", std::ios::out | std::ios::in);
//String that will hold the contents of the file
std::string statsStr = "";
//Integer to hold the index of the phrase we want to extract
int index = 0;
//COPY CONTENTS OF FILE TO STRING
while (!stats.eof())
{
static std::string tempString;
stats >> tempString;
statsStr += tempString + " ";
}
//FIND AND COPY PHRASE
index = statsStr.find("previouslyNoted="); //index is equal to 8
//Place the get pointer where "false" is expected to be
stats.seekg(index + strlen("previouslyNoted=")); //get pointer is placed at 24th index
//Copy phrase
stats >> previouslyNotedStr;
//Output phrase
std::cout << previouslyNotedStr << std::endl;
But for whatever reason, the program outputs:
=false
What I expected to happen:
I believe that I placed the get pointer at the 24th index of the file, which is where the phrase "false" begins. Then the program would've inputted from that index onward until a space character would have been met, or the end of the file would have been met.
What actually happened:
For whatever reason, the get pointer started an index before expected. And I'm not sure as to why. An explanation as to what is going wrong/what I'm doing wrong would be much appreciated.
Also, I do understand that I could simply make previouslyNotedStr a substring of statsStr, starting from where I wish, and I've already tried that with success. I'm really just experimenting here.
The VisualC++ tag means you are on windows. On Windows the end of line takes two characters (\r\n). When you read the file in a string at a time, this end-of-line sequence is treated as a delimiter and you replace it with a single space character.
Therefore after you read the file you statsStr does not match the contents of the file. Every where there is a new line in the file you have replaced two characters with one. Hence when you use seekg to position yourself in the file based on numbers you got from the statsStr string, you end up in the wrong place.
Even if you get the new line handling correct, you will still encounter problems if the file contains two or more consecutive white space characters, because these will be collapsed into a single space character by your read loop.
You are reading the file word by word. There are better methods:
while (getline(stats, statsSTr)
{
// An entire line is read into statsStr.
std::string::size_type posn = statsStr.find("previouslyNoted=");
// ...
}
By reading entire text lines into a string, there is no need to reposition the file.
Also, there is a white-space issue when reading by word. This will affect where you think the text is in the file. For example, white space is skipped, and there is no telling how many spaces, newlines or tabs were skipped.
By the way, don't even think about replacing the text in the same file. Replacement of text only works if the replacement text has the same length as the original text in the file. Write to a new file instead.
Edit 1:
A better method is to declare your key strings as array. This helps with positioning pointers within a string:
static const char key_text[] = "previouslyNoted=";
while (getline(stats, statsStr))
{
std::string::size_type key_position = statsStr.find(key_text);
std::string::size_type value_position = key_position + sizeof(key_text) - 1; // for the nul terminator.
// value_position points to the character after the '='.
// ...
}
You may want to save programming type by making your data file conform to an existing format, such as INI or XML, and using appropriate libraries to parse them.

How to read content of the file and save it to string type variable? Why there is empty space?

This is how I get the name of the file from the command line and open a file and save the content of the file line by line to a string. All the procedures works fine except three empty spaces at the beginning of the file. Is anyone can say why these empty spaces occurred and how can I ignore them?
string filename = "input.txt";
char *a=new char[filename.size()+1];
a[filename.size()]=0;
memcpy(a,filename.c_str(),filename.size());
ifstream fin(a);
if(!fin.good()){
cout<<" = File does not exist ->> No File for reading\n";
exit(1);
}
string s;
while(!fin.eof()){
string tmp;
getline(fin,tmp);
s.append(tmp);
if(s[s.size()-1] == '.')
{
//Do nothing
}
else
{
s.append(" ");
}
cout<<s<<endl;
The most probable cause is that your file is encoded in something else than ASCII. It contains a bunch of unprintable bytes and the string you on the screen is the result of your terminal interpreting those bytes. To confirm this, print the size of s after the reading is done. It should be larger than the number of characters you see on the screen.
Other issues:
string filename = "input.txt";
char *a=new char[filename.size()+1];
a[filename.size()]=0;
memcpy(a,filename.c_str(),filename.size());
ifstream fin(a);
is quite an overzealous way to go about it. Just write ifstream fin(a.c_str());, or simply ifstream fin(a); in C++11.
Next,
while(!fin.eof()){
is almost surely a bug. eof() does not tell if you the next read will succeed, only whether the last one reached eof or not. Using it this way will tipically result in last line seemingly being read twice.
Always, always, check for success of a read operation before you use the result. That's idiomatically done by putting getline in the loop condition: while (getline(fin, tmp))

Trying to read the whole text file

I'm trying to read the whole contain of the txt file, not line by line, but the whole contain
and print it on screen inside a textfield in xcode
i'm using a mix of obj-c and c++ lang:
while(fgets(buff, sizeof(buff), in)!=NULL){
cout << buff; // this print the whole output in the console
NSString * string = [ NSString stringWithUTF8String:buff ] ;
[Data setStringValue:string]; // but this line only print last line inside the textfield instead of printing it all
}
I'm trying to print the whole contain of the file like:
something...
something...
etc...
but instead it just printing the last line to the textfield, please help me
Is there a reason you aren't using Obj-C to read the file? It would be as simple as:
NSData *d = [NSData dataWithContentsOfFile:filename];
NSString *s = [[[NSString alloc] initWithData:d encoding:NSUTF8StringEncoding] autorelease];
[Data setStringValue:s];
Edit: To use the code you have now I would try something like this:
while(fgets(buff, sizeof(buff), in)!=NULL){
NSMutableString *s = [[Data stringValue] mutableCopy];
[s appendString:[NSString stringWithUTF8String:buff]];
[Data setStringValue:s];
}
Read a file, return the content as a C++ string:
// open the file
std::ifstream is;
is.open(fn.c_str(), std::ios::binary);
// put the content in a C++ string
std::string str((std::istreambuf_iterator<char>(is)),
std::istreambuf_iterator<char>());
In your code you are using the C api (FILE* from cstdio). In C, the code is more complex:
char * buffer = 0; // to be filled with the entire content of the file
long length;
FILE * f = fopen (filename, "rb");
if (f) // if the file was correctly opened
{
fseek (f, 0, SEEK_END); // seek to the end
length = ftell (f); // get the length of the file
fseek (f, 0, SEEK_SET); // seek back to the beginning
buffer = malloc (length); // allocate a buffer of the correct size
if (buffer) // if allocation succeed
{
fread (buffer, 1, length, f); // read 'length' octets
}
fclose (f); // close the file
}
To answer the question of why your solution didn't work:
[Data setStringValue:string]; // but this line only print last line inside the textfield instead of printing it all
Assuming that Data refers to a text field, setStringValue: replaces the entire contents of the field with the string you passed in. Your loop reads and sets one line at a time, so at any given time, string is one line from the file.
Views only get told to display when you're not doing anything else on the main thread, so your loop—assuming that you didn't run it on another thread or queue—does not print one line at a time. You read each line and replace the text field's contents with that line, so when your loop finishes, the field is left with the last thing you set its stringValue to—the last line from the file.
Slurping the entire file at once will work, but a couple of problems remain:
Text fields aren't meant for displaying multiple lines. No matter how you read the file, you're still putting its contents in a place that isn't designed for such contents.
If the file is large enough, reading it will take a significant amount of time. If you do this on the main thread, then, during that time, the app will be hung.
A proper solution would be:
Use a text view, not a text field. Text views are built to work with text of any number of lines, and when you create one in a nib, it comes wrapped in a scroll view for free.
Read the file one line or other limited-size chunk at a time, but not in a for or while loop. Use NSFileHandle or dispatch_source, either of which will call a block you provide whenever they read another chunk of the file.
Append each chunk to the text view's storage instead of replacing the entire text with it.
Show a progress indicator when you start reading, then hide it when you finish reading. For extra credit, make it a determinate progress bar, showing the user how far you've gotten through the file.

Reading a text file in c++

string numbers;
string fileName = "text.txt";
ifstream inputFile;
inputFile.open(fileName.c_str(),ios_base::in);
inputFile >> numbers;
inputFile.close();
cout << numbers;
And my text.txt file is:
1 2 3 4 5
basically a set of integers separated by tabs.
The problem is the program only reads the first integer in the text.txt file and ignores the rest for some reason. If I remove the tabs between the integers it works fine, but with tabs between them, it won't work. What causes this? As far as I know it should ignore any white space characters or am I mistaken? If so is there a better way to get each of these numbers from the text file?
When reading formatted strings the input operator starts with ignoring leading whitespace. Then it reads non-whitespace characters up to the first space and stops. The non-whitespace characters get stored in the std::string. If there are only whitespace characters before the stream reaches end of file (or some error for that matter), reading fails. Thus, your program reads one "word" (in this case a number) and stops reading.
Unfortunately, you only said what you are doing and what the problems are with your approach (where you problem description failed to cover the case where reading the input fails in the first place). Here are a few things you might want to try:
If you want to read multiple words, you can do so, e.g., by reading all words:
std::vector<std::string> words;
std::copy(std::istream_iterator<std::string>(inputFile),
std::istream_iterator<std::string>(),
std::back_inserter(words));
This will read all words from inputFile and store them as a sequence of std::strings in the vector words. Since you file contains numbers you might want to replace std::string by int to read numbers in a readily accessible form.
If you want to read a line rather than a word you can use std::getline() instead:
if (std::getline(inputFile, line)) { ... }
If you want to read multiple lines, you'd put this operation into a loop: There is, unfortunately, no read-made approach to read a sequence of lines as there is for words.
If you want to read the entire file, not just the first line, into a file, you can also use std::getline() but you'd need to know about one character value which doesn't occur in your file, e.g., the null value:
if (std::getline(inputFile, text, char()) { ... }
This approach considers a "line" a sequence of characters up to a null character. You can use any other character value as well. If you can't be sure about the character values, you can read an entire file using std::string's constructor taking iterators:
std::string text((std::istreambuf_iterator<char>(inputFile)),
std::istreambuf_iterator<char>());
Note, that the extra pair of parenthesis around the first parameter is, unfortunately, necessary (if you are using C++ 2011 you can avoid them by using braces, instead of parenthesis).
Use getline to do the reading.
string numbers;
if (inputFile.is_open())//checking if open
{
getline (inputFile,numbers); //fetches entire line into string numbers
inputFile.close();
}
Your program does behave exactly as in your description : inputFile >> numbers; just extract the first integer in the input file, so if you suppress the tab, inputFile>> will extract the number 12345, not 5 five numbers [1,2,3,4,5].
a better method :
vector< int > numbers;
string fileName = "text.txt";
ifstream inputFile;
inputFile.open(fileName.c_str(),ios_base::in);
char c;
while (inputFile.good()) // loop while extraction from file is possible
{
c = inputFile.get(); // get character from file
if ( inputFile.good() and c!= '\t' and c!=' ' ) // not sure of tab and space encoding in C++
{
numbers.push_back( (int) c);
}
}
inputFile.close();

Need help about monitoring txt file and reading new(last) entry(word) from that txt file

This is my first contact with C++.I have to make program that will monitor one .txt or .doc file and read every new(last) entry(word) from it.Only thing that I was able to do by now is to completely read txt file, but that is not the point, I can't even get only last word from txt file so I would really appreciate your help with this.
Thank you all in advance!!!
Not sure if this is homework, and just in case it is I'm trying to avoid spoiling it by "telling to much", and instead point you to the key ideas you could use.
To avoid reading the whole file, you could use first use the seekg method to position the file a certain number of bytes from the end, then perform the "read to the last word" from there.
To perform the "read to the last word" task proper (net of the optimization of not reading the whole file one word at a time, for which see first paragraph) use the >> operator with the std::ifstream as the left operand and a std::string as the right operand: just put this in a while(!thestream.eof()) { ... } so it will keep reading until it has the last word.
BTW, note that reading the text from a .doc file will be orders of magnitude harder than reading it from a text file, unless you can use a suitable ".doc-reading library" (the standard C++ library has no such functionality, per se).
Reading from MS Word from C++ is a tedious task; you'll need to get through the jumble of COM interfaces. Since you are saying it's your first contact with C++, my advice is to concentrate on plain text instead, namely on getting the last line of a plain text file.
I would do something like this. Provide your implementations of ReadFromEnd and FindRightmostLineSeparator, they should be trivial, and initialize the fileSize variable.
int const INITIAL_BUFFER_SIZE = 64;
int bufferSize = INITIAL_BUFFER_SIZE;
char* lastLine = NULL;
std::auto_ptr<char> buffer (new char[buffer_size]);
while(true) {
ReadFromEnd(buffer, buffer_size);
lastLine = FindRightmostLineSeparator(buffer);
if (lastLine == NULL && bufferSize == fileSize)
lastLine = buffer;
if (lastLine)
break;
buffer_size *= 2;
if (buffer_size > fileSize)
bufferSize = fileSize;
buffer.reset(new char[buffer_size]);
}
// lastLine contains the pointer to your last line