C++ program crashing on weird input - c++

Im coding huffman compression and it works fine for all exteded ascii(0-255) but when I open non text file like mp3 that has somethink like that inside:
ťîxł¸ H…W]`9M ČČ  ˇ˘Ł¤Ąxw
it crashes. I tested and it is not because of the size it is because of the input data.
It crashes on file save, heres the code:
for(int i=0;i<=contents.length();i++){
newString +=kod[contents[i]];
}
saveFile("test_nowy.txt", newString);
bool saveFile (string name, string contents)
{
ofstream file;
file.open(name.c_str());
file << contents;
file.close();
}
I also need to say that despite passing all earlier steps(calculating codes etc) the results are wrong. It seems like my program doesn't understand those characters.

You are accessing out the boundary of a string which is undefined behavior.
for(int i=0;i<=contents.length();i++)
^^
should be:
for(int i=0;i<contents.length();i++)
^^
BTW, it will be a good time to learn debugger. Capture the exact point where the program crashes and find out why.

You're accessing negative indices of your kod array. Try
kod[contents[i] & 0xFFu]
See http://ideone.com/2LvmKW
Also fix the overrun that billz spotted.

You say it works for all characters from 0..255, but not for these characters, that would imply you're using something other than an 8 bit char. If so, you're probably indexing outside the range of kod.

Related

C++ Null characters in string?

I want to read a txt file and convert two cells from each line to floats.
If I first run:
someString = someString.substr(1, tempLine.size());
And then:
std::stof(someString)
it only converts the first number in 'someString' to a number. The rest of the string is lost.
When I handled the string in my IDE I noticed that copying it and pasting it inside quotation marks gives me "\u00005\u00007\u0000.\u00007\u00001\u00007\u00007\u0000" and not 57.7177.
If I instead do:
std::string someOtherString = "57.7177"
std::stof(someOtherString)
I get 57.7177.
Minimal working example is:
int main() {
std::string someString = "\u00005\u00007\u0000.\u00007\u00001\u00007\u00007\u0000";
float someFloat = std::stof(someString);
return 0;
}
Same problem occurs using both UTF-8 and -16 encoding.
What is happening and what should I do differently? Should I remove the null-characters somehow?
"I want to read a txt file"
What is the encoding of the text file? "Text" is not a encoding. What I suspect is happening is that you wrote code that reads in the file as either UTF8 or Windows-1250 encoding, and stored it in a std::string. From the bytes, I can see that the file is actually UTF16BE, and so you need to read into a std::u16string. If your program will only ever run on Windows, then you can get by with a std::wstring.
You probably have followup questions, but your original question is vague enough that I can't predict what those questions would be.

The program doesn't seem to be saving the input data correctly (c++)

So, I want my program to read data from a file, and save it into different quarter1, quarter2,quarter3, quarter4 depending of it's date, but it doesn't seem to work properly and still don't know why, I've been trying to debug and I'm pretty sure it fails when saving at saveQuarters or existeix which is basically a dichothomic search which returns if the code exists and if it exists, it returns the position. This is the code:
I just skimmed through some of the stuff you had so this suggestion may not work, but you can try declaring your file as input or output. Perhaps that could be the problem.
Some thing like:
string fileName = "data.txt";
ifstream dataFile;
dataFile.open(fileName, ios::in);
Doing this:
fitxerCens >> taulaCens[i].stateName;
Will grab an entire line of the data file until it sees a space is correct.

Brought a Linux C++ Console Application to a Win32 C++ App using VS2010 and the search function from <algorithm> is no longer working

Just like the title says, I've been working on a fairly large program and have come upon this bug. I'm also open to alternatives for searching a file for a string instead of using . Here is my code narrowed down:
istreambuf_iterator<char> eof;
ifstream fin;
fin.clear();
fin.open(filename.c_str());
if(fin.good()){
//I outputted text to a file to make sure opening the file worked, which it does
}
//term was not found.
if(eof == search(istreambuf_iterator<char>(fin), eof, term.begin(), term.end()){
//PROBLEM: this code always executes even when the string term is in the file.
}
So just to clarify, my program worked correctly in Linux but now that I have it in a win32 app project in vs2010, the application builds just fine but the search function isn't working like it normally did. (What I mean by normal is that the code in the if statement didn't execute because, where as now it always executes.)
NOTE: The file is a .xml file and the string term is simply "administration."
One thing that might or might not be important is to know that filename (filename from the code above) is a XML file I have created in the program myself using the code below. Pretty much I create an identical xml file form the pre-existing one except for it is all lower case and in a new location.
void toLowerFile(string filename, string newloc, string& newfilename){
//variables
ifstream fin;
ofstream fout;
string temp = "/";
newfilename = newloc + temp + newfilename;
//open file to read
fin.open(filename.c_str());
//open file to write
fout.open(newfilename.c_str());
//loop through and read line, lower case, and write
while (fin.good()){
getline (fin,temp);
//write lower case version
toLowerString(temp);
fout << temp << endl;
}
//close files
fout.close();
fin.close();
}
void toLowerString(string& data){
std::transform(data.begin(), data.end(), data.begin(), ::tolower);
}
I'm afraid your code is invalid - the search algorithm requires forward iterators, but istreambuf_iterator is only an input iterator.
Conceptually that makes sense - the algorithm needs to backtrack on a partial match, but the stream may not support backtracking.
The actual behaviour is undefined - so the implementation is allowed to be helpful and make it seem to work, but doesn't have to.
I think you either need to copy the input, or use a smarter search algorithm (single-pass is possible) or a smarter iterator.
(In an ideal world at least one of the compilers would have warned you about this.)
Generally, with Microsoft's compiler, if your program compiles and links a main() function rather than a wmain() function, everything defaults to char. It would be wchar_t or WCHAR if you have a wmain(). If you have tmain() instead, then you are at the mercy of your compiler/make settings and it's the UNICODE macro that determines which flavor your program uses. But I doubt that char_t/wchar_t mismatch is actually the issue here because I think you would have got an warning or error if all four of the search parameters didn't use the same the same character width.
This is a bit of a guess, but try this:
if(eof == search(istreambuf_iterator<char>(fin.rdbuf()), eof, term.begin(), term.end())

Extra character when reading a file. C++

I'm writing two programs that communicate by reading files which the other one writes.
My problem is that when the other program is reading a file created by the first program it outputs a weird character at the end of the last data. This only happens seemingly at random, as adding data to the textfile can result in a normal output.
I'm utilizing C++ and Qt4. This is the part of program 1:
std::ofstream idxfile_new;
QString idxtext;
std::string fname2="some_textfile.txt"; //Imported from a file browser in the real code.
idxfile_new.open (fname2.c_str(), std::ios::out);
idxtext = ui->indexBrowser->toPlainText(); //Grabs data from a dialog of the GUI.
//See 'some_textfile.txt' below
idxfile_new<<idxtext.toStdString();
idxfile_new.clear();
idxfile_new.close();
some_textfile.txt:
3714.1 3715.1 3716.1 3717.1 3719.1 3739.1 3734.1 3738.1 3562.1 3563.1 3623.1
part of program 2:
std::string indexfile = "some_textfile.txt"; //Imported from file browser in the real code
std::ifstream file;
std::string sub;
file.open(indexfile.c_str(), std::ios::in);
while(file>>sub)
{
cerr<<sub<<"\n"; //Stores values in an array in the real code
}
This outputs:
3714.1
3715.1
3716.1
3717.1
3719.1
3739.1
3734.1
3738.1
3562.1
3563.1
3623.1�
If I add more data it works at times. Sometimes it can output data such as
3592.�
or
359�
at the end. So it is not consistent in reading the whole data either. At first I figured it wasn't reading the eof properly, and I have read and tried many solutions to similar problems but can't get it to work correctly.
Thank you guys for the help!
I managed to solve the problem by myself this morning.
For anyone with the same problem I will post my solution.
The problem was the UTF-8 encoding when creating the file. Here's my solution:
Part of program 1:
std::ofstream idxfile_new;
QString idxtext;
std::string fname2="some_textfile.txt";
idxfile_new.open (fname2.c_str(), std::ios::out);
idxtext = ui->indexBrowser->toPlainText();
QByteArray qstr = idxtext.toUtf8(); //Enables Utf8 encoding
idxfile_new<<qstr.data();
idxfile_new.clear();
idxfile_new.close();
The other program is left unchanged.
A hex converter displayed the extra character as 'ef bf bd', which is due to the replacement character U+FFFD that replace invalid bytes when encoding to Utf8.

Having problems with 0x0A character in C++ even in binary mode. (interprets it as new file)

Hi this might seem a bit noobie, but here we go. Im developing a program that downloads leaderboards of a certain game from the internet and transforms it into a proper format to work with it (elaborate rankings, etc).
The files contains the names, ordered by rank, but between each name there are 7 random control codes (obivously unprintable). The txt file looks like this:
..C...hName1..)...&Name2......)Name3..é...þName4..Ü...†Name5..‘...QName6..~...bName7..H...NName8..|....Name9..v...HName10.
Checked via an hexEditor and saw the first control code after each name is always a null character (0x00). So, what I do is read everything, and then cout every character. When a 0x00 character is found, skip 7 characters and keep couting. Therefore you end up with the list, right?
At first I had the problem that on those random control codes, sometimes you would find like a "soft EOF" (0x1A), and the program would stop reading there. So I finally figured out to open it in binary mode. It worked, and then everything would be couted... or thats what I thought.
But I came across another file which still didn't work, and finally found out that there was an EOF character! (0x0A) Which doesn't makes sense since Im opening it in binary mode. But still, after reading that character, C++ interprets that as a new file, and hence skips 7 characters, so the name after that character will always appear cut.
Here's my current code:
#include <cstdlib>
#include <iostream>
#include <fstream>
using namespace std;
int main () {
string scores;
system("wget http://certainwebsite/001.txt"); //download file
ifstream highin ("001.txt", ios::binary);
ofstream highout ("board.txt", ios::binary);
if (highin.is_open())
{
while ( highin.good() )
{
getline (highin, scores);
for (int i=0;i<scores.length(); i++)
{
if (scores[i]==0x00){
i=i+7; //skip 7 characters if 'null' is found
cout << endl;
highout << endl;
}
cout << scores[i];
highout << scores[i]; //cout names and save them in output file
}
}
highin.close();
}
else cout << "Unable to open file";
system("pause>nul");
}
Not sure how to ignore that character if being already in binary mode doesn't work. Sorry for the long question but I wanted to be detailed and specific. In this case, the EOF character is located before the Name3, and hence this is how the output looks like:
http://i.imgur.com/yu1NjoZ.png
By default getline() reads until the end of line and discards the newline character. However, the delimiter character could be customized (by supplying the third parameter). If you wish to read until the null character (not until the end of line), you could try using getline (highin, scores, '\0'); (and adjusting the logic of skipping the characters).
I'm glad you figured it out and it doesn't surprise me that getline() was the culprit. I had a similar issue dealing with the newline character when I was trying to read in a CSV file. There are several different getline() functions in C++ depending on how you call the function and each seems to handle the newline character differently.
As a side note, in your for loop, I'd recommend against performing a method call in your test. That adds unnecessary overhead to the loop. It'd be better to call the method once and put that value into a variable, then enter the loop and test i against the length variable. Unless you expect the length to change, calling the length() method each iteration is a waste of system resources.
Thank you all guys, it worked, it was the getline() which was giving me problems indeed. Due to the 'while' loop, each time it found a new line character, it restarted the process, hence skipping those 7 characters.