Extra bytes when generating text file - c++

I'm trying to generate a text file that has 50 lines, each line consisting of 50 spaces. However, every few lines, 9 or 10 extra bytes gets added to the file.
#include <iostream>
#include <fstream>
using namespace std;
void InitializeCanvas() {
ofstream file("paint.txt");
int b = 0;
for (int i = 0; i < 50; i++) {
for (int j = 0; j < 50; j++) {
file << " ";
}
file << "\r\n";
//these lines show where the pointer is and where it should be
b += 52;
int pointer = file.tellp();
int difference = pointer - b;
cout << pointer << " (" << (difference) << ")" << endl;
}
file.close();
}
int main() {
InitializeCanvas();
return 0;
}
On line 9, 9 extra bytes are added. On lines 19, there are 19 extra bytes. Same for 29, 39, and 49. No extra bytes are added except for on those lines. What could be causing that? This code was compiled using CodeBlocks 13.12.

Edit: Since the question got some additional information, the explanation of this answer does not fit anymore completely - still the solution should work.
The extra bytes come from two mixed newlines per row (NL+CRLF). Let's take a look at the end of a line because \n is interpreted already\r\n in your compiler.
... 20 0D 0D 0A
... Space NL CR LF
The solution is in the constructor of ofstream. It's in text mode.
explicit ofstream (const char* filename, ios_base::openmode mode = ios_base::out);
Just use \n or write your data in binary format, or use endl.
ofstream file("paint.txt", std::ios_base::binary | std::ios_base::out);

Some (windows) compilers replace "\n" by "\r\n" so if you write "\r\n" you get the '\r' twice.
All you need to do is to use endl instead of the "\r\n"
replace this line:
file << "\r\n";
by:
file << endl;

Related

Weird characters appear at the end of file when encrypting it

I never thought I would have to turn to SO to solve this.
Alright so for more insight I am making my own encryption program.
I'm not trying to make it good or anything it's just a personal project.
What this program is doing is that it's flipping certain bits in every single byte of the character making it unreadable.
However every time I run the program and decrypt I get weird characters on the output. These characters seem to match the amount of lines as following:
^^ text that I want to encrypt
^^ after encrypting. (a lot of the text got cut off)
^^ after decrypting. there's 10 null character corresponding to the amount of newlines. there also seems to be another weird '�' character. Where are these bytes coming from??
I've tried a lot of stuff. Here is my code if anyone needs it (it's compiled with default flags):
#include <iostream>
#include <fstream>
#include <cstring>
#include <string>
#define ENCRYPTFILE "Encrypted.oskar"
typedef unsigned char BYTE;
char saltFunc(BYTE salt, char chr) {
for(int i = 0; i < 8; i++) {
if((salt >> i) & 1U) {
chr ^= 1UL << i;
}
}
return chr;
}
int main () {
std::ofstream encryptFile(ENCRYPTFILE, std::ifstream::in);
std::ifstream inputFile(ENCRYPTFILE, std::ifstream::in);
unsigned int length;
unsigned int lineLength;
BYTE salt = 0b00000001;
std::string line;
std::cin.unsetf(std::ios::dec);
std::cin.unsetf(std::ios::hex);
std::cin.unsetf(std::ios::oct);
//std::cout << "input salt in hex with a prefix 0x so for example. 0xA2" << std::endl;
//std::cin >> std::hex >> salt;
inputFile.seekg(0, inputFile.end);
length = inputFile.tellg();
inputFile.seekg(0, inputFile.beg);
std::cout << lineLength << std::endl;
char* fileBuffer = new char[length];
char* encryptFileBuffer = new char[length];
memset(fileBuffer, 0, length);
memset(encryptFileBuffer, 0, length);
while (inputFile.good()) { // just get file length in bytes.
static int i = 0;
fileBuffer[i] = inputFile.get();
i++;
}
while (std::getline(inputFile, line))
++lineLength;
inputFile.clear();
encryptFile.clear();
std::cout << "file size: " << length << std::endl;
for(int i = 0; i < length; i++) {
encryptFileBuffer[i] = saltFunc(salt, fileBuffer[i]);
encryptFile << encryptFileBuffer[i];
}
inputFile.close();
encryptFile.close();
delete[] encryptFileBuffer;
delete[] fileBuffer;
return 0;
}
The problem is that you are measuring the length of the file in bytes, which, for text files, is not the same as the length in characters. But you are then reading it as characters, so you end up reading too many characters and then writing extra garbage after then end in the output file.
Since you are getting one extra character per line, it is likely you are running on Windows, where line ending characters are two bytes in the file. That's where the extra incorrect length you are seeing is coming from.
For encryption/decryption what you probably want to do is read and write the file in binary mode, so you are reading and writing bytes not characters. You do this by adding std::ios::binary into the flags when opening the file(s):
std::ofstream encryptFile(ENCRYPTFILE, std::ifstream::in | std::ios::binary);
std::ifstream inputFile(ENCRYPTFILE, std::ifstream::in | std::ios::binary);

How does tellg() behave when reading from file?

I am trying to read from file using fstream .The file I am trying
to read has this content:
1200
1000
980
890
760
My code:
#include <fstream>
#include <iostream>
using namespace std;
int main ()
{
fstream file("highscores.txt", ios::in | ios::out);
if (!file.is_open())
{
cout << "Could not open file!" << endl;
return 0;
}
int cur_score;
while (!file.eof())
{
file >> cur_score;
cout << file.tellg() << endl;
}
}
The output is:
9
14
18
22
26
Why after first read the tellg() returns 9,
the first read is the number (1200) which is 4 positions
and I know there is \r and \n so this make 6 positions. Also. if I add more number in my file tellg() will
return a bigger number after first read.
If you've saved your file in UTF8 with a text editor, there might be an UTF8 BOM at the beginning of the file. This BOM is 3 chars long, so added to the 6, it would make 9.
If you want to be sure, check out the beginning of the file, with:
fstream file("highscores.txt", ios::in | ios::out | ios::binary);
if(file) {
char verify[16];
file.read(verify, sizeof(verify));
int rd = file.gcount();
for(int i = 0; i<rd; i++) {
cout << hex << setw(2) << (int)verify[i] << " ";
}
cout <<dec << endl;
}
Edit:
Running on windows with MSVC2013 on the file and I found 4, 10, 15, 20, 25 as expected, and I couldn't reproduce your figures.
I've now done a test with mingw and here I get exactly your numbers, and the strange effect that increasing the number of lines increases the output.
THIS IS A BUG of MINGW when you read your windows (CRLF line separator) file in text mode:
If I save the file in UNIX style (i.e. LF line separator), I get with the same programme 4,9,13,17 which is again the expected value for a linux system.
If I save the file in WINDOWS style (i.e. CRLF line separator), and if I change the code to open the file in ios::binary, I get the awaited 4,10,15,20,25.
Apparently it's an old problem.

Counting characters in txt file giving wrong count

I am learning C++ and right now I have made a file that does some encrypting/decrypting. After I am done with everything, I want to find out how much a file was compressed/decompressed. So I decided to count the characters in the input and output file, but here's where it starts going wrong.
int get_compression(string file1, string file2){
string line = "";
ifstream stream1(file1.c_str());
double counter1 = 0.0;
while(getline(stream1, line)){
counter1 += line.length();
}
stream1.close();
cout << counter1 << "\n";
ifstream stream2(file2.c_str());
double counter2 = 0.0;
while(getline(stream2, line)){
counter2 += line.length();
}
stream2.close();
cout << counter2 << "\n";
return (counter2/counter1)*100;
}
I have added the two cout statements to see what it has counted, but it is telling me it has counted 496 characters in the input txt file that really has 528 characters, and 481 characters in the txt file that has 785 characters. Did I make some rookie mistake somewhere?
I believe you are not counting the new line characters. On Windows it may occur 2 characters per line error. So I suggest that you have a look at how many lines does each file has and added to what your code has counted.
The other answers and comments are perfectly accurate, but you might want to try using Boost Filesystem because it makes things like this so much easier.
This is an example taken from the boost documentation at http://www.boost.org/doc/libs/1_49_0/libs/filesystem/v3/doc/tutorial.html#Reporting-size
#include <iostream>
#include <boost/filesystem.hpp>
using namespace boost::filesystem;
int main(int argc, char* argv[])
{
if (argc < 2)
{
std::cout << "Usage: tut1 path\n";
return 1;
}
std::cout << argv[1] << " " << file_size(argv[1]) << '\n';
return 0;
}

Extra File lines being overwritten when output is put into specificied file lines

Hey I have a bit of a silly question but I am having a bit of an issue with my code. I am trying to overwrite a line of a file, which is what it does, but the problem is that it overwrites other file lines as well. I am using C++ visual studios 2010. My code is below.
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
const string FILENAME = "DatabaseTest.txt";
fstream& GoToLineI(fstream& file, int num)
{
file.seekg(ios::beg);
for(int i = 0; i < num+1; i++)
file.ignore(std::numeric_limits<std::streamsize>::max(), '\n');
return file;
}
fstream& GoToLineO(fstream& file, int num)
{
file.seekp(ios::beg);
for( int i = 0; i < num; i++)
{
//gets the length of the line.
GoToLineI(file, i);
string s;
file >> s;
long pos = file.tellp();
file.seekp( pos + s.length() );
}
return file;
}
int main()
{
fstream myfile(FILENAME.c_str(), ios::out);
myfile.close();
myfile.open(FILENAME.c_str(), ios::in | ios::out);
myfile << "Usernames:" << endl;
for( int j = 0; j < 101; j++)
myfile << j << endl;
cout << "Where do you want to grab the data from?";
int i = 0;
cin >> i;
GoToLineI(myfile, i);
string line;
myfile >> line;
cout << line << endl;
GoToLineO(myfile, i);
if( myfile.is_open() )
{
cout << "File should be writeable" << endl;
myfile << "This should be at line 75" << endl;
}
myfile.seekp(ios::end);
system("PAUSE");
myfile.close();
return 0;
}
The issue may be in how I have my GoToLineO, which is how I find where to get to the output line, and It calls the GoToLineI in order to get the length of the lines until it reaches the right line to start displaying out put on. The output that this code generates is as such.
72
73
74
This should be at line 75
82
83
84
And it should look like this:
73
74
This should be at line 75
76
77
78
79
80
81
Any sort of insights or advice would be greatly appreciated.
edit: changed to only the important part of the outputs that should be shown.
If you seek to a spot in a file, and then start writing there, what you write is going to overwrite the exact same number of bytes as what you write -- a little like an editor that's always in overwrite mode instead of insert mode.
If you want your result to remain a simple text file, about all you can do is copy the data to a new file, inserting your new data in the right place, then copying the remaining data from the original file into the new file after the new data you inserted.
If you want that result to have the same name as the original, you have a few choices -- you can copy the entire result back to your existing file, or (if you aren't worried about the possibility of multiple hard links to the original file) you can delete the original file, and rename the new one to the old name.

C++ checksum reading nonexistent newline

I am doing a very basic checksum on files by reading the input file into a character array, and then iterating over that array and adding each character into the checksum. The problem is that when I do this all of my checksums are 10 too high (10 is the ascii decimal value for the newline character).
How is it newline characters are being inserted into my code, when I know for a fact there is no newline character in my text? Even a single line text file gets a newline character added in!
#include <iostream>
#include <fstream>
int main () {
int fileLength = 0;
std::ifstream inputFile;
char charArray[10000];
int checkSumValue = 0;
// open file in binary
inputFile.open("/Path/To/File", std::ios::binary);
// get file length, then return to beginning of file
inputFile.seekg(0, std::ios_base::end);
fileLength = inputFile.tellg();
inputFile.seekg(0, std::ios_base::beg);
// read all data from file into char array
inputFile.read(charArray, fileLength);
// iterate over char array, adding ascii decimal value to checksum
for (int num = 0; num <= fileLength; num++) {
std::cout << "Checksum value before iteration " << num << " is "
<< checkSumValue << std::endl;
checkSumValue += static_cast<int>(charArray[num]);
}
// properly close out the input file
inputFile.close();
inputFile.clear(std::ios_base::goodbit);
std::cout << "The checksum value is: " << checkSumValue << std::endl;
std::cout << "The file length is: " << fileLength << std::endl;
return 0;
}
Your problem is here:
num <= fileLength
It should be:
num < fileLength
For example. If the length is 1. Then the only valid character is charArray[0]
Also note. Doing this:
inputFile.read(charArray, fileLength);
is dangerious as fileLength may be larger than the size of the array.
A better solution would be to use a vector (as it dynamically sizes)
std::vector<char> charArray(fileLength);
inputFile.read(&charArray[0], fileLength);
But do you really need to copy the data into an array? Why not just do the sum on the fly.
size_t checkSumValue = std::accumulate(std::istreambuf_iterator<char>(fileLength),
std::istreambuf_iterator<char>(),
size_t(0)
);
Martin was also correct - you should be (num < fileLength) in all cases.
The other possibility is that you created your file in an editor and it's artificially added a spurious newline for you. That's common. Try dumping your file in a hex editor. I just ran your program (with the <= removed) and it works fine.