C++ checksum reading nonexistent newline - c++

I am doing a very basic checksum on files by reading the input file into a character array, and then iterating over that array and adding each character into the checksum. The problem is that when I do this all of my checksums are 10 too high (10 is the ascii decimal value for the newline character).
How is it newline characters are being inserted into my code, when I know for a fact there is no newline character in my text? Even a single line text file gets a newline character added in!
#include <iostream>
#include <fstream>
int main () {
int fileLength = 0;
std::ifstream inputFile;
char charArray[10000];
int checkSumValue = 0;
// open file in binary
inputFile.open("/Path/To/File", std::ios::binary);
// get file length, then return to beginning of file
inputFile.seekg(0, std::ios_base::end);
fileLength = inputFile.tellg();
inputFile.seekg(0, std::ios_base::beg);
// read all data from file into char array
inputFile.read(charArray, fileLength);
// iterate over char array, adding ascii decimal value to checksum
for (int num = 0; num <= fileLength; num++) {
std::cout << "Checksum value before iteration " << num << " is "
<< checkSumValue << std::endl;
checkSumValue += static_cast<int>(charArray[num]);
}
// properly close out the input file
inputFile.close();
inputFile.clear(std::ios_base::goodbit);
std::cout << "The checksum value is: " << checkSumValue << std::endl;
std::cout << "The file length is: " << fileLength << std::endl;
return 0;
}

Your problem is here:
num <= fileLength
It should be:
num < fileLength
For example. If the length is 1. Then the only valid character is charArray[0]
Also note. Doing this:
inputFile.read(charArray, fileLength);
is dangerious as fileLength may be larger than the size of the array.
A better solution would be to use a vector (as it dynamically sizes)
std::vector<char> charArray(fileLength);
inputFile.read(&charArray[0], fileLength);
But do you really need to copy the data into an array? Why not just do the sum on the fly.
size_t checkSumValue = std::accumulate(std::istreambuf_iterator<char>(fileLength),
std::istreambuf_iterator<char>(),
size_t(0)
);

Martin was also correct - you should be (num < fileLength) in all cases.
The other possibility is that you created your file in an editor and it's artificially added a spurious newline for you. That's common. Try dumping your file in a hex editor. I just ran your program (with the <= removed) and it works fine.

Related

Weird characters appear at the end of file when encrypting it

I never thought I would have to turn to SO to solve this.
Alright so for more insight I am making my own encryption program.
I'm not trying to make it good or anything it's just a personal project.
What this program is doing is that it's flipping certain bits in every single byte of the character making it unreadable.
However every time I run the program and decrypt I get weird characters on the output. These characters seem to match the amount of lines as following:
^^ text that I want to encrypt
^^ after encrypting. (a lot of the text got cut off)
^^ after decrypting. there's 10 null character corresponding to the amount of newlines. there also seems to be another weird '�' character. Where are these bytes coming from??
I've tried a lot of stuff. Here is my code if anyone needs it (it's compiled with default flags):
#include <iostream>
#include <fstream>
#include <cstring>
#include <string>
#define ENCRYPTFILE "Encrypted.oskar"
typedef unsigned char BYTE;
char saltFunc(BYTE salt, char chr) {
for(int i = 0; i < 8; i++) {
if((salt >> i) & 1U) {
chr ^= 1UL << i;
}
}
return chr;
}
int main () {
std::ofstream encryptFile(ENCRYPTFILE, std::ifstream::in);
std::ifstream inputFile(ENCRYPTFILE, std::ifstream::in);
unsigned int length;
unsigned int lineLength;
BYTE salt = 0b00000001;
std::string line;
std::cin.unsetf(std::ios::dec);
std::cin.unsetf(std::ios::hex);
std::cin.unsetf(std::ios::oct);
//std::cout << "input salt in hex with a prefix 0x so for example. 0xA2" << std::endl;
//std::cin >> std::hex >> salt;
inputFile.seekg(0, inputFile.end);
length = inputFile.tellg();
inputFile.seekg(0, inputFile.beg);
std::cout << lineLength << std::endl;
char* fileBuffer = new char[length];
char* encryptFileBuffer = new char[length];
memset(fileBuffer, 0, length);
memset(encryptFileBuffer, 0, length);
while (inputFile.good()) { // just get file length in bytes.
static int i = 0;
fileBuffer[i] = inputFile.get();
i++;
}
while (std::getline(inputFile, line))
++lineLength;
inputFile.clear();
encryptFile.clear();
std::cout << "file size: " << length << std::endl;
for(int i = 0; i < length; i++) {
encryptFileBuffer[i] = saltFunc(salt, fileBuffer[i]);
encryptFile << encryptFileBuffer[i];
}
inputFile.close();
encryptFile.close();
delete[] encryptFileBuffer;
delete[] fileBuffer;
return 0;
}
The problem is that you are measuring the length of the file in bytes, which, for text files, is not the same as the length in characters. But you are then reading it as characters, so you end up reading too many characters and then writing extra garbage after then end in the output file.
Since you are getting one extra character per line, it is likely you are running on Windows, where line ending characters are two bytes in the file. That's where the extra incorrect length you are seeing is coming from.
For encryption/decryption what you probably want to do is read and write the file in binary mode, so you are reading and writing bytes not characters. You do this by adding std::ios::binary into the flags when opening the file(s):
std::ofstream encryptFile(ENCRYPTFILE, std::ifstream::in | std::ios::binary);
std::ifstream inputFile(ENCRYPTFILE, std::ifstream::in | std::ios::binary);

How to read any kind of files as binary and edit it as you want (like compress it) with c++?

I'm trying to figure out a way to manipulate the binary code of any file in the computer in goal to apply a compress/decompress algorithm in c++ .
I have been searching about that for long time and all i found was how to read a .bin file :
#include <iostream>
#include <fstream>
using namespace std;
int main (){
streampos size;
char * memblock;
ifstream file ("name.bin", ios::in|ios::binary|ios::ate);
if (file.is_open())
{
size = file.tellg();
memblock = new char[size];
file.seekg (0, ios::beg);
file.read (memblock, size);
for(int i = 0 ; i < size ; i++){
cout << memblock[i] ;
}
file.close();
cout << "\n\n the entire file content is in memory";
delete[] memblock;
}
else cout << "Unable to open file";
return 0;
}
I just wanna those bytes without ASCII translation, other words i wanna all the file as binary not what is inside it
<< is overloaded for char types to output the ASCII formated character. The data (the ones and zeros) in your memblock array are accurately read in as binary. It's just the way you're displaying them that is ASCII. Instead of a char[] for memblock, make it a uint8_t[]. Then, when you output, do
std::cout << std::hex << std::fill('0') << std::setw(2) << memblock[i];
^ ^ ^
| | |
| | sets the width of the next output
| sets the fill character (default is space)
tells the stream to output all numbers in hexadecimal
You'll have to #include <iomanip> for the stream format manipulators hex, fill, and setw to work.
Note that setw will only be set on the stream for the next output operation, while hex and fill will be set until explicitly set otherwise. That said, you only need to set these two manipulators once, probably outside your loop. Then when you're finished, you can set them back like:
std::cout << std::dec << std::fill(' ');
See https://en.cppreference.com/w/cpp/io/basic_ostream/operator_ltlt2 for the list of overloaded operator<< functions for char and char arrays.
The answer was simpler than we thought :
include bitset
for(int i = 0 ; i < size ; i++){
//changing the value of "memblock[i]" to binary byte per byte with for loop
//and of course using bitset
bitset<8> test (memblock[i]);
cout << test ;
}

Appending a text file to a char array, receiving garbage output

When I add a file to a char array, then print, I get garbage output (random ASCII symbols). The file contains only text (a paragraph).
The code is as follows:
int arraySize = 0;
string line;
while(getline(inFile, line)){
//cout << line << endl; // this will print array fine.
arraySize += line.length();
}
char message[arraySize];
char encrypted[arraySize];
//adds file to array
int i = 0;
while(inFile.good() && !inFile.eof()){
inFile.get(message[i]);
i++;
}
message[i] = '\0';
//prints array
for(int i = 0; i < arraySize; i++){
cout << message[i]; //this returns garbage values
}
I believe its printing garbage because it thinks there's nothing in the array messages, but I do not know why there is "nothing there."
The reason is you reached the end of file when you count the length of the text thus the read pointer is at the end of the file and you used it again to read the text file.
To do it: Get the read pointer again to the beginning:
inFile.clear();
inFile.seekg(0, ios::beg);
while(inFile.get(message[i])){
i++;
}
Also don't use: while (!infile.eof()) it is considered to be incorrect.
What I recommend is to use std::vector you don not mind about the file size or any allocation / de-allocation of memory. So your code can be like this:
std::ifstream inFile("data.txt"); // your file name here
std::string strLine;
std::vector<std::string> vecStr;
while(std::getline(inFile, strLine))
vecStr.push_back(strLine);
for(int i(0); i < vecStr.size(); i++)
std::cout << vecStr[i] << std::endl;
inFile.close();
Have you seen how the code is charm above?
NB: You got the garbage values because the array is only declared but not initialized:
The first read gets the length of the text. But moved the read pointer to the end and then you did:
while(inFile.good() && !inFile.eof()){ // Will fail because inFile.eof() is true from the previous read.
//std::cout << "Inside the reading loop" << std::endl;
inFile.get(message[i]);
i++;
}
As you can see above the loop will not be executed because the previous read reached the eof thus the array is just declared without being initialized thus as you know it contains garbage values.
To confirm that the loop is not executed un-comment the line above and see if the loop is executed. The result is no printing message which means it was not executed.

Parsing header of file

I am working with images. I would like to extract width and height of it from a header.
The width is represented at location
size
4B width
4B height
they are represented at specific index.
I tried parse it and extract it with code
ifstream f(name, ios::binary | ios:: in ); // reading a file in binary
ostringstream ob;
if ( f.fail()){ // fail test
return false;
}
f.seekg (0, f.end);
int length = f.tellg(); // length of the file
memory = new char[length]; // allocate array of chars
f.read (memory, length); // read the content of the file into an array
f.seekg (0, f.beg); // point back at the beginning of the file.
Each of them has 4B , so using for loop
for ( int i = index ; i <4 ;i++){
cout << hex<< memory[i];
}
or i even tried it converting it into number using
string a;
for ( int i = index; i < 4 ;i++{
a+=memory[i];
}
cout << atoi( a.c_str() ) << endl;
Should output a number , but it output some unreadable format.
You're seeking to the beginning of the file after reading it, rather than before. You need to switch round the read() and the seekg():
f.seekg (0, f.beg); // point back at the beginning of the file.
f.read (memory, length); // read the content of the file into an array
You don't give very much information to go on so a little guesswork here but this is how I might approach reading the file:
The main features being, don't use manual memory allocation, use a std::vector (its what its for). Also copy the data from the char array into variables cast from the correct type. This ensures alignment is correct (never cast into the char array). Another approach might be to read directly from the file into the correctly typed variables cast into char*.
int main(int, char* argv[])
{
// first parameter needs to be file name
std::string name = argv[1] ? argv[1]:"";
std::ifstream ifs(name, std::ios::binary|std::ios::ate); // open at end
if(!ifs)
{
std::cerr << std::strerror(errno) << '\n';
return EXIT_FAILURE;
}
if(ifs.tellg() < 8) // too small
{
std::cerr << "Bad image file, too short" << '\n';
return EXIT_FAILURE;
}
// Don't allocate memory manually, use a container
std::vector<char> image(ifs.tellg()); // big enough for whole file
ifs.seekg(0); // back to beginning
if(!ifs.read(image.data(), image.size()))
{
std::cerr << std::strerror(errno) << '\n';
return EXIT_FAILURE;
}
// copy raw data into variables
std::uint32_t width; // 4 bytes wide integer
std::uint32_t height; // 4 bytes wide integer
std::copy(&image[4], &image[ 8], (char*)&width); // offset 4 bytes
std::copy(&image[8], &image[12], (char*)&height); // offset 8 bytes
// at this point a lot depends on the system architecture and
// how the number is stored in the file. The documentation
// should tell you if it is little-endian or big-endian
// you may have to do manual jiggery-pokery
// to change endienness
std::cout << "width : " << width << '\n';
std::cout << "height: " << height << '\n';
}
For a quick answer of the problem stated in the title, and by looking at the input file you specify, you could simply read three lines, skip the first and then extract one integer each from the following two lines.
So something like
int width, height;
std::string input;
std::istringstream is;
std::getline(stream, input); // One line for the "size" string
std::getline(stream, input); // One line for the width
is.str(input)
is >> width;
std::getline(stream, input); // One line for the height
is.str(input)
is >> height;
If you have multiple entries like this in the file, then do the above in a loop.
If the file doesn't actually contain the texts you show, just the numbers, then it's even simpler:
int width, height;
stream >> width >> height;
Or if you have multiple entries
while (stream >> width >> height) { ... }
std::cout will print data passed as characters if the data is char, so try casting before printing.
for ( int i = 0 ; i < 4 ; i++) {
cout << hex<< (int)(unsigned char)memory[index + i];
}
If the number in the file is little endian, it can be converted to integer like this:
// Use #include <cstdint> for using type uint32_t and uint8_t
uint32_t num = 0; // the converted number will be here
for ( int i = 0 ; i < 4 ; i++) {
num |= (uint8_t)memory[index + i] << (uint32_t)(8 * i);
}

In c++ seekg seems to include cr chars, but read() drops them

I'm currently trying to read the contents of a file into a char array.
For instance, I have the following text in a char array. 42 bytes:
{
type: "Backup",
name: "BackupJob"
}
This file is created in windows, and I'm using Visual Studio c++, so there is no OS compatibility issues.
However, executing the following code, at the completion of the for loop, I get Index: 39, with no 13 displayed prior to the 10's.
// Create the file stream and open the file for reading
ifstream fs;
fs.open("task.txt", ifstream::in);
int index = 0;
int ch = fs.get();
while (fs.good()) {
cout << ch << endl;
ch = fs.get();
index++;
}
cout << "----------------------------";
cout << "Index: " << index << endl;
return;
However, when attempting to create a char array the length of the file, reading the file size as per below results in the 3 additional CR chars attributing to the total filesize so that length is equal 42, which is adding screwing up the end of the array with dodgy bytes.
// Create the file stream and open the file for reading
ifstream fs;
fs.seekg(0, std::ios::end);
length = fs.tellg();
fs.seekg(0, std::ios::beg);
// Create the buffer to read the file
char* buffer = new char[length];
fs.read(buffer, length);
buffer[length] = '\0';
// Close the stream
fs.close();
Using a hex viewer, I have confirmed that file does indeed contain the CRLF (13 10) bytes in the file.
There seems to be a disparity with getting the end of the file, and what the get() and read() methods actually return.
Could anyone please help with this?
Cheers,
Justin
You should open your file in binary mode. This will stop read dropping CR.
fs.open("task.txt", ifstream::in|ifstream::binary);