C++ Binary file read issue - c++

I was making a function to read a file containing some dumped data (sequence of 1 byte values). As the dumped values were 1 byte each, I read them as chars. I opened the file in binary mode, read the data as chars and did a casting into int (so I get the ascii codes). But the data read isn't correct (compared in a hex-editor). Here's my code:
int** read_data(char* filename, int** data, int& height, int& width)
{
data=new int*[height];
int row,col;
ifstream infile;
infile.open(filename,ios::binary|ios::in);
if(!infile.good())
{
return 0;
}
char* ch= new char[width];
for(row=0; row<height; row++)
{
data[row]=new int[width];
infile.read(ch,width);
for(col=0; col<width; col++)
{
data[row][col]=int(ch[col]);
cout<<data[row][col]<<" ";
}
cout<<endl;
}
infile.close();
return data;
}
Any ideas what might be wrong with this code?
My machine is windows, I'm using Visual Studio 2005 and the (exact) filename that i passed is:
"D:\\files\\output.dat"
EDIT: If I don't use unsigned char, the first 8 values, which are all 245, are read as -11.

I think, you might have to use unsigned char and unsigned int to get correct results. In your code, the bytes you read are interpreted as signed values. I assume you did not intend that.

Your error seems to cover in using of char* for ch. When you try to output it, all chars are printed until the first zero value.

A plain char can be either signed or unsigned, depending on the compiler. To get a consistent (and correct) result, you can cast the value to unsigned char before assigning to the int.
data[row][col]=static_cast<unsigned char>(ch[col]);

Related

How to store Huffman Codes in a binary file c++?

I was working on a Huffman project to compress text files. I was able to generate the required codes. I read the whole file and accordingly stored the codes in a "vector char" variable. I also padded the encoded vector.
vector<char> padding(vector<char> text)
{
int num = text.size();
unsigned int pad_value = 32-(num%32);
for(int i=0;i<pad_value;i++){
text.push_back('0');
}
string pad_info = bitset<32>(pad_value).to_string();
for(int i=pad_info.length()-1;i>=0;i--){
text.insert(text.begin(),pad_info[i]);
}
return text;
}
I padded on the base of 32 bits, as I was thinking if using an array of "unsigned int" to directly store the integers in a binary file so that they occupy 4 bytes for every 32 characters. I used this function for that:
vector<unsigned int> build_byte_array(vector<char> padded_text)
{
vector<unsigned int> byte_arr;
for(int i=0;i<padded_text.size();i+=32)
{
string byte="";
for(int j=i;j<i+32;j++){
byte += padded_text[j];
}
unsigned int b = stoul(byte,nullptr,2);
//cout<<b<<":"<<byte<<endl;
byte_arr.push_back(b);
}
return byte_arr;
}
Now the problem is when I write this byte array to binary file using
ofstream output("compressed.bin",ios::binary);
for(int i=0;i<byte_array.size();i++){
unsigned int a = byte_array[i];
output.write((char*)(&a),sizeof(a));
}
I get a binary file which is bigger than the original text file. How do I solve that or what error am I making.
Edit : I tried to compress a file of about 2,493 KB (for testing purposes) and it generated a compressed.bin file of 3,431 KB. So, I don't think padding is the issue here.
I also tried with 15KB file but the size of always increases after using this algo.
I tried using:
for(int i=0;i<byte_array.size();i++){
unsigned int a = byte_array[i];
char b = (char)a;
output.write((char*)(&a),sizeof(b));
}
but after using this I am unable to recover the original byte array when decompressing the file.
unsigned int a = byte_array[i];
output.write((char*)(&a),sizeof(a));
The size of the write is sizeof(a) which is usually 4 bytes.
An unsigned int is not a byte. A more suitable type for a byte would be std::byte, uint8_t, or unsigned char.
You are expanding your data with padding, so if you're not getting much compression or there's not much data to begin with, the output could easily be larger.
You don't need to pad nearly as much as you do. First off, you are adding 32 bits when the data already ends on a word boundary (when num is a multiple of 32). Pad zero bits in that case. Second, you are inserting 32 bits at the start to record how many bits you padded, where five bits would suffice to encode 0..31. Third, you could write bytes instead of ints, so the padding on the end could be 0..7 bits, and you could prepend three bits instead of five. The padding overall could be reduced from your current 33..64 bits to 3..10 bits.

Convert a 16-bit integer to an array of char? (C++)

I need to write 16-bit integers to a file. fstream only writes characters. Thus I need to convert the integers to char - the actual integer, not the character representing the integer (i.e. 0 should be 0x00, not 0x30) I tried the following:
char * chararray = (char*)(&the_int);
However this creates a backwards array of two characters. The individual characters are not flipped, but the order of the characters is. Thus I created this function:
char * inttochar(uint16_t input)
{
int input_size = sizeof(input);
char * chararray = (char*)(&input);
char * output;
output[0]='\0';
for (int i=0; i<input_size; i++)
{
output[i]=chararray[input_size-(i+1)];
}
return output;
}
This seems slow. Surely there is a more efficient, less hacky way to convert it?
It's a bit hard to understand what you're asking here (perhaps it's just me, although I gather the commentators thought so too).
You write
fstream only writes characters
That's true, but doesn't necessarily mean you need to create a character array explicitly.
E.g., if you have an fstream object f (opened in binary mode), you can use the write method:
uint16_t s;
...
f.write(static_cast<const char *>(&s), sizeof(uint16_t));
As others have noted, when you serialize numbers, it often pays to use a commonly-accepted ordering. Hence, use htons (refer to the documentation for your OS's library):
uint16_t s;
...
const uint16_t ns = htons(s);
f.write(static_cast<const char *>(&ns), sizeof(uint16_t));

C++ Opengl BMP how it manages to extract width and height from header?

This is my first post,so I am sorry if I write something wrong.I cannot understand how height and width are being extracted from the header.Here's the code till the part I am interested.
GLuint load_bmp(const char* imagepath)
{
unsigned char header[54];
unsigned int imageSize;
unsigned int dataPos;
unsigned int width;
unsigned int height;
unsigned char *data;
FILE *file=fopen(imagepath,"rb");
if(!file)
{
return false;
}
else
{
if(fread(header,1,54,file)!=54)
{
return false;
}
if((header[0]!='B')||(header[1]!='M'))
{
return false;
}
dataPos=*(int*)&header[0x0A];//This line
imageSize=*(int*)&header[0x22];//This line
height=*(int*)&header[0x12];//This line
width=*(int*)&header[0x16];//This line
}
}
How do you get the right values using these 4 lines of code?
The header is read into a buffer. The lines in question then cast addresses into that buffer as if they were pointing to binary integers, and read them.
So for example the height is a four-byte integer that is represented by the bytes from header[0x12] to header[0x15]. The code casts the address of the first byte as if it's pointing to an integer, then reads the contents of that integer pointer. I don't know if C++ has more guarantees about this sort of thing than C, but if not, then the code is making some assumptions about the size and byte representation of an int that won't work in some environments.

Best way to compare input values to read values from files

I am relatively new to c++ programming and I have hit one of my first major snags in all of this..
I am trying to figure out how to read a value/character from a generic ".txt" file that is on notepad. With that comparison I want to determine whether or not to read that entire line, but I can't seem to just read the single one or two digit number, I got it to read the whole line using { 'buffername'.getline(variable, size) } but when I try to change the 'size' to a specific number it gives me a comparison error saying that its invalid to switch to 'int' or 'char' (depending on how I declare the variable).
Any help is appreciated.
Thanks
int length = 2;
char * buffer;
ifstream is;
is.open ("test.txt", ios::binary );
// allocate memory:
buffer = new char [length];
// read 2 char
is.read (buffer,length);
//Compare the character and decide
delete[] buffer;
return 0;
You'll want to use an ifstream to get the value (ref 1).
Something like the following should work. Here I use a word of type std::string, but you can replace that with other types to read them (ie: int, double, etc...).
std::ifstream f("somefile.txt");
std::string word;
std::string line;
if(f >> word){
if(<the comparison>){
line = f.getline();
}
}
Here's an extended example of how to use the ifstream
First of all, for performance reasons it is a bad idea to read 1 byte at a time.
I suggest this alternative:
You would be better off reading in the whole line, and then using character array.
char variable[1000];
read your line in from the file into variable.
if (variable[1]=='c') { printf("Byte 2 (remember 0 offset) is compared for the letter c";}
getting a 2 digit #
number=((variable[3]-48)*10)+(variable[4]-48);
You have to subtract 48 because in ASCII the number 0 is 48.

Very strange char array behaviour

.
unsigned int fname_length = 0;
//fname length equals 30
file.read((char*)&fname_length,sizeof(unsigned int));
//fname contains random data as you would expect
char *fname = new char[fname_length];
//fname contains all the data 30 bytes long as you would expect, plus 18 bytes of random data on the end (intellisense display)
file.read((char*)fname,fname_length);
//m_material_file (std:string) contains all 48 characters
m_material_file = fname;
// count = 48
int count = m_material_file.length();
now when trying this way, intellisense still shows the 18 bytes of data after setting the char array to all ' ' and I get exactly the same results. even without the file read
char name[30];
for(int i = 0; i < 30; ++i)
{
name[i] = ' ';
}
file.read((char*)fname,30);
m_material_file = name;
int count = m_material_file.length();
any idea whats going wrong here, its probably something completely obvious but im stumped!
thanks
Sounds like the string in the file isn't null-terminated, and intellisense is assuming that it is. Or perhaps when you wrote the length of the string (30) into the file, you didn't include the null character in that count. Try adding:
fname[fname_length] = '\0';
after the file.read(). Oh yeah, you'll need to allocate an extra character too:
char * fname = new char[fname_length + 1];
I guess that intellisense is trying to interpret char* as C string and is looking for a '\0' byte.
fname is a char* so both the debugger display and m_material_file = fname will be expecting it to be terminated with a '\0'. You're never explicitly doing that, but it just happens that whatever data follows that memory buffer has a zero byte at some point, so instead of crashing (which is a likely scenario at some point), you get a string that's longer than you expect.
Use
m_material_file.assign(fname, fname + fname_length);
which removes the need for the zero terminator. Also, prefer std::vector to raw arrays.
std::string::operator=(char const*) is expecting a sequence of bytes terminated by a '\0'. You can solve this with any of the following:
extend fname by a character and add the '\0' explicitly as others have suggested or
use m_material_file.assign(&fname[0], &fname[fname_length]); instead or
use repeated calls to file.get(ch) and m_material_file.push_back(ch)
Personally, I would use the last option since it eliminates the explicitly allocated buffer altogether. One fewer explicit new is one fewer chance of leaking memory. The following snippet should do the job:
std::string read_name(std::istream& is) {
unsigned int name_length;
std::string file_name;
if (is.read((char*)&name_length, sizeof(name_length))) {
for (unsigned int i=0; i<name_length; ++i) {
char ch;
if (is.get(ch)) {
file_name.push_back(ch);
} else {
break;
}
}
}
return file_name;
}
Note:
You probably don't want to use sizeof(unsigned int) to determine how many bytes to write to a binary file. The number of bytes read/written is dependent on the compiler and platform. If you have a maximum length, then use it to determine the specific byte size to write out. If the length is guaranteed to fewer than 255 bytes, then only write a single byte for the length. Then your code will not depend on the byte size of intrinsic types.