I'm trying read width and height of bmp file in C++. width and height is 512 of my bmp file. 18. byte is width value of bmp files.
My code show below:
int main() {
char header[54];
ifstream bmp;
bmp.open("image.bmp", ios::in | ios::binary);
if (!bmp) {
cout << "Error" << endl;
system("PAUSE");
exit(1);
}
bmp.read(header, 54);
char a = header[18];
int b = *(int *)&header[18];
system("PAUSE");
}
How does b become 512 when a is '\0' ? Sorry my bad English.
Because char is one byte and int is (probably) 4 bytes. 512 as a four-byte int would look something like 0x00 0x00 0x02 0x00. Notice that only one of those four bytes is non-zero.
char a = header[18]; just reads the single byte at position 18, while int b = *(int *)&header[18]; interprets positions 18 through 21 as a single int value.
According to the BMP file format, the width is a signed integer comprising 4 bytes, not one byte.
So if the width is 512, then - when represented as a 4 byte integer, the bytes are 0x00 - 0x00 - 0x02 - 0x00 and stored from position 18 to 21, respectively. So position 18 is 0x00 (your value of a), whereas the width when interpreted as a 4 byte signed integer from position 18 to 21 is 512.
Important note that I'm not certain can be expanded upon in a comment.
int b = *(int *)&header[18];
treats header[18] as though it was an int. It's not an int. Welcome to Strict Aliasing. You have converted types which are not the same What you do here is walk into undefined behaviour. It's one of the sort of Undefined behaviour that USUALLY "works", but consider this:
header[18] is not 32 or 64 bit byte aligned. Reading a 32 or 64 bit integer the program will suffer a performance hit as the processor plays games to read the unaligned data or die a painful death as the processor raises its digital middle finger and outright crashes.
The almost correct way to do this is
int width;
memcpy(&width, &header[18], sizeof(width));
But this ignores the fact that the header and the program may disagree on the size and byte order of an int.
Looking at the BMP spec, the image width is a little endian unsigned 16 bit integer. Reading that into the typical 32 bit int is going to result in utter garbage. At the very least use a uint16_t instead and pray for a little endian processor (a reasonable assumption on PC hardware).
uint16_t width;
memcpy(&width, &header[18], sizeof(width));
Better
uint16_t width = (uint8_t)header[19];
width <<= 8;
width += (uint8_t)header[18];
or similar to make sure the bytes go in the right order regardless of the native endian.
Related
I was working on a Huffman project to compress text files. I was able to generate the required codes. I read the whole file and accordingly stored the codes in a "vector char" variable. I also padded the encoded vector.
vector<char> padding(vector<char> text)
{
int num = text.size();
unsigned int pad_value = 32-(num%32);
for(int i=0;i<pad_value;i++){
text.push_back('0');
}
string pad_info = bitset<32>(pad_value).to_string();
for(int i=pad_info.length()-1;i>=0;i--){
text.insert(text.begin(),pad_info[i]);
}
return text;
}
I padded on the base of 32 bits, as I was thinking if using an array of "unsigned int" to directly store the integers in a binary file so that they occupy 4 bytes for every 32 characters. I used this function for that:
vector<unsigned int> build_byte_array(vector<char> padded_text)
{
vector<unsigned int> byte_arr;
for(int i=0;i<padded_text.size();i+=32)
{
string byte="";
for(int j=i;j<i+32;j++){
byte += padded_text[j];
}
unsigned int b = stoul(byte,nullptr,2);
//cout<<b<<":"<<byte<<endl;
byte_arr.push_back(b);
}
return byte_arr;
}
Now the problem is when I write this byte array to binary file using
ofstream output("compressed.bin",ios::binary);
for(int i=0;i<byte_array.size();i++){
unsigned int a = byte_array[i];
output.write((char*)(&a),sizeof(a));
}
I get a binary file which is bigger than the original text file. How do I solve that or what error am I making.
Edit : I tried to compress a file of about 2,493 KB (for testing purposes) and it generated a compressed.bin file of 3,431 KB. So, I don't think padding is the issue here.
I also tried with 15KB file but the size of always increases after using this algo.
I tried using:
for(int i=0;i<byte_array.size();i++){
unsigned int a = byte_array[i];
char b = (char)a;
output.write((char*)(&a),sizeof(b));
}
but after using this I am unable to recover the original byte array when decompressing the file.
unsigned int a = byte_array[i];
output.write((char*)(&a),sizeof(a));
The size of the write is sizeof(a) which is usually 4 bytes.
An unsigned int is not a byte. A more suitable type for a byte would be std::byte, uint8_t, or unsigned char.
You are expanding your data with padding, so if you're not getting much compression or there's not much data to begin with, the output could easily be larger.
You don't need to pad nearly as much as you do. First off, you are adding 32 bits when the data already ends on a word boundary (when num is a multiple of 32). Pad zero bits in that case. Second, you are inserting 32 bits at the start to record how many bits you padded, where five bits would suffice to encode 0..31. Third, you could write bytes instead of ints, so the padding on the end could be 0..7 bits, and you could prepend three bits instead of five. The padding overall could be reduced from your current 33..64 bits to 3..10 bits.
my task is to read metadata values from a unsigned char array, which contains the bytes of a binary .shp file (Shapefile)
unsigned char* bytes;
The header of the file which is stored in the array and the order of the information stored in it looks like this:
int32_t filecode // BigEndian
int32_t skip[5] // Uninteresting stuff
int32_t filelength // BigEndian
int32_t version // LitteEndian
int32_t shapetype // LitteEndian
// Rest of the header and of the filecontent which I don't need
So my question would be how can I extract this information (except the skip part of course) under consideration of the endianness and read it into the according variables.
I thought about using ifstream, but I couldnt figure out how to do it properly.
Example:
Read the first four bytes of the binary, ensure big endian byte order, store it in a int32_t. Then skip 5* 4 Bytes (5 * int32). Then read four bytes, ensure big endian byte order, and store it in a int32_t. Then read four bytes, ensure little endian byte order, and again store it in a int32_t and so on.
Thanks for your help guys!
So 'reading' a byte array just means extracting the bytes from the positions in the byte array where you know your data is stored. Then you just need to do the appropriate bit manipulations to handle the endianess. So for example, filecode would be this
filecode = (bytes[0] << 24) | (bytes[1] << 16) | (bytes[2] << 8) | bytes[3];
and version would be this
version = bytes[13] | (bytes[14] << 8) | (bytes[15] << 16) | (bytes[16] << 24);
(An offset of 13 for the version seems a bit odd, I'm just going on what you stated above).
Below is the code I have used to read 16-bit and 32-bit per sample wavs which work just fine.
My question is, how can I read the remaining 8-bit unsigned, 24-bit signed, and 32-bit float wavs?
Read one sample of a 16-bit signed wav:
short buffer;
file.read( ( char * ) &readbuffer, 2 );
Read one sample of a 32-bit signed wav:
int buffer;
file.read( ( char * ) &readbuffer, 4 );
You're making a few assumptions about the target machine. According to the Microsoft WAV format, All sample data is little-endian. You're also expecting the various data types to be the size that you want, which may not always be the case.
But as your current routines work for you, we won't focus on that for the moment (but you probably should fix these things at some point)
32 bit float
If we forget about the scary endian-ness and non-standard type sizes, the 32-bit float case becomes relatively straightforward using your other code as a template:
float buffer;
file.read( ( char * ) &buffer, 4 );
This question covers reading floats from a binary source in a lot more detail.
x bit unsigned
Since we know that your machine is correctly interpreting the 16 and 32 bit cases, we can assume it is little endian. Which means you can just read everything into an unsigned int that has been initialized to zero and the remaining bytes are already correctly padded for you:
unsigned int buffer = 0;
file.read( ( char * ) &buffer, 1 ); // 8bit unsigned integer
buffer = 0;
file.read( ( char * ) &buffer, 3 ); // 24bit unsigned integer
x bit signed
Finally, if you're reading a signed integer, you need to pad the remaining bytes of your buffer variable depending on the value of the number you just read:
If the number was positive you can just pad with 0
If the number was negative (highest bit of the most significant byte is set) then you pad with with \xFF bytes.
This code works on a 24 bit signed integer:
long buffer;
int number_of_bytes = 3; // 24 bit signed integer
file.read( (char *) &buffer, number_of_bytes);
// Determine the padding byte
unsigned char padding_byte = 0;
if ( ((char*) &buffer)[number_of_bytes - 1] & 128) {
padding_byte = 255;
}
// Pad the data
for (int i = number_of_bytes; i < sizeof(buffer); i++) {
((char*) &buffer)[i] = padding_byte;
}
Again, I feel I should point out that this code will fail on some machines because you're not checking endian-ness. But all you need to do to fix that is check the endian-ness of the machine that's running the code and reverse the order of the bytes if you're on a big-endian machine.
My background is php so entering the world of low-level stuff like char is bytes, which are bits, which is binary values, etc is taking some time to get the hang of.
What I am trying to do here is sent some values from an Ardunio board to openFrameWorks (both are c++).
What this script currently does (and works well for one sensor I might add) when asked for the data to be sent is:
int value_01 = analogRead(0); // which outputs between 0-1024
unsigned char val1;
unsigned char val2;
//some Complicated bitshift operation
val1 = value_01 &0xFF;
val2 = (value_01 >> 8) &0xFF;
//send both bytes
Serial.print(val1, BYTE);
Serial.print(val2, BYTE);
Apparently this is the most reliable way of getting the data across.
So now that it is send via serial port, the bytes are added to a char string and converted back by:
int num = ( (unsigned char)bytesReadString[1] << 8 | (unsigned char)bytesReadString[0] );
So to recap, im trying to get 4 sensors worth of data (which I am assuming will be 8 of those serialprints?) and to have int num_01 - num_04... at the end of it all.
Im assuming this (as with most things) might be quite easy for someone with experience in these concepts.
Write a function to abstract sending the data (I've gotten rid of your temporary variables because they don't add much value):
void send16(int value)
{
//send both bytes
Serial.print(value & 0xFF, BYTE);
Serial.print((value >> 8) & 0xFF, BYTE);
}
Now you can easily send any data you want:
send16(analogRead(0));
send16(analogRead(1));
...
Just send them one after the other.
Note that the serial driver lets you send one byte (8 bits) at a time. A value between 0 and 1023 inclusive (which looks like what you're getting) fits in 10 bits. So 1 byte is not enough. 2 bytes, i.e. 16 bits, are enough (there is some extra space, but unless transfer speed is an issue, you don't need to worry about this wasted space).
So, the first two bytes can carry the data for your first sensor. The next two bytes carry the data for the second sensor, the next two bytes for the third sensor, and the last two bytes for the last sensor.
I suggest you use the function that R Samuel Klatchko suggested on the sending side, and hopefully you can work out what you need to do on the receiving side.
int num = ( (unsigned char)bytesReadString[1] << 8 |
(unsigned char)bytesReadString[0] );
That code will not do what you expect.
When you shift an 8-bit unsigned char, you lose the extra bits.
11111111 << 3 == 11111000
11111111 << 8 == 00000000
i.e. any unsigned char, when shifted 8 bits, must be zero.
You need something more like this:
typedef unsigned uint;
typedef unsigned char uchar;
uint num = (static_cast<uint>(static_cast<uchar>(bytesReadString[1])) << 8 ) |
static_cast<uint>(static_cast<uchar>(bytesReadString[0]));
You might get the same result from:
typedef unsigned short ushort;
uint num = *reinterpret_cast<ushort *>(bytesReadString);
If the byte ordering is OK. Should work on Little Endian (x86 or x64), but not on Big Endian (PPC, Sparc, Alpha, etc.)
To generalise the "Send" code a bit --
void SendBuff(const void *pBuff, size_t nBytes)
{
const char *p = reinterpret_cast<const char *>(pBuff);
for (size_t i=0; i<nBytes; i++)
Serial.print(p[i], BYTE);
}
template <typename T>
void Send(const T &t)
{
SendBuff(&t, sizeof(T));
}
Can't exactly find a way on how to do the following in C/C++.
Input : hexdecimal values, for example: ffffffffff...
I've tried the following code in order to read the input :
uint16_t twoBytes;
scanf("%x",&twoBytes);
Thats works fine and all, but how do I split the 2bytes in 1bytes uint8_t values (or maybe even read the first byte only). Would like to read the first byte from the input, and store it in a byte matrix in a position of choosing.
uint8_t matrix[50][50]
Since I'm not very skilled in formating / reading from input in C/C++ (and have only used scanf so far) any other ideas on how to do this easily (and fast if it goes) is greatly appreciated .
Edit: Found even a better method by using the fread function as it lets one specify how many bytes it should read from the stream (stdin in this case) and save to a variable/array.
size_t fread ( void * ptr, size_t size, size_t count, FILE * stream );
Parameters
ptr - Pointer to a block of memory with a minimum size of (size*count) bytes.
size - Size in bytes of each element to be read.
count - Number of elements, each one with a size of size bytes.
stream - Pointer to a FILE object that specifies an input stream.
cplusplus ref
%x reads an unsigned int, not a uint16_t (thought they may be the same on your particular platform).
To read only one byte, try this:
uint32_t byteTmp;
scanf("%2x", &byteTmp);
uint8_t byte = byteTmp;
This reads an unsigned int, but stops after reading two characters (two hex characters equals eight bits, or one byte).
You should be able to split the variable like this:
uint8_t LowerByte=twoBytes & 256;
uint8_t HigherByte=twoBytes >> 8;
A couple of thoughts:
1) read it as characters and convert it manually - painful
2) If you know that there are a multiple of 4 hexits, you can just read in twobytes and then convert to one-byte values with high = twobytes << 8; low = twobyets & FF;
3) %2x