I'm trying to get information from a tiff image file. The output of Endian is correct but the rest of them are all wrong. The first 8 bytes of the tiff file is:
4d 4d 00 2a 00 02 03 60
The magicno I'm getting is 10752, which is 2A00 is HEX. But I should be reading the third and for bytes, which should be 002a. Need help please!!
Here's my code.
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
char buffer[3];
short magicno;
int ifdaddress;
short ifdcount;
ifstream imfile;
imfile.open("pooh.tif",ios::binary);
imfile.seekg(0,ios::beg);
imfile.read(buffer,2);
imfile.read((char*)&magicno, 2);
imfile.read((char*)&ifdaddress, 4);
imfile.seekg(ifdaddress, ios::beg);
imfile.read((char*)&ifdcount, 2);
imfile.close();
buffer[2]='\0';
cout<<"Endian: "<<buffer<<endl;
cout<<"Magic: "<<magicno<<endl;
cout<<"IFD Address: "<<ifdaddress<<endl;
cout<<"IFD CountL "<<ifdcount<<endl;
return 0;
}
My output is:
Endian: MM
Magic: 10752
IFD Address: 1610809856
IFD CountL 0
You read the endianness marker correctly but you do not act upon it. From Adobe's "TIFF 6":
Bytes 0-1:
The byte order used within the file. Legal values are:
“II” (4949.H)
“MM” (4D4D.H)
In the “II” format, byte order is always from the least significant byte to the most significant byte, for both 16-bit and 32-bit integers. This is called little-endian byte order. In the “MM” format, byte order is always from most significant to least significant, for both 16-bit and 32-bit integers. This is called big-endian byte order.
(https://partners.adobe.com/public/developer/en/tiff/TIFF6.pdf)
You need two sets of routines to read a short integer from a TIFF file (and also to read the longer integral types): one that reads Motorola ("MM") big-endian numbers, and one that reads Intel ("II") little-endians.
As it is, you must be one a little-endian system while attemtping to natively read big-endian numbers.
The code to correctly read a word can be as simple as
unsigned char d1,d2;
imfile.read (&d1,1);
imfile.read (&d2,1);
if (magicno == 0x4949)
word = d1 + (d2<<8);
else
word = (d1<<8)+d2;
Untested, but the general idea should be clear. Best make it a functionl because you need a similar setup for the "LONG" data type, which in turn is needed for the "RATIONAL" datatype.
Ultimately, for TIFF files, you may want a generalized read_data function which first checks what data type is stored in the file and then calls the correct routine.
Related
I was working on a Huffman project to compress text files. I was able to generate the required codes. I read the whole file and accordingly stored the codes in a "vector char" variable. I also padded the encoded vector.
vector<char> padding(vector<char> text)
{
int num = text.size();
unsigned int pad_value = 32-(num%32);
for(int i=0;i<pad_value;i++){
text.push_back('0');
}
string pad_info = bitset<32>(pad_value).to_string();
for(int i=pad_info.length()-1;i>=0;i--){
text.insert(text.begin(),pad_info[i]);
}
return text;
}
I padded on the base of 32 bits, as I was thinking if using an array of "unsigned int" to directly store the integers in a binary file so that they occupy 4 bytes for every 32 characters. I used this function for that:
vector<unsigned int> build_byte_array(vector<char> padded_text)
{
vector<unsigned int> byte_arr;
for(int i=0;i<padded_text.size();i+=32)
{
string byte="";
for(int j=i;j<i+32;j++){
byte += padded_text[j];
}
unsigned int b = stoul(byte,nullptr,2);
//cout<<b<<":"<<byte<<endl;
byte_arr.push_back(b);
}
return byte_arr;
}
Now the problem is when I write this byte array to binary file using
ofstream output("compressed.bin",ios::binary);
for(int i=0;i<byte_array.size();i++){
unsigned int a = byte_array[i];
output.write((char*)(&a),sizeof(a));
}
I get a binary file which is bigger than the original text file. How do I solve that or what error am I making.
Edit : I tried to compress a file of about 2,493 KB (for testing purposes) and it generated a compressed.bin file of 3,431 KB. So, I don't think padding is the issue here.
I also tried with 15KB file but the size of always increases after using this algo.
I tried using:
for(int i=0;i<byte_array.size();i++){
unsigned int a = byte_array[i];
char b = (char)a;
output.write((char*)(&a),sizeof(b));
}
but after using this I am unable to recover the original byte array when decompressing the file.
unsigned int a = byte_array[i];
output.write((char*)(&a),sizeof(a));
The size of the write is sizeof(a) which is usually 4 bytes.
An unsigned int is not a byte. A more suitable type for a byte would be std::byte, uint8_t, or unsigned char.
You are expanding your data with padding, so if you're not getting much compression or there's not much data to begin with, the output could easily be larger.
You don't need to pad nearly as much as you do. First off, you are adding 32 bits when the data already ends on a word boundary (when num is a multiple of 32). Pad zero bits in that case. Second, you are inserting 32 bits at the start to record how many bits you padded, where five bits would suffice to encode 0..31. Third, you could write bytes instead of ints, so the padding on the end could be 0..7 bits, and you could prepend three bits instead of five. The padding overall could be reduced from your current 33..64 bits to 3..10 bits.
I have implemented the Huffman coding algorithm in C++, and it's working fine. I want to create a text compression algorithm.
behind every file or data in the digital world, there is 0/1.
I want to persist the sequence of bits(0/1) that are generated by the Huffman encoding algorithm in the file.
my goal is to save the number of bits used in the file to store. I'm storing metadata for decoding in a separate file. I want to write bit by bit data to file, and then read the same bit by bit in c++.
the problem I'm facing with the binary mode is that it not allowing me to put data bit by bit.
I want to put "10101" as bit by bit to file but it put asci values or 8-bits of each character at a time.
code
#include "iostream"
#include "fstream"
using namespace std;
int main(){
ofstream f;
f.open("./one.bin", ios::out | ios::binary);
f<<"10101";
f.close();
return 0;
}
output
any help or pointer to help is appreciated. thank you.
"Binary mode" means only that you have requested that the actual bytes you write are not corrupted by end-of-line conversions. (This is only a problem on Windows. No other system has the need to deliberately corrupt your data.)
You are still writing a byte at a time in binary mode.
To write bits, you accumulate them in an integer. For convenience, in an unsigned integer. This is your bit buffer. You need to decide whether to accumulate them from the least to most or from the most to least significant positions. Once you have eight or more bits accumulated, you write out one byte to your file, and remove those eight bits from the buffer.
When you're done, if there are bits left in your buffer, you write out those last one to seven bits to one byte. You need to carefully consider how exactly you do that, and how to know how many bits there were, so that you can properly decode the bits on the other end.
The accumulation and extraction are done using the bit operations in your language. In C++ (and many other languages), those are & (and), | (or), >> (right shift), and << (left shift).
For example, to insert one bit, x, into your buffer, and later three bits in y, ending up with the earliest bits in the most significant positions:
unsigned buf = 0, bits = 0;
...
// some loop
{
...
// write one bit (don't need the & if you know x is 0 or 1)
buf = (buf << 1) | (x & 1);
bits++;
...
// write three bits
buf = (buf << 3) | (y & 7);
bits += 3;
...
// write bytes from the buffer before it fills the integer length
if (bits >= 8) { // the if could be a while if expect 16 or more
// out is an ostream -- must be in binary mode if on Windows
bits -= 8;
out.put(buf >> bits);
}
...
}
...
// write any leftover bits (it is assumed here that bits is in 0..7 --
// if not, first repeat if or while from above to clear out bytes)
if (bits) {
out.put(buf << (8 - bits));
bits = 0;
}
...
I'm trying read width and height of bmp file in C++. width and height is 512 of my bmp file. 18. byte is width value of bmp files.
My code show below:
int main() {
char header[54];
ifstream bmp;
bmp.open("image.bmp", ios::in | ios::binary);
if (!bmp) {
cout << "Error" << endl;
system("PAUSE");
exit(1);
}
bmp.read(header, 54);
char a = header[18];
int b = *(int *)&header[18];
system("PAUSE");
}
How does b become 512 when a is '\0' ? Sorry my bad English.
Because char is one byte and int is (probably) 4 bytes. 512 as a four-byte int would look something like 0x00 0x00 0x02 0x00. Notice that only one of those four bytes is non-zero.
char a = header[18]; just reads the single byte at position 18, while int b = *(int *)&header[18]; interprets positions 18 through 21 as a single int value.
According to the BMP file format, the width is a signed integer comprising 4 bytes, not one byte.
So if the width is 512, then - when represented as a 4 byte integer, the bytes are 0x00 - 0x00 - 0x02 - 0x00 and stored from position 18 to 21, respectively. So position 18 is 0x00 (your value of a), whereas the width when interpreted as a 4 byte signed integer from position 18 to 21 is 512.
Important note that I'm not certain can be expanded upon in a comment.
int b = *(int *)&header[18];
treats header[18] as though it was an int. It's not an int. Welcome to Strict Aliasing. You have converted types which are not the same What you do here is walk into undefined behaviour. It's one of the sort of Undefined behaviour that USUALLY "works", but consider this:
header[18] is not 32 or 64 bit byte aligned. Reading a 32 or 64 bit integer the program will suffer a performance hit as the processor plays games to read the unaligned data or die a painful death as the processor raises its digital middle finger and outright crashes.
The almost correct way to do this is
int width;
memcpy(&width, &header[18], sizeof(width));
But this ignores the fact that the header and the program may disagree on the size and byte order of an int.
Looking at the BMP spec, the image width is a little endian unsigned 16 bit integer. Reading that into the typical 32 bit int is going to result in utter garbage. At the very least use a uint16_t instead and pray for a little endian processor (a reasonable assumption on PC hardware).
uint16_t width;
memcpy(&width, &header[18], sizeof(width));
Better
uint16_t width = (uint8_t)header[19];
width <<= 8;
width += (uint8_t)header[18];
or similar to make sure the bytes go in the right order regardless of the native endian.
I'm building some code to read a RIFF wav file and I've bumped into something odd.
The first 4 bytes of the file header are the word RIFF in big-endian ascii coding:
0x5249 0x4646
I read this first element using:
char *fileID = new char[4];
filestream.read(fileID,4);
When I write this to screen the results are as expected:
std::cout << fileID << std::endl;
>> RIFF
Now, the next 4 bytes give the size of the file, but crucially they're little-endian.
So, I write a little function to flip the bytes, based on a union:
int flip4bytes(char* input){
union flip {int flip_int; char flip_char[4];};
flip.flip_char[0] = input[3];
flip.flip_char[1] = input[2];
flip.flip_char[2] = input[1];
flip.flip_char[3] = input[0];
return flip.flip_int;
}
This looks good to me, except when I call it, the value returned is totally wrong. Interestingly, the following code (where the bytes are not reversed!) works correctly:
int flip4bytes(char* input){
union flip {int flip_int; char flip_char[4];};
flip.flip_char[0] = input[0];
flip.flip_char[1] = input[1];
flip.flip_char[2] = input[2];
flip.flip_char[3] = input[3];
return flip.flip_int;
}
This has thoroughly confused me. Is the union somehow reversing the bytes for me?! If not, how are the bytes being converted to int correctly without being reversed?
I think there's some facet of endian-ness here that I'm ignorant to..
You are simply on a little-endian machine, and the "RIFF" string is just a string and thus neither little- nor big-endian, but just a sequence of chars. You don't need to reverse the bytes on a little-endian machine, but you need to when operating on a big-endian.
You need to figure of the endianess of your machine. #include <sys/param.h> will help you do that.
You could also use the fact that network byte order is big ended (if my memory serves me correctly - you need to check). In which case convert to big ended and use the ntohs function. That should work on any machine that you compile the code on.
Can't exactly find a way on how to do the following in C/C++.
Input : hexdecimal values, for example: ffffffffff...
I've tried the following code in order to read the input :
uint16_t twoBytes;
scanf("%x",&twoBytes);
Thats works fine and all, but how do I split the 2bytes in 1bytes uint8_t values (or maybe even read the first byte only). Would like to read the first byte from the input, and store it in a byte matrix in a position of choosing.
uint8_t matrix[50][50]
Since I'm not very skilled in formating / reading from input in C/C++ (and have only used scanf so far) any other ideas on how to do this easily (and fast if it goes) is greatly appreciated .
Edit: Found even a better method by using the fread function as it lets one specify how many bytes it should read from the stream (stdin in this case) and save to a variable/array.
size_t fread ( void * ptr, size_t size, size_t count, FILE * stream );
Parameters
ptr - Pointer to a block of memory with a minimum size of (size*count) bytes.
size - Size in bytes of each element to be read.
count - Number of elements, each one with a size of size bytes.
stream - Pointer to a FILE object that specifies an input stream.
cplusplus ref
%x reads an unsigned int, not a uint16_t (thought they may be the same on your particular platform).
To read only one byte, try this:
uint32_t byteTmp;
scanf("%2x", &byteTmp);
uint8_t byte = byteTmp;
This reads an unsigned int, but stops after reading two characters (two hex characters equals eight bits, or one byte).
You should be able to split the variable like this:
uint8_t LowerByte=twoBytes & 256;
uint8_t HigherByte=twoBytes >> 8;
A couple of thoughts:
1) read it as characters and convert it manually - painful
2) If you know that there are a multiple of 4 hexits, you can just read in twobytes and then convert to one-byte values with high = twobytes << 8; low = twobyets & FF;
3) %2x