Can't exactly find a way on how to do the following in C/C++.
Input : hexdecimal values, for example: ffffffffff...
I've tried the following code in order to read the input :
uint16_t twoBytes;
scanf("%x",&twoBytes);
Thats works fine and all, but how do I split the 2bytes in 1bytes uint8_t values (or maybe even read the first byte only). Would like to read the first byte from the input, and store it in a byte matrix in a position of choosing.
uint8_t matrix[50][50]
Since I'm not very skilled in formating / reading from input in C/C++ (and have only used scanf so far) any other ideas on how to do this easily (and fast if it goes) is greatly appreciated .
Edit: Found even a better method by using the fread function as it lets one specify how many bytes it should read from the stream (stdin in this case) and save to a variable/array.
size_t fread ( void * ptr, size_t size, size_t count, FILE * stream );
Parameters
ptr - Pointer to a block of memory with a minimum size of (size*count) bytes.
size - Size in bytes of each element to be read.
count - Number of elements, each one with a size of size bytes.
stream - Pointer to a FILE object that specifies an input stream.
cplusplus ref
%x reads an unsigned int, not a uint16_t (thought they may be the same on your particular platform).
To read only one byte, try this:
uint32_t byteTmp;
scanf("%2x", &byteTmp);
uint8_t byte = byteTmp;
This reads an unsigned int, but stops after reading two characters (two hex characters equals eight bits, or one byte).
You should be able to split the variable like this:
uint8_t LowerByte=twoBytes & 256;
uint8_t HigherByte=twoBytes >> 8;
A couple of thoughts:
1) read it as characters and convert it manually - painful
2) If you know that there are a multiple of 4 hexits, you can just read in twobytes and then convert to one-byte values with high = twobytes << 8; low = twobyets & FF;
3) %2x
Related
I was working on a Huffman project to compress text files. I was able to generate the required codes. I read the whole file and accordingly stored the codes in a "vector char" variable. I also padded the encoded vector.
vector<char> padding(vector<char> text)
{
int num = text.size();
unsigned int pad_value = 32-(num%32);
for(int i=0;i<pad_value;i++){
text.push_back('0');
}
string pad_info = bitset<32>(pad_value).to_string();
for(int i=pad_info.length()-1;i>=0;i--){
text.insert(text.begin(),pad_info[i]);
}
return text;
}
I padded on the base of 32 bits, as I was thinking if using an array of "unsigned int" to directly store the integers in a binary file so that they occupy 4 bytes for every 32 characters. I used this function for that:
vector<unsigned int> build_byte_array(vector<char> padded_text)
{
vector<unsigned int> byte_arr;
for(int i=0;i<padded_text.size();i+=32)
{
string byte="";
for(int j=i;j<i+32;j++){
byte += padded_text[j];
}
unsigned int b = stoul(byte,nullptr,2);
//cout<<b<<":"<<byte<<endl;
byte_arr.push_back(b);
}
return byte_arr;
}
Now the problem is when I write this byte array to binary file using
ofstream output("compressed.bin",ios::binary);
for(int i=0;i<byte_array.size();i++){
unsigned int a = byte_array[i];
output.write((char*)(&a),sizeof(a));
}
I get a binary file which is bigger than the original text file. How do I solve that or what error am I making.
Edit : I tried to compress a file of about 2,493 KB (for testing purposes) and it generated a compressed.bin file of 3,431 KB. So, I don't think padding is the issue here.
I also tried with 15KB file but the size of always increases after using this algo.
I tried using:
for(int i=0;i<byte_array.size();i++){
unsigned int a = byte_array[i];
char b = (char)a;
output.write((char*)(&a),sizeof(b));
}
but after using this I am unable to recover the original byte array when decompressing the file.
unsigned int a = byte_array[i];
output.write((char*)(&a),sizeof(a));
The size of the write is sizeof(a) which is usually 4 bytes.
An unsigned int is not a byte. A more suitable type for a byte would be std::byte, uint8_t, or unsigned char.
You are expanding your data with padding, so if you're not getting much compression or there's not much data to begin with, the output could easily be larger.
You don't need to pad nearly as much as you do. First off, you are adding 32 bits when the data already ends on a word boundary (when num is a multiple of 32). Pad zero bits in that case. Second, you are inserting 32 bits at the start to record how many bits you padded, where five bits would suffice to encode 0..31. Third, you could write bytes instead of ints, so the padding on the end could be 0..7 bits, and you could prepend three bits instead of five. The padding overall could be reduced from your current 33..64 bits to 3..10 bits.
my task is to read metadata values from a unsigned char array, which contains the bytes of a binary .shp file (Shapefile)
unsigned char* bytes;
The header of the file which is stored in the array and the order of the information stored in it looks like this:
int32_t filecode // BigEndian
int32_t skip[5] // Uninteresting stuff
int32_t filelength // BigEndian
int32_t version // LitteEndian
int32_t shapetype // LitteEndian
// Rest of the header and of the filecontent which I don't need
So my question would be how can I extract this information (except the skip part of course) under consideration of the endianness and read it into the according variables.
I thought about using ifstream, but I couldnt figure out how to do it properly.
Example:
Read the first four bytes of the binary, ensure big endian byte order, store it in a int32_t. Then skip 5* 4 Bytes (5 * int32). Then read four bytes, ensure big endian byte order, and store it in a int32_t. Then read four bytes, ensure little endian byte order, and again store it in a int32_t and so on.
Thanks for your help guys!
So 'reading' a byte array just means extracting the bytes from the positions in the byte array where you know your data is stored. Then you just need to do the appropriate bit manipulations to handle the endianess. So for example, filecode would be this
filecode = (bytes[0] << 24) | (bytes[1] << 16) | (bytes[2] << 8) | bytes[3];
and version would be this
version = bytes[13] | (bytes[14] << 8) | (bytes[15] << 16) | (bytes[16] << 24);
(An offset of 13 for the version seems a bit odd, I'm just going on what you stated above).
What I want to do is store the data in a std::vector<short> in a std::vector<uint8_t>, splitting each short into two uint8_t values. I need to do this because I have a network application that will only send std::vector<uint8_t>'s, so I need to convert to uint8_t to send and then convert back when I receive the uint8_t vector.
Normally what i would do (and what I saw when I looked up the problem) is:
std::vector<uint8_t> newVec(oldvec.begin(),oldvec.end());
However, if i understand correctly this will take each individual short value, truncate to the size of a uint8_t, and make a new vector of half the amount of data and the same number of entries, when what i want is the same amount of data with twice as many entries.
solutions that include a way to reverse the process and that avoid copying as much as possible would help a lot. Thanks!
to split something at the 8 bit boundary, you can use right shifts and masks, i.e.
uint16_t val;
uint8_t low = val & 0xFF;
uint8_t high = (val >> 8) & 0xFF;
now you can put your high and low into the second vector in your order.
For splitting and merging, you would have the following:
unsigned short oldShort;
uint8_t char1 = oldShort & 0xFF; // lower byte
uint8_t char2 = oldShort >> 8; // upper byte
Then push the two parts onto the vector, and send it off to your network library. On the receiving end, during re-assembly, you would read the next two bytes off of the vector and combine them back into the short.
Note: Make sure that there are an even number of elements on the received vector such that you didn't obtain corrupted/modified data during transit.
// Read off the next two characters and merge them again
unsigned short mergedShort = (char2 << 8) | char1;
I need to do this because I have a network application1 that will only send std::vector's
Besides masking and bit shifting you should take endianess into account when sending stuff over the wire.
The network representation of data is usually big endian. So you can always put the MSB first. Provide a simple function like:
std::vector<uint8_t> networkSerialize(const std::vector<uint16_t>& input) {
std::vector<uint8_t> output;
output.reserve(input.size() * sizeof(uint16_t)); // Pre-allocate for sake of
// performance
for(auto snumber : input) {
output.push_back((snumber & 0xFF00) >> 8); // Extract the MSB
output.push_back((snumber & 0xFF)); // Extract the LSB
}
return output;
}
and use it like
std::vector<uint8_t> newVec = networkSerialize(oldvec);
See live demo.
1)Emphasis mine
Disclaimer: People are talking about "network byte order". If you send something huger than 1 byte, of course you need to take network endiannes into account. However, as far as I understand the limitation "network application that will only send std::vector<uint8_t>" explicitly states that "I don't want to mess with any of that endianness stuff". uint8_t is already a one byte and if you send a sequence of bytes in an one order, you should get them back in the exactly same order. This is helpful: sending the array through a socket. There can be different system endianness on client and server machines but OP said nothing about it so that is a different story...
Regarding the answer:
Assuming all "endianness" questions are closed.
If you just want to send a vector of shorts, I believe, VTT`s answer will perform the best. However, if std::vector<short> is just a particular case, you can use pack() function from my answer to a similar question. It packs any iterable container, string, C-string and more... into a vector of bytes and does not perform any endiannes shenanigans. Just include byte_pack.h and then you can use it like this:
#include "byte_pack.h"
void cout_bytes(const std::vector<std::uint8_t>& bytes)
{
for(unsigned byte : bytes) {
std::cout << "0x" << std::setfill('0') << std::setw(2) << std::hex
<< byte << " ";
}
std::cout << std::endl;
}
int main()
{
std::vector<short> test = { (short) 0xaabb, (short) 0xccdd };
std::vector<std::uint8_t> test_result = pack(test);
cout_bytes(test_result); // -> 0xbb 0xaa 0xdd 0xcc (remember of endianness)
return 0;
}
Just copy everything in one go:
::std::vector<short> shorts;
// populate shorts...
::std::vector<uint8_t> bytes;
::std::size_t const bytes_count(shorts.size() * sizeof(short) / sizeof(uint8_t));
bytes.resize(bytes_count);
::memcpy(bytes.data(), shorts.data(), bytes_count);
I have to copy the following structure to a char[] buffer.
struct AMG_ANGLES {
unsigned char bIsEnCrypted;
unsigned char bIsError;
unsigned short usErrorFlag;
unsigned char byteNumDABs;
unsigned short usBagId;
unsigned short usKvMa;
unsigned char byteDataType;
};
AMG_ANGLES struct_data;
struct_data.bIsEnCrypted = 1;
struct_data.bIsError = 1;
struct_data.usErrorFlag = 2;
struct_data.byteNumDABs = 1;
struct_data.usBagId =10;
struct_data.usKvMa=20;
struct_data.byteDataType = 1;
// here I am coping structure to a char buffer
char sendbuf[sizeof(struct_data)] = "";
memcpy(sendbuf,(char*)&struct_data, sizeof(struct_data));
on copy the buffer having first two unsigned char data and short (1,1,2) and size is only 3 bytes. reaming data was not copying.
Please help where i am doing wrong.
I tried following way also
memcpy(sendbuf+0, &struct_data.bIsEnCrypted, sizeof(struct_data.bIsEnCrypted));
memcpy(sendbuf+1, &struct_data.bIsError, sizeof(struct_data.bIsError));
memcpy(sendbuf+2, &struct_data.usErrorFlag, sizeof(struct_data.usErrorFlag));
memcpy(sendbuf+4, &struct_data.byteNumDABs, sizeof(struct_data.byteNumDABs));
memcpy(sendbuf+6, &struct_data.usBagId, sizeof(struct_data.usBagId));
memcpy(sendbuf+8, &struct_data.usKvMa, sizeof(struct_data.usKvMa));
memcpy(sendbuf+10, &struct_data.byteDataType, sizeof(struct_data.byteDataType));
same result I am getting.
Your code is fine; your approach to determine whether the contents of the buffer are correct is flawed.
You have not told us how you have determined that the contents of the buffer are wrong, but from your description I suspect that you did something like printf( "%s\n", sendbuf ). Well, that won't work, because your buffer does not really contain characters, it contains binary data.
Specifically, your short usErrorFlag is two bytes long, and since the value you store in it is 2, this means that it will be stored in sendbuf in two consecutive bytes, one byte will have the value of 0x02 and the next byte will have the value of 0x00. (Assuming, from hints in your description, that your hardware is "Little Endian".) So, when you try to view the contents of your sendbuf as a string, printf() will stop printing as soon as it encounters the 0x00 byte.
So, your code is correct. Proceed with sending your sendbuf to your UDP socket.
If I read "sendbuf" I immediately assume that you are sending data from one computer to another. These computers will have different compilers, the compilers will for example order their bytes in a different order. memcpy isn't going to work on all compilers.
I suggest you find where the contents of sendbuf is documented, and assign the individual bytes accordingly. For example
sendbuf [0] = struct_data.bIsEncrypted;
sendbuf [1] = struct_data.bIsError;
sendbuf [2] = struct_data.uIsErrorFlag >> 8;
sendbuf [3] = struct_data.uIsErrorFlag & 0xff;
This makes your code independent of byte ordering, independent of struct padding, independent of reordering of items once you are not using a POD, and so on. In your case I would bet money that there is at least padding between byteNumDABs and usBagId, and at the end.
(Bytes 2 and 3 might be exactly the other way round, that's why you find a spec for that data structure).
I'm building some code to read a RIFF wav file and I've bumped into something odd.
The first 4 bytes of the file header are the word RIFF in big-endian ascii coding:
0x5249 0x4646
I read this first element using:
char *fileID = new char[4];
filestream.read(fileID,4);
When I write this to screen the results are as expected:
std::cout << fileID << std::endl;
>> RIFF
Now, the next 4 bytes give the size of the file, but crucially they're little-endian.
So, I write a little function to flip the bytes, based on a union:
int flip4bytes(char* input){
union flip {int flip_int; char flip_char[4];};
flip.flip_char[0] = input[3];
flip.flip_char[1] = input[2];
flip.flip_char[2] = input[1];
flip.flip_char[3] = input[0];
return flip.flip_int;
}
This looks good to me, except when I call it, the value returned is totally wrong. Interestingly, the following code (where the bytes are not reversed!) works correctly:
int flip4bytes(char* input){
union flip {int flip_int; char flip_char[4];};
flip.flip_char[0] = input[0];
flip.flip_char[1] = input[1];
flip.flip_char[2] = input[2];
flip.flip_char[3] = input[3];
return flip.flip_int;
}
This has thoroughly confused me. Is the union somehow reversing the bytes for me?! If not, how are the bytes being converted to int correctly without being reversed?
I think there's some facet of endian-ness here that I'm ignorant to..
You are simply on a little-endian machine, and the "RIFF" string is just a string and thus neither little- nor big-endian, but just a sequence of chars. You don't need to reverse the bytes on a little-endian machine, but you need to when operating on a big-endian.
You need to figure of the endianess of your machine. #include <sys/param.h> will help you do that.
You could also use the fact that network byte order is big ended (if my memory serves me correctly - you need to check). In which case convert to big ended and use the ntohs function. That should work on any machine that you compile the code on.