Best way to merge hex strings in c++? [heavily edited] - c++

I have two hex strings, accompanied by masks, that I would like to merge into a single string value/mask pair. The strings may have bytes that overlap but after applying masks, no overlapping bits should contradict what the value of that bit must be, i.e. value1 = 0x0A mask1 = 0xFE and value2 = 0x0B, mask2 = 0x0F basically says that the resulting merge must have the upper nibble be all '0's and the lower nibble must be 01011
I've done this already using straight c, converting strings to byte arrays and memcpy'ing into buffers as a prototype. It's tested and seems to work. However, it's ugly and hard to read and doesn't throw exceptions for specific bit requirements that contradict. I've considered using bitsets, but is there another way that might not demand the conversion overhead? Performance would be nice, but not crucial.
EDIT: More detail, although writing this makes me realize I've made a simple problem too difficult. But, here it is, anyway.
I am given a large number of inputs that are binary searches of a mixed-content document. The document is broken into pages, and pages are provided by an api the delivers a single page at a time. Each page needs to be searched with the provided search terms.
I have all the search terms prior to requesting pages. The input are strings representing hex digits (this is what I mean by hex strings) as well a mask to indicate bits that are significant in the input hex string. Since I'm given all input up-front I wanted to improve the search of each page returned. I wanted to pre-process merge these hex strings together. To make the problem more interesting, every string has an optional offset into the page where they must appear and a lack of an offset indicates that the string can appear anywhere in a page requested. So, something like this:
class Input {
public:
int input_id;
std::string value;
std::string mask;
bool offset_present;
unsigned int offset;
};
If a given Input object has offset_present = false, then any value assigned to offset is ignored. If offset_present is false, then it clearly can't be merged with other inputs.
To make the problem more interesting, I want to report an output that provides information about what was found (input_id that was found, where the offset was, etc). Merging some input (but not others) makes this a bit more difficult.
I had considered defining a CompositeInput class and was thinking about the underlying merger be a bitset, but further reading about about bitsets made me realize it wasn't what I really thought. My inexperience made me give up on the composite idea and go brute force. I necessarily skipped some details about other input types an additional information to be collected for the output (say, page number, parag. number) when an input is found. Here's an example output class:
class Output {
public:
Output();
int id_result;
unsigned int offset_result;
};
I would want to product N of these if I merge N hex strings, keeping any merger details hidden from the user.

I don't know what a hexstring is... but other than that it should be like this:
outcome = (value1 & mask1) | (value2 & mask2);

it sounds like |, & and ~ would work?

const size_t prefix = 2; // "0x"
const size_t bytes = 2;
const char* value1 = "0x0A";
const char* mask1 = "0xFE";
const char* value2 = "0x0B";
const char* mask2 = "0x0F";
char output[prefix + bytes + 1] = "0x";
uint8_t char2int[] = { /*zeroes until index '0'*/ 0,1,2,3,4,5,6,7,8,9 /*...*/ 10,11,12,13,14,15 };
char int2char[] = { '0', /*...*/ 'F' };
for (size_t ii = prefix; ii != prefix + bytes; ++ii)
{
uint8_t result1 = char2int[value1[ii]] & char2int[mask1[ii]];
uint8_t result2 = char2int[value2[ii]] & char2int[mask2[ii]];
if (result1 & result2)
throw invalid_argument("conflicting bits");
output[ii] = int2char[result1 | result2];
}

Related

Comparing a negative and positive integer's byte array values [duplicate]

This question already exists:
Comparing a negative and positive integer's byte array values in C++ [closed]
Closed 1 year ago.
I am pretty much new to C++ and especially signed and unsigned conversions. Currently I am doing one exercise, where I need to compare a value if it is between a two values (one minimum and maximum).
For example
minimum value = -319 - it's byte array (of int8) will be {254, 193}
maximum value = 400 - it's byte array (of int8) {1, 144}
Assume the value to be compared between is -200 {255, 56}, which should be valid, but I am not able to get it correctly.
Assume the compare value, minimum and maximum value bytes is ANDed with 255 unsigned mask byte, which is then stored in int8 byte and then compared between min and max value. Due to the sign bit I am confused and not able to correctly get the comparison, but for all positive values is working fine.
The methods I tried were,
First I tried to convert int8_t to int16_t
int16_t compare = (int16_t)(inp[i] & 0xFF);
int16_t minVal = (int16_t)(minimum[i] & 0xFF);
int16_t maxVal = (int16_t)(maximum[i] & 0xFF); return (compare>=minVal&&compare<=maxVal)
Second method was I set 0 to MS bit in mask byte and then comparing each byte, since I thought comparison with that bit is not needed, but I am not sure if it is right as the data might get wrong.
Third try was to set 0 the MS Bit while comparing . I tried it and this is also a failure.
Please let me know how to compare in this scenario. Thank you
So to clarify the question I get a three 2-element byte arrays, and the function will return true, if the value byte array is inbetween min and max byte array.
So the comparison algorithm above is ran in a for loop:
bool isValueInBetween(uint8_t* min, uint8_t* max, uint8_t* val){
bool is_match = true;
for(int i = 0; i < 2; i++){
int8_t compare = (inp[i] & 0xFF); //masking is used just because in the book it was mentioned, in case different mask occurs, but I think in this case it is not needed
int8_t minVal = (minimum[i] & 0xFF);
int8_t maxVal = (maximum[i] & 0xFF);
if (!(compare>=minVal&&compare<=maxVal))
is_match = false;
}
return is_match;
}
The above algorithm works for positive values, but doesn't work in negative or mixed value scenarios(say min in negative and max in positive). Please let me know any good books for C++.
As written, the question is a little hard to understand without the exact problem statement. As of this posting, it reads as if the question was "Write a function to compare two integer values." and you took it way beyond the extreme and assumed they meant down to the binary level...
What's wrong with using the logical comparison operators directly like so:
bool isValueInBetween(int8_t val, int8_t min, int8_t max) {
return min <= val && val <= max;
}

Add a bit value to a string

I am trying to send a packet over a network and so want it to be as small as possible (in terms of size).
Each of the input can contain a common prefix substring, like ABCD. In such cases, I just wanna send a single bit say, 1 denoting that the current string has the same prefix ABCD and append it to the remaining string. So, if the string was ABCDEF, I will send 1EF; if it was LKMPS, I wish to send the string LKMPS as is.
Could someone please point out how I could add a bit to a string?
Edit: I get that adding a 1 to a string does not mean that this 1 is a bit - it is just a character that I added to the string. And that exactly is my question - for each string, how do I send a bit denoting that the prefix matches? And then send the remaining part of the string that is different?
In common networking hardware, you won't be able to send individual bits. And most architectures cannot address individual bits, either.
However, you can still minimize the size as you want by using one of the bits that you may not be using. For instance, if your strings contain only 7-bit ASCII characters, you could use the highest bit to encode the information you want in the first byte of the string.
For example, if the first byte is:
0b01000001 == 0x41 == 'A'
Then set the highest bit using |:
(0b01000001 | 0x80) == 0b11000001 == 0xC1
To test for the bit, use &:
(0b01000001 & 0x80) == 0
(0b11000001 & 0x80) != 0
To remove the bit (in the case where it was set) to get back the original first byte:
(0b11000001 & 0x7F) == 0b01000001 == 0x41 == 'A'
If you're working with a buffer for use in your communications protocol, it should generally not be an std::string. Standard strings are not intended for use as buffers; and they can't generally be prepended in-place with anything.
It's possible that you may be better served by an std::vector<std::byte>; or by a (compile-time-fixed-size) std::array. Or, again, a class of your own making. That is especially true if you want your "in-place" prepending of bits or characters to not merely keep the same span of memory for your buffer, but to actually not move any of the existing data. For that, you'd need twice the maximum length of the buffer, and start it at the middle, so you can either append or prepend data without shifting anything - while maintaining bit-resolution "pointers" to the effective start and end of the full part of the buffer. This is would be readily achievable with, yes you guessed it, your own custom buffer class.
I think the smallest amount of memory you can work with is 8 bits.
If you wanted to operate with bits, you could specify 8 prefixes as follows:
#include <iostream>
using namespace std;
enum message_header {
prefix_on = 1 << 0,
bitdata_1 = 1 << 1,
bitdata_2 = 1 << 2,
bitdata_3 = 1 << 3,
bitdata_4 = 1 << 4,
bitdata_5 = 1 << 5,
bitdata_6 = 1 << 6,
bitdata_7 = 1 << 7,
};
int main() {
uint8_t a(0);
a ^= prefix_1;
if(a & prefix_on) {
std::cout << "prefix_on" << std::endl;
}
}
That being said, networks pretty fast nowadays so I wouldn't do it.

Get 32-bit hash value from boost::hash

I am using boost::hash to get hash value for a string.
But it is giving different hash values for same string on Windows 32-bit and Debian 64-bit systems.
So how can I get same hash value (32-bit or 64-bit) using boost::hash irrespective of platform?
What is the guarantee concerning boost::hash? I don't see any
guarantees that a generated hash code is usable outside of the
process which generates it. (This is frequently the case with
hash functions.) If you need a hash value for external data,
valid over different programs and different platforms (e.g. for
a hashed access to data on disk), then you'll have to write your
own. Something like:
uint32_t
hash( std::string const& key )
{
uint32_t results = 12345;
for ( auto current = key.begin(); current != key.end(); ++ current ) {
results = 127 * results + static_cast<unsigned char>( *current );
}
return results;
}
should do the trick, as long as you don't have to worry about
porting to some exotic mainframes (which might not support
uint32_t).
Use some of the well-known universal hash functions such as SHA instead, because those are supposed to guarantee that the same string will have the same hash everywhere. Note that in case you are doing something security-related, SHA might be too fast. It's a strange thing to say, but sometimes fast does not mean good as it opens a possibility for a brute force attack - in this case, there are other, slower hash function, some of which basically re-apply SHA many times in a row. Another thing, if you are hashing passwords, remember to salt them (I won't go into details, but the information is readily accessible online).
Hash-function above is simple, but weak and vulnerable.
For example, pass to that function string like "bb" "bbbb" "bbddbb" "ddffbb" -- any combination of pairs symbols with even ASCII codes, and watch for low byte.
It always will be 57.
Rather, I recommend to use my hash function, which is relative lightweight,
and does not have easy vulnerabilities:
#define NLF(h, c) (rand[(uint8_t)(c ^ h)])
uint32_t rand[0x100] = { 256 random non-equal values };
uint32_t oleg_h(const char *key) {
uint32_t h = 0x1F351F35;
char c;
while(c = *key++)
h = ((h >> 11) | (h << (32 - 11))) + NLF(h, c);
h ^= h >> 16;
return h ^ (h >> 8);
}

Ultra Quick Way To Concatenate Byte Values

Given 3 different bytes such as say x = 64, y = 90, z = 240 I am looking to concatenate them into say a string like 6490240. It would be lovely if this worked but it doesn't:
string xx = (string)x + (string)y + (string)z;
I am working in C++, and would settle for a concatenation of the bytes as a 24 bit string using their 8-bit representations.
It needs to be ultra fast because I am using this method on a lot of data, and it seems frustratingly like their isn't a way to just say treat this byte as if it were a string.
Many thanks for your help
To clarify, the reason why I'm particular about using 3 bytes is because the original data pertains to RGB values which are read via pointers and are stored of course as bytes in memory.
I want a way really to treat each color independently so you can think of this as a hashing function if you like. So any fast representation that does it without collisions is desired. This is the only way I can think of to avoid any collisions at all.
Did you consider instead just packing the color elements into three bytes of an integer?
uint32_t full_color = (x << 16) | (y << 8) | z;
Easiest way to turn numbers into a string is to use ostringstream
#include <sstream>
#include <string>
std::ostringstream os;
os << x << y << z;
std::string str = os.str(); // 6490240
You can even make use of manipulators to do this in hex or octal:
os << std::hex << x << y << z;
Update
Since you've clarified what you really want to do, I've updated my answer. You're looking to take RGB values as three bytes, and use them as a key somehow. This would be best done with a long int, not as a string. You can still stringify the int quite easily, for printing to the screen.
unsigned long rgb = 0;
byte* b = reinterpret_cast<byte*>(&rgb);
b[0] = x;
b[1] = y;
b[2] = z;
// rgb is now the bytes { 0, x, y, z }
Then you can use the long int rgb as your key, very efficiently. Whenever you want to print it out, you can still do that:
std::cout << std::hex << rgb;
Depending on the endian-ness of your system, you may need to play around with which bytes of the long int you set. My example overwrites bytes 0-2, but you might want to write bytes 1-3. And you might want to write the order as z, y, x instead of x, y, z. That kind of detail is platform dependent. Although if you never want to print the RGB value, but simply want to consider it as a hash, then you don't need to worry about which bytes you write or in what order.
try sprintf(xx,"%d%d%d",x,y,z);
Use a 3 character character array as your 24 bit representation, and assign each char the value of one of your input values.
Converting 3 bytes to bits and storing the result in an array can be done easily as below:
void bytes2bits(unsigned char x, unsigned char y, unsigned char z, char * res)
{
res += 24; *res-- = 0;
unsigned xyz = (x<<16)+(y<<8)+z;
for (size_t l = 0 ; l < 24 ; l++){
*res-- = '0'+(xyz & 1); xyz >>= 1;
}
}
However, if you are looking for a way to store three bytes values in a non ambiguous and compact way, you should probably settle for hexadecimal. (each group of four bits of the binary representation match a digit between 0 to 9 or a letter between A to F). It's ultra simple and ultra simple to encode and decode and also fit a human readable output.
If you never need to printout the result, just combining the values as a single integer and use it as a key as proposed Mark is certainly the fastest and the simplest solution. Assuming your native integer is 32 bits or more on the target system, just do:
unsigned int key = (x<< 16)|(y<<8)|z;
You can as easily get back the initial values from key if needed:
unsigned char x = (key >> 16) & 0xFF;
unsigned char y = (key >> 8) & 0xFF;
unsigned char z = key & 0xFF;

Bit Operators to append two unsigned char in C++

If I have two things which are hex, can I someone how append their binary together to get a value?
In C++,
say I have
unsigned char t = 0xc2; // 11000010
unsigned char q = 0xa3; // 10100011
What I want is somehow,
1100001010100011, is this possible using bit-wise operators?
I want to extract the binary form of t and q and append them...
Yes it's possible.
Just use the left-bitshift operator, shifting to the left by 8, using at least a 16-bit integer. Then binary OR the 2nd value to the integer.
unsigned char t = 0xc2; // 11000010
unsigned char q = 0xa3; // 10100011
unsigned short s = (((unsigned short)t)<<8) | q; //// 11000010 10100011
Alternatively putting both values in a union containing 2 chars (careful of big endian or small) would have the same bit level result. Another option is a char[2].
Concatenating two chars:
unsigned char t = 0xc2; // 11000010
unsigned char q = 0xa3; // 10100011
int result = t; // Put into object that can hold the fully concatenated data;
result <<= 8; // Shift it left
result |= q; // Or the bottom bits into place;
Your example doesn't really work too well because the width (usually 8-bits) of the input values aren't defined. For example, why isn't your example: 0000000100000010, which would be truly appending 1 (00000001) and 2 (00000010) bit wise.
If each value does have a fixed width then it can be answered with bit shifting and ORing values
EDIT: if your "width" is defined the full width with all leading zero's removed, then it is possible to do with shifting and ORing, but more complicated.
I'd go with the char array.
unsigned short s;
char * sPtr = &s;
sPtr[0] = t; sPtr[1] = q;
This doesn't really care about endian..
I'm not sure why you'd want to do this but this would work.
The problem with the bit methods are that you're not sure what size you've got.
If you know the size.. I'd go with Brians answer
There is no append in binary/hex because you are dealing with Numbers (can you append 1 and 2 and not confuse the resulting 12 with the "real" 12?)
You could delimit them with some special symbol, but you can't just "concatenate" them.
Appending as an operation doesn't really make sense for numbers, regardless of what base they're in. Using . as the concatenation operator: in your example, 0x1 . 0x2 becomes 0x12 if you concat the hex, and 0b101 if you concat the binary. But 0x12 and 0b101 aren't the same value (in base 10, they're 18 and 5 respectively). In general, A O B (where A and B are numbers and O is an operator) should result in the same value no matter what base you're operating in.