I am trying to send a packet over a network and so want it to be as small as possible (in terms of size).
Each of the input can contain a common prefix substring, like ABCD. In such cases, I just wanna send a single bit say, 1 denoting that the current string has the same prefix ABCD and append it to the remaining string. So, if the string was ABCDEF, I will send 1EF; if it was LKMPS, I wish to send the string LKMPS as is.
Could someone please point out how I could add a bit to a string?
Edit: I get that adding a 1 to a string does not mean that this 1 is a bit - it is just a character that I added to the string. And that exactly is my question - for each string, how do I send a bit denoting that the prefix matches? And then send the remaining part of the string that is different?
In common networking hardware, you won't be able to send individual bits. And most architectures cannot address individual bits, either.
However, you can still minimize the size as you want by using one of the bits that you may not be using. For instance, if your strings contain only 7-bit ASCII characters, you could use the highest bit to encode the information you want in the first byte of the string.
For example, if the first byte is:
0b01000001 == 0x41 == 'A'
Then set the highest bit using |:
(0b01000001 | 0x80) == 0b11000001 == 0xC1
To test for the bit, use &:
(0b01000001 & 0x80) == 0
(0b11000001 & 0x80) != 0
To remove the bit (in the case where it was set) to get back the original first byte:
(0b11000001 & 0x7F) == 0b01000001 == 0x41 == 'A'
If you're working with a buffer for use in your communications protocol, it should generally not be an std::string. Standard strings are not intended for use as buffers; and they can't generally be prepended in-place with anything.
It's possible that you may be better served by an std::vector<std::byte>; or by a (compile-time-fixed-size) std::array. Or, again, a class of your own making. That is especially true if you want your "in-place" prepending of bits or characters to not merely keep the same span of memory for your buffer, but to actually not move any of the existing data. For that, you'd need twice the maximum length of the buffer, and start it at the middle, so you can either append or prepend data without shifting anything - while maintaining bit-resolution "pointers" to the effective start and end of the full part of the buffer. This is would be readily achievable with, yes you guessed it, your own custom buffer class.
I think the smallest amount of memory you can work with is 8 bits.
If you wanted to operate with bits, you could specify 8 prefixes as follows:
#include <iostream>
using namespace std;
enum message_header {
prefix_on = 1 << 0,
bitdata_1 = 1 << 1,
bitdata_2 = 1 << 2,
bitdata_3 = 1 << 3,
bitdata_4 = 1 << 4,
bitdata_5 = 1 << 5,
bitdata_6 = 1 << 6,
bitdata_7 = 1 << 7,
};
int main() {
uint8_t a(0);
a ^= prefix_1;
if(a & prefix_on) {
std::cout << "prefix_on" << std::endl;
}
}
That being said, networks pretty fast nowadays so I wouldn't do it.
Related
A file which contains the buffer value. The first 16 bits contain the type. The next 32 bits gives the length of the data. The remaining value in the data.
How can I find the type from the 16 bits (find if it is int or char...)
I'm super stuck in my though process here. Not able to find a way to convert bits to types.
Say you have the homework assignment:
You are given a file where the first bit encodes the type, the
next 7 bits encode the length, and the rest is the data.
The types are encoded in the following way:
0 is for int
1 is for char
Print the ints or chars separated by newlines.
You just use the given information! Since 1 bit is used to encode the type there are two possible types. So you just read the first bit, then do:
if (bit == 0) {
int *i = ...
}
else if (bit == 1) {
char *c = ...
}
I have to use an encryption algorithm which takes unsigned int as an input. For this I want to convert my password which is alpha numeric 8 character string to int.
I am using the below code and not sure if it works right. I want to convert my characters say "test" to unsigned integer.
I do get an output value. But I'm not sure if this is the right way of doing this and if there can be any side effects.
Can you please explain what actually is happening here?
unsigned int ConvertStringToUInt(CString Input)
{
unsigned int output;
output = ((unsigned int)Input[3] << 24);
output += ((unsigned int)Input[2] << 16);
output += ((unsigned int)Input[1] << 8);
output += ((unsigned int)Input[0]);
return output;
}
For an input of "ABCD" the output of ConvertStringToUInt will be 0x44434241 because:
0x41 is the ASCII code of 'A'
0x42 is the ASCII code of 'B'
0x43 is the ASCII code of 'C'
0x44 is the ASCII code of 'D'
<< being the shift left operator.
So we have:
0x44 << 24 = 0x44000000
0x43 << 16 = 0x00430000
0x42 << 8 = 0x00004200
output =
0x44000000
+ 0x00430000
+ 0x00004200
+ 0x00000041
============
0x44434241
Be aware that your ConvertStringToUInt function only works if the length of the provided string is exactly 4, so this function is useless for your case because the length of your password is 8.
You can't do a unique mapping of a 8 character alphanumeric string to a 32 bit integer.
(10 + 26 + 26) ^ 8 is 218,340,105,584,896 (digits + upper-case and lower-case letters)
(10 + 26) ^ 8 is 2,821,109,907,456 (digits + case-insensitive letters)
2 ^ 32 is 4,294,967,296 (a 32 bit unsigned int)
So if you need to convert your 8 characters into a 32 bit number, you will need to use hashing. And that means that multiple passwords will map to the same key.
Note that this NOT encryption because the mapping is not reversible. It cannot be reversible. This can be proven mathematically.
The Wikipedia page on hash functions is a good place to start learning about this. Also the page on the pigeonhole principle.
However, it should also be noted that 8 character passwords are too small to be secure. And if you are hashing to a 32 bit code, brute-force attacks will be easy.
What you are trying to do is to reinvent a hashing algorithm, very poorly. I strongly recommend to use SHA-256 or some equivalent hashing algorithm available by the libs of your system, which is best practice and usually sufficient to transmit and compare passwords.
You should start reading the basics on that issue before writing any more code, else the security level of your application won't be much better than no hashing/encryption at all, but with the false sense of being on the safe side. Start here, for instance.
I have a coordinate pair of values that each range from [0,15]. For now I can use an unsigned, however since 16 x 16 = 256 total possible coordinate locations, this also represents all the binary and hex values of 1 byte. So to keep memory compact I'm starting to prefer the idea of using a BYTE or an unsigned char. What I want to do with this coordinate pair is this:
Let's say we have a coordinate pair with the hex value [0x05,0x0C], I would like the final value to be 0x5C. I would also like to do the reverse as well, but I think I've already found an answer with a solution to the reverse. I was thinking on the lines of using & or | however, I'm missing something for I'm not getting the correct values.
However as I was typing this and looking at the reverse of this: this is what I came up with and it appears to be working.
byte a = 0x04;
byte b = 0x0C;
byte c = (a << 4) | b;
std::cout << +c;
And the value that is printing is 76; which converted to hex is 0x4C.
Since I have figured out the calculation for this, is there a more efficient way?
EDIT
After doing some testing the operation to combine the initial two is giving me the correct value, however when I'm doing the reverse operation as such:
byte example = c;
byte nibble1 = 0x0F & example;
byte nibble2 = (0xF0 & example) >> 4;
std::cout << +nibble1 << " " << +nibble2 << std::endl;
It is printout 12 4. Is this correct or should this be a concern? If worst comes to worst I can rename the values to indicate which coordinate value they are.
EDIT
After thinking about this for a little bit and from some of the suggestions I had to modify the reverse operation to this:
byte example = c;
byte nibble1 = (0xF0 & example) >> 4;
byte nibble2 = (0x0F & example);
std:cout << +nibble1 << " " << +nibble2 << std::endl;
And this prints out 4 12 which is the correct order of what I am looking for!
First of all, be careful about there are in fact 17 values in the range 0..16. Your values are probably 0..15, because if they actually range both from 0 to 16, you won't be able to uniquely store every possible coordinate pair into a single byte.
The code extract you submitted is pretty efficient, you are using bit operators, which are the quickest thing you can ask a processor to do.
For the "reverse" (splitting your byte into two 4-bit values), you are right when thinking about using &. Just apply a 4-bit shift at the right time.
I'm writing a program in C++ to listen to a stream of tcp messages from another program to give tracking data from a webcam. I have the socket connected and I'm getting all the information in but having difficulty splitting it up into the data I want.
Here's the format of the data coming in:
8 byte header:
4 character string,
integer
32 byte message:
integer,
float,
float,
float,
float,
float
This is all being stuck into a char array called buffer. I need to be able to parse out the different bytes into the primitives I need. I have tried making smaller sub arrays such as headerString that was filled by looping through and copying the first 4 elements of the buffer array and I do get the the correct hear ('CCV ') printed out. But when I try the same thing with the next for elements (to get the integer) and try to print it out I get weird ascii characters being printed out. I've tried converting the headerInt array to an integer with the atoi method from stdlib.h but it always prints out zero.
I've already done this in python using the excellent unpack method, is their any alternative in C++?
Any help greatly appreciated,
Jordan
Links
CCV packet structure
Python unpack method
The buffer only contains the raw image of what you read over the
network. You'll have to convert the bytes in the buffer to whatever
format you want. The string is easy:
std::string s(buffer + sOffset, 4);
(Assuming, of course, that the internal character encoding is the same
as in the file—probably an extension of ASCII.)
The others are more complicated, and depend on the format of the
external data. From the description of the header, I gather than the
integers are four bytes, but that still doesn't tell me anything about
their representation. Depending on the case, either:
int getInt(unsigned char* buffer, int offset)
{
return (buffer[offset ] << 24)
| (buffer[offset + 1] << 16)
| (buffer[offset + 2] << 8)
| (buffer[offset + 3] );
}
or
int getInt(unsigned char* buffer, int offset)
{
return (buffer[offset + 3] << 24)
| (buffer[offset + 2] << 16)
| (buffer[offset + 1] << 8)
| (buffer[offset ] );
}
will probably do the trick. (Other four byte representations of
integers are possible, but they are exceedingly rare. Similarly, the
conversion of the unsigned results of the shifts and or's into a int
is implementation defined, but in practice, the above will work almost
everywhere.)
The only hint you give concerning the representation of the floats is in
the message format: 32 bytes, minus a 4 byte integer, leave 28 bytes for
5 floats; but 28 doesn't go into five, so I cannot even guess as to the
length of the floats (except that there must be some padding in there
somewhere). But converting floating point can be more or less
complicated if the external format isn't exactly like the internal
format.
Something like this may work:
struct {
char string[4];
int integers[2];
float floats[5];
} Header;
Header* header = (Header*)buffer;
You should check that sizeof(Header) == 32.
I am learning C/C++ programming & have encountered the usage of 'Bit arrays' or 'Bit Vectors'. Am not able to understand their purpose? here are my doubts -
Are they used as boolean flags?
Can one use int arrays instead? (more memory of course, but..)
What's this concept of Bit-Masking?
If bit-masking is simple bit operations to get an appropriate flag, how do one program for them? is it not difficult to do this operation in head to see what the flag would be, as apposed to decimal numbers?
I am looking for applications, so that I can understand better. for Eg -
Q. You are given a file containing integers in the range (1 to 1 million). There are some duplicates and hence some numbers are missing. Find the fastest way of finding missing
numbers?
For the above question, I have read solutions telling me to use bit arrays. How would one store each integer in a bit?
I think you've got yourself confused between arrays and numbers, specifically what it means to manipulate binary numbers.
I'll go about this by example. Say you have a number of error messages and you want to return them in a return value from a function. Now, you might label your errors 1,2,3,4... which makes sense to your mind, but then how do you, given just one number, work out which errors have occured?
Now, try labelling the errors 1,2,4,8,16... increasing powers of two, basically. Why does this work? Well, when you work base 2 you are manipulating a number like 00000000 where each digit corresponds to a power of 2 multiplied by its position from the right. So let's say errors 1, 4 and 8 occur. Well, then that could be represented as 00001101. In reverse, the first digit = 1*2^0, the third digit 1*2^2 and the fourth digit 1*2^3. Adding them all up gives you 13.
Now, we are able to test if such an error has occured by applying a bitmask. By example, if you wanted to work out if error 8 has occured, use the bit representation of 8 = 00001000. Now, in order to extract whether or not that error has occured, use a binary and like so:
00001101
& 00001000
= 00001000
I'm sure you know how an and works or can deduce it from the above - working digit-wise, if any two digits are both 1, the result is 1, else it is 0.
Now, in C:
int func(...)
{
int retval = 0;
if ( sometestthatmeans an error )
{
retval += 1;
}
if ( sometestthatmeans an error )
{
retval += 2;
}
return retval
}
int anotherfunc(...)
{
uint8_t x = func(...)
/* binary and with 8 and shift 3 plaes to the right
* so that the resultant expression is either 1 or 0 */
if ( ( ( x & 0x08 ) >> 3 ) == 1 )
{
/* that error occurred */
}
}
Now, to practicalities. When memory was sparse and protocols didn't have the luxury of verbose xml etc, it was common to delimit a field as being so many bits wide. In that field, you assign various bits (flags, powers of 2) to a certain meaning and apply binary operations to deduce if they are set, then operate on these.
I should also add that binary operations are close in idea to the underlying electronics of a computer. Imagine if the bit fields corresponded to the output of various circuits (carrying current or not). By using enough combinations of said circuits, you make... a computer.
regarding the usage the bits array :
if you know there are "only" 1 million numbers - you use an array of 1 million bits. in the beginning all bits will be zero and every time you read a number - use this number as index and change the bit in this index to be one (if it's not one already).
after reading all numbers - the missing numbers are the indices of the zeros in the array.
for example, if we had only numbers between 0 - 4 the array would look like this in the beginning: 0 0 0 0 0.
if we read the numbers : 3, 2, 2
the array would look like this: read 3 --> 0 0 0 1 0. read 3 (again) --> 0 0 0 1 0. read 2 --> 0 0 1 1 0. check the indices of the zeroes: 0,1,4 - those are the missing numbers
BTW, of course you can use integers instead of bits but it may take (depends on the system) 32 times memory
Sivan
Bit Arrays or Bit Vectors can be though as an array of boolean values. Normally a boolean variable needs at least one byte storage, but in a bit array/vector only one bit is needed.
This gets handy if you have lots of such data so you save memory at large.
Another usage is if you have numbers which do not exactly fit in standard variables which are 8,16,32 or 64 bit in size. You could this way store into a bit vector of 16 bit a number which consists of 4 bit, one that is 2 bit and one that is 10 bits in size. Normally you would have to use 3 variables with sizes of 8,8 and 16 bit, so you only have 50% of storage wasted.
But all these uses are very rarely used in business aplications, the come to use often when interfacing drivers through pinvoke/interop functions and doing low level programming.
Bit Arrays of Bit Vectors are used as a mapping from position to some bit value. Yes it's basically the same thing as an array of Bool, but typical Bool implementation is one to four bytes long and it uses too much space.
We can store the same amount of data much more efficiently by using arrays of words and binary masking operations and shifts to store and retrieve them (less overall memory used, less accesses to memory, less cache miss, less memory page swap). The code to access individual bits is still quite straightforward.
There is also some bit field support builtin in C language (you write things like int i:1; to say "only consume one bit") , but it is not available for arrays and you have less control of the overall result (details of implementation depends on compiler and alignment issues).
Below is a possible way to answer to your "search missing numbers" question. I fixed int size to 32 bits to keep things simple, but it could be written using sizeof(int) to make it portable. And (depending on the compiler and target processor) the code could only be made faster using >> 5 instead of / 32 and & 31 instead of % 32, but that is just to give the idea.
#include <stdio.h>
#include <errno.h>
#include <stdint.h>
int main(){
/* put all numbers from 1 to 1000000 in a file, except 765 and 777777 */
{
printf("writing test file\n");
int x = 0;
FILE * f = fopen("testfile.txt", "w");
for (x=0; x < 1000000; ++x){
if (x == 765 || x == 777760 || x == 777791){
continue;
}
fprintf(f, "%d\n", x);
}
fprintf(f, "%d\n", 57768); /* this one is a duplicate */
fclose(f);
}
uint32_t bitarray[1000000 / 32];
/* read file containing integers in the range [1,1000000] */
/* any non number is considered as separator */
/* the goal is to find missing numbers */
printf("Reading test file\n");
{
unsigned int x = 0;
FILE * f = fopen("testfile.txt", "r");
while (1 == fscanf(f, " %u",&x)){
bitarray[x / 32] |= 1 << (x % 32);
}
fclose(f);
}
/* find missing number in bitarray */
{
int x = 0;
for (x=0; x < (1000000 / 32) ; ++x){
int n = bitarray[x];
if (n != (uint32_t)-1){
printf("Missing number(s) between %d and %d [%x]\n",
x * 32, (x+1) * 32, bitarray[x]);
int b;
for (b = 0 ; b < 32 ; ++b){
if (0 == (n & (1 << b))){
printf("missing number is %d\n", x*32+b);
}
}
}
}
}
}
That is used for bit flags storage, as well as for parsing different binary protocols fields, where 1 byte is divided into a number of bit-fields. This is widely used, in protocols like TCP/IP, up to ASN.1 encodings, OpenPGP packets, and so on.