I read the IP RFC and in there it says the 4 first bits of the IP header is the version. In the drawing it also shows that bits 0 to 3 are the version.
https://www.rfc-editor.org/rfc/rfc791#section-3.1
But when I look at the first byte of the header (as captured using pcap lib) I see this byte:
0x45
This is a version 4 IP header but obviously bits 4 to 7 are equal to 4 and not bits 0 to 3 as I expected.
I expected doing a bitwise and on first byte and 0x0F will get me the version but it seems that I need to and with 0xF0.
Am I missing something? Understanding something incorrectly?
You should read Appendix B of the RFC:
Whenever an octet represents a numeric quantity the left most bit in the
diagram is the high order or most significant bit. That is, the bit
labeled 0 is the most significant bit. For example, the following
diagram represents the value 170 (decimal).
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|1 0 1 0 1 0 1 0|
+-+-+-+-+-+-+-+-+
Which means everything is correct except for your assumption that the “first four bits” are the least-significant, while those are the most-significant.
E.g. in the 7th and 8th bytes, containing the flags and the fragment offset, you can separate those as follows (consider that pseudocode, even though it is working C#):
byte flagsAndFragmentHi = packet[6];
byte fragmentLo = packet[7];
bool flagReserved0 = (flagsAndFragmentHi & 0x80) != 0;
bool flagDontFragment = (flagsAndFragmentHi & 0x40) != 0;
bool flagMoreFragments = (flagsAndFragmentHi & 0x20) != 0;
int fragmentOffset = ((flagsAndFragmentHi & 0x1F) << 8) | (fragmentLo);
Note that the more significant (left-shifted 8 bits) portion of the fragment offset is in the first byte (because IP works in big endian). Generally: bits on the left in the diagram are always more significant.
Related
Given 15 random hexadecimal numbers (60 bits) where there is always at least 1 duplicate in every 20 bit run (5 hexdecimals).
What is the optimal way to compress the bytes?
Here are some examples:
01230 45647 789AA
D8D9F 8AAAF 21052
20D22 8CC56 AA53A
AECAB 3BB95 E1E6D
9993F C9F29 B3130
Initially I've been trying to use Huffman encoding on just 20 bits because huffman coding can go from 20 bits down to ~10 bits but storing the table takes more than 9 bits.
Here is the breakdown showing 20 bits -> 10 bits for 01230
Character Frequency Assignment Space Savings
0 2 0 2×4 - 2×1 = 6 bits
2 1 10 1×4 - 1×2 = 2 bits
1 1 110 1×4 - 1×3 = 1 bits
3 1 111 1×4 - 1×3 = 1 bits
I then tried to do huffman encoding on all 300 bits (five 60bit runs) and here is the mapping given the above example:
Character Frequency Assignment Space Savings
---------------------------------------------------------
a 10 101 10×4 - 10×3 = 10 bits
9 8 000 8×4 - 8×3 = 8 bits
2 7 1111 7×4 - 7×4 = 0 bits
3 6 1101 6×4 - 6×4 = 0 bits
0 5 1100 5×4 - 5×4 = 0 bits
5 5 1001 5×4 - 5×4 = 0 bits
1 4 0010 4×4 - 4×4 = 0 bits
8 4 0111 4×4 - 4×4 = 0 bits
d 4 0101 4×4 - 4×4 = 0 bits
f 4 0110 4×4 - 4×4 = 0 bits
c 4 1000 4×4 - 4×4 = 0 bits
b 4 0011 4×4 - 4×4 = 0 bits
6 3 11100 3×4 - 3×5 = -3 bits
e 3 11101 3×4 - 3×5 = -3 bits
4 2 01000 2×4 - 2×5 = -2 bits
7 2 01001 2×4 - 2×5 = -2 bits
This yields a savings of 8 bits overall, but 8 bits isn't enough to store the huffman table. It seems because of the randomness of the data that the more bits you try to encode with huffman the less effective it works. Huffman encoding seemed to work best with 20 bits (50% reduction) but storing the table in 9 or less bits isnt possible AFAIK.
In the worst-case for a 60 bit string there are still at least 3 duplicates, the average case there are more than 3 duplicates (my assumption). As a result of at least 3 duplicates the most symbols you can have in a run of 60 bits is just 12.
Because of the duplicates plus the less than 16 symbols, I can't help but feel like there is some type of compression that can be used
If I simply count the number of 20-bit values with at least two hexadecimal digits equal, there are 524,416 of them. A smidge more than 219. So the most you could possibly save is a little less than one bit out of the 20.
Hardly seems worth it.
If I split your question in two parts:
How do I compress (perfect) random data: You can't. Every bit is some new entropy which can't be "guessed" by a compression algorithm.
How to compress "one duplicate in five characters": There are exactly 10 options where the duplicate can be (see table below). This is basically the entropy. Just store which option it is (maybe grouped for the whole line).
These are the options:
AAbcd = 1 AbAcd = 2 AbcAd = 3 AbcdA = 4 (<-- cases where first character is duplicated somewhere)
aBBcd = 5 aBcBd = 6 aBcdB = 7 (<-- cases where second character is duplicated somewhere)
abCCd = 8 abCdC = 9 (<-- cases where third character is duplicated somewhere)
abcDD = 0 (<-- cases where last characters are duplicated)
So for your first example:
01230 45647 789AA
The first one (01230) is option 4, the second 3 and the third option 0.
You can compress this by multiplying each consecutive by 10: (4*10 + 3)*10 + 0 = 430
And uncompress it by using divide and modulo: 430%10=0, (430/10)%10=3, (430/10/10)%10=4. So you could store your number like that:
1AE 0123 4567 789A
^^^ this is 430 in hex and requires only 10 bit
The maximum number for the three options combined is 1000, so 10 bit are enough.
Compared to storing these 3 characters normally you save 2 bit. As someone else already commented - this is probably not worth it. For the whole line it's even less: 2 bit / 60 bit = 3.3% saved.
If you want to get rid of the duplicates first, do this, then look at the links at the bottom of the page. If you don't want to get rid of the duplicates, then still look at the links at the bottom of the page:
Array.prototype.contains = function(v) {
for (var i = 0; i < this.length; i++) {
if (this[i] === v) return true;
}
return false;
};
Array.prototype.unique = function() {
var arr = [];
for (var i = 0; i < this.length; i++) {
if (!arr.contains(this[i])) {
arr.push(this[i]);
}
}
return arr;
}
var duplicates = [1, 3, 4, 2, 1, 2, 3, 8];
var uniques = duplicates.unique(); // result = [1,3,4,2,8]
console.log(uniques);
Then you would have shortened your code that you have to deal with. Then you might want to check out Smaz
Smaz is a simple compression library suitable for compressing strings.
If that doesn't work, then you could take a look at this:
http://ed-von-schleck.github.io/shoco/
Shoco is a C library to compress and decompress short strings. It is very fast and easy to use. The default compression model is optimized for english words, but you can generate your own compression model based on your specific input data.
Let me know if it works!
PROGRAMMING LANGUAGE: C
I've a 8 bit data with only 3 bit used, for example:
0110 0001
Where 0 indicate unused bit that are always set to 0 and 1 indicate bits that change.
I want to convert this 0110 0001 8 bit to 3 bit that indicate this 3 used bits.
For example
0110 0001 --> 111
0010 0001 --> 011
0000 0000 --> 000
0100 0001 --> 101
How I can do that with minimal operations?
You can achieve this with a couple of bitwise operations:
((a >> 4) & 6) | (a & 1)
Assuming you start from xYYx xxxY, where x is a bit you don't care about and Y a bit to keep:
left shift by 4 of a will result in xYYx, then masking with 6 (binary 110) will make sure only the second and third bit are retained, resulting in YY0 and preventing flipped x bits from messing up.
a & 1 selects the LSB, resulting in Y.
the two parts, YY0 and Y are combined using a | bitwise or, resulting in YYY.
Now you have the 3 bits you asked. But keep in mind that you can't address single bits, so it will still be byte-aligned as 00000YYY
You can get the k'th bit of n: (where n is 011000001)
(n & ( 1 << k )) >> k
(More details about that at StackOverflow)
so you use that to get bit 1,6 and 7 and just add those:
r=bit1+bit6*16+bit7*32
for(unsigned int h=0; h<ImageBits.iHeight; h++)
{
for(unsigned int w=0; w<ImageBits.iWidth; w++)
{
// So in this loop - if our data isn't aligned to 4 bytes, then its been padded
// in the file so it aligns...so we check for this and skip over the padded 0's
// Note here, that the data is read in as b,g,r and not rgb as you'd think!
unsigned char r,g,b;
fread(&b, 1, 1, fp);
fread(&g, 1, 1, fp);
fread(&r, 1, 1, fp);
ImageBits.pARGB[ w + h*ImageBits.iWidth ] = (r<<16 | g<<8 | b);
}// End of for loop w
//If there are any padded bytes - we skip over them here
if( iNumPaddedBytes != 0 )
{
unsigned char skip[4];
fread(skip, 1, 4 - iNumPaddedBytes, fp);
}// End of if reading padded bytes
}// End of for loop h
I do not understand this statement and how does it store the rgb value of the pixel
ImageBits.pARGB[ w + h*ImageBits.iWidth ] = (r<<16 | g<<8 | b);
i did a read up on the << bitwise shift operator but i still do not understand how it works.Can someone help me out here.
You need to convert separate values for Red, Green and Blue into a single variable, so you push them 16 and 8 bits to the "left" respectively, so they align 8 bits for Red (begin - 16), then you get 8 bits for Green (begin - 8) and the remaining color.
Consider the following:
Red -> 00001111
Green -> 11110000
Blue -> 10101010
Then RGB -> that has 24 bits capacity would look like this initially ->
-> 00000000 00000000 00000000
(there would actually be some random rubbish but it's easier to
demonstrate like this)
Shift the Red byte 16 places to the left, so we get 00001111 00000000 00000000.
Shift the Green byte 8 places to the left, so we have 00001111 11110000 00000000.
Don't shift the Blue byte, so we have 00001111 11110000 10101010.
You could achieve a similar result with unions. Here's an ellaboration as to why we do it like this. The only way for you to access a variable is to have it's address (usually bound to a variable name, or an alias).
That means that we have an address of the first byte only and also a guarantee that if it's a variable that is 3 bytes wide, the following two bytes that are next to our addressed byte belong to us. So we can literally "push the bits" to the left (shift them) so they "flow" into the remaining bytes of the variable. We could also pointer-arithmetic a pointer there or as I've mentioned already, use a union.
Bit shifting moves the bits that make up the value along by the number you specify.
In this case it's done with colour values so that you can store multiple 1 byte components (such as RGBA which are in the range 0-255) in a single 4 byte structure such as an int
Take this byte:
00000011
which is equal to 3 in decimal. If we wanted to store the value 3 for the RGB and A channel, we would need to store this value in the int (the int being 32 bits)
R G B A
00000011 00000011 00000011 00000011
As you can see the bits are set in 4 groups of 8, and all equal the value 3, but how do you tell what the R value is when it's stored this way?
If you got rid of the G/B/A values, you'd be left with
00000011 00000000 00000000 00000000
Which still doesn't equal 3 - (in fact it's some massive number - 12884901888 I think)
In order to get this value into the last byte of the int, you need to shift the bits 24 places to the right. e.g.
12884901888 >> 24
Then the bits would look like this:
00000000 00000000 00000000 00000011
And you would have your value '3'
Basically it's just a way of moving bits around in a storage structure so that you can better manipulate the values. Putting the RGBA values into a single value is usually called stuffing the bits
let's visualize this and break it into several steps, and you'll see how simple it is.
let's say we have the ARGB 32 bit variable, that can be viewed as
int rgb = {a: 00, r: 00, g: 00, b: 00} (this is not valid code, of course, and let's leave the A out of this for now).
the value in each of these colors is 8 bit of course.
now we want to place a new value, and we have three 8 bit variables for each color:
unsigned char r = 0xff, g=0xff, b=0xff.
what we're essentially doing is taking a 32 bit variable, and then doing this:
rgb |= r << 16 (shifting the red 16 bit left. everything to right of it will remain 0)
so now we have
rgb = [a: 00, r: ff, g: 00, b: 00]
and now we do:
rgb = rgb | (g << 8) (meaning taking the existing value and OR'ing it with green shifted to its place)
so we have [a: 00, r: ff, g: ff, b: 00]
and finally...
rgb = rgb | b (meaning taking the value and ORing it with the blue 8 bits. the rest remains unchanged)
leaving us with [a: 00, r: ff, g: f, b: ff]
which represents a 32 bit (24 actually since the Alpha is irrelevant to this example) color.
I'm currently learning about bitset, and in one paragraph it says this about their interactions with strings:
"The numbering conventions of strings and bitsets are inversely related: the rightmost character in the string--the one with the highest subscript--is used to initialize the low order bit in the bitset--the bit with subscript 0."
however later on they give an example + diagram which shows something like this:
string str("1111111000000011001101");
bitset<32> bitvec5(str, 5, 4); // 4 bits starting at str[5], 1100
value of str:
1 1 1 1 1 (1 1 0 0) 0 0 0 ...
value of bitvec5:
...0 0 0 0 0 0 0 (1 1 0 0)
This example shows it taking the rightmost bit and putting it so the last element from the string is the last in the bitset, not the first.
Which is right?(or are both wrong?)
They are both right.
Traditionally the bits in a machine word are numbered from right to left, so the lowest bit (bit 0) is to the right, just like it is in the string.
The bitset looks like this
...1100 value
...3210 bit numbers
and the string that looks the same
"1100"
will have string[0] == '1' and string[3] == '0', the exact opposite!
string strval("1100"); //1100, so from rightmost to leftmost : 0 0 1 1
bitset<32> bitvec4(strval); //bitvec4 is 0 0 1 1
So whatever you are reading is correct(both text and example) :
the rightmost character in the string--the one with the highest
subscript--is used to initialize the low order bit in the bitset--the
bit with subscript 0.
I have the following code for self learning:
#include <iostream>
using namespace std;
struct bitfields{
unsigned field1: 3;
unsigned field2: 4;
unsigned int k: 4;
};
int main(){
bitfields field;
field.field1=8;
field.field2=1e7;
field.k=18;
cout<<field.k<<endl;
cout<<field.field1<<endl;
cout<<field.field2<<endl;
return 0;
}
I know that unsigned int k:4 means that k is 4 bits wide, or a maximum value of 15, and the result is the following.
2
0
1
For example, filed1 can be from 0 to 7 (included), field2 and k from 0 to 15. Why such a result? Maybe it should be all zero?
You're overflowing your fields. Let's take k as an example, it's 4 bits wide. It can hold values, as you say, from 0 to 15, in binary representation this is
0 -> 0000
1 -> 0001
2 -> 0010
3 -> 0011
...
14 -> 1110
15 -> 1111
So when you assign 18, having binary representation
18 -> 1 0010 (space added between 4th and 5th bit for clarity)
k can only hold the lower four bits, so
k = 0010 = 2.
The equivalent holds true for the rest of your fields as well.
You have these results because the assignments overflowed each bitfield.
The variable filed1 is 3 bits, but 8 takes 4 bits to present (1000). The lower three bits are all zero, so filed1 is zero.
For filed2, 17 is represented by 10001, but filed2 is only four bits. The lower four bits represent the value 1.
Finally, for k, 18 is represented by 10010, but k is only four bits. The lower four bits represent the value 2.
I hope that helps clear things up.
In C++ any unsigned type wraps around when you hit its ceiling[1]. When you define a bitfield of 4 bits, then every value you store is wrapped around too. The possible values for a bitfield of size 4 are 0-15. If you store '17', then you wrap to '1', for '18' you go one more to '2'.
Mathematically, the wrapped value is the original value modulo the number of possible values for the destination type:
For the bitfield of size 4 (2**4 possible values):
18 % 16 == 2
17 % 16 == 1
For the bitfield of size 3 (2**3 possible values):
8 % 8 == 0.
[1] This is not true for signed types, where it is undefined what happens then.