How to define a formula for a steganography message hiding? - bit-manipulation

From three pixels I computed their LSB (Least Significant Bit); for example, these are the three consecutive LSB for three pixels: 010
Then, with the first and second LSB I perform the XOR operation: 1
The same operation -XOR- for the first and third LSB: 0
These two binary values -1 and 0-, are used to hide a message composed of binary values.
Suppose the three pixels have these three LSB 000 binary values. Then a table is created to hide/insert the two bits:
+----------+
| 000 |
+----------+
| 00 | 000 |
+----+-----+
| 01 | 001 |
+----+-----+
| 10 | 010 |
+----+-----+
| 11 | 100 |
+----+-----+
When the two bits from the message are 00 none of the three pixels' LSB is changed... but when message bits are 01 the last LSB is changed 001.
Now, suppose that the three pixels have these three LSB 001, then the table for LSB replacement is:
+----------+
| 001 |
+----------+
| 00 | 000 |
+----+-----+
| 01 | 001 |
+----+-----+
| 10 | 101 |
+----+-----+
| 11 | 011 |
+----+-----+
I need to do the same for the remaining LSB combinations: 010, 011, 100, 101, 110, 111
I have tried different logical operations to create a table such those two presented.
Note: Color version

Basically, a triplet of bits, abc, can be reduced to a pair of bits, de, using a set of specific computations, which are
d = a XOR b
e = a XOR c
For each de pair you're looking to derive the abc triplet that is the closest to any triplet of pixels, ijk.
Approach
This is a table of the XOR operations
result from
0 00, 11
1 01, 10
The important part here is that you can get the same result from two possible combinations, which are complement to each other.
In your case you have an independent condition, a XOR b, and a dependent one, a XOR c, because a is used in both of them. a (and b) can be any of the two values, but c has only one option based on what a is.
The number of abc triplets that reduce to a specific de combination can be calculated by using 2 for each independent restriction and 1 for each dependent one and multiplying them together. Therefore, 2 x 1 = 2. And half of these are complement to the other.
A more complicated example would have been abcde -> fgh, with
f = a XOR b
g = a XOR c
h = d XOR e
Since the restrictions are independent, dependent, independent, you get 2 x 1 x 2 = 4 combinations of abcde that reduce to the same fgh. Again, with half being complement to the other half.
Anyway, for each de pair compute the two abc triplets that reduce to it and then calculate the Hamming distance (HD) between each of these triplets and your pixel triplet ijk. The result with the lower value is the triplet you'd want to modify your pixels to so that they reduce to that specific de pair.
For example, the triplets 000 and 111 reduce to the pair 00. If the LSBs from your pixels are 000, 001, 010 or 100, you want to modify them to 000. And if they are 110, 101, 011 or 111, modify them to 111.
The HD can obviously be a value between 0 and 3. Since the triplets are complement to each other, if the HD between a triplet and your actual pixels is, for example, 1, the HD with the other triplet will be 2, so that both add up to 3. In a similar vein, the table you build for the pixels 000 will be complement to the one for 111.
| 000 | 111
---+-----+----
00 | 000 | 111
01 | 001 | 110
10 | 010 | 101
11 | 100 | 011

Related

Bit order in struct is not what I would have expected

I have a framework which uses 16 bit floats, and I wanted to separate its components to then use for 32bit floats. In my first approach I used bit shifts and similar, and while that worked, it was wildly chaotic to read.
I then wanted to use custom bit sized structs instead, and use a union to write to that struct.
The code to reproduce the issue:
#include <iostream>
#include <stdint.h>
union float16_and_int16
{
struct
{
uint16_t Mantissa : 10;
uint16_t Exponent : 5;
uint16_t Sign : 1;
} Components;
uint16_t bitMask;
};
int main()
{
uint16_t input = 0x153F;
float16_and_int16 result;
result.bitMask = input;
printf("Mantissa: %#010x\n", result.Components.Mantissa);
printf("Exponent: %#010x\n", result.Components.Exponent);
printf("Sign: %#010x\n", result.Components.Sign);
return 0;
}
In the example I would expect my Mantissa to be 0x00000054, the exponent to be 0x0000001F, and sign 0x00000001
Instead I get Mantissa: 0x0000013f, Exponent: 0x00000005, Sign: 0x00000000
Which means that from my bit mask first the Sign was taken (first bit), next 5 bits to exponent, then 10 bit to mantissa, so the order is inverse of what I wanted. Why is that happening?
The worse part is that a different compiler could give the expected order. The standard has never specified the implementation details for bitfields, and specifically the order. The rationale being as usual that it is an implementation detail and that programmers should not rely nor depend on that.
The downside is that it is not possible to use bitfields in cross language programs, and that programmers cannot use bitfields for processing data having well known bitfields (for example in network protocol headers) because it is too complex to make sure how the implementation will process them.
For that reason I have always thought that it was just an unuseable feature and I only use bitmask on unsigned types instead of bitfields. But that last part is no more than my own opinion...
I would say your input is incorrect, for this compiler anyway. This is what the float16_and_int16 order looks like.
sign exponent mantissa
[15] [14:10] [9:0]
or
SGN | E X P O N E N T| M A N T I S S A |
15 | 14 | 13 | 12 | 11 | 10 | 09 | 08 | 07 | 06 | 05 | 04 | 03 | 02 | 01 | 00 |
if input = 0x153F then bitMask ==
SGN | E X P O N E N T| M A N T I S S A |
15 | 14 | 13 | 12 | 11 | 10 | 09 | 08 | 07 | 06 | 05 | 04 | 03 | 02 | 01 | 00 |
0 0 0 1 0 1 0 1 0 0 1 1 1 1 1 1
so
MANTISSA == 0100111111 (0x13F)
EXPONENT == 00101 (0x5)
SIGN == 0 (0x0)
If you want mantissa to be 0x54, exponent 0x1f and sign 0x1 you need
SGN | E X P O N E N T| M A N T I S S A |
15 | 14 | 13 | 12 | 11 | 10 | 09 | 08 | 07 | 06 | 05 | 04 | 03 | 02 | 01 | 00 |
1 1 1 1 1 1 0 0 0 1 0 1 0 1 0 0
or
input = 0xFC64

What does i+=(i&-i) do? Is it portable?

Let i be a signed integer type. Consider
i += (i&-i);
i -= (i&-i);
where initially i>0.
What do these do? Is there an equivalent code using arithmetic only?
Is this dependent on a specific bit representation of negative integers?
Source: setter's code of an online coding puzzle (w/o any explanation/comments).
The expression i & -i is based on Two's Complement being used to represent negative integers. Simply put, it returns a value k where each bit except the least significant non-zero bit of i is set to 0, but that particular bit keeps its own value. (i.e. 1)
As long as the expression you provided executes in a system where Two's Complement is being used to represent negative integers, it would be portable. So, the answer to your second question would be that the expression is dependent on the representation of negative integers.
To answer your first question, since arithmetic expressions are dependent on the data types and their representations, I do not think that there is a solely arithmetic expression that would be equivalent to the expression i & -i. In essence, the code below would be equivalent in functionality to that expression. (assuming that i is of type int) Notice, though, that I had to make use of a loop to produce the same functionality, and not just arithmetics.
int tmp = 0, k = 0;
while(tmp < 32)
{
if(i & (1 << tmp))
{
k = i & (1 << tmp);
break;
}
tmp++;
}
i += k;
On a Two's Complement architecture, with 4 bits signed integers:
| i | bin | comp | -i | i&-i | dec |
+----+------+------+----+------+-----+
| 0 | 0000 | 0000 | -0 | 0000 | 0 |
| 1 | 0001 | 1111 | -1 | 0001 | 1 |
| 2 | 0010 | 1110 | -2 | 0010 | 2 |
| 3 | 0011 | 1101 | -3 | 0001 | 1 |
| 4 | 0100 | 1100 | -4 | 0100 | 4 |
| 5 | 0101 | 1011 | -5 | 0001 | 1 |
| 6 | 0110 | 1010 | -6 | 0010 | 2 |
| 7 | 0111 | 1001 | -7 | 0001 | 1 |
| -8 | 1000 | 1000 | -8 | 1000 | 8 |
| -7 | 1001 | 0111 | 7 | 0001 | 1 |
| -6 | 1010 | 0110 | 6 | 0010 | 2 |
| -5 | 1011 | 0101 | 5 | 0001 | 1 |
| -4 | 1100 | 0100 | 4 | 0100 | 4 |
| -3 | 1101 | 0011 | 3 | 0001 | 1 |
| -2 | 1110 | 0010 | 2 | 0010 | 2 |
| -1 | 1111 | 0001 | 1 | 0001 | 1 |
Remarks:
You can conjecture that i&-i only has one bit set (it's a power of 2) and it matches the least significant bit set of i.
i + (i&-i) has the interesting property to be one bit closer to the next power of two.
i += (i&-i) sets the least significant unset bit of i.
So, doing i += (i&-i); will eventually make you jump to the next power of two:
| i | i&-i | sum | | i | i&-i | sum |
+---+------+-----+ +---+------+-----+
| 1 | 1 | 2 | | 5 | 1 | 6 |
| 2 | 2 | 4 | | 6 | 2 | -8 |
| 4 | 4 | -8 | |-8 | -8 | UB |
|-8 | -8 | UB |
| i | i&-i | sum | | i | i&-i | sum |
+---+------+-----+ +---+------+-----+
| 3 | 1 | 4 | | 7 | 1 | -8 |
| 4 | 4 | -8 | |-8 | -8 | UB |
|-8 | -8 | UB |
UB: overflow of signed integer exhibits undefined behavior.
If i has unsigned type, the expressions are completely portable and well-defined.
If i has signed type, it's not portable, since & is defined in terms of representations but unary -, +=, and -= are defined in terms of values. If the next version of the C++ standard mandates twos complement, though, it will become portable, and will do the same thing as in the unsigned case.
In the unsigned case (and the twos complement case), it's easy to confirm that i&-i is a power of two (has only one bit nonzero), and has the same value as the lowest-place bit of i (which is also the lowest-place bit of -i). Therefore:
i -= i&-i; clears the lowest-set bit of i.
i += i&-i; increments (clearing, but with carry to higher bits) the lowest-set bit of i.
For unsigned types there is never overflow for either expression. For signed types, i -= i&-i overflows taking -i when i initially has the minimum value of the type, and i += i&-i overflows in the += when i initially has the max value of the type.
Here is what I researched prompted by other answers. The bit manipulations
i -= (i&-i); // strips off the LSB (least-significant bit)
i += (i&-i); // adds the LSB
are used, predominantly, in traversing a Fenwick tree. In particular, i&-i gives the LSB if signed integers are represented via two's complement. As already pointed out by Peter Fenwick in his original proposal, this is not portable to other signed integer representations. However,
i &= i-1; // strips off the LSB
is (it also works with one's complement and signed magnitude representations) and has one fewer operations.
However there appears to be no simple portable alternative for adding the LSB.
i & -i is the easiest way to get the least significant bit (LSB) for an integer i.
You can read more here.
A1: You can read more about 'Mathematical Equivalents' here.
A2: If the negative integer representation is not the usual standard form (i.e. weird big integers), then i & -i might not be LSB.
The easiest way to think of it is in terms of the mathematical equivalence:
-i == (~i + 1)
So -i inverts the bits of the value and then adds 1. The significance of this is that all the lower 0 bits of i are turned into 1s by the ~i operation, so adding 1 to the value causes all those low 1 bits to flip to 0 whilst carrying the 1 upwards until it lands in a 0 bit, which will just happen to be the same position as the lowest 1 bit in i.
Here's an example for the number 6 (0110 in binary):
i = 0110
~i == 1001
(~i + 1) == 1010
i & (~i + 1) == 0010
You may need to do each operation manually a few times before you realise the patterns in the bits.
Here's two more examples:
i = 1000
~i == 0111
(~i + 1) == 1000
i & (~i + 1) == 1000
i = 1100
~i == 0011
(~i + 1) == 0100
i & (~i + 1) == 0100
See how the + 1 causes a sort of 'bit cascade' carrying the one up to the first open 0 bit?
So if (i & -i) is a means of extracting the lowest 1 bit, then it follows that the use cases of i += (i & -i) and i -= (i & -i) are attempts to add and subtract the lowest 1 bit of a value.
Subtracting the lowest 1 bit of a value from itself serves as a means to zero out that bit.
Adding the lowest 1 bit of a value to itself doesn't appear to have any special purpose, it just does what it says on the tin.
It should be portable on any system using two's complement.

relation between size of types and their range of values? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
In c++ or any other language, what is the relation between the size of types and the range of values they take?
E.g.- char has 1 byte size that means no. Of values it can store is 2^8. So why can it take values ranging from -128 to 127 only and why not larger values.
Is it related to bit pattern?
Or am I misunderstanding this thing. I am new to programming and i grasp the concepts fast but m stuck here in this concept!!
Please explain this in relation to floating point types too!! Thanks in advance
Start with the basic idea of the number of states. A bit has two states - 0 and 1. Two bits have four possible states: 00, 01, 10, and 11. For three bits the number of states is eight:
000 001 010 011 100 101 110 111
The pattern should emerge by now: adding an extra bit doubles the number of states that a group of bits can take. This is easy to see: if the number of states of k bits is N, then for k+1 bits there's N states for when the added bit is 0 and N more states for when it is 1, or N+N altogether. Hence, k bits can have 2k states.
Bytes are groups of 8 bits, so the number of states a byte could have is 2k, which is 256. If you use a byte to represent an unsigned value, its range would be 0..255, inclusive. For signed values one bit is taken to represent the sign. In two's complement representation the value range becomes -128..127. Negative values allow one extra value, because non-negative part of the range includes zero, while negative part of the range does not have a zero.
Its easy, variable of datatype has 2^(sizeof(datatype) * CHAR_BIT) values. Now it depends if this datatype is signed or unsigned.
signed has 0 .. ((2^(sizeof(datatype) * CHAR_BIT))-1) values.
unsigned has -((2^(sizeof(datatype) * CHAR_BIT))/2) .. +((2^(sizeof(datatype) * CHAR_BIT)/2)-1) values.
char datatype
2^8 is 256
where
-128..127 has 256 values
for signed char and unsigned char has range
0..255, still 256 values.
Byte is sequence of 8 bits.
+---+---+---+---+---+---+---+---+
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+---+---+---+---+---+---+---+---+
2^7 2^6 2^5 2^4 2^3 2^2 2^1 2^0
The highest bit (in little bit endian) indicates whether value is 0 - positive or 1 - negative, the rest of bits are for value.
Then you have
+---+---+---+---+---+---+---+---+
| 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | < Max positive number
+---+---+---+---+---+---+---+---+
2^7 2^6 2^5 2^4 2^3 2^2 2^1 2^0
and
+---+---+---+---+---+---+---+---+
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | < Max negative number
+---+---+---+---+---+---+---+---+
2^7 2^6 2^5 2^4 2^3 2^2 2^1 2^0
Zero's becouse numbers are usually represented in two's complement.
Convertion from two's complement is following
1. Invert all bits -> |0|1|1|1|1|1|1|1| -> 127
2. Add 1 -> |1|0|0|0|0|0|0|0| -> 128
3. Change sign -> -> -128

C++ Inverted Weighted Shuffle/Random

I have a list of weighted objects i.e.:
A->1 B->1 C->3 D->2 E->3
is there an efficient algorithm in C++ to pick random elements according to their weight?
For example The possibility that element A or B with a lower weighting is picked is higher (30%) than the possibility that the algorithm selects elements C E (10%) or D (20%)
As #Dukeling said, we need more info. Like how you interpret and use the selection chance.
At least in the field of evolutionary algorithm, fitness scaling (or selection chance scaling) is a sizable topic.
Suppose you start with badness score
B[i] = how badly you don't want to select the i-th item
And the objective is to calculate fitness/selection score S[i] which I assume you are to use it in roulette wheel fashion.
As you say, one obvious way is to use multiplicative inverse:
S[i] = 1 / B[i]
However, there might be a little problem with that.
The the same amount of change in B[i] with low value has so much more impact than the same amount of change when B[i] already has high value.
Ask yourself this:
Say
B[1] = 1 -> S[1] = 1
B[2] = 2 -> S[2] = 0.5
So item 1 is twice times as likely to be selected compared to item 2
But with the same amount of change
B[3] = 1000 -> S[3] = 0.001
B[4] = 1001 -> S[4] = 0.000999001
Item 3 is only 1.001 times as likely to be selected compared to item 4
I'll just throw one possible alternative scheme here for now.
S[i] = max(B) - B[i] + 1
The + 1 part helps so no item has zero chance to be selected.
This ends the part of calculating selection score.
Next, let's clear up how to use the selection score in roulette wheel fashion.
Assume we decided to use the additive inverse scheme.
B[1] = 1 -> S[1] = 1001
B[2] = 2 -> S[2] = 1000
B[3] = 1000 -> S[3] = 2
B[4] = 1001 -> S[4] = 1
Then imagine each point in the score is correspond to a lottery ticket.
Let's assign the ticket a running IDs.
| Item | Score = #ticket | ticket ID | win chance |
| 1 | 1001 | 0 to 1000 | 1001/2004 ~ 0.499500998 |
| 2 | 1000 | 1001 to 2000 | 1000/2004 ~ 0.499001996 |
| 3 | 2 | 2001 to 2002 | 2/2004 ~ 0.000998004 |
| 4 | 1 | 2003 to 2003 | 1/2004 ~ 0.000499002 |
There are 2004 tickets in total.
To do a selection, pick the winning ticket ID at random i.e. the random range is [0,2004).
Binary search can be used to quickly look up which item owns the winning ticket as you have already seen in this question. What needs to be looked up with binary search are the boundary values of ticket ID which are 1001,2001,2003 rather than the score themselves.
For comparison, here is the selection chance in case the multiplicative inverse scheme is used.
| Item | win chance |
| 1 | 1/1.501999001 ~ 0.665779404 |
| 2 | 0.5/1.501999001 ~ 0.332889702 |
| 3 | 0.001/1.501999001 ~ 0.000665779 |
| 4 | 0.000999001/1.501999001 ~ 0.000665114 |
You can notice that in the additive inverse scheme, 1 unit of badness consistently corresponds to around a difference of 0.0005 in selection chance.
Whereas in multiplicative inverse scheme, 1 unit of badness results in varying difference of selection chance.

Combinational Circuit with LED Lighting

Combinational Circuit design question.
A
____
| |
F | | B
| |
____
| G |
E | | C
| |
____
D
Suppose this is a LED display. It would take input of 4 bit
(0000)-(1111) and display the Hex of it. For example
if (1100) come in it would display C by turning on AFED and turning off BCG.
If (1010) comes in it would display A by turning on ABCEFG
and turn off D.
These display will all be Capital letters so there is no visual
difference between 0 and D and 8 and B.
Develop a truth table and an optimized expression using Karnaugh Maps.
I'm not exactly sure how to begin. For the truth table would I be using (w,x,y,z) as input variable or just the ABCDEFG variable since it's the one turning on and off?
input (1010)-->A--> ABCEFG~D (~ stand for NOT)
input (1011)-->B--> ABCDEFG
input (1100)-->C--> ADEF~B~C~G
So would I do for all hex 0-F then that would give me the min. term canonical then use Karnaugh Map to optimize it? Any help would be grateful!
1) Map your lights to bits:
ABCDEFG, so truth table will be:
ABCDEFG
input (1010)-->A-->1110110
and so on.
You will have big table (with 16 rows).
2) Then follow sample on wikipedia for every output light.
You need to do 7 of these: Each for one segment in the 7-segment display.
This figure is for illustration only. It doesn't necessarily map to any segment in your problem.
cd=00 01 11 10 <-- where abcd = 0000 for 0 : put '1' if the light is on
ab= 00 1 1 1 1 = 0001 for 1 : put '0' if it's off for
ab= 01 1 1 1 0 = 0010 for 2 ... the given segment
ab= 11 0 1 1 1
ab= 10 1 1 1 0 = 1111 for f
^^^^ = d=1 region
^^^^ = c==1 region
The two middle rows represent "b==1" region and the two last rows are a==1 region.
From that map find maximum size rectangles (that are of size [1,2 or 4] x [1, 2 or 4]); that can be overlapping. The middle 2x4 region is coded as 'd'. The top row is '~a~b'. The top left 2x2 square is '~a~c'. A bottom left square that wraps from row 4 to row 1 is '~b~c'. Finally the small 2x1 region that covers position x=4, y=3 is 'abc'.
This function would thus be 'd + ~a~b + ~a~c + ~b~c + abc'. If there are no redundant squares (that are completely covered by other squares), then this formula should be optimal canonical form. (not counting XOR operation). Repeat for 7 times for the real data!
Any selection/permutation of the variables should give the same logical circuit, whether you use abcd or dcba or acbd etc.