C++ Encoding Numbers

C++ Encoding Numbers - c++

I am currently working on sending data to a receiving party based on mod96 encoding scheme. Following is the request structure to be sent from my side:
Field Size Type
1. Message Type 2 "TT"
2. Firm 2 Mod-96
3. Identifier Id 1 Alpha String
4. Start Sequence 3 Mod-96
5. End Sequence 3 Mod-96
My doubt is that the sequence number can be greater than 3 bytes. Suppose I have to send numbers 123 and 123456 as start and end sequence numbers, how to encode it in mod 96 format . Have dropped the query to the receiving party, but they are yet to answer it. Can somebody please throw some light on how to go about encoding the numbers in mod 96 format.

Provided there's a lot of missing detail on what you really need, here's how works Mod-96 econding:
You just use printable characters as if they were digits of a number:
when you encode in base 10 you know that 123 is 10^2*1 + 10^1*2 + 10^0*3
(oh and note that you arbitrary choose that 1's value is really one: value('1') = 1
when you encode in base 96 you know that 123 is
96^2*value('1')+ 96^2*value('2')+96^0*value('3')
since '1' is the ASCII character #49 then value('1') = 49-32 = 17
Encoding 3 printable characters into a number
unsigned int encode(char a, char b, char c){
return (a-32)*96*96 + (b-32)*96 + (c-32);
}
Encoding 2 printable characters into a number
unsigned int encode(char a, char b){
return (b-32)*96 + (c-32);
}
Decoding a number into 2 printable characters
void decode( char* a, char*b, unsigned int k){
* b = k % 96 +32;
* a = k / 96 +32;
}
Decoding a number into 3 printable characters
void decode( char* a, char*b, char*c, unsigned int k){
* c = k % 96 +32;
k/= 96;
* b = k % 96 +32;
* a = k/96 +32;
}
You also of course need to check that characters are printable (between 32 and 127 included) and that numbers you are going to decode are less than 9216 (for 2 characters encoded) and 884736(for 3 characters encoded).
You know the final size would be 6 bytes:
Size 2 => max of 9215 => need 14 bits storage (values up to 16383 unused)
Size 3 => max of 884735 => need 17 bits storage (values up to 131071 unused)
Your packet needs 14+17+17 bits of memory (wich is 48 => exactly 6 bytes) bits storage just for Mod-96 stuff.
Observation:
Instead of 3 fields of sizes(2+3+3) we could have used one field of size(8) => we would finally use 47 bits ( but is still rounded up to 6 bytes)
If you still store each encoded number into a integer number of bytes you would use the same amount of memory (14 bits fits into 2 bytes, 17 bits fits into 3 bytes) used by storing chars directly.

Related

Why take up 21 bytes, on a 32-bit system, three pointers plus two numbers 5 * 4 = 20 (should be 20 bytes )

Why take up 21 bytes, on a 32-bit system, three pointers plus two numbers 5 * 4 = 20 (should be 20 bytes ah)
Thank you for your answer!!!
https://redis.com/ebook/part-2-core-concepts/01chapter-9-reducing-memory-use/9-1-short-structures/9-1-1-the-ziplist-representation/
enter image description here

Your book counts the terminating \0 byte at the end of "one\0" as overhead, bringing the total to 21.

Optimal way to compress 60 bit string

Given 15 random hexadecimal numbers (60 bits) where there is always at least 1 duplicate in every 20 bit run (5 hexdecimals).
What is the optimal way to compress the bytes?
Here are some examples:
01230 45647 789AA
D8D9F 8AAAF 21052
20D22 8CC56 AA53A
AECAB 3BB95 E1E6D
9993F C9F29 B3130
Initially I've been trying to use Huffman encoding on just 20 bits because huffman coding can go from 20 bits down to ~10 bits but storing the table takes more than 9 bits.
Here is the breakdown showing 20 bits -> 10 bits for 01230
Character Frequency Assignment Space Savings
0 2 0 2×4 - 2×1 = 6 bits
2 1 10 1×4 - 1×2 = 2 bits
1 1 110 1×4 - 1×3 = 1 bits
3 1 111 1×4 - 1×3 = 1 bits
I then tried to do huffman encoding on all 300 bits (five 60bit runs) and here is the mapping given the above example:
Character Frequency Assignment Space Savings
---------------------------------------------------------
a 10 101 10×4 - 10×3 = 10 bits
9 8 000 8×4 - 8×3 = 8 bits
2 7 1111 7×4 - 7×4 = 0 bits
3 6 1101 6×4 - 6×4 = 0 bits
0 5 1100 5×4 - 5×4 = 0 bits
5 5 1001 5×4 - 5×4 = 0 bits
1 4 0010 4×4 - 4×4 = 0 bits
8 4 0111 4×4 - 4×4 = 0 bits
d 4 0101 4×4 - 4×4 = 0 bits
f 4 0110 4×4 - 4×4 = 0 bits
c 4 1000 4×4 - 4×4 = 0 bits
b 4 0011 4×4 - 4×4 = 0 bits
6 3 11100 3×4 - 3×5 = -3 bits
e 3 11101 3×4 - 3×5 = -3 bits
4 2 01000 2×4 - 2×5 = -2 bits
7 2 01001 2×4 - 2×5 = -2 bits
This yields a savings of 8 bits overall, but 8 bits isn't enough to store the huffman table. It seems because of the randomness of the data that the more bits you try to encode with huffman the less effective it works. Huffman encoding seemed to work best with 20 bits (50% reduction) but storing the table in 9 or less bits isnt possible AFAIK.
In the worst-case for a 60 bit string there are still at least 3 duplicates, the average case there are more than 3 duplicates (my assumption). As a result of at least 3 duplicates the most symbols you can have in a run of 60 bits is just 12.
Because of the duplicates plus the less than 16 symbols, I can't help but feel like there is some type of compression that can be used

If I simply count the number of 20-bit values with at least two hexadecimal digits equal, there are 524,416 of them. A smidge more than 219. So the most you could possibly save is a little less than one bit out of the 20.
Hardly seems worth it.

If I split your question in two parts:
How do I compress (perfect) random data: You can't. Every bit is some new entropy which can't be "guessed" by a compression algorithm.
How to compress "one duplicate in five characters": There are exactly 10 options where the duplicate can be (see table below). This is basically the entropy. Just store which option it is (maybe grouped for the whole line).
These are the options:
AAbcd = 1 AbAcd = 2 AbcAd = 3 AbcdA = 4 (<-- cases where first character is duplicated somewhere)
aBBcd = 5 aBcBd = 6 aBcdB = 7 (<-- cases where second character is duplicated somewhere)
abCCd = 8 abCdC = 9 (<-- cases where third character is duplicated somewhere)
abcDD = 0 (<-- cases where last characters are duplicated)
So for your first example:
01230 45647 789AA
The first one (01230) is option 4, the second 3 and the third option 0.
You can compress this by multiplying each consecutive by 10: (4*10 + 3)*10 + 0 = 430
And uncompress it by using divide and modulo: 430%10=0, (430/10)%10=3, (430/10/10)%10=4. So you could store your number like that:
1AE 0123 4567 789A
^^^ this is 430 in hex and requires only 10 bit
The maximum number for the three options combined is 1000, so 10 bit are enough.
Compared to storing these 3 characters normally you save 2 bit. As someone else already commented - this is probably not worth it. For the whole line it's even less: 2 bit / 60 bit = 3.3% saved.

If you want to get rid of the duplicates first, do this, then look at the links at the bottom of the page. If you don't want to get rid of the duplicates, then still look at the links at the bottom of the page:
Array.prototype.contains = function(v) {
for (var i = 0; i < this.length; i++) {
if (this[i] === v) return true;
}
return false;
};
Array.prototype.unique = function() {
var arr = [];
for (var i = 0; i < this.length; i++) {
if (!arr.contains(this[i])) {
arr.push(this[i]);
}
}
return arr;
}
var duplicates = [1, 3, 4, 2, 1, 2, 3, 8];
var uniques = duplicates.unique(); // result = [1,3,4,2,8]
console.log(uniques);
Then you would have shortened your code that you have to deal with. Then you might want to check out Smaz
Smaz is a simple compression library suitable for compressing strings.
If that doesn't work, then you could take a look at this:
http://ed-von-schleck.github.io/shoco/
Shoco is a C library to compress and decompress short strings. It is very fast and easy to use. The default compression model is optimized for english words, but you can generate your own compression model based on your specific input data.
Let me know if it works!

max and min values that can be represented with a 5-digit

How do I find max and min values that can be represented with
a 5-digit number that is in base 13 assuming only positive integers
are represented? then the answer needs to be in base 10.
does 5 digit number mean 5bits? Isn't the smallest number that
can be represented a zero and largest is 2^(N-1)?

This sounds like homework, but I'll bite anyway :)
5 digits probably means 5 digits, as in 12345.
Base 13 means there are 13 possible digits where we as humans are used to calculate with 10.
We could represent the extra 3 digits with A, B, C so that the full range of possible digits is 0123456789ABC. With this representation, it's clear that the smallest 5-digit value is 00000 and the largest CCCCC.
To convert CCCCC in base 13 to base 10 you do
((((C * 13) + C) * 13 + C) * 13 + C) * 13 + C
=
((((12 * 13) + 12 ) * 13 + 12 ) * 13 + 12 ) * 13 + 12
=
371,292
00000 is of course zero in any base.

How do I predict the required size of a Base32 Decode output?

I have a std::string that is base32 encoded and I have a function that decodes it. The function takes a char* input, a char* destination and a length. How do I know what length I will need for the destination? I need to know what array to allocate for the destination. How do I determine the size?

Base32 allows to encode each 5 bits ( as 32 = 2^5 ) using single character.
It means that you need output buffer size for encoding:
dst_size = src_size * 8 / 5 (1.6 times larger)
But as base32 string length must be multiple of 40 bits:
dst_size = (src_size * 8 + 4) / 5
Thus, for decoding (base32->binary) required buffer size is accordingly
dst_size = ceil( src_size / 1.6 )

Actually, the encoded base32 string length is computed as follow :
ceil(bytesLength / 5.d) * 8
bytesLength / 5.f because we want to know how many chunks of 5 bytes we have, and ceil because 0.1 chunk is still 1 chunk
ceil(bytesLength / 5.f) * 8 because a chunk is made of 8 characters.
For the input data 'a' the encoded result will be ME====== because we have 1 chunk of 8 characters : two 5bits encoded characters (ME) 6 padding characters (======)
The same fashion, the decoded length is :
bytesLength * 5 / 8
But here bytesLength is not including the padding characters, thuse for ME====== bytelength is 2, giving 2 * 5 / 8 == 1 we only have 1 byte to decode.
For a visual explanation, see rfc4648 section 9 (page 11)

Octal to binary conversion confusion

I have a code in C++ which convert 2 digits octal number to binary number. For testing validity of the code I used several online conversion site like
this and
this
When I enter 58 or 59 in as an octal value it says invalid octal values but when I enter 58 in my code it gives binary number as - 101000. Again for testing I enter 101000 as binary number in above website's calculator then they gives me result 50 as octal value.
I need some explanation why this is so.
Here is the C++ code -
#include <iostream.h>
#include <conio.h>
void octobin(int);
void main()
{
clrscr();
int a;
cout << "Enter a 2-digit octal number : ";
cin>>a;
octobin(a);
getch();
}
void octobin(int oct)
{
long bnum=0;
int A[6];
//Each octal digit is converted into 3 bits, 2 octal digits = 6 bits.
int a1,a2,quo,rem;
a2=oct/10;
a1=oct-a2*10;
for(int x=0;x<6;x++)
{
A[x]=0;
}
//Storing the remainders of the one's octal digit in the array.
for (x=0;x<3;x++)
{
quo=a1/2;
rem=a1%2;
A[x]=rem;
a1=quo;
}
//Storing the remainders of the ten's octal digit in the array.
for(x=3;x<6;x++)
{
quo=a2/2;
rem=a2%2;
A[x]=rem;
a2=quo;
}
//Obtaining the binary number from the remainders.
for(x=x-1;x>=0;x--)
{
bnum*=10;
bnum+=A[x];
}
cout << "The binary number for the octal number " << oct << " is " << bnum << "." << endl;
}

Octal numbers have digits that are all in the range [0,7]. Thus, 58 and 59 are not octal numbers, and your method should be expected to give erroneous results.
The reason that 58 evaluates to 101000 is because the first digit of the octal number expands to the first three digits of the binary number. 5 = 101_2. Same story for the second part, but 8 = 1000_2, so you only get the 000 part.
An alternate explanation is that 8 = 0 (mod 8) (I am using the = sign for congruency here), so both 8 and 0 will evaluate to 000 in binary using your code.
The best solution would be to do some input validation. For example, while converting you could check to make sure the digit is in the range [0,7]

You cannot use 58 or 59 as an input value. It's octal, for Christ's sake.
Valid digits are from 0 to 7 inclusive.

If you're encoding a number in base 8, none of the octets can be 8 or greater. If you're going to do the code octet by octet, there needs to be a test to see whether the octet is 8 or 9, and to throw an error. Right now your code isn't checking this so the 8 and 9 are overflowing to 10.

58 and 59 aren't valid octal values indeed ... the maximum digit you can use is yourbase-1 :
decimal => base = 10 => Digits from 0 t 9
hexadécimal => base = 16 => Digits from 0 to 15 (well, 0 to F)
Octal => base = 8 => Digits from 0 to 7

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

C++ Encoding Numbers - c++

Related

Why take up 21 bytes, on a 32-bit system, three pointers plus two numbers 5 * 4 = 20 (should be 20 bytes )

Optimal way to compress 60 bit string

max and min values that can be represented with a 5-digit

How do I predict the required size of a Base32 Decode output?

Octal to binary conversion confusion

Categories

Resources