Number format construct? - c++

Outside the binary 01010, octal 01122, decimal integer 1234 and hexadecimal 0xFF, does any one have any idea or trick how to build a new number format? For example, 0x11AEGH has it's range from 0 - 9 to A - H. I'm going to build password generator, so it would be very helpful if anyone can put something on it that might help.
Firstly, Is there any function which can do this? Basically I want to convert 0x11AEGH to binary, octal, integer and so on...

Formatting a number in an N-ary system requires two things: an alphabet, and an ability to obtain results of integer division + the remainder.
Consider formatting a number in a base-26 system using the Latin alphabet. Repeatedly obtain the remainder R of division by 26, pick letter number R, and add it to the front of the number that you are formatting. Integer-divide the number by 26, and use it in the next step of the algorithm. Stop when you reach zero.
For example, if you print 1234 in base-26, you can do it like this:
1234 % 26 is 12. Add M; 1234/26 is 47
47 % 26 is 21. Add V; 47 / 26 is 1
1 % 26 is 1. Add B. 1 / 26 is zero; stop.
So 1234 in base-26 is BVM.
To convert back, start from the front, and sequentially subtract the designated "zero" (A in case of the above example) from each digit, like this:
B-A is 1. Result is 1
V-A is 21. Result is 1*26+21, which is 47
M-A is 12. Result is 47*26+12, which is 1234.

You can probably use functions like Long.parseLong(String representation, int radix) and Long.toString(long value, int radix) as they are in Java. Other languages may also have the similar features.
This approach restricts the maximal possible radix (base) and you have no control over the characters representing different digits (alphanumeric chars are used). However it is a ready solution without any extra coding.

Related

Binary to Octal

How to write code that converters a input file of binary numbers into octal. I had to write a code that converts it to decimal which I did, but now I'm supposed to write a code that converts it to octal by grouping the binary numbers in groups of 3 and then calling the binary to decimal function.
For example 10100 would be grouped 10 | 100. So I'd call binary to decimal on 10 and 100 and get 2 for 10 and 4 for 100, then place the numbers together to get 24, which is 10100 decimal in octal.
However, I cannot figure out how to group the numbers. (The number is type string by the way). Any tips would help thanks.
Simply add leading zeroes to your string if its length is not divisble by three then group them and convert to octal.

Minimum description length and Huffman coding for two symbols?

I am confused about the interpretation of the minimum description length of an alphabet of two symbols.
To be more concrete, suppose that we want to encode a binary string where 1's occur with probability 0.80; for instance, here is a string of length 40, with 32 1's and 8 0's:
1 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 0 1 1 0 0 1
Following standard MDL analysis, we can encode this string using prefix codes (like Huffman's) and the code of encoding this string would be (-log(0.8) * 32 - log(0.2) * 8), which is lower than duplicating the string without any encoding.
Intuitively, it is "cheaper" to encode this string than some string where 1's and 0's occur with equal probability. However, in practice, I don't see why this would be the case. At the very least, we need one bit to distinguish between 1's and 0's. I don't see how prefix codes could do better than just writing the binary string without encoding.
Can someone help me clarify this, please?
I don't see how prefix codes could do better than just writing the
binary string without encoding.
You can't with prefix codes, unless you combine bits to make more symbols. For example, if you code every two bits, you now have four symbols with probabilities 0.64, 0.16, 0.16, and 0.04. That would be coded with 0, 10, 110, 111. That gives an average of 1.56 bits per symbol, or 0.7800 bits per original bit. We're getting a little closer to the optimal 0.7219 bits per bit (-0.2 log20.2 - 0.8 log20.8).
Do that for three-bit groupings, and you get 0.7280 bits per bit. Surprisingly close to the optimum. In this case, the code lengths just happen to group really nicely with the probabilities. The code is 1 bit (0) for the symbol with probability 0.512, 3 bits (100, 101, 110) for the three symbols with probability 0.128, and 5 bits (11100, 11101, 11110, 11111) for both the three symbols with probability 0.032 and the one symbol with probability 0.008.
You can keep going and get asymptotically close to the optimal 0.7219 bits per bit. Though it becomes more inefficient in time and space for larger groupings. The Pareto Front turns out to be at multiples of three bits up through 15. 6 bits gives 0.7252 bits per bit, 9 gives 0.7251, 12 is 0.7250, and 15 is 0.7249. The approach is monumentally slow, where you need to go to 28 bits to get to 0.7221. So you might as well stop at 6. Or even just 3 is pretty good.
Alternatively you can use something other than prefix coding, such as arithmetic coding, range coding, or asymmetric numeral system coding. They effectively use fractional bits for each symbol.

Discard part of lines and save arbitrary long numbers

I have a file.txt like:
bqnt := 31
hqnt := 159
tqnt := 2
source := (6,53)
speed := 59
Where 6 and 53 are a coordinates and all the numbers that appear can have an arbitrary number of digits.
How can I save all the numbers that appear in a file.txt like this?
Next to the obvious solution, where you just keep the numbers in their string format, you can use an arbitrary precision integer library like the GNU Multiple Precision Arithmetic Library, which allows you to manipulate integers of arbitrary length.
The problem seems poorly worded, do you want to keep 6 and 53 or just one of them? If you just want to accumulate strings of numeric characters, isidigit comes to mind.
You read each of the characters one at a time, if isdigit(c) you keep it, if not, you keep going, when you see a non-digit character, you move to the next line in your output, or you move to the next index in your array.

Create a sequence which is ordered by bits set

I'm looking for a reversible function unsigned f(unsigned) for which the number of bits set in f(i) increases with i, or at least does not decrease. Obviously, f(0) has to be 0 then, and f(~0) must come last. In between there's more flexibility. After f(0), the next 32* values must be 1U<<0 to 1U<<31, but I don't care a lot about the order (they all have 1 bit set).
I'd like an algorithm which doesn't need to calculate f(0)...f(i-1) in order to calculate f(i), and a complete table is also unworkable.
This is similar to Gray codes, but I can't see a way to reuse that algorithm. I'm trying to use this to label a large data set, and prioritize the order in which I search them. The idea is that I have a key C, and I'll check labels C ^ f(i). Low values of i should give me labels similar to C, i.e. differing in only a few bits.
[*] Bonus points for not assuming that unsigned has 32 bits.
[example]
A valid initial sequence:
0, 1, 2, 4, 16, 8 ... // 16 and 8 both have one bit set, so they compare equal
An invalid initial sequence:
0, 1, 2, 3, 4 ... // 3 has two bits set, so it cannot precede 4 or 2147483648.
Ok, seems like I have a reasonable answer. First let's define binom(n,k) as the number of ways in which we can set k out of n bits. That's the classic Pascal triangle:
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7 1
1 8 28 56 70 56 28 8 1
...
Easily calculated and cached. Note that the sum of each line is 1<<lineNumber.
The next thing we'll need is the partial_sum of that triangle:
1 2
1 3 4
1 4 7 8
1 5 11 15 16
1 6 16 26 31 32
1 7 22 42 57 63 64
1 8 29 64 99 120 127 128
1 9 37 93 163 219 247 255 256
...
Again, this table can be created by summing two values from the previous line, except that the new entry on each line is now 1<<line instead of 1.
Let's use these tables above to construct f(x) for an 8 bits number (it trivially generalizes to any number of bits). f(0) still has to be 0. Looking up the 8th row in the first triangle, we see that next 8 entries are f(1) to f(9), all with one bit set. The next 28 entries (7+6+5+4+3+2+1) all have 2 bits set, so that's f(10) to f(37). The next 56 entries, f(38) to f(93) have 3 bits, and there are 70 entries with 4 bits set. From symmetry we can see that they're centered around f(128), in particular they're f(94) to f(163). And obviously, the only number with 8 bits set sorts last, as f(255).
So, with these tables we can quickly determine how many bits must be set in f(i). Just do a binary search in the last row of your table. But that doesn't answer exactly which bits are set. For that we need the previous rows.
The reason that each value in the table can be created from the previous line is simple. binom(n,k) == binom(k, n-1) + binom(k-1, n-1). There are two sorts of numbers with k bits set: Those that start with a 0... and numbers which start with 1.... In the first case, the next n-1 bits must contain those k bits, in the second case the next n-1 bits must contain only k-1 bits. Special cases are of course 0 out of n and n out of n.
This same stucture can be used to quickly tell us what f(16) must be. We already had established that it must contain 2 bits set, as it falls in the range f(10) - f(37). In particular, it's number 6 with 2 bits set (starting as usual with 0). It's useful to define this as an offset in a range as we'll try to shrink the length this range from 28 down to 1.
We now subdivide that range into 21 values which start with a zero and 7 which start a one. Since 6 < 21, we know that the first digit is a zero. Of the remaining 7 bits, still 2 need to be set, so we move up a line in the triangle and see that 15 values start with two zeroes, and 6 start with 01. Since 6 < 15, f(16) starts with 00. Going further up, 7 <= 10 so it starts with 000. But 6 == 6, so it doesn't start with 0000 but 0001. At this point we change the start of the range, so the new offset becomes 0 (6-6)
We know need can focus only on the numbers that start with 0001 and have one extra bit, which are f(16)...f(19). It should be obvious by know that the range is f(16)=00010001, f(17)=00010010, f(18)=00010100, f(19)=00011000.
So, to calculate each bit, we move one row up in the triangle, compare our "remainder", add a zero or one based on the comparison possibly go left one column. That means the computational complexity of f(x) is O(bits), or O(log N), and the storage needed is O(bits*bits).
For each given number k we know that there are binom(n, k) n-bit integers that have exactly k bits of value one. We can now generate a lookup table of n + 1 integers that store for each k how many numbers have less one bits. This lookup table can then be used to find the number o of one bits of f(i).
Once we know this number we subtract the lookup table value for this number of bits from i which leaves us with the permutation index p for numbers with the given number of one bits. Altough I have not done research in this area I am quite sure that there exists a method for finding the pth permutation of a std::vector<bool> which is initialized with zeros and o ones in the lowest bits.
The reverse function
Again the lookup table comes in handy. We can directly calculate the number of preceding numbers with less one bits by counting the one bits in the input integer and reading in the lookup table. Then you "only" need to determine the permutation index and add it to the looked up value and you are done.
Disclaimer
Of course this is only a rough outline and some parts (especially involving the permutations) might take longer than it sounds.
Addition
You stated yourself
I'm trying to use this to label a large data set, and prioritize the order in which I search them.
Which sounds to me as if you would be going from the low hamming distance to the high hamming distance. In this case it would be enough to have an incremental version which generates the next number from the previous:
unsigned next(unsigned previous)
{
if(std::next_permutation(previous))
return previous;
else
return (1 << (1 + countOneBits(previous))) - 1;
}
Of course std::next_permutation permutation does not work this way but I think it is clear how I mean to use it.

Prime backwards in order

An emirp (prime spelled backwards) is a pime number whose reversal is also prime. Ex. 17 & 71. I have to write a program that displays the first 100 emirps. It has to display 10 numbers per line and align the numbers properly:
2 3 5 7 11 13 17 31 37 71
73 79 97 101 107 113 131 149 151 157.
I have no cue what I am doing and would love if anyone could dump this down for me.
It sounds like there are two general problems:
Finding the emirps.
Formatting the output as required.
Break down your tasks into smaller parts, and then you'll be able to see more clearly how to get the whole task done.
To find the emirps, first write some helper functions:
is_prime() to determine whether a number is prime or not
reverse_digits() to reverse the digits of any number
Combining these two functions, you can imagine a loop that finds all the numbers that are primes both forward and reversed. Your first task is complete when you can simply generate a list of those numbers, printing them to the console one per line.
Next, work out what format you want to use (it looks like a fixed format of some number of character spaces per number is what you need). You know that you have 100 numbers, 10 per line, so working out how to format the numbers should be straightforward.
Break the problem down into simpler sub-problems:
Firstly, you need to check whether a number is prime. This is such a common task that you can easily Google it - or try a naive implementation yourself, which may be better given that this is homework.
Secondly, you need to reverse the digits of a number. I'd suggest you figure out an algorithm for this on a piece of paper first, then implement it in code.
Put the two together - it shouldn't be that hard.
Format the results properly. Printing 10 numbers per line is something you should be able to figure out easily once the rest is done.
Once you have a simple version working you might be able to optimise it in some way.
A straight forward way of checking if a number is prime is by trying all known primes less than it and seeing if it divides evenly into that number.
Example: To find the first couple of primes
Start off with the number 2, it is prime because the only divisors are itself and 1, meaning the only way to multiple two numbers to get 2 is 2 x 1. Likewise for 3.
So that starts us off with two known primes 2 and 3. To check the next number we can check if 4 modulo 2 equals 0. This means when divide 2 into 4 there is no remainder, which means 2 is a factor of 4. Specifically we know 2 x 2 = 4. Thus 4 is not prime.
Moving on to the next number: 5. To check five we try 5 modulo 2 and 5 modulo 3, both of which equals one. So 5 is prime, add it to our list of known primes and then continue on checking the next number. This rather tedious process is great for a computer.
So on and so forth - check the next number by looping through all previous found primes and check if they divide evenly, if all previously found primes don't divide evenly, you have a new prime. Repeat.
You can speed this up by counting by 2's, since all even numbers are divisible by two. Also, another nice trick is you don't have to check any primes greater than the square root of the number, since anything larger would need a smaller prime factor. Cuts your loops in half.
So that is your algorithm to generate a large list of primes.
Collect a good chunk of them in an array, say the first 10,000 or so. And then loop through them, reverse the numbers and see if the result is in your array. If so you have a emirp, continue until you get the first 100 emirps
If the first 10,000 primes don't return 100 emirps. Move on to the next 10,000. Repeat.
For homework, I would use a fairly simplistic isPrime function, pseudo-code along the lines of:
def isPrime (num):
set testDiv1 to 2
while testDiv1 multiplied by testDiv1 is less than or equal to num:
testDiv2 = integer part of (num divided by testDiv1)
if testDiv1 multiplied by testDiv2 is equal to num:
return true
Add 1 to testDiv1
return false
This basically checks whether the number is evenly divisible by any number between 2 and the square root of the number, a primitive primality check. The reson you stop at the square root is because you would have already found a match below it if there was one above it.
For example 100 is 2 times 50, 4 times 25, 5 time 20 and 10 times 10. The next one after that would be 20 times 5 but you don't need to check 20 since it would have been found when you checked 5. Any positive number can be expressed as a product of two other positive numbers, one below the square root and one above (other than the exact square root case of course).
The next tricky bit is the reversal of digits. C has some nice features which will make this easier for you, the pseudo-code is basically:
def reverseDigits (num):
set newNum to zero
while num is not equal to zero:
multiply newnum by ten
add (num modulo ten) to newnum
set num to the integer part of (num divided by ten)
return newNum
In C, you can use int() for integer parts and % for the modulo operator (what's left over when you divide something by something else - like 47 % 10 is 7, 9 % 4 is 1, 1000 % 10 is 0 and so on).
The isEmirp will be a fairly simplistic:
def isEmirp (num):
if not isPrime (num):
return false
num2 = reverseDigits (num)
if not isPrime (num2):
return false
return true
Then at the top level, your code will look something like:
def mainProg:
create array of twenty emirps
set currEmirp to zero
set tryNum to two
while currEmirp is less than twenty
if isEmirp (tryNum):
put tryNum into emirps array at position currEmirp
add 1 to currEmirp
for currEmirp ranging from 0 to 9:
print formatted emirps array at position currEmirp
print new line
for currEmirp ranging from 10 to 19:
print formatted emirps array at position currEmirp
print new line
Right, you should be able to get some usable code out of that, I hope. If you have any questions of the translation, leave a comment and I'll provide pointers for you, rather than solving it or doing the actual work.
You'll learn a great deal more if you try yourself, even if you have a lot of trouble initially.