Write a function to take 8 bits from an array and turn it into Decimal - c++

I'm trying to write a function that takes 8 bits from an array that is 6x24 (just consider it taking a byte 1 bit at a time) and convert it to a decimal integer. Meaning there should be 18 numbers in total. Here is my code
int bitArray[6][24]; //the Array of bits, can only be a 1 or 0
int ex=0; //ex keeps track of the current exponent to use to calculate the decimal value of a binary digit
int decArray[18]; //array to store decimals
int byteToDecimal(int pos, int row) //takes two variables so you can give it an array column and row
{
numholder=0; //Temporary number for calculations
for(int x=pos; x<pos+8;x++) //pos is used to adjust where we start and stop looking at 1's and 0's in a row
{
if(bitArray[row][x] != 0)//if the row and column is a 1
{
numholder += pow(2, 7-ex);//2^(7-ex), meaning the first bit is worth 2^7, and the last is 2^0
}
ex++;
}
ex=0;
return numholder;
}
Then you can call the function like so
decArray[0]=byteToDecimal(0,0);
decArray[1]=byteToDecimal(8,0);
decArray[2]=byteToDecimal(16,0);
decArray[3]=byteToDecimal(0,1);
decArray[4]=byteToDecimal(8,1);
decArray[5]=byteToDecimal(16,1);
ect. When I place a single 1 into bitArray[0][0], calling the function gives me the number 127, when it should be 128.

Apparently bitArray (or at least the bytes involved) is not filled with zeros. The reason for that may vary. Most likely you have some leftover from previous operations with it. The second (insane) reason is that maybe Arduino C compiler doesn't initialize static objects with zeros (I've hever had experience with Arduino so I can't tell for sure).
In any case, try to call memset(bitArray, 0, sizeof(bitArray)) before you perform other operations with it.
Here is a demo written in plain C, demonstrating that normally your code should work fine.

Related

How to convert an arbitrary length unsigned int array to a base 10 string representation?

I am currently working on an arbitrary size integer library for learning purposes.
Each number is represented as uint32_t *number_segments.
I have functional arithmetic operations, and the ability to print the raw bits of my number.
However, I have struggled to find any information on how I could convert my arbitrarily long array of uint32 into the correct, and also arbitrarily long base 10 representation as a string.
Essentially I need a function along the lines of:
std::string uint32_array_to_string(uint32_t *n, size_t n_length);
Any pointers in the right direction would be greatly appreciated, thank you.
You do it the same way as you do with a single uint64_t except on a larger scale (bringing this into modern c++ is left for the reader):
char * to_str(uint64_t x) {
static char buf[23] = {0}; // leave space for a minus sign added by the caller
char *p = &buf[22];
do {
*--p = '0' + (x % 10);
x /= 10;
} while(x > 0);
return p;
}
The function fills a buffer from the end with the lowest digits and divides the number by 10 in each step and then returns a pointer to the first digit.
Now with big nums you can't use a static buffer but have to adjust the buffer size to the size of your number. You probably want to return a std::string and creating the number in reverse and then copying it into a result string is the way to go. You also have to deal with negative numbers.
Since a long division of a big number is expensive you probably don't want to divide by 10 in the loop. Rather divide by 1'000'000'000 and convert the remainder into 9 digits. This should be the largest power of 10 you can do long division by a single integer, not bigum / bignum. Might be you can only do 10'000 if you don't use uint64_t in the division.

Shift left/right adding zeroes/ones and dropping first bits

I've got to program a function that receives
a binary number like 10001, and
a decimal number that indicates how many shifts I should perform.
The problem is that if I use the C++ operator <<, the zeroes are pushed from behind but the first numbers aren't dropped... For example
shifLeftAddingZeroes(10001,1)
returns 100010 instead of 00010 that is what I want.
I hope I've made myself clear =P
I assume you are storing that information in int. Take into consideration, that this number actually has more leading zeroes than what you see, ergo your number is most likely 16 bits, meaning 00000000 00000001 . Maybe try AND-ing it with number having as many 1 as the number you want to have after shifting? (Assuming you want to stick to bitwise operations).
What you want is to bit shift and then limit the number of output bits which can be active (hold a value of 1). One way to do this is to create a mask for the number of bits you want, then AND the bitshifted value with that mask. Below is a code sample for doing that, just replace int_type with the type of value your using -- or make it a template type.
int_type shiftLeftLimitingBitSize(int_type value, int numshift, int_type numbits=some_default) {
int_type mask = 0;
for (unsigned int bit=0; bit < numbits; bit++) {
mask += 1 << bit;
}
return (value << numshift) & mask;
}
Your output for 10001,1 would now be shiftLeftLimitingBitSize(0b10001, 1, 5) == 0b00010.
Realize that unless your numbits is exactly the length of your integer type, you will always have excess 0 bits on the 'front' of your number.

C++ Radix sort algorithm

Trying to understand radix sort for my data structures class. My teacher showed us a sample of radix sort in C++. I don't understand what the for loop for the digits does, she said something about maximum digits. Also when I try this in VS it says log10 is an ambiguous call to an overloaded function.
void RadixSort(int A[], int size)
{
int d = 1;
for(int i = 0; i < size; ++i)
{
int digits_temp;
digits_temp=(int)log10(abs(A[i]!=0 ? abs(A[i]) : 1)) +1;
if(digits_temp > d)
d = digits_temp;
}
d += 1;
*rest of the implementation*
}
Can anyone explain what this for loop does and why i get that ambiguous call error? Thanks
That piece of code is just a search for the number of digits needed for the "longest" integer; that's probably needed to allocate some buffer later.
log10 gives you the power of ten that corresponds to its argument, which, rounded to the next integer (hence the +1 followed by the (int) cast, which results in truncation), gives you the number of digits required for the number.
The argument of log10 is a bit of a mess, since abs is called twice when just once would suffice. Still, the idea is to pass to log10 the absolute value of the number being examined if it's not zero, or 1 if it is zero - this because, if the argument were zero, the logarithm would diverge to minus infinity (which is not desirable in this case, I think that the conversion to int would lead to strange results).
The rest of the loop is just the search for the maximum: at each iteration it calculates the digits needed for the current int being examined, checks if it's bigger than the "current maximum" (d) and, if it is, it replaces the "current maximum".
The d+=1 may be for cautionary purposes (?) or for the null-terminator of the string being allocated, it depends on how d is used afterward.
As for the "ambiguous call" error: you get it because you are calling log10 with an int argument, which can be converted equally to float, double and long double (all types for which log10 is overloaded), so the overload to choose is not clear to the compiler. Just stick a (double) cast before the whole log10 argument.
By the way, that code could have been simplified/optimized by just looking for the maximum int (in absolute value) and then taking the base-10 logarithm to discover the number of digits needed.
Log base 10 + 1 gives you the total number of digits present in a number.
Essentially here, you are checking every element in the array A[] and if the element is == 0 you store 1 in the digits_temp variable.
You initialize d = 1 as a number should have atleast 1 digit, and if it has more than 1 you replace it with the number of digits calculated.
Hope that helps.
There are 3 types of definition for log10 function which are float,double,long double input.
log10( static_cast<double> (abs(A[i]!=0 ? abs(A[i]) : 1)) );
So you need to static cast it as double to avoid the error.
(int)log10(x)+1 gives the number of digit present in that number.
Rest is simple implementation of Radix Sort
You see the warning because log10 is defined for float, double and long double but not integer and it's being called with a integer. The compiler can convert the int into any of those types so the call is ambiguous.
The for loop is doing a linear search for the maximum of digits in any of the numbers in the array. It is unnecessarily complicated and slow because you can simply searched for the largest absolute value in A then taken the log10 of that.
void RadixSort(int A[], int size)
{
int max_abs = 1;
for(int i = 0; i < size; ++i)
{
if(abs(A[i] > max_abs)
max_abs = abs(A[i]);
}
int d += log10(float(max_abs));
/* rest of the implementation */
}
Rest of code is missing so cant exactly determined usage.
But basically Radix sort goes over all INTEGERS and sort them comparing Digit Digit starting from least significant upwards.
the first part of code only determines the max digit count+1 from integers in array, this could be used to normalize all numbers to same length for easy handling.
i.e (1,239,2134) to (0001,0239,2134)

Analysis of the usage of prime numbers in hash functions

I was studying hash-based sort and I found that using prime numbers in a hash function is considered a good idea, because multiplying each character of the key by a prime number and adding the results up would produce a unique value (because primes are unique) and a prime number like 31 would produce better distribution of keys.
key(s)=s[0]*31(len–1)+s[1]*31(len–2)+ ... +s[len–1]
Sample code:
public int hashCode( )
{
int h = hash;
if (h == 0)
{
for (int i = 0; i < chars.length; i++)
{
h = MULT*h + chars[i];
}
hash = h;
}
return h;
}
I would like to understand why the use of even numbers for multiplying each character is a bad idea in the context of this explanation below (found on another forum; it sounds like a good explanation, but I'm failing to grasp it). If the reasoning below is not valid, I would appreciate a simpler explanation.
Suppose MULT were 26, and consider
hashing a hundred-character string.
How much influence does the string's
first character have on the final
value of 'h'? The first character's value
will have been multiplied by MULT 99
times, so if the arithmetic were done
in infinite precision the value would
consist of some jumble of bits
followed by 99 low-order zero bits --
each time you multiply by MULT you
introduce another low-order zero,
right? The computer's finite
arithmetic just chops away all the
excess high-order bits, so the first
character's actual contribution to 'h'
is ... precisely zero! The 'h' value
depends only on the rightmost 32
string characters (assuming a 32-bit
int), and even then things are not
wonderful: the first of those final 32
bytes influences only the leftmost bit
of `h' and has no effect on the
remaining 31. Clearly, an even-valued
MULT is a poor idea.
I think it's easier to see if you use 2 instead of 26. They both have the same effect on the lowest-order bit of h. Consider a 33 character string of some character c followed by 32 zero bytes (for illustrative purposes). Since the string isn't wholly null you'd hope the hash would be nonzero.
For the first character, your computed hash h is equal to c[0]. For the second character, you take h * 2 + c[1]. So now h is 2*c[0]. For the third character h is now h*2 + c[2] which works out to 4*c[0]. Repeat this 30 more times, and you can see that the multiplier uses more bits than are available in your destination, meaning effectively c[0] had no impact on the final hash at all.
The end math works out exactly the same with a different multiplier like 26, except that the intermediate hashes will modulo 2^32 every so often during the process. Since 26 is even it still adds one 0 bit to the low end each iteration.
This hash can be described like this (here ^ is exponentiation, not xor).
hash(string) = sum_over_i(s[i] * MULT^(strlen(s) - i - 1)) % (2^32).
Look at the contribution of the first character. It's
(s[0] * MULT^(strlen(s) - 1)) % (2^32).
If the string is long enough (strlen(s) > 32) then this is zero.
Other people have posted the answer -- if you use an even multiple, then only the last characters in the string matter for computing the hash, as the early character's influence will have shifted out of the register.
Now lets consider what happens when you use a multiplier like 31. Well, 31 is 32-1 or 2^5 - 1. So when you use that, your final hash value will be:
\sum{c_i 2^{5(len-i)} - \sum{c_i}
unfortunately stackoverflow doesn't understad TeX math notation, so the above is hard to understand, but its two summations over the characters in the string, where the first one shifts each character by 5 bits for each subsequent character in the string. So using a 32-bit machine, that will shift off the top for all except the last seven characters of the string.
The upshot of this is that using a multiplier of 31 means that while characters other than the last seven have an effect on the string, its completely independent of their order. If you take two strings that have the same last 7 characters, for which the other characters also the same but in a different order, you'll get the same hash for both. You'll also get the same hash for things like "az" and "by" other than in the last 7 chars.
So using a prime multiplier, while much better than an even multiplier, is still not very good. Better is to use a rotate instruction, which shifts the bits back into the bottom when they shift out the top. Something like:
public unisgned hashCode(string chars)
{
unsigned h = 0;
for (int i = 0; i < chars.length; i++) {
h = (h<<5) + (h>>27); // ROL by 5, assuming 32 bits here
h += chars[i];
}
return h;
}
Of course, this depends on your compiler being smart enough to recognize the idiom for a rotate instruction and turn it into a single instruction for maximum efficiency.
This also still has the problem that swapping 32-character blocks in the string will give the same hash value, so its far from strong, but probably adequate for most non-cryptographic purposes
would produce a unique value
Stop right there. Hashes are not unique. A good hash algorithm will minimize collisions, but the pigeonhole principle assures us that perfectly avoiding collisions is not possible (for any datatype with non-trivial information content).

C/C++ Bit Array or Bit Vector

I am learning C/C++ programming & have encountered the usage of 'Bit arrays' or 'Bit Vectors'. Am not able to understand their purpose? here are my doubts -
Are they used as boolean flags?
Can one use int arrays instead? (more memory of course, but..)
What's this concept of Bit-Masking?
If bit-masking is simple bit operations to get an appropriate flag, how do one program for them? is it not difficult to do this operation in head to see what the flag would be, as apposed to decimal numbers?
I am looking for applications, so that I can understand better. for Eg -
Q. You are given a file containing integers in the range (1 to 1 million). There are some duplicates and hence some numbers are missing. Find the fastest way of finding missing
numbers?
For the above question, I have read solutions telling me to use bit arrays. How would one store each integer in a bit?
I think you've got yourself confused between arrays and numbers, specifically what it means to manipulate binary numbers.
I'll go about this by example. Say you have a number of error messages and you want to return them in a return value from a function. Now, you might label your errors 1,2,3,4... which makes sense to your mind, but then how do you, given just one number, work out which errors have occured?
Now, try labelling the errors 1,2,4,8,16... increasing powers of two, basically. Why does this work? Well, when you work base 2 you are manipulating a number like 00000000 where each digit corresponds to a power of 2 multiplied by its position from the right. So let's say errors 1, 4 and 8 occur. Well, then that could be represented as 00001101. In reverse, the first digit = 1*2^0, the third digit 1*2^2 and the fourth digit 1*2^3. Adding them all up gives you 13.
Now, we are able to test if such an error has occured by applying a bitmask. By example, if you wanted to work out if error 8 has occured, use the bit representation of 8 = 00001000. Now, in order to extract whether or not that error has occured, use a binary and like so:
00001101
& 00001000
= 00001000
I'm sure you know how an and works or can deduce it from the above - working digit-wise, if any two digits are both 1, the result is 1, else it is 0.
Now, in C:
int func(...)
{
int retval = 0;
if ( sometestthatmeans an error )
{
retval += 1;
}
if ( sometestthatmeans an error )
{
retval += 2;
}
return retval
}
int anotherfunc(...)
{
uint8_t x = func(...)
/* binary and with 8 and shift 3 plaes to the right
* so that the resultant expression is either 1 or 0 */
if ( ( ( x & 0x08 ) >> 3 ) == 1 )
{
/* that error occurred */
}
}
Now, to practicalities. When memory was sparse and protocols didn't have the luxury of verbose xml etc, it was common to delimit a field as being so many bits wide. In that field, you assign various bits (flags, powers of 2) to a certain meaning and apply binary operations to deduce if they are set, then operate on these.
I should also add that binary operations are close in idea to the underlying electronics of a computer. Imagine if the bit fields corresponded to the output of various circuits (carrying current or not). By using enough combinations of said circuits, you make... a computer.
regarding the usage the bits array :
if you know there are "only" 1 million numbers - you use an array of 1 million bits. in the beginning all bits will be zero and every time you read a number - use this number as index and change the bit in this index to be one (if it's not one already).
after reading all numbers - the missing numbers are the indices of the zeros in the array.
for example, if we had only numbers between 0 - 4 the array would look like this in the beginning: 0 0 0 0 0.
if we read the numbers : 3, 2, 2
the array would look like this: read 3 --> 0 0 0 1 0. read 3 (again) --> 0 0 0 1 0. read 2 --> 0 0 1 1 0. check the indices of the zeroes: 0,1,4 - those are the missing numbers
BTW, of course you can use integers instead of bits but it may take (depends on the system) 32 times memory
Sivan
Bit Arrays or Bit Vectors can be though as an array of boolean values. Normally a boolean variable needs at least one byte storage, but in a bit array/vector only one bit is needed.
This gets handy if you have lots of such data so you save memory at large.
Another usage is if you have numbers which do not exactly fit in standard variables which are 8,16,32 or 64 bit in size. You could this way store into a bit vector of 16 bit a number which consists of 4 bit, one that is 2 bit and one that is 10 bits in size. Normally you would have to use 3 variables with sizes of 8,8 and 16 bit, so you only have 50% of storage wasted.
But all these uses are very rarely used in business aplications, the come to use often when interfacing drivers through pinvoke/interop functions and doing low level programming.
Bit Arrays of Bit Vectors are used as a mapping from position to some bit value. Yes it's basically the same thing as an array of Bool, but typical Bool implementation is one to four bytes long and it uses too much space.
We can store the same amount of data much more efficiently by using arrays of words and binary masking operations and shifts to store and retrieve them (less overall memory used, less accesses to memory, less cache miss, less memory page swap). The code to access individual bits is still quite straightforward.
There is also some bit field support builtin in C language (you write things like int i:1; to say "only consume one bit") , but it is not available for arrays and you have less control of the overall result (details of implementation depends on compiler and alignment issues).
Below is a possible way to answer to your "search missing numbers" question. I fixed int size to 32 bits to keep things simple, but it could be written using sizeof(int) to make it portable. And (depending on the compiler and target processor) the code could only be made faster using >> 5 instead of / 32 and & 31 instead of % 32, but that is just to give the idea.
#include <stdio.h>
#include <errno.h>
#include <stdint.h>
int main(){
/* put all numbers from 1 to 1000000 in a file, except 765 and 777777 */
{
printf("writing test file\n");
int x = 0;
FILE * f = fopen("testfile.txt", "w");
for (x=0; x < 1000000; ++x){
if (x == 765 || x == 777760 || x == 777791){
continue;
}
fprintf(f, "%d\n", x);
}
fprintf(f, "%d\n", 57768); /* this one is a duplicate */
fclose(f);
}
uint32_t bitarray[1000000 / 32];
/* read file containing integers in the range [1,1000000] */
/* any non number is considered as separator */
/* the goal is to find missing numbers */
printf("Reading test file\n");
{
unsigned int x = 0;
FILE * f = fopen("testfile.txt", "r");
while (1 == fscanf(f, " %u",&x)){
bitarray[x / 32] |= 1 << (x % 32);
}
fclose(f);
}
/* find missing number in bitarray */
{
int x = 0;
for (x=0; x < (1000000 / 32) ; ++x){
int n = bitarray[x];
if (n != (uint32_t)-1){
printf("Missing number(s) between %d and %d [%x]\n",
x * 32, (x+1) * 32, bitarray[x]);
int b;
for (b = 0 ; b < 32 ; ++b){
if (0 == (n & (1 << b))){
printf("missing number is %d\n", x*32+b);
}
}
}
}
}
}
That is used for bit flags storage, as well as for parsing different binary protocols fields, where 1 byte is divided into a number of bit-fields. This is widely used, in protocols like TCP/IP, up to ASN.1 encodings, OpenPGP packets, and so on.