hash function for well-defined string c++ - c++

I have a string which will be exactly consist of numbers between 1-30 and one of 'R','T'or'M' char. Let me illustrate it by some examples.
string a="15T","1R","12M","24T","24M" ... // they are all valid for my string
Now I need to have a hash function which gives me a unique hash value for every input string. Since my input have a finite set I think it is possible.
Is there anyone who can tell what kind of hash function could I define ?
By the way, I'll create my hash table using vector therefore I guess size is not an important issue but I'll define 10000 as an upper bound. I mean I assume I can not have more than 10000 such a string
Thanks in advance.

Just have a large enough integer type and put the (maximal) three characters into the integer:
std::size_t hash(const char* s) {
std::size_t result = 0;
while(*s) {
result <<= 8;
result |= *s++;
}
return result;
}

You could define an algebraic function:
result = string[0] * 0x010000
+ string[1] * 0x000100
+ string[2];
Basically, each character fits into an uint8_t, which has a range of 256. So each column is a power of 256.
Yes, there are big gaps, but this insures a unique hash.
You could compress the gaps by using various "powers" for the different character columns.
Given "15T":
result = (string[0] - '0') * 10 // 10 == number of digits in the 2nd column
+ (string[1] - '0') * 3; // 3 == number of choices in 1st column.
switch (string[2])
{
case 'T' : result += 0; break;
case 'M' : result += 1; break;
case 'R' : result += 2; break;
}
It's a number / counting system where each column has a different number of digits.

Something along the line of:
unsigned myhash(const char * str)
{
int n = 0;
// Parse the number part
for ( ; *str >= '0' && *str <= '9'; ++str)
n = n * 10 + (*str - '0');
int c = *str == 'R' ? 0 :
*str == 'T' ? 1 :
*str == 'M' ? 2 :
3;
// Check for invalid strings
if ( c == 3 || n <= 0 || n > 30 || *(++str) != 0 )
{
// Some error or anything
// (Or replace the if condition with an assert)
throw std::runtime_error("Invalid string");
}
// Since 0 <= c < 3 and 0 <= (n-1) < 30
// There are only 90 possible values
return c * 30 + (n-1);
}
In my experience whenever you have to deal with something like this it is often better to do the opposite, that is work with integers and have a function to perform the opposite conversion if necessary.
You can rebuild the original string with:
int n = hash % 30 + 1;
int c = hash / 30; // 0 is 'R', 1 is 'T', 2 is 'M'

Related

I got stuck in this syntax of c++. "Check if the frequency of all the digits in a number is same"

I'm practicing a coding problem on "Check if the frequency of all the digits in a number is same"
#include<bits/stdc++.h>
using namespace std;
// returns true if the number
// passed as the argument
// is a balanced number.
bool isNumBalanced(int N)
{
string st = to_string(N);
bool isBalanced = true;
// frequency array to store
// the frequencies of all
// the digits of the number
int freq[10] = {0};
int i = 0;
int n = st.size();
for (i = 0; i < n; i++)
// store the frequency of
// the current digit
freq[st[i] - '0']++;
for (i = 0; i < 9; i++)
{
// if freq[i] is not
// equal to freq[i + 1] at
// any index 'i' then set
// isBalanced to false
if (freq[i] != freq[i + 1])
isBalanced = false;
}
// return true if
// the string is balanced
if (isBalanced)
return true;
else
return false;
}
// Driver code
int main()
{
int N = 1234567890;
bool flag = isNumBalanced(N);
if (flag)
cout << "YES";
else
cout << "NO";
}
but I can't understand this code:
// store the frequency of
// the current digit
freq[st[i] - '0']++;
How this part actually working and storing frequency?
And instead of this line, what else I can write?
st is a string and thus, a sequence of chars. st[i] is the ith char in this string.
Chars are actually positive integers between 0 and 256, so you can use them with mathematical operations, such as -. These integers are assigned to characters according to the ASCII alphabet. For example: The char 0 is assigned to 48 and the char 7 to 55 (Note: in the following, I use x to denote the character).
Their order makes it possible that mathematical operations are sensible as follows: The char 7 and the char 0 are exactly 7 numbers apart, so 0 + 7 = 48 + 7 = 55 = 7. So: 7 - 0 = 7.
So, you get the position in the freq array according to the number, i.e., the position 0 for 0 or position 7 for 7. The ++ operator increments that value in-place.
This line is several things condensed into one expression
freq[st[i] - '0']++;
The individual part are rather simple and in total it also isn't too difficult:
st[i] - '0' - character digits do not map 1 to 1 to integers. There is an offset. The integer value of '1' is 1 + '0', '2' is 2 + '0'. Hence to get the integer from the digit you need to subtract '0'.
freq[ ... ] - accesses the element of the array. Element at index i stores frequency of digit i.
()++ - increments that frequency by one.
Subtracting the '0' character from the single string character results in the actual number you're looking for. This gives you the number whose frequency you are tracking in your code. This works because of the way characters are stored as ASCII values. Check out the table below. Say that the integer value N that is passed in is 1221. The first value observed in this example is '1' which corresponds to an ASCII value of 49. The ASCII value of '0' is 48. Subtracting the two: 49 - 48 = 1. This allows you to access each integer value individually as part of the array that was the result of the transformation of an 'int' value into a string.
ASCII Table
The code of
for (i = 0; i < n; i++)
// store the frequency of
// the current digit
freq[st[i] - '0']++;
traverses the string and for each item, it subtracts '0', which has a value of 48, because character code 48 represents 0, character code 49 represents 1 and so on.
This code however is superfluos, wastes memory in storing a string and wastes time converting a number to a string. This is better:
bool isNumBalanced(int N)
{
//We create an array of 10 for each digit
int digits[10];
//Initialize the difits
for (int i = 0; i < 10; i++) digits[i] = 0;
//If the input is 0, then we have a trivial case
if (N == 0) return true;
//We loop the digits
do {
//N % 10 is the last digit
//We increment the frequency of that digit
digits[N % 10]++;
} while ((N /= 10) != 0); //We don't stop until we reach the trivial case, see above
//Using the transitivity of equality, we compare all values to the first
//We return false upon the first difference
for (int j = 1; j < 10; j++)
if (digits[0] != digits[j]) return false;
//Otherwise we return true
return true;
}
For those who don't understand it.
int arr[5]={0} // it stores 0 in all places
for(int i=0;i<5;i++){
arr[i]++;
} // Now the array is [1 1 1 1 1]
what happened here is first i=0 then arr[0]++ "here arr[0] value was 0, ++, it increment 0 to 1"
now arr[0] value is 1.
Now `
let
st="1221";
for (i = 0; i < 4; i++) {
freq[st[i] - '0']++;
for i=0, the freq location is : freq[49-48]++ = freq[1]++ means value of freq[1] is 1
for i=1, the freq location is : freq[50-48]++ = freq[2]++ means value of freq[2] is 1
for i=2, the freq location is : freq[50-48]++ = freq[2]++ means value of freq[2] is 2
for i=3, the freq location is : freq[49-48]++ = freq[1]++ means value of freq[1] is 2
ASCII value of '0' is 48
ASCII value of '1' is 49
ASCII value of '2'is 50

Logical error-encrypting a message in C++

the code should count each character. If the character is a number, it should count the previous character as much as the number.
So if the input is 'a', it should count 'a' once and assign it to acounter which now is equal to 1.
but if after 'a' is 3, it means 'aaa' and it should count 'a' three times and assign it to acounter which now is equal to 3.
Note: the program is for all of the alphabets but if this one isn't solved then what's the point of writing the rest?
I've tried put another loop exclusively for numbers but it didn't work.
char secret_message[1000];
int counter,number_counter;
int acounter=0;
gets(secret_message);
for (counter = 0 ; secret_message[counter] != NULL ; counter++)
{
if (secret_message[counter]=='a')
acounter++;
if (secret_message[counter] >= '0' && secret_message[counter] <= '9')
{
for(number_counter=1;number_counter<=secret_message[counter];number_counter++)
{
if (secret_message[counter-1]=='a')
acounter++;
}
}
}
cout<<endl<<"acounter is:"<<acounter;
if the input is a3 the output should be 3, but it's 52 !
You'll want to convert the digit from text to number, then use addition:
if (isdigit(secret_message[counter]))
{
const int value = secret_message[counter] - '0';
acounter += value;
}

C++ How to output the letters or numbers from input of letters or numbers

So let's say we have the following case: for ”12323465723” possible answers would be ”abcbcdfegbc” (1 2 3 2 3 4 6 5 7 2 3), ”awwdfegw” (1 23 23 4 6 5 7 23), ”lcwdefgw” (12 3 23 4 6 5 7 23), in this case, the user will input numbers from 1 to 26, not divided by any space and the program itself will suggest 3 ways of interpreting the numbers, getting the most of the combinations from 1 to 26 these being the values from a to z
As you can see this is edited, as this is the last part of the problem, Thank you all who have helped me this far, I've managed to solve half of my problem, only the above mentioned one is left.
SOLVED -> Thank you
This involves a decision between 0 to 2 outcomes at each step. The base cases are there are no more characters or none of them can be used. In the latter case, we backtrack to output the entire tree. We store the word in memory like dynamic programming. This naturally leads to a recursive algorithm.
#include <stdlib.h> /* EXIT */
#include <stdio.h> /* (f)printf */
#include <errno.h> /* errno */
#include <string.h> /* strlen */
static char word[2000];
static size_t count;
static void recurse(const char *const str) {
/* Base case when it hits the end of the string. */
if(*str == '\0') { printf("%.*s\n", (int)count, word); return; }
/* Bad input. */
if(*str < '0' || *str > '9') { errno = ERANGE; return; }
/* Zero is not a valid start; backtrack without output. */
if(*str == '0') return;
/* Recurse with one digit. */
word[count++] = *str - '0' + 'a' - 1;
recurse(str + 1);
count--;
/* Maybe recurse with two digits. */
if((*str != '1' && *str != '2')
|| (*str == '1' && (str[1] < '0' || str[1] > '9'))
|| (*str == '2' && (str[1] < '0' || str[1] > '6'))) return;
word[count++] = (str[0] - '0') * 10 + str[1] - '0' + 'a' - 1;
recurse(str + 2);
count--;
}
int main(int argc, char **argv) {
if(argc != 2)
return fprintf(stderr, "Usage: a.out <number>\n"), EXIT_FAILURE;
if(strlen(argv[1]) > sizeof word)
return fprintf(stderr, "Too long.\n"), EXIT_FAILURE;
recurse(argv[1]);
return errno ? (perror("numbers"), EXIT_FAILURE) : EXIT_SUCCESS;
}
When run on your original input, ./a.out 12323465723, it gives,
abcbcdfegbc
abcbcdfegw
abcwdfegbc
abcwdfegw
awbcdfegbc
awbcdfegw
awwdfegbc
awwdfegw
lcbcdfegbc
lcbcdfegw
lcwdfegbc
lcwdfegw
(I think you have made a transposition in lcwdefgw.)
According to ASCII table we know that from 65 to 90 it A to Z.
so below is the simple logic to achieve what you're trying.
int main(){
int n;
cin>>n;
n=n+64;
char a=(char) n;
if (a>=64 && a<=90)
cout<<a;
else cout<<"Error";
return 0;
}
If you want to count the occurencs of "ab" then this will do it:
int main()
{
char line[150];
int grup = 0;
cout << "Enter a line of string: ";
cin.getline(line, 150);
for (int i = 0; line[i] != '\0'; ++i)
{
if (line[i] == 'a' && line[i+1] == 'b')
{
++grup;
}
}
cout << "Occurences of ab: " << grup << endl;
return 0;
}
If you want to convert an int to an ASCII-value you can do that using this code:
// Output ASCII-values
int nr;
do {
cout << "\nEnter a number: ";
cin >> nr;
nr += 96; // + 96 because the ASCII-values of lower case letters start after 96
cout << (char) nr;
} while (nr > 96 && nr < 123);
Here I use the C style of casting values to keep things simple.
Also bear in mind ASCII-values: ASCII Table
Hope this helps.
This could be an interesting problem and you probably tagged it wrong, There's nothing specific to C++ here, but more on algorithm.
First of all the "decode" method that you described from numerical to alphabetical strings is ambiguious. Eg., 135 could be interpreted as either "ace" or "me". Is this simply an oversight or the intended question?
Suppose the ambiguity is just an oversight, and the user will enter numbers properly separated by say a white space (eg., either "1 3 5" or "13 5"). Let nstr be the numerical string, astr be the alphabetical string to count, then you would
Set i=0, cnt=0.
Read the next integer k from nstr (like in this answer).
Decode k into character ch
If ch == astr[i], increment i
If i == astr.length(), set i=0 and increment cnt
Repeat from 2 until reaching the end of nstr.
On the other hand, suppose the ambiguous decode is intended (the numerical string is supposed to have multiple ways to be decoded), further clarification is needed in order to write a solution. For example, how many k's are there in "1111"? Is it 1 or 2, given "1111" can be decoded either as aka or kk, or maybe even 3, if the counting of k doesn't care about how the entire "1111" is decoded?

Unary operations in C++

I came across a programming question of which I knew only a part of the answer.
int f( char *p )
{
int n = 0 ;
while ( *p != 0 )
n = 10*n + *p++ - '0' ;
return n ;
}
This is what I think the program is doing.
p is a pointer and the while loop is DE-refrencing the values of the pointer until it equals 0. However I don't understand the n assignment line, what is '0' doing? I am assuming the value of p is initially negative, that is the only way it will reach 0 after the increment.
You are confusing the number zero (none, nothing) with the character 0 (a circle, possibly with a slash through it). Notice that zero is in tick marks, so it's the character "0", not the number zero.
'0' - '0' = 0
'1' - '0' = 1
'2' - '0' = 2
...
So by subtracting the character zero from a digit, you get the number that corresponds to that digit.
So, say you have this sequence of digits: '4', '2', '1'. How do you get the number four-hundred and twenty-one from that? You turn the '4' into four. Then you multiply by ten. Now you have fourty. Convert the '2' into two and add it. Now you have fourty-two. Multiply by ten. Convert the '1' into one, and add, now you have four hundred and twenty one.
That's how you convert a sequence of digits into a number.
The n local variable accumulates the value of the decimal number that is passed to this function in the string. This is an implementation of atoi, without the validity checks.
Here is the workings of the loop body:
n = 10*n + *p++ - ‘0';
Assign to n the result of multiplying the prior value of n by ten plus the current character code at the pointer p less the code of zero; increment p after dereferencing.
Since digit characters are encoded sequentially, the *p-'0' expression represents a decimal value of a digit.
Let's say that you are parsing the string "987". As you go through the loop, n starts at zero; then it gets assigned the following values:
n = 10*0 + 9; // That's 9
n = 10*9 + 8; // That's 98
n = 10*98 + 7; // That's 987
It's poorly written, to say the least.
0) Use formatting!:
int f(char* p)
{
int n = 0;
while (*p != 0)
n = 10*n + *p++ - ‘0?;
return n;
}
1) ? there is syntactically invalid. It should probably be a ' as noted by chris (and your existing ‘ is wrong too, but that's probably because you copied it from a website and not a source file), giving:
int f(char* p)
{
int n = 0;
while (*p != 0)
n = 10 * n + *p++ - '0';
return n;
}
2) The parameter type isn't as contrained as it should be. Because *p is never modified (per our goals), we should enforce that to make sure we don't make any mistakes:
int f(const char* p)
{
int n = 0;
while (*p != 0)
n = 10 * n + *p++ - '0';
return n;
}
3) The original programmer was obviously allergic to readable code. Let's split up our operations:
int f(const char* p)
{
int n = 0;
for (; *p != 0; ++p)
{
const int digit = *p - '0';
n = 10 * n + digit;
}
return n;
}
4) Now that the operations are a bit more visible, we can see some independent functionality embedded in this function; this should be factored out (this is called reactoring) into a separate function.
Namely, we see the operation of converting a character to a digit:
int todigit(const char c)
{
// this works because the literals '0', '1', '2', etc. are
// all guaranteed to be in order. Ergo '0' - '0' will be 0,
// '1' - '0' will be 1, '2' - '0' will be 2, and so on.
return c - '0';
}
int f(const char* p)
{
int n = 0;
for (; *p != 0; ++p)
n = 10 * n + todigit(*p);
return n;
}
5) So now it's clear the function reads a string character by character and generates a number digit by digit. This functionality already exists under the name atoi, and this function is an unsafe implementation:
int todigit(const char c)
{
// this works because the literals '0', '1', '2', etc. are
// all guaranteed to be in order. Ergo '0' - '0' will be 0,
// '1' - '0' will be 1, '2' - '0' will be 2, and so on.
return c - '0';
}
int atoi_unsafe(const char* p)
{
int n = 0;
for (; *p != 0; ++p)
n = 10 * n + todigit(*p);
return n;
}
It's left as an exercise to the read to check for overflow, invalid characters (those that aren't digits), and so on. But this should make it much clearer what's going on, and is how such a function should have been written in the first place.
This is a string to number conversion function. Similar to atoi.
A string is a sequence of characters. So "123" in memory would be :
'1','2','3',NULL
p Points to it.
Now, according to ASCII, digits are encoded from '0' to '9'. '0' being assigned the value 48 and '9' being assigned the value 57. As such, '1','2','3',NULL in memory is actually : 49, 50, 51, 0
If you wanted to convert from the character '0' to the integer 0, you would have to subtract 48 from the value in memory. Do you see where this is going?
Now, instead of subtracting the number 48, you subtract '0', which makes the code easier to read.

String (const char*, size_t) to int?

What's the fastest way to convert a string represented by (const char*, size_t) to an int?
The string is not null-terminated.
Both these ways involve a string copy (and more) which I'd like to avoid.
And yes, this function is called a few million times a second. :p
int to_int0(const char* c, size_t sz)
{
return atoi(std::string(c, sz).c_str());
}
int to_int1(const char* c, size_t sz)
{
return boost::lexical_cast<int>(std::string(c, sz));
}
Given a counted string like this, you may be able to gain a little speed by doing the conversion yourself. Depending on how robust the code needs to be, this may be fairly difficult though. For the moment, let's assume the easiest case -- that we're sure the string is valid, containing only digits, (no negative numbers for now) and the number it represents is always within the range of an int. For that case:
int to_int2(char const *c, size_t sz) {
int retval = 0;
for (size_t i=0; i<sz; i++)
retval *= 10;
retval += c[i] -'0';
}
return retval;
}
From there, you can get about as complex as you want -- handling leading/trailing whitespace, '-' (but doing so correctly for the maximally negative number in 2's complement isn't always trivial [edit: see Nawaz's answer for one solution to this]), digit grouping, etc.
Another slow version, for uint32:
void str2uint_aux(unsigned& number, unsigned& overflowCtrl, const char*& ch)
{
unsigned digit = *ch - '0';
++ch;
number = number * 10 + digit;
unsigned overflow = (digit + (256 - 10)) >> 8;
// if digit < 10 then overflow == 0
overflowCtrl += overflow;
}
unsigned str2uint(const char* s, size_t n)
{
unsigned number = 0;
unsigned overflowCtrl = 0;
// for VC++10 the Duff's device is faster than loop
switch (n)
{
default:
throw std::invalid_argument(__FUNCTION__ " : `n' too big");
case 10: str2uint_aux(number, overflowCtrl, s);
case 9: str2uint_aux(number, overflowCtrl, s);
case 8: str2uint_aux(number, overflowCtrl, s);
case 7: str2uint_aux(number, overflowCtrl, s);
case 6: str2uint_aux(number, overflowCtrl, s);
case 5: str2uint_aux(number, overflowCtrl, s);
case 4: str2uint_aux(number, overflowCtrl, s);
case 3: str2uint_aux(number, overflowCtrl, s);
case 2: str2uint_aux(number, overflowCtrl, s);
case 1: str2uint_aux(number, overflowCtrl, s);
}
// here we can check that all chars were digits
if (overflowCtrl != 0)
throw std::invalid_argument(__FUNCTION__ " : `s' is not a number");
return number;
}
Why it's slow? Because it processes chars one-by-one. If we'd had a guarantee that we can access bytes upto s+16, we'd can use vectorization for *ch - '0' and digit + 246.
Like in this code:
uint32_t digitsPack = *(uint32_t*)s - '0000';
overflowCtrl |= digitsPack | (digitsPack + 0x06060606); // if one byte is not in range [0;10), high nibble will be non-zero
number = number * 10 + (digitsPack >> 24) & 0xFF;
number = number * 10 + (digitsPack >> 16) & 0xFF;
number = number * 10 + (digitsPack >> 8) & 0xFF;
number = number * 10 + digitsPack & 0xFF;
s += 4;
Small update for range checking:
the first snippet has redundant shift (or mov) on every iteration, so it should be
unsigned digit = *s - '0';
overflowCtrl |= (digit + 256 - 10);
...
if (overflowCtrl >> 8 != 0) throw ...
Fastest:
int to_int(char const *s, size_t count)
{
int result = 0;
size_t i = 0 ;
if ( s[0] == '+' || s[0] == '-' )
++i;
while(i < count)
{
if ( s[i] >= '0' && s[i] <= '9' )
{
//see Jerry's comments for explanation why I do this
int value = (s[0] == '-') ? ('0' - s[i] ) : (s[i]-'0');
result = result * 10 + value;
}
else
throw std::invalid_argument("invalid input string");
i++;
}
return result;
}
Since in the above code, the comparison (s[0] == '-') is done in every iteration, we can avoid this by calculating result as negative number in the loop, and then return result if s[0] is indeed '-', otherwise return -result (which makes it a positive number, as it should be):
int to_int(char const *s, size_t count)
{
size_t i = 0 ;
if ( s[0] == '+' || s[0] == '-' )
++i;
int result = 0;
while(i < count)
{
if ( s[i] >= '0' && s[i] <= '9' )
{
result = result * 10 - (s[i] - '0'); //assume negative number
}
else
throw std::invalid_argument("invalid input string");
i++;
}
return s[0] == '-' ? result : -result; //-result is positive!
}
That is an improvement!
In C++11, you could however use any function from std::stoi family. There is also std::to_string family.
llvm::StringRef s(c,sz);
int n;
s.getAsInteger(10,n);
return n;
http://llvm.org/docs/doxygen/html/classllvm_1_1StringRef.html
You'll have to either write custom routine or use 3rd party library if you're dead set on avoiding string copy.
You probably don't want to write atoi from scratch (it is still possible to make a bug here), so I'd advise to grab existing atoi from public domain or BSD-licensed code and modify it. For example, you can get existing atoi from FreeBSD cvs tree.
If you run the function that often, I bet you parse the same number many times. My suggestion is to BCD encode the string into a static char buffer (you know it's not going to be very long, since atoi only can handle +-2G) when there's less than X digits (X=8 for 32 bit lookup, X=16 for 64 bit lookup) then place a cache in a hash map.
When you're done with the first version, you can probably find nice optimizations, such as skipping the BCD encoding entirely and just using X characters in the string (when length of string <= X) for lookup in the hash table. If the string is longer, you fallback to atoi.
Edit: ... or fallback instead of atoi to Jerry Coffin's solution, which is as fast as they come.