sorting a string using counting sort - c++

I looked at the counting sort algorithm to sort a string here: https://www.geeksforgeeks.org/counting-sort/. I have a few questions:
#define RANGE 255
What is the function of RANGE? Why do we have to specifically define the RANGE to 255?
int count[RANGE + 1], i;
Why do we have to declare the size of count[] as RANGE+1? Why couldn't it be just 256?
// Store count of each character
for(i = 0; arr[i]; ++i)
++count[arr[i]];
The array stores the count of the specified digit, but here we have characters in a string, so how does the above code convert the characters to numeric equivalents to be stored in the array?

I would not use that code as a learning example. There are so many errors in it that I wasn't even able to properly compile it.
What is the function of RANGE?
When you compile your code a special program called a preprocessor runs beforehand. The preprocessor essentially replaces a lot of things. It usually does this based on statements called preprocessor directives and they begin with the "#" symbol. In this case, #define RANGE 255 is telling the preprocessor to replace every occurence of "RANGE" in the code with "255". For example, the line int count[RANGE + 1], i becomes int count[255 + 1], i.
Why do we have to specifically define the RANGE to be 255?
To be completely honest I'm not sure why the code decided to use 255 for RANGE. I've tested the code and it works just fine with RANGE equal to 114 and it doesn't work with numbers. If you increase the length of the input string "geeksforgeeks" to something much larger then RANGE won't be sufficiently large enough.
How does the code convert the characters to numeric equivalents to be stored in the count array?
The char data type is actually an integer. Every character we use (A to Z, 0 to 9, punctuation, etc) all has a corresponding number. For example the code below will print out the number which corresponds to a which is "97".
#include <iostream>
int main()
{
char name[] = "abc";
int a = name[0];
std::cout << a;
return 0;
}
The line ++count[arr[i]]; simply accesses element i in the array arr and appends it to the count integer array which it can do because char is an integer. Once we have it in the count integer array it is treated like a normal integer and when we print it out in the console it shows us a number rather than a character.

RANGE defines possible keys for counters. Which are 0..RANGE. It might be arbitrary, but 255 is for having exactly 256 possible values. The same as the number of distinct characters.
So we have possible keys 0..255. That is exactly 256. You can hardcode it like this. But since RANGE is arbitrary, you may want to change it to 512 for example. In that case, you will need to change the size too.
From a logic point string consists of characters, but it is only our minds representation. For C++ string is an array of char type. Which is an integer type. Since the international part of ASCII table uses only values 0..127. We can safely use these values as array indexes.

Related

C++ Turning Character types into int type

So I read and was taught that subtracting '0' from my given character turns it into an int, however my Visual Studio isn't recognizing that here, saying a value of type "const char*" cannot be used to initialize an entity of type int in C++ programming here.
bigint::bigint(const char* number) : bigint() {
int number1 = number - '0'; // error code
for (int i = 0; number1 != 0 ; ++i)
{
digits[i] = number1 % 10;
number1 /= 10;
digits[i] = number1;
}
}
The goal of the first half is to simply turn the given number into a type int. The second half is outputting that number backwards with no leading zeroes. Please note this function is apart of the class declared given in a header file here:
class bigint {
public:
static const int MAX_DIGITS = 50;
private:
int digits[MAX_DIGITS];
public:
// constructors
bigint();
bigint(int number);
bigint(const char * number);
}
Is there any way to convert the char parameter to an int so I can then output an int? Without using the std library or strlen, since I know there is a way to use the '0' char but I can't seem to be doing it right.
You can turn a single character in the range '0'..'9' into a single digit 0..9 by subtracting '0', but you cannot turn a string of characters into a number by subtracting '0'. You need a parsing function like std::stoi() to do the conversion work character-by-character.
But that's not what you need here. If you convert the string to a number, you then have to take the number apart. The string is already in pieces, so:
bigint::bigint(const char* number) : bigint() {
while (number) // keep looping until we hit the string's null terminator
{
digits[i] = number - '0'; // store the digit for the current character
number++; // advance the string to the next character
}
}
There could be some extra work involved in a more advanced version, such as sizing digits appropriately to fit the number of digits in number. Currently we have no way to know how many slots are actually in use in digits, and this will lead to problems later when the program has to figure out where to stop reading digits.
I don't know what your understanding is, so I will go over everything I see in the code snippet.
First, what you're passing to the function is a pointer to a char, with const keyword making the char immutable or "read only" if you prefer.
A char is actually a 8-bit sized 1 integer. It can store a numerical value in binary form, which can be also interpreted as a character.
Fundamental types - cppreference.com
Standard also expects char to be a "type for character representation". It could be represented in ASCII code, but it could be something else like EBCDIC maybe, I'm not sure. For future reference just remember that ASCII is not guaranteed, although you're likely to never use a system where it's no ASCII (if I'm correct). But it's not so much that char is somehow enforcing encoding - it's the functions that you pass those chars and char pointers to, that interpret their content as characters in ASCII encoding, while on some obscure or legacy platforms they could actually interpret them as characters in some less common encoding. Standard however demands that encoding used has this property: codes for characters '0' to '9' are subsequent, and thus '9' - '0' means: subtract code of '0' from code of '9'. The result is 9, because code for '9' is 9 positions from code for '0' in ASCII. Ranges 'a'-'z' and 'A'-'Z' have this quality as well, in case you need that, but it's a little bit trickier if your input is in base higher than 10, like a popular base of 16 called hexadecimal.
A pointer stores an address, so the most basic functionality for it is to "point" to a variable. But it can be used in various ways, one of which, very frequent in C, is to store address of the beginning of an array of variables of the same type. Those could be chars. We could interpret such an array as a line of text, or a string (a concept, not to be confused with C++ specific string class).
Since a pointer does not contain information on length or end of such an array, we need to get that information across to the function we pass the pointer to. Sometimes we can just provide the length, sometimes we provide the end pointer. When dealing with "lines of text" or c-style strings, we use (and c standard library functions expect) what is callled a null-terminated string. In such a string, the first char after the last one used for a line is a null, which is, to simplify, basically a 0. A 0, but not a '0'.
So what you're passing to the function, and what you interpret as, say 416, is actually a pointer to a place in memory where '4' is econded and stored as a number, followed by '1' and then '6', taking up three bytes. And depending on how you obtained this line of text, '6' is probably followed by a NULL, that is - a zero.
NULL - cppreference.com
Conversion of such a string to a number first requires a data type able to hold it. In case of 416 it could be anything from short upwards. If you wanted to do that on your own, you would need to iterate over entire line of text and add the numbers multiplied by proper powers of 10, take care of signedness too and maybe check if there are any edge cases. You could however use a standard function like int atoi (const char * str);
atoi - cplusplus.com
Now, that would be nice of course, but you're trying to work with "bigints". However you define them, it means your class' purpose is to deal with numbers to big to be stored in built-in types. So there is no way you can convert them just like that.
What you're trying to do right now seems to be a constructor that creates a bigint out of number represented as a c style string. How shall I put it... you want to store your bigint internally as an array of it's digits in base 10 (a good choice for code simplicity, readability and maintainability, as well as interoperation with base 10 textual representation, but it doesn't make efficient use of memory and processing power.) and your input is also an array of digits in base 10, except internally you're storing numbers as numbers, while your input is encoded characters. You need to:
sanitize the input (you need criteria for what kind of input is acceptable, fe. if there can be any leading or trailing whitespace, can the number be followed by any non-numerical characters to be discarded, how to represent signedness, is + for positive numbers optional or forbidden etc., throw exception if the input is invalid.
convert whatever standard you enforce for your input into whatever uniform standard you employ internally, fe. strip leading whitespace, remove + sign if it's optional and you don't use it internally etc.
when you know which positions in your internal array correspond with which positions in the input string, you can iterate over it and copy every number, decoding it first from ASCII.
A side note - I can't be sure as to what exactly it is that you expect your input to be, because it's only likely that it is a textual representation - as it could just as easily be an array of unencoded chars. Of course it's obviously the former, which I know because of your post, but the function prototype (the line with return type and argument types) does not assure anyone about that. Just another thing to be aware of.
Hope this answer helped you understand what is happening there.
PS. I cannot emphasize strongly enough that the biggest problem with your code is that even if this line worked:
int number1 = number - '0'; // error code
You'd be trying to store a number on the order of 10^50 into a variable capable of holding on the order of 10^9
The crucial part in this problem, which I have a vague feeling you may have found on spoj.com is that you're handling BIGints. Integers too big to be stored in a trivial manner.
1 ) The standard does not actually require for char to be this size directly, but indirectly it requires for it to be at least 8 bits, possibly more on obscure platforms. And yes, I think there were some platforms where it was indeed over 8 bits. Same thing with pointers that may behave strange on obscure architectures.

Character array and its memory allocation in C++

I am bit confused after reading a text book. Consider a character array ar[10] in C++. In the text book it says that 10 bytes will be allocated for the array.
Starting from subscript ar[0], how many elements can I store in the given array? Is it 10? If yes can I store data at ar[10]? I want to know how many bytes will be allocated for the array in total since I came to know that every string ends with \0. Will overflow happen if I try to store a character into ar[10]?
If yes can I store data at ar[10]
No.
In your example, ar is an array with ten values. The first value is index #0, so you have ar[0] through ar[9], inclusively. That's the ten values in this array. Count them. Most of us conveniently have exactly ten fingers. Start counting on your fingers, starting with ar[0], and stop when you've used all your ten fingers. You'll stop on ar[9].
Attempting to access ar[10] is undefined behavior.
It will store 10 items in total, including the '\0'. So, 9 characters, and one '\0' null terminator at ar[9].
You can store ten values, from index 0 to index 9. This seems really wrong at first, but remember that 0 is technically a value and must be counted as one. It's sort of like how unsigned ints will hold 2^32 values, but the highest usable number is actually (2^32)-1.
Note that if you want to have the array be null-terminated you will only be able to store 9 characters, as ar[9] will hold '\0'. You could store another character there instead, but will have to write your code around the fact that your C-string is not null-terminated.
That all said, it's generally considered bad practice to use character arrays for strings in C++. It's a lot more error-prone than just using the string standard library.
More info: http://www.cplusplus.com/reference/string/string/
Hence, you have declared a[10] so it carries 10 values. As it is char array which contains string and string is terminated by '\0'. '\0' is also a value.
So if you string length is n then your array size will be n+1 to keep n length string. Otherwise, the overflow will occur.
Observe the following example
int main(){
char a[1], r, t;
printf("Size %d Byte\n", sizeof(a));
a[0] ='a';
a[1] ='b';
a[2] ='c';
printf("%c\n",r); //c
printf("%c\n",t); //b
}
As your array size is 1. Though you have not assigned value of r,t it is auto assigned by a[2] and a[1] respectively.

Is it possible to Store 0 as the first digit in integer variable

I want to assign a value of 097 to an integer variable. I don't want it to get implicitly converted to 97. Is this possible?
int i=097;
cout<<i;
OUTPUT as 097 : Possible?
I need to put the value in the link list in reverse order. So if the user is inputting 097 I need to parse it digit wise and store in link list as 7->9->0. Its not the exact program but its something I am trying to achieve. There can be other ways like using arrays and all. But I was just wondering if I can parse 0 via using int variable.
No, this is not possible.
Integers are stored in binary, not as individual digits. Therefore, all information not related to the value of the integer is not stored.
Perhaps you would like to store your value in a string instead?
It doesn't make sense to say that the value of an int is 97 or 097. What you want is a way to format your output. To do that, can std::setwidth and std::setfill.
cout << setwidth(3) << setfill('0') << i;
Not with an integer variable. To achieve this, you could either use a string or some other method of tracking how many leading zeros the number should have.
Incidentally, a leading zero in a C++ integer literal turns it into an octal literal. This makes your program malformed since it's trying to use a non-octal digit 9 in an octal literal.
As others have suggested, try using a string to display the value.Use to_string function to convert the integer to a string and just insert a leading zero at the beginning.
int i = 97;
std::string s = std::to_string(i);
s.insert(0,"0");
EDIT: You can then store the digits into the list by iterating through the string.

how compilers detect overflow in numbers while compiling?

Compiler deal with source code as strings so in C++ for example when it encourage statement like unsigned char x = 150; it knows from type limits that unsigned char must be in range between 0 and 255.
My question is while the number 150 remain string what algorithm compiler use to compare digit sequence - 150 in this case - against type limits?
I made a simple algorithm to do that for type 'int' for decimal, octal, hexadecimal and little endian binary but i don't think compiler do such thing like that to detect overflow in numbers.
the algorithm i made are coded in C++:
typedef signed char int8;
typedef signed int int32;
#define DEC 0
#define HEX 1
#define OCT 2
#define BIN 3
bool isOverflow(const char* value, int32 base)
{
// left-most digit for maximum and minimum number
static const char* max_numbers[4][2] =
{
// INT_MAX INT_MIN
{ "2147483647", "2147483648" }, // decimal
{ "7fffffff", "80000000" }, // hexadecimal
{ "17777777777", "20000000000" }, // octal
{ "01111111111111111111111111111111", "10000000000000000000000000000000" } // binary
};
// size of strings in max_numbers array
static const int32 number_sizes[] = { 10, 8, 11, 32 };
// input string size
int32 str_len = strlen(value);
// is sign mark exist in input string
int32 signExist = ((base == DEC || base == OCT) && *value == '-');
// first non zero digit in input number
int32 non_zero_index = signExist;
// locate first non zero index
while(non_zero_index < str_len && value[non_zero_index] == 0) non_zero_index++;
// if non_zero_index equal length then all digits are zero
if (non_zero_index == str_len) return false;
// get number of digits that actually represent the number
int32 diff = str_len - non_zero_index;
// if difference less than 10 digits then no overflow will happened
if (diff < number_sizes[base]) return false;
// if difference greater than 10 digits then overflow will happened
if (diff > number_sizes[base]) return true;
// left digit in input and search strings
int8 left1 = 0, left2 = 0;
// if digits equal to 10 then loop over digits from left to right and compare
for (int32 i = 0; non_zero_index < str_len; non_zero_index++, i++)
{
// get input digit
left1 = value[non_zero_index];
// get match digit
left2 = max_numbers[signExist][i];
// if digits not equal then if left1 is greater overflow will occurred, false otherwise
if (left1 != left2) return left1 > left2;
}
// overflow won't happened
return false;
}
This algorithm can be optimized to work with all integers types but with float-point i have to make new one to work with IEEE float-point representation.
i think compilers use efficient algorithm to detect overflow other than mine, don't you?
Compilers handle it pretty much the easiest possible way: they convert the number to an integer or float as appropriate. There's no law that says the compiler can't convert from strings to some other representation as appropriate.
But now, consider your original problem; what about if you took the digits and just built routines to treat them as numbers? Say, for example, an algorithm that could take
6 + 5
and compute the sum as a two-digit string 11? Extend that to other operations and you could compute whether 32769 is greater than 32768 directly.
It seems simplest for the compiler to convert the string representation into an integer in one step, and then compare against upper and lower bounds of the type in a secondary step.
I can't imagine why it would be better to compare strings.
For floats, the problem is harder due to precision and rounding.
I'm not sure what particular algorithms most compliers employ to do this, but here are a few options that could work:
The compiler could try using an existing library (for example, in C++, a stringstream) to try to convert the string into the number of the appropriate type. This could then be used to check for errors.
The compiler could convert the string into a very high-precision number format (for example, a 128-bit integer) and then check, whenever an assignment is made from a numeric literal to a primitive type, whether the value could fit in that range without a cast.
Seeing that compilers will have to convert to the integral/numeric type anyway, they can just as well let their atoi, atol, atof functions raise an error when the destination capacity gets exceeded.
There is no need to operate on strings beforehand, and convert in a separate step.
Most likely, I'd think, compilers will convert to integral types directly in their (highly optimized) parser's semantic actions.
In most compiler theory, the text of a program (translation unit) is converted into tokens and values. For example, the text "150" would be converted into a token of constant integer with a value of 150. This is of course, after the preprocessor has run.
The compiler then begins the process of syntax and semantic checking. So an assignment statement is evaluated for syntax (correct spelling and format), then checked for semantics.
The compiler can either complain about a value that is out of range (such as -150 for unsigned char) or apply some transformations. In the case of -150, this would be transformed into an 8-bit value (the Most Significant Bit that indicated negativity is now the value 128). I am not a language lawyer, so I don't exactly know the freedom the compiler has in this respect, nor whether a warning is required or not.
In summary, the compiler has some freedoms when evaluating statements and checking the semantics. All text is converted into an internal representation for tokens and values (a more compact data structure). Checking for whether a constant integer literal is within range for an assignment statement takes place during the semantics stage of the compilation process. Semantics are decided from the language standard or company policy. Some semantics are turned into compiler options and left for the programmer.

Non-Integer numbers in an String and using atoi

If there are non-number characters in a string and you call atoi [I'm assuming wtoi will do the same]. How will atoi treat the string?
Lets say for an example I have the following strings:
"20234543"
"232B"
"B"
I'm sure that 1 will return the integer 20234543. What I'm curious is if 2 will return "232." [Thats what I need to solve my problem]. Also 3 should not return a value. Are these beliefs false? Also... if 2 does act as I believe, how does it handle the e character at the end of the string? [Thats typically used in exponential notation]
You can test this sort of thing yourself. I copied the code from the Cplusplus reference site. It looks like your intuition about the first two examples are correct, but the third example returns '0'. 'E' and 'e' are treated just like 'B' is in the second example also.
So the rules are
On success, the function returns the converted integral number as an int value.
If no valid conversion could be performed, a zero value is returned.
If the correct value is out of the range of representable values, INT_MAX or INT_MIN is returned.
According to the standard, "The functions atof, atoi, atol, and atoll need not affect the value of the integer expression errno on an error. If the value of the result cannot be represented, the behavior is undefined." (7.20.1, Numeric conversion functions in C99).
So, technically, anything could happen. Even for the first case, since INT_MAX is guaranteed to be at least 32767, and since 20234543 is greater than that, it could fail as well.
For better error checking, use strtol:
const char *s = "232B";
char *eptr;
long value = strtol(s, &eptr, 10); /* 10 is the base */
/* now, value is 232, eptr points to "B" */
s = "20234543";
value = strtol(s, &eptr, 10);
s = "123456789012345";
value = strtol(s, &eptr, 10);
/* If there was no overflow, value will contain 123456789012345,
otherwise, value will contain LONG_MAX and errno will be ERANGE */
If you need to parse numbers with "e" in them (exponential notation), then you should use strtod. Of course, such numbers are floating-point, and strtod returns double. If you want to make an integer out of it, you can do a conversion after checking for the correct range.
atoi reads digits from the buffer until it can't any more. It stops when it encounters any character that isn't a digit, except whitespace (which it skips) or a '+' or a '-' before it has seen any digits (which it uses to select the appropriate sign for the result). It returns 0 if it saw no digits.
So to answer your specific questions: 1 returns 20234543. 2 returns 232. 3 returns 0. The character 'e' is not whitespace, a digit, '+' or '-' so atoi stops and returns if it encounters that character.
See also here.
If atoi encounters a non-number character, it returns the number formed up until that point.
I tried using atoi() in a project, but it wouldn't work if there were any non-digit characters in the mix and they came before the digit characters - it'll return zero. It seems to not mind if they come after the digits, for whatever reason.
Here's a pretty bare bones string to int converter I wrote up that doesn't seem to have that problem (bare bones in that it doesn't work with negative numbers and it doesn't incorporate any error handling, but it might be helpful in specific instances). Hopefully it might be helpful.
int stringToInt(std::string newIntString)
{
unsigned int dataElement = 0;
unsigned int i = 0;
while ( i < newIntString.length())
{
if (newIntString[i]>=48 && newIntString[i]<=57)
{
dataElement += static_cast<unsigned int>(newIntString[i]-'0')*(pow(10,newIntString.length()-(i+1)));
}
i++;
}
return dataElement;
}
I blamed myself up to this atoi-function behaviour when I was learning-approached coding program with function calculating integer factorial result given input parameter by launching command line parameter.
atoi-function returns 0 if value is something else than numeral value and "3asdf" returns 3. C -language handles command line input parameters in char -array pointer variable as we all already know.
I was told that down at the book "Linux Hater's Handbook" there's some discussion appealing for computer geeks doesn't really like atoi-function, it's kind of foolish in reason that there's no way to check validity of given input type.
Some guy asked me why I don't brother to use strtol -function located on stdlib.h -library and he gave me an example attached to my factorial-calculating recursive method but I don't care about factorial result is bigger than integer primary type value -range, out of ranged (too large base number). It will result in negative values in my program.
I solved my problem with atoi-function first checking if given user's input parameter is truly numerical value and if that matches, after then I calculate the factorial value.
Using isdigit() -function located on chtype.h -library is following:
int checkInput(char *str[]) {
for (int x = 0; x < strlen(*str); ++x)
{
if (!isdigit(*str[x])) return 1;
}
return 0;
}
My forum-pal down in other Linux programming forum told me that if I would use strtol I could handle the situations with out of ranged values or even parse signed int to unsigned long -type meaning -0 and other negative values are not accepted.
It's important upper on my code check if charachter is not numerical value. Negotation way to check this one the function returns failed results when first numerical value comes next to check in string. (or char array in C)
Writing simple code and looking to see what it does is magical and illuminating.
On point #3, it won't return "nothing." It can't. It'll return something, but that something won't be useful to you.
http://www.cplusplus.com/reference/clibrary/cstdlib/atoi/
On success, the function returns the converted integral number as an int value.
If no valid conversion could be performed, a zero value is returned.
If the correct value is out of the range of representable values, INT_MAX or INT_MIN is returned.