Enumerating digits - c++

For my book class I'm storing an ISBN number and I need to validate the data entered so I decided to use enumeration. (First three inputs must be single digits, last one must be a letter or digit.) However, I'm wondering if it is even possible to enumerate numbers. Ive tried putting them in as regular integers, string style with double quotes, and char style with single quotes.
Example:
class Book{
public:
enum ISBN_begin{
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
};
enum ISBN_last{
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', a, b, c, d, e, f,
g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z};
The compiler error says expected identifier which means that its seeing the numbers as values for the identifiers and not identifiers themselves. So is enumerating digits possible?

I think you're going about this the wrong way...why not just use a simple regex that will validate the entire thing a bit more simply?
(yes, I know, that wasn't the original question, but it might make your life a lot easier.)
This page and this page provide some good examples on using regex to validate isbn numbers.
I think creating an enumeration whose values are equal to the entities they're enumerating...I think you're doing a lot more than you have to.

Why would you want to enumerate numbers? Enums exist to give numbers a name. If you want to store a number, store an integer - or in this case a char (as you also need characters to be stored). For validation, accept a string and write a function like this:
bool ISBN_Validate(string val)
{
if (!val.length == <length of ISBN>) return false;
if (val[0] < '0' || val[0] > '9') return false;
foreach (char ch in val)
{
if (ch is not between '0' and 'z') return false;
}
}
Easy - and no silly enumerations ;)

#include <ctype.h>
Don't forget the basics. The above include file gives you isalpha(), isdigit(), etc.

I would suggest using a string for each of the begin/end criteria, ie:
string BeginCriteria = "0123456789";
string EndCriteria = "0123456789abcd... so forth";
// Now to validate the input
// for the first three input characters:
if ( BeginCriteria.find( chInput ) != npos )
// Then its good.
// For the last 3 characters:
if ( EndCriteria.find( chInput ) != npos )
// Then its good.

enums really aren't what you want to use. They aren't sets like that. The members of an enum have to be symbols like variables or function names and you can give them values.
enum numbers { One = 1, Two, Three };
One after this is equivalent to a named constant with integer value 1. numbers is equivalent to a new type with a subrange of integer values.
What you probably want is to use a regular expression.

You would have to do something like this:
enum ISBN_begin {
ZERO, ONE, TWO // etc.
};

If you can't use regular expressions, why not just use an array of char instead? You could even use the same array, and just have a const index number where the ISBN_last chars begin in the array.

enums are not defined with literals, they are defined with variables

Some languages (Ada for one) allows what you want, so your request is not too silly. You are simply forgetting that in C and C++, character literals are just another form of integer literals (of type int in C, of type char in C++)

Related

How do I check whether character constants conform to ASCII?

A comment on an earlier version of this answer of mine alerted me to the fact that I can't assume that 'A', 'B', 'C' etc. have successive numeric values. I had sort of assumed the C or C++ language standards guarantee that this is the case.
So, how should I determine whether consecutive letter characters' values are themselves consecutive? Or rather, how can I determine whether the character constants I can express within single quotes have their ASCII codes for a numeric value?
I'm asking how to do this both in C and in C++. Obviously the C way would work in C++ also, but if there's a C++ish facility for doing this I'm interested in that as well. Also, I'm asking about the newest relevant standards (C11, C++17).
You can use the preprocessor to check if a particular character maps to the charset:
#include <iostream>
using namespace std;
int main() {
#if ('A' == 65 && 'Z' - 'A' == 25)
std::cout << "ASCII" << std::endl;
#else
std::cout << "Other charset" << std::endl;
#endif
return 0;
}
The drawback is, you need to know the mapped values in advance.
The numeric chars '0' - '9' are guaranteed to appear in consecutive order BTW.
... (2) I expect to be able to obtain the distance in number-of-letters between two letters ...
This comment specifying your goal makes much more sense than your actual question! Why didn't you ask about that? You can use strchr on an array of characters, and strchr doesn't care what the native character set is, meaning your code won't care what the native character set is... For example:
char alphabet[] = "AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz";
ptrdiff_t fubar = strchr(alphabet, 'y') - strchr(alphabet, 'X');
printf("'X' and 'y' have a distance of %tu and a case difference of %tu\n", fubar / 2, fubar % 2);
... how should I determine whether consecutive letter characters' values are themselves consecutive?
Consecutive letter characters' values are consecutive, by definition, because they're consecutive letter characters. I know this isn't what you meant, but your actual question illustrates a lack of planning and thought, and... a stupid question warrants a stupid answer.
You're much better off programming in such a way that you don't care what values they have. Nonetheless, create an array containing the characters you care about, loop through the elements and test for inconsistencies. For example:
int is_consecutive(char const *alphabet) {
for (size_t x = 0; alphabet[x] && alphabet[x] + 1 == alphabet[x + 1]; x++);
return !alphabet[x];
}
... how can I determine whether the character constants I can express within single quotes have their ASCII codes for a numeric value?
Again with the lack of sense, and again with the caring about values... Alternatively, build two translation tables, native_to_ascii and ascii_to_native, and work it out from there. I won't help you with this, as it's a silly exercise involving the use of magic numbers that most likely aren't necessary for your actual goal.

The meaning of char *p_c = new char['1', '2', '3', '4'];

Consider
char *p_c = new char['1', '2', '3', '4'];
Is this syntax correct? If yes, then what does it do?
I don’t know why, but compiler allows this syntax! What will it do with regards to memory? I am not able to access the variable by *p_c. How does one determine the size of and the number of elements present?
Your code is syntactically valid C++, if rather strange, and I don't think it does what you intended:
new char['1', '2', '3', '4'] is evaluated as new char['4'] due to the way the comma operator works. (The preceding elements are evaluated from left to right, but the value of the expression is that of the rightmost element.)
So your statement is equivalent to char *p_c = new char['4'];
'4' is a char type with a numeric value that depends on the encoding that your platform uses (ASCII, EBCDIC &c. although the former is most likely on a desktop system.).
So the number of elements in the array is whatever '4' evaluates to when converted to a size_t. On an ASCII system the number of elements would be 52.
The syntax for the new expression you used is something like:
identifier = new Type[<expression>];
In the <expression> above, C++ allows any expression whose result is convertible to std::size_t. And for your own expression, you used the comma operator.
<expression> := '1', '2', '3', '4'
which will evaluate every item in the comma list and return the last, which is '4', and that result will be converted to its std::size_t value, probably (52); So, the code is equivalent to:
char* p_c = new char['4'];
char *p_c = new char['1', '2', '3', '4'];
is functionally equivalent to:
char *p_c = new char['4'];
because of comma operator. Comma operator evaluates its operant left to right and discards them except the last one (the right most one).
The character literal '4' has value 52 in ASCII (but your system doesn't have to use ASCII and neither is it required by C or C++ standards-- but almost all modern systems do use ASCII).
So, it's as if you used:
char *p_c = new char[52];

Convert a single character to lowercase in C++ - tolower is returning an integer

I'm trying to convert a string to lowercase, and am treating it as a char* and iterating through each index. The problem is that the tolower function I read about online is not actually converting a char to lowercase: it's taking char as input and returning an integer.
cout << tolower('T') << endl;
prints 116 to the console when it should be printing T.
Is there a better way for me to convert a string to lowercase?
I've looked around online, and most sources say to "use tolower and iterate through the char array", which doesn't seem to be working for me.
So my two questions are:
What am I doing wrong with the tolower function that's making it return 116 instead of 't' when I call tolower('T')
Are there better ways to convert a string to lowercase in C++ other than using tolower on each individual character?
That's because there are two different tolower functions. The one that you're using is this one, which returns an int. That's why it's printing 116. That's the ASCII value of 't'. If you want to print a char, you can just cast it back to a char.
Alternatively, you could use this one, which actually returns the type you would expect it to return:
std::cout << std::tolower('T', std::locale()); // prints t
In response to your second question:
Are there better ways to convert a string to lowercase in C++ other than using tolower on each individual character?
Nope.
116 is indeed the correct value, however this is simply an issue of how std::cout handles integers, use char(tolower(c)) to achieve your desired results
std::cout << char(tolower('T')); // print it like this
It's even weirder than that - it takes an int and returns an int. See http://en.cppreference.com/w/cpp/string/byte/tolower.
You need to ensure the value you pass it is representable as an unsigned char - no negative values allowed, even if char is signed.
So you might end up with something like this:
char c = static_cast<char>(tolower(static_cast<unsigned char>('T')));
Ugly isn't it? But in any case converting one character at a time is very limiting. Try converting 'ß' to upper case, for example.
To lower is int so it returns int. If you check #include <ctype> you will see that definition is int tolower ( int c ); You can use loop to go trough string and to change every single char to lowe case. For example
while (str[i]) // going trough string
{
c=str[i]; // ging c value of current char in string
putchar (tolower(c)); // changing to lower case
i++; //incrementing
}
the documentation of int to_lower(int ch) mandates that ch must either be representable as an unsigned char or must be equal to EOF (which is usually -1, but don't rely on that).
It's not uncommon for character manipulation functions that have been inherited from the c standard library to work in terms of ints. There are two reasons for this:
In the early days of C, all arguments were promoted to int (function prototypes did not exist).
For consistency these functions need to handle the EOF case, which for obvious reasons cannot be a value representable by a char, since that would mean we'd have to lose one of the legitimate encodings for a character.
http://en.cppreference.com/w/cpp/string/byte/tolower
The answer is to cast the result to a char before printing.
e.g.:
std::cout << static_cast<char>(std::to_lower('A'));
Generally speaking to convert an uppercase character to a lowercase, you only need to add 32 to the uppercase character as this number is the ASCII code difference between lowercase and uppercase characters, e.g., 'a'-'A'=97-67=32.
char c = 'B';
c += 32; // c is now 'b'
printf("c=%c\n", c);
Another easy way would be to first map the uppercase character to an offset within the range of English alphabets 0-25 i.e. 'a' is index '0' and 'z' is index '25' inclusive and then remap it to a lowercase character.
char c = 'B';
c = c - 'A' + 'a'; // c is now 'b'
printf("c=%c\n", c);

Why do "strings", i.e. character arrays, have a null-terminating element, whereas integer arrays don't?

From what I understand, character arrays in C/C++ have a null-terminating character for the purpose of denoting an off-the-end element of that array, while integer arrays don't; they have some internal mechanism that is hidden from the user, but they obviously know their own size since the user can do sizeof(myArray)/sizeof(int) (Is that technically a hack?). Wouldn't it make sense for an integer array to have some null-terminating int -- call it i or something?
Why is this? It has never made any sense to me.
Because, in C, strings are not the same as character arrays, they exist at a level above arrays in much the same way as a linked list exists at a level above structures.
This is an example of a string:
"pax is great"
This is an example of a character array:
{ 'p', 'a', 'x' }
This is an example of a character array that just happens to be equivalent to a string:
{ 'p', 'a', 'x', '\0' }
In other words, C string are built on top of character arrays.
If you look at it another way, neither integer arrays nor "real" character arrays (like {'a', 'b', 'c'} for example) have a terminating character.
You can quite easily do the same thing (have a terminator) with an integer array of people's ages, using -1 (or any negative number) as the terminator.
The only difference is that you'll write your own code to handle it rather than using code helpfully provided in the C standard library, things like:
size_t agelen (int *ages) {
size_t len = 0;
while (*ages++ >= 0)
len++;
return len;
}
int *agecpy (int *src, int *dst) {
int *d = dst;
while (*s >= 0)
*d++ = *src++;
*dst = -1;
return dst;
}
Because string does not exists in c.
Because the null terminator is there to mark the end of the input and it doesn't have to be the length of the given array.
This is by convention, treating null as a non-character. Unlike other major system software languages of then e.g. PL/1 which had a leading integer to denote the length of a variable length character string, C was designed to treat strings as simply character arrays and did not want the overhead and in particular any portability issues (such as sizeof int) nor any limitations (what about very long strings). The convention has stuck because it worked out rather well.
To denote end of an int array as you have suggested would require a non-Int marker. That could be rather difficult to arrange. And sizeof an int array as you are figuring out is merely taking advantage of your knowledge of *alloc - there is absolutely nothing in C to prevent you from cobbling together an "array" by clever management of allocated memory. Modern compilers of course contain many convenience checks on wayward code and someone with better knowledge of compilers could clarify/rectify my comments here. C++ Vector contains an explicit knowledge of array capacity, for example.
A lot of places you can see a different Field Separator FS character used to separate out strings. E.g., CSV. But if you were to do that, you will need to write you own std libraries - thousands and thousands of lines of good, tested code.
A C-Style string is a collection of characters terminated by '\0'. It is not an array.
The collection can be indexed like an array.
Because the length of the collection can vary, the length must be determined by counting the number of characters in the collection.
A convenient representation is an array because an array is also a collection.
One difference is that an array is a fixed sized data structure. The collection of characters may not be a fixed size; for example, it can be concatenated.
If you think about the problem of how to represent strings, you have two choices: 1) store a count of letters followed by the letters or 2) store the letters followed by some unique special character used as an end of string marker.
End of string marker is more flexible - longer strings possible, easier to use, etc.
BTW you can have terminator on an int array if you want... Nothing stopping you saying that a -1 for example means the end if the list, as long as you are sure that the -1 is unique.

C/C++: Inherent ambiguity of "\xNNN" format in literal strings

Consider these two strings:
wchar_t* x = L"xy\x588xla";
wchar_t* y = L"xy\x588bla";
Upon reading this you would expect that both string literals are the same except one character - an 'x' instead of a 'b'.
It turns out that this is not the case. The first string compiles to:
y = {'x', 'y', 0x588, 'x', 'l', 'a' }
and the second is actually:
x = {'x', 'y', 0x588b, 'l', 'a' }
They are not even the same length!
Yes, the 'b' is eaten up by the hex representation ('\xNNN') character.
At the very least, this could cause confusion and subtle bugs for in hand-written strings (you could argue that unicode strings don't belong in the code body)
But the more serious problem, and the one I am facing, is in auto-generated code. There just doesn't seem to be any way to express this: {'x', 'y', 0x588, 'b', 'l', 'a' } as a literal string without resorting to writing the entire string in hex representation, which is wasteful and unreadable.
Any idea of a way around this?
What's the sense in the language behaving like this?
A simple way is to use compile time string literal concatenation, thus:
wchar_t const* y = L"xy\x588" L"bla";