Using isalnum with signed character inputs - Visual C++ - c++

I have a very simple program where I am using the isalnum function to check if a string contains alpha-numeric characters. The code is:
#include "stdafx.h"
#include <iostream>
#include <string>
#include <locale>
using namespace std;
int _tmain(int argc, _TCHAR* argv[]) {
string test = "(…….";
for ( unsigned int i = 0; i < test.length(); i++) {
if (isalnum(test[i])) {
cout << "True: " << test[i] << " " << (int)test[i] << endl;
}
else {
cout << "False: " << isalnum(test[i]) << test[i] << " " << (int)test[i] << endl;
}
}
return 0;
}
I am using Visual Studio Desktop Edition 2013 for this snippet.
The issue(s):
1. When this program is run in Debug mode, the program fails with a debug assertion that says: "Expression c >= -1 && c <= 255"
Printing the character at the ith position results in a negative integer (-123). Converting all calls to isalnum to accept unsigned char as input causes the above error to disappear.
I checked the documentation for isalnum and it accepts arguments of type char. Then why does this code snippet fail? I am sure I am missing something trivial here but any help is welcome.

The isalnum function is declared in <cctype> (the C++ version of <ctype.h>) -- which means you really should have #include <cctype> at the top of your source file. You're getting away with calling it without the #include directive because either "stdafx.h" or one of the standard headers (likely <locale>) includes it -- but it's a bad idea to depend on that.
isalnum and friends come from C. The isalnum function takes an argument of type int, which must be either within the range of unsigned char or equal to EOF (which is typically -1). If the argument has any other value, the behavior is undefined.
Annoyingly, this means that if plain char happens to be signed, passing a char value to isalnum causes undefined behavior if the value happens to be negative and not equal to EOF. The signedness of plain char is implementation-defined; it seems to be signed on most modern systems.
C++ adds a template function isalnum that takes an argument of any character type and a second argument of type std::locale. Its declaration is:
template <class charT> bool isalnum (charT c, const locale& loc);
I'm fairly sure that this version of isalnum doesn't suffer from the same problem as the one in <cctype>. You can pass it a char value and it will handle it correctly. You can also pass it an argument of some wide character type like wchar_t. But it requires two arguments. Since you're only passing one argument to isalnum(), you're not using this version; you're using the isalnum declared in <cctype>.
If you want to use this version, you can pass the default locale as the second argument:
std::isalnum(test[i], std::locale())
Or, if you're sure you're only working with narrow characters (type char), you can cast the argument to unsigned char:
std::isalnum(static_cast<unsigned char>(test[i]))

The problem is that characters are signed by default, and anything over 0x7f is being treated as a negative number when passed to isalnum. Make this simple change:
if (isalnum((unsigned char)test[i])) {
Microsoft's documentation clearly states that the parameter is int, not char. I believe you're getting confused with a different version of isalnum that comes from the locale header. I don't know why the function doesn't accept sign-extended negative numbers, but suspect that it's based on wording in the standard.

Related

Sign & Unsigned Char is not working in C++

In C++ Primer 5th Edition I saw this
when I tried to use it---
At this time it didn't work, but the program's output did give a weird symbol, but signed is totally blank And also they give some warnings when I tried to compile it. But C++ primer and so many webs said it should work... So I don't think they give the wrong information did I do something wrong?
I am newbie btw :)
But C++ primer ... said it should work
No it doesn't. The quote from C++ primer doesn't use std::cout at all. The output that you see doesn't contradict with what the book says.
So I don't think they give the wrong information
No1.
did I do something wrong?
It seems that you've possibly misunderstood what the value of a character means, or possibly misunderstood how character streams work.
Character types are integer types (but not all integer types are character types). The values of unsigned char are 0..255 (on systems where size of byte is 8 bits). Each2 of those values represent some textual symbol. The mapping from a set of values to a set of symbols is called a "character set" or "character encoding".
std::cout is a character stream. << is stream insertion operator. When you insert a character into a stream, the behaviour is not to show the numerical value. Instead, the behaviour to show the symbol that the value is mapped to3 in the character set that your system uses. In this case, it appears that the value 255 is mapped to whatever strange symbol you saw on the screen.
If you wish to print the numerical value of a character, what you can do is convert to a non-character integer type and insert that to the character stream:
int i = c;
std::cout << i;
1 At least, there's no wrong information regarding your confusion. The quote is a bit inaccurate and outdated in case of c2. Before C++20, the value was "implementation defined" rather than "undefined". Since C++20, the value is actually defined, and the value is 0 which is the null terminator character that signifies end of a string. If you try to print this character, you'll see no output.
2 This was bit of a lie for simplicity's sake. Some characters are not visible symbols. For example, there is the null terminator charter as well as other control characters. The situation becomes even more complex in the case of variable width encodings such as the ubiquitous Unicode, where symbols may consist of a sequence of several char. In such encoding, and individual char cannot necessarily be interpreted correctly without other char that are part of such sequence.
3 And this behaviour should feel natural once you grok the purpose of character types. Consider following program:
unsigned char c = 'a';
std::cout << c;
It would be highly confusing if the output would be a number that is the value of the character (such as 97 which may be the value of the symbol 'a' on the system) rather than the symbol 'a'.
For extra meditation, think about what this program might print (and feel free to try it out):
char c = 57;
std::cout << c << '\n';
int i = c;
std::cout << i << '\n';
c = '9';
std::cout << c << '\n';
i = c;
std::cout << i << '\n';
This is due to the behavior of the << operator on the char type and the character stream cout. Note, the << is known as formatted output means it does some implicit formatting.
We can say that the value of a variable is not the same as its representation in certain contexts. For example:
int main() {
bool t = true;
std::cout << t << std::endl; // Prints 1, not "true"
}
Think of it this way, why would we need char if it would still behave like a number when printed, why not to use int or unsigned? In essence, we have different types so to have different behaviors which can be deduced from these types.
So, the underlying numeric value of a char is probably not what we looking for, when we print one.
Check this for example:
int main() {
unsigned char c = -1;
int i = c;
std::cout << i << std::endl; // Prints 255
}
If I recall correctly, you're somewhat close in the Primer to the topic of built-in types conversions, it will bring in clarity when you'll get to know these rules better. Anyway, I'm sure, you will benefit greatly from looking into this article. Especially the "Printing chars as integers via type casting" part.

C++ toupper Syntax

I've just been introduced to toupper, and I'm a little confused by the syntax; it seems like it's repeating itself. What I've been using it for is for every character of a string, it converts the character into an uppercase character if possible.
for (int i = 0; i < string.length(); i++)
{
if (isalpha(string[i]))
{
if (islower(string[i]))
{
string[i] = toupper(string[i]);
}
}
}
Why do you have to list string[i] twice? Shouldn't this work?
toupper(string[i]); (I tried it, so I know it doesn't.)
toupper is a function that takes its argument by value. It could have been defined to take a reference to character and modify it in-place, but that would have made it more awkward to write code that just examines the upper-case variant of a character, as in this example:
// compare chars case-insensitively without modifying anything
if (std::toupper(*s1++) == std::toupper(*s2++))
...
In other words, toupper(c) doesn't change c for the same reasons that sin(x) doesn't change x.
To avoid repeating expressions like string[i] on the left and right side of the assignment, take a reference to a character and use it to read and write to the string:
for (size_t i = 0; i < string.length(); i++) {
char& c = string[i]; // reference to character inside string
c = std::toupper(c);
}
Using range-based for, the above can be written more briefly (and executed more efficiently) as:
for (auto& c: string)
c = std::toupper(c);
As from the documentation, the character is passed by value.
Because of that, the answer is no, it shouldn't.
The prototype of toupper is:
int toupper( int ch );
As you can see, the character is passed by value, transformed and returned by value.
If you don't assign the returned value to a variable, it will be definitely lost.
That's why in your example it is reassigned so that to replace the original one.
As many of the other answers already say, the argument to std::toupper is passed and the result returned by-value which makes sense because otherwise, you wouldn't be able to call, say std::toupper('a'). You cannot modify the literal 'a' in-place. It is also likely that you have your input in a read-only buffer and want to store the uppercase-output in another buffer. So the by-value approach is much more flexible.
What is redundant, on the other hand, is your checking for isalpha and islower. If the character is not a lower-case alphabetic character, toupper will leave it alone anyway so the logic reduces to this.
#include <cctype>
#include <iostream>
int
main()
{
char text[] = "Please send me 400 $ worth of dark chocolate by Wednesday!";
for (auto s = text; *s != '\0'; ++s)
*s = std::toupper(*s);
std::cout << text << '\n';
}
You could further eliminate the raw loop by using an algorithm, if you find this prettier.
#include <algorithm>
#include <cctype>
#include <iostream>
#include <utility>
int
main()
{
char text[] = "Please send me 400 $ worth of dark chocolate by Wednesday!";
std::transform(std::cbegin(text), std::cend(text), std::begin(text),
[](auto c){ return std::toupper(c); });
std::cout << text << '\n';
}
toupper takes an int by value and returns the int value of the char of that uppercase character. Every time a function doesn't take a pointer or reference as a parameter the parameter will be passed by value which means that there is no possible way to see the changes from outside the function because the parameter will actually be a copy of the variable passed to the function, the way you catch the changes is by saving what the function returns. In this case, the character upper-cased.
Note that there is a nasty gotcha in isalpha(), which is the following: the function only works correctly for inputs in the range 0-255 + EOF.
So what, you think.
Well, if your char type happens to be signed, and you pass a value greater than 127, this is considered a negative value, and thus the int passed to isalpha will also be negative (and thus outside the range of 0-255 + EOF).
In Visual Studio, this will crash your application. I have complained about this to Microsoft, on the grounds that a character classification function that is not safe for all inputs is basically pointless, but received an answer stating that this was entirely standards conforming and I should just write better code. Ok, fair enough, but nowhere else in the standard does anyone care about whether char is signed or unsigned. Only in the isxxx functions does it serve as a landmine that could easily make it through testing without anyone noticing.
The following code crashes Visual Studio 2015 (and, as far as I know, all earlier versions):
int x = toupper ('é');
So not only is the isalpha() in your code redundant, it is in fact actively harmful, as it will cause any strings that contain characters with values greater than 127 to crash your application.
See http://en.cppreference.com/w/cpp/string/byte/isalpha: "The behavior is undefined if the value of ch is not representable as unsigned char and is not equal to EOF."

isdigit raises a debug assertion when entering £ and ¬

The below code works for every character I type in except for £ or ¬.
Why do I get a "debug assertion fail"?
#include <iostream>
#include <string>
#include <cctype>
using namespace std;
int main() {
string input;
while (1) {
cout << "Input number: ";
getline(cin, input);
if (!isdigit(input[0]))
cout << "not a digit\n";
}
}
The microsoft docs say:
The C++ compiler treats variables of type char, signed char, and unsigned char as having different types. Variables of type char are promoted to int as if they are type signed char by default, unless the /J compilation option is used. In this case they are treated as type unsigned char and are promoted to int without sign extension.
And they also say:
The behavior of isdigit and _isdigit_l is undefined if c is not EOF or in the range 0 through 0xFF, inclusive. When a debug CRT library is used and c is not one of these values, the functions raise an assertion.
So char is per default signed, which means as those two characters are not ASCII they are negative in your ANSI charset, and thus you get the assertion.
The Microsoft docs say:
The behavior of isdigit and _isdigit_l is undefined if c is not EOF or in the range 0 through 0xFF, inclusive. When a debug CRT library is used and c is not one of these values, the functions raise an assertion.
(I'm guessing Microsoft because of the comment about an "error window", but docs for other implementations place the same limit on argument values.)
EDIT: as Deduplicator observed, the error probably arises from default char being signed on this platform, so that you are passing negative values (different from EOF). std::string uses char, not wide characters, so my original conclusion cannot be correct.

Converting Const char * to Unsigned long int - strtoul

I am using the following code to convert Const char * to Unsigned long int, but the output is always 0. Where am I doing wrong? Please let me know.
Here is my code:
#include <iostream>
#include <vector>
#include <stdlib.h>
using namespace std;
int main()
{
vector<string> tok;
tok.push_back("2");
const char *n = tok[0].c_str();
unsigned long int nc;
char *pEnd;
nc=strtoul(n,&pEnd,1);
//cout<<n<<endl;
cout<<nc<<endl; // it must output 2 !?
return 0;
}
Use base-10:
nc=strtoul(n,&pEnd,10);
or allow the base to be auto-detected:
nc=strtoul(n,&pEnd,0);
The third argument to strtoul is the base to be used and you had it as base-1.
You need to use:
nc=strtoul(n,&pEnd,10);
You used base=1 that means only zeroes are allowed.
If you need info about integer bases you can read this
The C standard library function strtoul takes as its third argument the base/radix of the number system to be used in interpreting the char array pointed to by the first argument.
Where am I doing wrong?
nc=strtoul(n,&pEnd,1);
You're passing the base as 1, which leads to a unary numeral system i.e. the only number that can be repesented is 0. Hence you'd get only that as the output. If you need decimal system interpretation, pass 10 instead of 1.
Alternatively, passing 0 lets the function auto-detect the system based on the prefix: if it starts with 0 then it is interpreted as octal, if it is 0x or 0X it is taken as hexadecimal, if it has other numerals it is assumed as decimal.
Aside:
If you don't need to know the character upto which the conversion was considered then passing a dummy second parameter is not required; you can pass NULL instead.
When you're using a C standard library function in a C++ program, it's recommended that you include the C++ version of the header; with the prefix c, without the suffix .h e.g. in your case, it'd be #include <cstdlib>
using namespace std; is considered bad practice

In C++, I thought you could do "string times 2" = stringstring?

I'm trying to figure out how to print a string several times. I'm getting errors. I just tried the line:
cout<<"This is a string. "*2;
I expected the output: "This is a string. This is a string.", but I didn't get that. Is there anything wrong with this line? If not, here's the entire program:
#include <iostream>
using namespace std;
int main()
{
cout<<"This is a string. "*2;
cin.get();
return 0;
}
My compiler isn't open because I am doing virus scans, so I can't give the error message. But given the relative simplicity of this code for this website, I'm hoping someone will know if I am doing anything wrong by simply looking.
Thank you for your feedback.
If you switch to std::string, you can define this operation yourself:
std::string operator*(std::string const &s, size_t n)
{
std::string r; // empty string
r.reserve(n * s.size());
for (size_t i=0; i<n; i++)
r += s;
return r;
}
If you try
std::cout << (std::string("foo") * 3) << std::endl
you'll find it prints foofoofoo. (But "foo" * 3 is still not permitted.)
There is an operator+() defined for std::string, so that string + string gives stringstring, but there is no operator*().
You could do:
#include <iostream>
#include <string>
using namespace std;
int main()
{
std::string str = "This is a string. ";
cout << str+str;
cin.get();
return 0;
}
As the other answers pointed there's no multiplication operation defined for strings in C++ regardless of their 'flavor' (char arrays or std::string). So you're left with implementing it yourself.
One of the simplest solutions available is to use the std::fill_n algorithm:
#include <iostream> // for std::cout & std::endl
#include <sstream> // for std::stringstream
#include <algorithm> // for std::fill_n
#include <iterator> // for std::ostream_iterator
// if you just need to write it to std::cout
std::fill_n( std::ostream_iterator< const char* >( std::cout ), 2, "This is a string. " );
std::cout << std::endl;
// if you need the result as a std::string (directly)
// or a const char* (via std::string' c_str())
std::stringstream ss;
std::fill_n( std::ostream_iterator< const char* >( ss ), 2, "This is a string. " );
std::cout << ss.str();
std::cout << std::endl;
Indeed, your code is wrong.
C++ compilers treat a sequence of characters enclosed in " as a array of characters (which can be multibyte or singlebyte, depending on your compiler and configuration).
So, your code is the same as:
char str[19] = "This is a string. ";
cout<<str * 2;
Now, if you check the second line of the above snippet, you'll clearly spot something wrong. Is this code multiplying an array by two? should pop in your mind. What is the definition of multiplying an array by two? None good.
Furthermore, usually when dealing with arrays, C++ compilers treat the array variable as a pointer to the first address of the array. So:
char str[19] = "This is a string. ";
cout<<0xFF001234 * 2;
Which may or may not compile. If it does, you code will output a number which is the double of the address of your array in memory.
That's not to say you simply can't multiply a string. You can't multiply C++ strings, but you can, with OOP, create your own string that support multiplication. The reason you will need to do that yourself is that even std strings (std::string) doesn't have a definition for multiplication. After all, we could argue that a string multiplication could do different things than your expected behavior.
In fact, if need be, I'd write a member function that duplicated my string, which would have a more friendly name that would inform and reader of its purpose. Using non-standard ways to do a certain thing will ultimately lead to unreadable code.
Well, ideally, you would use a loop to do that in C++. Check for/while/do-while methods.
#include <iostream>
#include <string>
using namespace std;
int main()
{
int count;
for (count = 0; count < 5; count++)
{
//repeating this 5 times
cout << "This is a string. ";
}
return 0;
}
Outputs:
This is a string. This is a string. This is a string. This is a string. This is a string.
Hey there, I'm not sure that that would compile. (I know that would not be valid in c# without a cast, and even then you still would not receive the desired output.)
Here would be a good example of what you are trying to accomplish. The OP is using a char instead of a string, but will essentially function the same with a string.
Give this a whirl:
Multiply char by integer (c++)
cout<<"This is a string. "*2;
you want to twice your output. Compiler is machine not a human. It understanding like expression and your expression is wrong and generate an error .
error: invalid operands of types
'const char [20]' and 'int' to binary
'operator*'