Isn't it a generally a bad idea to convert from a larger integral type to a smaller signed if there is any possibility that overflow errors could occur? I was surprised by this code in C++ Primer (17.5.2) demonstrating low-level IO operations:
int ch;
while((ch = cin.get()) != EOF)
cout.put(ch); //overflow could occur here
Given that
cin.get() converts the character it obtains to unsigned char, and then to int. So ch will be in the range 0-255 (exluding EOF). All is good.
But then in the put(ch) expression ch gets converted back to char. If char is signed then any value of ch from 128-255 is going to cause an overflow, surely?
Would such code be generally bad practice if I'm expecting something outside of ordinary input 0-127, since there are no guarantees how overflow is treated?
There are rules for integer demotion.
When a long integer is cast to a short, or a short is cast to a char,
the least-significant bytes are retained.
As seen in: https://msdn.microsoft.com/en-us/library/0eex498h.aspx
So the least significant byte of ch will be retained. All good.
Use itoa, if you want to convert the integer into a null-terminated string which would represent it.
char * itoa ( int value, char * str, int base );
or you can convert it to a string , then char :
std::string tostr (int x){
std::stringstream str;
str << x;
return str.str();}
convert string to char
string fname;
char *lname;
lname = new char[fname.lenght() + 1];
strcpy(f, lname.c_str());
if you see "Secure Error" disable it with #Pragma
Related
I have the following code:
int some_array[256] = { ... };
int do_stuff(const char* str)
{
int index = *str;
return some_array[index];
}
Apparently the above code causes a bug in some platforms, because *str can in fact be negative.
So I thought of two possible solutions:
Casting the value on assignment (unsigned int index = (unsigned char)*str;).
Passing const unsigned char* instead.
Edit: The rest of this question did not get a treatment, so I moved it to a new thread.
The signedness of char is indeed platform-dependent, but what you do know is that there are as many values of char as there are of unsigned char, and the conversion is injective. So you can absolutely cast the value to associate a lookup index with each character:
unsigned char idx = *str;
return arr[idx];
You should of course make sure that the arr has at least UCHAR_MAX + 1 elements. (This may cause hilarious edge cases when sizeof(unsigned long long int) == 1, which is fortunately rare.)
Characters are allowed to be signed or unsigned, depending on the platform. An assumption of unsigned range is what causes your bug.
Your do_stuff code does not treat const char* as a string representation. It uses it as a sequence of byte-sized indexes into a look-up table. Therefore, there is nothing wrong with forcing unsigned char type on the characters of your string inside do_stuff (i.e. use your solution #1). This keeps re-interpretation of char as an index localized to the implementation of do_stuff function.
Of course, this assumes that other parts of your code do treat str as a C string.
I wrote some code to verify a serial number is alpha numeric in C using isalnum. I wrote the code assuming isalnum input is char. Everything worked. However, after reviewing the isalnum later, I see that it wants input as int. Is my code okay the way it is should I change it?
If I do need to change, what would be the proper way? Should I just declare an int and set it to the char and pass that to isalnum? Is this considered bad programming practice?
Thanks in advance.
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
bool VerifySerialNumber( char *serialNumber ) {
int num;
char* charPtr = serialNumber;
if( strlen( serialNumber ) < 10 ) {
printf("The entered serial number seems incorrect.");
printf("It's less than 10 characters.\n");
return false;
}
while( *charPtr != '\0' ) {
if( !isalnum(*charPtr) ) {
return false;
}
*charPtr++;
}
return true;
}
int main() {
char* str1 = "abcdABCD1234";
char* str2 = "abcdef##";
char* str3 = "abcdABCD1234$#";
bool result;
result = VerifySerialNumber( str1 );
printf("str= %s, result=%d\n\n", str1, result);
result = VerifySerialNumber( str2 );
printf("str= %s, result=%d\n\n", str2, result);
result = VerifySerialNumber( str3 );
printf("str= %s, result=%d\n\n", str3, result);
return 0;
}
Output:
str= abcdABCD1234, result=1
The entered serial number seems incorrect.It's less than 10 characters.
str= abcdef##, result=0
str= abcdABCD1234$#, result=0
You don't need to change it. The compiler will implicitly convert your char to an int before passing it to isalnum. Functions like isalnum take int arguments because functions like fgetc return int values, which allows for special values like EOF to exist.
Update: As others have mentioned, be careful with negative values of your char. Your version of the C library might be implemented carefully so that negative values are handled without causing any run-time errors. For example, glibc (the GNU implementation of the standard C library) appears to handle negative numbers by adding 128 to the int argument.* However, you won't always be able to count on having isalnum (or any of the other <ctype.h> functions) quietly handle negative numbers, so getting in the habit of not checking would be a very bad idea.
* Technically, it's not adding 128 to the argument itself, but rather it appears to be using the argument as an index into an array, starting at index 128, such that passing in, say, -57 would result in an access to index 71 of the array. The result is the same, though, since array[-57+128] and (array+128)[-57] point to the same location.
Usually it is fine to pass a char value to a function that takes an int. It will be converted to the int with the same value. This isn't a bad practice.
However, there is a specific problem with isalnum and the other C functions for character classification and conversion. Here it is, from the ISO/IEC 9899:TC2 7.4/1 (emphasis mine):
In all cases the argument is an int, the value of which shall be
representable as an unsigned char or shall equal the value of the
macro EOF. If the argument has any other value, the behavior is
undefined.
So, if char is a signed type (this is implementation-dependent), and if you encounter a char with negative value, then it will be converted to an int with negative value before passing it to the function. Negative numbers are not representable as unsigned char. The numbers representable as unsigned char are 0 to UCHAR_MAX. So you have undefined behavior if you pass in any negative value other than whatever EOF happens to be.
For this reason, you should write your code like this in C:
if( !isalnum((unsigned char)*charPtr) )
or in C++ you might prefer:
if( !isalnum(static_cast<unsigned char>(*charPtr)) )
The point is worth learning because at first encounter it seems absurd: do not pass a char to the character functions.
Alternatively, in C++ there is a two-argument version of isalnum in the header <locale>. This function (and its friends) do take a char as input, so you don't have to worry about negative values. You will be astonished to learn that the second argument is a locale ;-)
So, string comes with the value type of char. I want a string of value type unsigned char. Why i want such a thing is because i am currently writing a program which converts large input of hexadecimal to decimal, and i am using strings to calculate the result. But the range of char, which is -128 to 127 is too small, unsigned char with range 0 to 255 would work perfectly instead. Consider this code:
#include<iostream>
using namespace std;
int main()
{
typedef basic_string<unsigned char> u_string;
u_string x= "Hello!";
return 0;
}
But when i try to compile, it shows 2 errors, one is _invalid conversion from const char* to unsigned const char*_ and the other is initializing argument 1 of std::basic_string<_CharT, _Traits, _Alloc>::basic_string...(it goes on)
EDIT:
"Why does the problem "converts large input of hexadecimal to decimal" require initializing a u_string with a string literal?"
While calculating, each time i shift to the left of the hexadecimal number, i multiply by 16. At most the result is going to be 16x9=144, which surpasses the limit of 127, and it makes it negative value.
Also, i have to initialize it like this:
x="0"; x[0] -='0';
Because i want it to be 0 in value. if the variable is null, then i can't perform operations on it, if it is 0, then i can.
So, what should i do?
String literals are const char and you are assigning them to a const unsigned char.
Two solution you have:
First, Copy string from standard strings to your element by element.
Second, Write your own user-literal for your string class:
inline constexpr const unsigned char * operator"" _us(const char *s,unsigned int)
{
return (const unsigned char *) s;
}
// OR
u_string operator"" _us(const char *s, unsigned int len)
{
return u_string(s, s+len);
}
u_string x = "Hello!"_us;
An alternative solution would be to make your compiler treat char as unsigned. There are compiler flags for this:
MSVC: /J
GCC, Clang, ICC: -funsigned-char
Is there a way to convert numeric string to a char containing that value? For example, the string "128" should convert to a char holding the value 128.
Yes... atoi from C.
char mychar = (char)atoi("128");
A more C++ oriented approach would be...
template<class T>
T fromString(const std::string& s)
{
std::istringstream stream (s);
T t;
stream >> t;
return t;
}
char mychar = (char)fromString<int>(mycppstring);
There's the C-style atoi, but it converts to an int. You 'll have to cast to char yourself.
For a C++ style solution (which is also safer) you can do
string input("128");
stringstream ss(str);
int num;
if((ss >> num).fail()) {
// invalid format or other error
}
char result = (char)num;
It depends. If char is signed and 8 bits, you cannot convert "128" to a char in base 10. The maximum positive value of a signed 8-bit value is 127.
This is a really pedantic answer, but you should probably know this at some point.
You can use atoi. That will get you the integer 128. You can just cast that to a char and you're done.
char c = (char) atoi("128");
The title is pretty self explanatory.
char c = std::cin.peek(); // sets c equal to character in stream
I just realized that perhaps native type char can't hold the EOF.
thanks,
nmr
Short answer: No. Use int instead of char.
Slightly longer answer: No. If you can get either a character or the value EOF from a function, such as C's getchar and C++'s peek, clearly a normal char variable won't be enough to hold both all valid characters and the value EOF.
Even longer answer: It depends, but it will never work as you might hope.
C and C++ has three character types (except for the "wide" types): char, signed char and unsigned char. Plain char can be signed or unsigned, and this varies between compilers.
The value EOF is a negative integer, usually -1, so clearly you can't store it in an unsigned char or in a plain char that is unsigned. Assuming that your system uses 8-bit characters (which nearly all do), EOF will be converted to (decimal) 255, and your program will not work.
But if your char type is signed, or if you use the signed char type, then yes, you can store -1 in it, so yes, it can hold EOF. But what happens then when you read a character with code 255 from the file? It will be interpreted as -1, that is, EOF (assuming that your implementation uses -1). So your code will stop reading not just at the end of the file, but also as soon as it finds a 255 character.
Note that the return value of std::cin.peek() is actually of type std::basic_ios<char>::int_type, which is the same as std::char_traits<char>::int_type, which is an int and not a char.
More important than that, the value returned in that int is not necessarily a simple cast from char to int but is the result of calling std::char_traits<char>::to_int_type on the next character in the stream or std::char_traits<char>::eof() (which is defined to be EOF) if there is no character.
Typically, this is all implemented in exactly the same way as fgetc casts the character to an unsigned char and then to an int for its return value so that you can distinguish all valid character values from EOF.
If you store the return value of std::cin.peek() in a char then there is the possiblity that reading a character with a positive value (say ΓΏ in a iso-8859-1 encoded file) will compare equal to EOF .
The pedantic thing to do would be.
typedef std::istream::traits_type traits_type;
traits_type::int_type ch;
traits_type::char_type c;
while (!traits_type::eq_int_type((ch = std::cin.peek()), traits_type::eof()))
{
c = traits_type::to_char_type(ch);
// ...
}
This would probably be more usual:
int ch;
char c;
while ((ch = std::cin.peek()) != EOF)
{
c = std::iostream::traits_type::to_char_type(ch);
// ...
}
Note that it is important to convert the character value correctly. If you perform a comparison like this: if (ch == '\xff') ... where ch is an int as above, you may not get the correct results. You need to use std::char_traits<char>::to_char_type on ch or std::char_traits<char>::to_int_type on the character constant to get a consistent result. (You are usually safe with members of the basic character set, though.)