Conversion from int to char may alter its value - c++

This is giving me an error on line 524:35 where it says
strcap[i] = tolower (str [i]);
saying that the conversion to char from int may alter its value, I have a few errors of these in the source code, so if I can fix this one then the others will be a piece of cake. Can anyone please explain to me in simple terms? I am quite new to this.. Thanks!
char *imccapitalize( const char *str )
{
static char strcap[LGST];
int i;
for( i = 0; str[i] != '\0'; i++ )
strcap[i] = tolower( str[i] );
strcap[i] = '\0';
strcap[0] = toupper( strcap[0] );
return strcap;
}

Many C "character" functions in fact take and return ints as characters. (This is so they can return special values such as EOF, and also because this practice got set in stone before people noticed that type safety was a good thing.) toupper and tolower are two of these functions. The declaration for tolower is:
int tolower(int c);
So to get rid of these warnings, you have to typecast the return value:
strcap[i] = (char) tolower( str[i] );

It should be a warning, not an error (however most compilers have an option to treat warnings
as errors, so if that option is being used an error will be produced).
The warning is generated because tolower returns int, which is common for the C runtime library character manipulation functions. Assigning an int to char can result in truncation of the value since on most machines a char is one-byte and an int is two or more, hence the warning.
A typecast will eliminate the warning:
strcap[i] = (char)tolower(str[i]);

Integers are represented using more bytes than chars do. Therefore if you try to turn an integer to a char you will have to get rid of the extra bytes ,potentially altering the integer's value.

Related

What does `(c = *str) != 0` mean?

int equiv (char, char);
int nmatches(char *str, char comp) {
char c;
int n=0;
while ((c = *str) != 0) {
if (equiv(c,comp) != 0) n++;
str++;
}
return (n);
}
What does "(c = *str) != 0" actually mean?
Can someone please explain it to me or help give me the correct terms to search for an explanation myself?
This expression has two parts:
c = *str - this is a simple assignment of c from dereferencing a pointer,
val != 0 - this is a comparison to zero.
This works, because assignment is an expression, i.e. it has a value. The value of the assignment is the same as the value being assigned, in this case, the char pointed to by the pointer. So basically, you have a loop that traces a null-terminated string to the end, assigning each individual char to c as it goes.
Note that the != 0 part is redundant in C, because the control expression of a while loop is implicitly compared to zero:
while ((c = *str)) {
...
}
The second pair of parentheses is optional from the syntax perspective, but it's kept in assignments like that in order to indicate that the assignment is intentional. In other words, it tells the readers of your code that you really meant to write an assignment c = *str, and not a comparison c == *str, which is a lot more common inside loop control blocks. The second pair of parentheses also suppresses the compiler warning.
Confusingly,
while ((c = *str) != 0) {
is a tautology of the considerably easier to read
while (c = *str) {
This also has the effect of assigning the character at *str to c, and the loop will terminate once *str is \0; i.e. when the end of the string has been reached.
Assignments within conditionals such as the above can be confusing on first glance, (cf. the behaviour of the very different c == *str), but they are such a useful part of C and C++, you need to get used to them.
(c = *str) is an expression and that has a value in itself. It is an assignment, the value of an assignment is the assigned value. So the value of (c = *str) is the value of *str.
The code basically checks, whether the value of *str, which just has been assigned to c is not 0. In case it isn't, then it will call the function equiv with that value.
Once the 0 is assigned, this is the end of the string. The function has to stop reading from the memory, which it does.
It's looping over every character in the string str, assigning them to c and then seeing if c is equal to 0 which would indicate the end of the string.
Although really the code should use '\0' as that is more obviously a NUL character.
We are going through the str in the while loop and extract every char symbol in it until it is equal to zero - the main rule of the end of char string.
Here is 'for' loop equivalent:
for (int i = 0; i < strlen(str); ++i )
std::cout << str[i];
It is just sloppily written code. The intention is to copy a character from the string str into c and then check if it was the null terminator.
The idiomatic way to check for the null terminator in C is an explicit check against '\0':
if(c != '\0')
This is so-called self-documenting code, since the de facto standard way to write the null terminator in C is by using the octal escape sequence \0.
Another mistake is to use assignment inside conditions. This was recognized as bad practice back in the 1980s and since then every compiler gives a warning against such code, "possibly incorrect assignment" or similar. This is bad practice because assignment includes a side effect and expressions with side effects should be kept as simple as possible. But it is also bad practice because it is easy to mix up = and ==.
The code could easily be rewritten as something more readable and safe:
c = *str;
while (c != '\0')
{
if(equiv(c, comp) != 0)
{
n++;
}
str++;
c = *str;
}
You don't need char c since you already have the pointer char *str, also you can replace != 0 with != '\0' for better readability (if not compatibility)
while (*str != '\0')
{
if (equiv((*str),comp)
!= 0)
{ n++; }
str++;
}
To understand what the code does, you can read it like this
while ( <str> pointed-to value is-not <end_of_string> )
{
if (function <equiv> with parameters( <str> pointed-to value, <comp> )
returned non-zero integer value)
then { increment <n> by 1 }
increment pointer <str> by 1 x sizeof(char) so it points to next adjacent char
}

Taking an index out of const char* argument

I have the following code:
int some_array[256] = { ... };
int do_stuff(const char* str)
{
int index = *str;
return some_array[index];
}
Apparently the above code causes a bug in some platforms, because *str can in fact be negative.
So I thought of two possible solutions:
Casting the value on assignment (unsigned int index = (unsigned char)*str;).
Passing const unsigned char* instead.
Edit: The rest of this question did not get a treatment, so I moved it to a new thread.
The signedness of char is indeed platform-dependent, but what you do know is that there are as many values of char as there are of unsigned char, and the conversion is injective. So you can absolutely cast the value to associate a lookup index with each character:
unsigned char idx = *str;
return arr[idx];
You should of course make sure that the arr has at least UCHAR_MAX + 1 elements. (This may cause hilarious edge cases when sizeof(unsigned long long int) == 1, which is fortunately rare.)
Characters are allowed to be signed or unsigned, depending on the platform. An assumption of unsigned range is what causes your bug.
Your do_stuff code does not treat const char* as a string representation. It uses it as a sequence of byte-sized indexes into a look-up table. Therefore, there is nothing wrong with forcing unsigned char type on the characters of your string inside do_stuff (i.e. use your solution #1). This keeps re-interpretation of char as an index localized to the implementation of do_stuff function.
Of course, this assumes that other parts of your code do treat str as a C string.

Converting from int to char where overflow errors are concerned

Isn't it a generally a bad idea to convert from a larger integral type to a smaller signed if there is any possibility that overflow errors could occur? I was surprised by this code in C++ Primer (17.5.2) demonstrating low-level IO operations:
int ch;
while((ch = cin.get()) != EOF)
cout.put(ch); //overflow could occur here
Given that
cin.get() converts the character it obtains to unsigned char, and then to int. So ch will be in the range 0-255 (exluding EOF). All is good.
But then in the put(ch) expression ch gets converted back to char. If char is signed then any value of ch from 128-255 is going to cause an overflow, surely?
Would such code be generally bad practice if I'm expecting something outside of ordinary input 0-127, since there are no guarantees how overflow is treated?
There are rules for integer demotion.
When a long integer is cast to a short, or a short is cast to a char,
the least-significant bytes are retained.
As seen in: https://msdn.microsoft.com/en-us/library/0eex498h.aspx
So the least significant byte of ch will be retained. All good.
Use itoa, if you want to convert the integer into a null-terminated string which would represent it.
char * itoa ( int value, char * str, int base );
or you can convert it to a string , then char :
std::string tostr (int x){
std::stringstream str;
str << x;
return str.str();}
convert string to char
string fname;
char *lname;
lname = new char[fname.lenght() + 1];
strcpy(f, lname.c_str());
if you see "Secure Error" disable it with #Pragma

Is char and int interchangeable for function arguments in C?

I wrote some code to verify a serial number is alpha numeric in C using isalnum. I wrote the code assuming isalnum input is char. Everything worked. However, after reviewing the isalnum later, I see that it wants input as int. Is my code okay the way it is should I change it?
If I do need to change, what would be the proper way? Should I just declare an int and set it to the char and pass that to isalnum? Is this considered bad programming practice?
Thanks in advance.
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
bool VerifySerialNumber( char *serialNumber ) {
int num;
char* charPtr = serialNumber;
if( strlen( serialNumber ) < 10 ) {
printf("The entered serial number seems incorrect.");
printf("It's less than 10 characters.\n");
return false;
}
while( *charPtr != '\0' ) {
if( !isalnum(*charPtr) ) {
return false;
}
*charPtr++;
}
return true;
}
int main() {
char* str1 = "abcdABCD1234";
char* str2 = "abcdef##";
char* str3 = "abcdABCD1234$#";
bool result;
result = VerifySerialNumber( str1 );
printf("str= %s, result=%d\n\n", str1, result);
result = VerifySerialNumber( str2 );
printf("str= %s, result=%d\n\n", str2, result);
result = VerifySerialNumber( str3 );
printf("str= %s, result=%d\n\n", str3, result);
return 0;
}
Output:
str= abcdABCD1234, result=1
The entered serial number seems incorrect.It's less than 10 characters.
str= abcdef##, result=0
str= abcdABCD1234$#, result=0
You don't need to change it. The compiler will implicitly convert your char to an int before passing it to isalnum. Functions like isalnum take int arguments because functions like fgetc return int values, which allows for special values like EOF to exist.
Update: As others have mentioned, be careful with negative values of your char. Your version of the C library might be implemented carefully so that negative values are handled without causing any run-time errors. For example, glibc (the GNU implementation of the standard C library) appears to handle negative numbers by adding 128 to the int argument.* However, you won't always be able to count on having isalnum (or any of the other <ctype.h> functions) quietly handle negative numbers, so getting in the habit of not checking would be a very bad idea.
* Technically, it's not adding 128 to the argument itself, but rather it appears to be using the argument as an index into an array, starting at index 128, such that passing in, say, -57 would result in an access to index 71 of the array. The result is the same, though, since array[-57+128] and (array+128)[-57] point to the same location.
Usually it is fine to pass a char value to a function that takes an int. It will be converted to the int with the same value. This isn't a bad practice.
However, there is a specific problem with isalnum and the other C functions for character classification and conversion. Here it is, from the ISO/IEC 9899:TC2 7.4/1 (emphasis mine):
In all cases the argument is an int, the value of which shall be
representable as an unsigned char or shall equal the value of the
macro EOF. If the argument has any other value, the behavior is
undefined.
So, if char is a signed type (this is implementation-dependent), and if you encounter a char with negative value, then it will be converted to an int with negative value before passing it to the function. Negative numbers are not representable as unsigned char. The numbers representable as unsigned char are 0 to UCHAR_MAX. So you have undefined behavior if you pass in any negative value other than whatever EOF happens to be.
For this reason, you should write your code like this in C:
if( !isalnum((unsigned char)*charPtr) )
or in C++ you might prefer:
if( !isalnum(static_cast<unsigned char>(*charPtr)) )
The point is worth learning because at first encounter it seems absurd: do not pass a char to the character functions.
Alternatively, in C++ there is a two-argument version of isalnum in the header <locale>. This function (and its friends) do take a char as input, so you don't have to worry about negative values. You will be astonished to learn that the second argument is a locale ;-)

Different results using atoi

Could someone explain why those calls are not returning the same expected result?
unsigned int GetDigit(const string& s, unsigned int pos)
{
// Works as intended
char c = s[pos];
return atoi(&c);
// doesn't give expected results
return atoi(&s[pos]);
return atoi(&static_cast<char>(s[pos]));
return atoi(&char(s[pos]));
}
Remark: I'm not looking for the best way to convert a char to an int.
None of your attempts are correct, including the "works as intended" one (it just happened to work by accident). For starters, atoi() requires a NUL-terminated string, which you are not providing.
How about the following:
unsigned int GetDigit(const string& s, unsigned int pos)
{
return s[pos] - '0';
}
This assumes that you know that s[pos] is a valid decimal digit. If you don't, some error checking is in order.
What you are doing is use a std::string, get one character from its internal representation and feed a pointer to it into atoi, which expects a const char* that points to a NULL-terminated string. A std::string is not guaranteed to store characters so that there is a terminating zero, it's just luck that your C++ implementation seems to do this.
The correct way would be to ask std::string for a zero terminated version of it's contents using s.c_str(), then call atoi using a pointer to it.
Your code contains another problem, you are casting the result of atoi to an unsigned int, while atoi returns a signed int. What if your string is "-123"?
Since int atoi(const char* s) accepts a pointer to a field of characters, your last three uses return a number corresponding to the consecutive digits beginning with &s[pos], e.g. it can give 123 for a string like "123", starting at position 0. Since the data inside a std::string are not required to be null-terminated, the answer can be anything else on some implementation, i.e. undefined behaviour.
Your "working" approach also uses undefined behaviour.
It's different from the other attempts since it copies the value of s[pos]to another location.
It seems to work only as long as the adjacent byte in memory next to character c accidentally happens to be a zero or a non-digit character, which is not guaranteed. So follow the advice given by #aix.
To make it work really you could do the following:
char c[2] = { s[pos], '\0' };
return atoi(c);
if you want to access the data as a C string - use s.c_str(), and then pass it to atoi.
atoi expects a C-style string, std::string is a C++ class with different behavior and characteristics. For starters - it doesn't have to be NULL terminated.
atoi takes pointer to char for it's argument. In the first try when you are using the char c it takes pointer to only one character hence you get the answer you want. However in the other attempts what you get is pointer to a char which has happened to be beginning of a string of chars, therefore I assume what you are getting after atoi in the later attempts is a number converted from the chars in positions pos, pos+1, pos+2 and up to the end of the s string.
If you really want to convert just a single char in the string at the position (as opposed to a substring starting at that position and ending at the end of the string), you can do it these ways:
int GetDigit(const string& s, const size_t& pos) {
return atoi(string(1, s[pos]).c_str());
}
int GetDigit2(const string& s, const size_t& pos) {
const char n[2] = {s[pos], '\0'};
return atoi(n);
}
for example.