How I can fix this problem. Look to program en the extern C and C++ code.
The program is good. How you can write a good C and C++ code.
#include <stdio.h>
#include “myccode.h”
void dchar(char c)
{
printf ("%d\n",(int)c);
}
void main()
{
dchar(128);
}
On the screen has to be 128 but, you get -128. They told me write a extern C code. Me and my friend wrote for both C en C++ code compiler, but nothing happened. We still get -128.
char is typically a signed type holding one byte, even though that's up to the compiler (thanks commenters for pointing that out). It can therefore hold numbers from -128 to +127.
Thus, 128 is first converted to a char (causing an overflow, being converted to -128), and then back to int.
To print 128, you have to use a type which can hold this value. This could be for example an unsigned char:
#include <cstdio>
void dchar(unsigned char c)
{
printf ("%u\n",c);
}
int main()
{
dchar(128);
return 0;
}
Output:
128
The range of values a char can support, for your compiler, is -128 to 127. It is implementation defined whether a char is equivalent to a unsigned char or signed char, and your compiler vendor has chosen the latter.
Converting a value of 128 to a signed char that cannot represent the value 128 gives undefined behaviour. So any result is possible from your code.
Your only options would be to change your function so it accepts an integral argument of a type that can represent the value of 128. Options might include unsigned char (the standard guarantees it can represent values between 0 and 255) or int (which is guaranteed able to represent values between -32767 and 32767 - and, depending on compiler, may support a larger range).
The "right" solution depends on the range of values for which you require your function to produce correct output. You have not given any information about what values you need your code to work with other than 128, so that is an answer you will need to work out for yourself.
And, BTW, main() returns int, not void in standard C++.
First I want to thank you for your time and solution. Both program works.
I ask you again to watch the program but, this time with the complete description.
This is the complete program including description.
include
void dchar(char c)
{printf ("%d\n",(int)c);}
dchar the function is simple and press the numeric value of the passed to the function sign off.
What do you think can go wrong in such a simple function, especially when there type security is provided.
You can check this by compiling this short source file separately. Then enter the following source and compile it into position.
extern void dchar (unsigned char); I forgot last time, Sorry
void main()
{dchar(128);}
Everything seems to be normal until you try to run the linked program. Instead of weathered deed value 128
The value -128 is displayed. There has been an incomparable view type conversion so that the numerical value is wrong
was interpreted. The process of mangling function did not even have a chance to prevent this.
To resolve this problem by using well-written header files. Write the external declaration in a .h header files and use #include
To add the header file. Naturally, even now that there are errors do occur, but the probability of this method is dramatically reduced.
Related
The function std::isdigit is:
int isdigit(int ch);
The return (Non-zero value if the character is a numeric character, zero otherwise.) smells like the function was inherited from C, but even that does not explain why the parameter type is int not char while at the same time...
The behavior is undefined if the value of ch is not representable as
unsigned char and is not equal to EOF.
Is there any technical reason why isdigitstakes an int not a char?
The reaons is to allow EOF as input. And EOF is (from here):
EOF integer constant expression of type int and negative value
The accepted answer is correct, but I believe the question deserves more detail.
A char in C++ is either signed or unsigned depending on your implementation (and, yet, it's a distinct type from signed char and unsigned char).
Where C grew up, char was typically unsigned and assumed to be an n-bit byte that could represent [0..2^n-1]. (Yes, there were some machines that had byte sizes other than 8 bits.) In fact, chars were considered virtually indistinguishable from bytes, which is why functions like memcpy take char * rather than something like uint8_t *, why sizeof char is always 1, and why CHAR_BITS isn't named BYTE_BITS.
But the C standard, which was the baseline for C++, only promised that char could hold any value in the execution character set. They might hold additional values, but there was no guarantee. The source character set (basically 7-bit ASCII minus some control characters) required something like 97 values. For a while, the execution character set could be smaller, but in practice it almost never was. Eventually there was an explicit requirement that a char be large enough to hold an 8-bit byte.
But the range was still uncertain. If unsigned, you could rely on [0..255]. Signed chars, however, could--in theory--use a sign+magnitude representation that would give you a range of [-127..127]. Note that's only 255 unique values, not 256 values ([-128..127]) like you'd get from two's complement. If you were language lawyerly enough, you could argue that you cannot store every possible value of an 8-bit byte in a char even though that was a fundamental assumption throughout the design of the language and its run-time library. I think C++ finally closed that apparent loophole in C++17 or C++20 by, in effect, requiring that a signed char use two's complement even if the larger integral types use sign+magnitude.
When it came time to design fundamental input/output functions, they had to think about how to return a value or a signal that you've reached the end of the file. It was decided to use a special value rather than an out-of-band signaling mechanism. But what value to use? The Unix folks generally had [128..255] available and others had [-128..-1].
But that's only if you're working with text. The Unix/C folks thought of textual characters and binary byte values as the same thing. So getc() was also for reading bytes from a binary file. All 256 possible values of a char, regardless of its signedness, were already claimed.
K&R C (before the first ANSI standard) didn't require function prototypes. The compiler made assumptions about parameter and return types. This is why C and C++ have the "default promotions," even though they're less important now than they once were. In effect, you couldn't return anything smaller than an int from a function. If you did, it would just be converted to int anyway.
The natural solution was therefore to have getc() return an int containing either the character value or a special end-of-file value, imaginatively dubbed EOF, a macro for -1.
The default promotions not only mandated a function couldn't return an integral type smaller than an int, they also made it difficult to pass in a small type. So int was also the natural parameter type for functions that expected a character. And thus we ended up with function signatures like int isdigit(int ch).
If you're a Posix fan, this is basically all you need.
For the rest of us, there's a remaining gotcha: If your chars are signed, then -1 might represent a legitimate character in your execution character set. How can you distinguish between them?
The answer is that functions don't really traffic in char values at all. They're really using unsigned char values dressed up as ints.
int x = getc(source_file);
if (x != EOF) { /* reached end of file */ }
else if (0 <= x && x < 128) { /* plain 7-bit character */ }
else if (128 <= x && x < 256) {
// Here it gets interesting.
bool b1 = isdigit(x); // OK
bool b2 = isdigit(static_cast<char>(x)); // NOT PORTABLE
bool b3 = isdigit(static_cast<unsigned char>(x)); // CORRECT!
}
I have written the following code to test if the given input is a digit or not.
#include<iostream>
#include<ctype.h>
#include<stdio.h>
using namespace std;
main()
{
char c;
cout<<"Please enter a digit: ";
cin>>c;
if(isdigit(c)) //int isdigit(int c) or char isdigit(char c)
{
cout<<"You entered a digit"<<endl;
}
else
{
cout<<"You entered a non-digit value"<<endl;
}
}
My question is: what should be the input variable type? char or int?
The situation is unfortunately a bit more complex than has been told by the other answers.
First of all: the first part of your code is correct (disregarding multiple-byte encodings); if you want to read a single char with cin, you'll have to use a char variable with >> operator.
Now, about isdigit: why does it take an int instead of a char?
It all comes from C; isdigit and its companion were born to be used along with functions like getchar(), which read a character from the stream and return an int. This in turn was done to provide the character and an error code: getchar() can return EOF (which is defined as some implementation-defined negative constant) through its return code to signify that the input stream has ended.
So, the basic idea is: negative = error code; positive = actual character code.
Unfortunately, this poses interoperability problems with "regular" chars.
Short digression: char ultimately is just an integral type with a very small range, but a particularly stupid one. In most occasions - when working with bytes or character codes - you'd want it to be unsigned by default; OTOH, for coherency reasons with other integral types (int, short, long, ...), you may say that the right thing would be that plain char should be signed. The Standard chose the most stupid way: plain char is either signed or unsigned, depending from whatever the implementor of the compiler decides1.
So, you have to be prepared for char being either signed or unsigned; in most implementations it's signed by default, which poses a problem with the getchar() arrangement above.
If char is used to read bytes and is signed it means that all bytes with the high bit set (AKA bytes that, read with an unsigned 8-bit type would be >127) turn out to be negative values. This obviously isn't compatible with the getchar() using negative values for EOF - there could be overlap between actual "negative" characters and EOF.
So, when C functions talk about receiving/providing characters into int variables the contract is always that the character is assumed to be a char that has been cast to an unsigned char (so that it is always positive, negative values overflowing into the top half of its range) and then put into an int. Which brings us back to the isdigit function, which, along its companion functions, has this contract as well:
The header <ctype.h> declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.
(C99, §7.4, ¶1)
So, long story short: your if should be at the very least:
if(isdigit((unsigned char)c))
The problem is not just a theoretical one: several widespread C library implementations use the provided value straight as an index into a lookup table, so negative values will read into unallocated memory and segfault your program.
Also, you are not taking into account the fact that the stream may be closed, and thus >> returning without touching your variable (which will be at an uninitialized value); to take this into account, you should check if the stream is still in a valid state before working on c.
Of course this is a bit of an unfair rant; as #Pete Becker noted in the comment below, it's not like they were all morons, but just that the standard mostly tried to be compatible with existing implementations, which were probably evenly split between unsigned and signed char. Traces of this split can be found in most modern compilers, which can generally change the signedness of char through command line options (-fsigned-char/-funsigned-char for gcc/clang, /J in VC++).
If you want to read a single character and check whether it is a digit or not then it should be char.
If you set it as int then multiple characters will be read and the result of isDigit will always be true.
My program does the common task of writing binary data to a file, conforming to a certain non-text file format. Since the data I'm writing is not already in existing chunks but instead is put together byte by byte at runtime, I use std::ostream::put() instead of write(). I assume this is normal procedure.
The program works just fine. It uses both std::stringstream::put() and std::ofstream::put() with two-digit hex integers as the arguments. But I get compiler warning C4309: "truncation of constant value" (in VC++ 2010) whenever the argument to put() is greater than 0x7f. Obviously the compiler is expecting a signed char, and the constant is out of range. But I don't think any truncation is actually happening; the byte gets written just like it's supposed to.
Compiler warnings make me think I'm not doing things in the normal, accepted way. The situation I described has to be a common one. Is there are common way to avoid such a compiler warning? Or is this an example of a pointless compiler warning that should just be ignored?
I thought of two inelegant ways to avoid it. I could use syntax like mystream.put( char(0xa4) ) on every call. Or instead of using std::stringstream I could use std::basic_stringstream< unsigned char >, but I don't think that trick would work with std::ofstream, which is not a templated type. I feel like there should be a better solution here, especially since ofstream is meant for writing binary files.
Your thoughts?
--EDIT--
Ah, I was mistaken about std::ofstream not being a templated type. It is actually std::basic_ofstream<char>, but I tried that method that and realized it won't work anyway for lack of defined methods and polymorphic incompatibility with std::ostream.
Here's a code sample:
stringstream ss;
int a, b;
/* Do stuff */
ss.put( 0 );
ss.put( 0x90 | a ); // oddly, no warning here...
ss.put( b ); // ...or here
ss.put( 0xa4 ); // C4309
I found solution that I'm happy with. It's more elegant than explicitly casting every constant to unsigned char. This is what I had:
ss.put( 0xa4 ); // C4309
I thought that the "truncation" was happening in implicitly casting unsigned char to char, but Cong Xu pointed out that integer constants are assumed to be signed, and any one greater than 0x7f gets promoted from char to int. Then it has to actually be truncated (cut down to one byte) if passed to put(). By using the suffix "u", I can specify an unsigned integer constant, and if it's no greater than 0xff, it will be an unsigned char. This is what I have now, without compiler warnings:
ss.put( 0xa4u );
std::stringstream ss;
ss.put(0x7f);
ss.put(0x80); //C4309
As you've guessed, the problem is that ostream.put() expects a char, but 0x7F is the maximum value for char, and anything greater gets promoted to int. You should cast to unsigned char, which is as wide as char so it'll store anything char does and safely, but also make truncation warnings legitimate:
ss.put(static_cast<unsigned char>(0x80)); // OK
ss.put(static_cast<unsigned char>(0xFFFF)); //C4309
The C programming language says that the functions from <ctype.h> follow a common requirement:
ISO C99, 7.4p1:
In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.
This means that the following code is unsafe:
int upper(const char *s, size_t index) {
return toupper(s[index]);
}
If this code is executed on an implementation where char has the same value space as signed char and there is a character with a negative value in the string, this code invokes undefined behavior. The correct version is:
int upper(const char *s, size_t index) {
return toupper((unsigned char) s[index]);
}
Nevertheless I see many examples in C++ that don't care about this possibility of undefined behavior. So is there anything in the C++ standard that guarantees that the above code will not lead to undefined behavior, or are all the examples wrong?
[Additional Keywords: ctype cctype isalnum isalpha isblank iscntrl isdigit isgraph islowwer isprint ispunct isspace isupper isxdigit tolower]
For what it's worth, the Solaris Studio compilers (using stlport4) are one such compiler suite that produce an unexpected result here. Compiling and running this:
#include <stdio.h>
#include <cctype>
int main() {
char ch = '\xa1'; // '¡' in latin-1 locales + UTF-8
printf("is whitespace: %i\n", std::isspace(ch));
return 0;
}
gives me:
kevin#solaris:~/scratch
$ CC -library=stlport4 whitespace.cpp && ./a.out
is whitespace: 8
For reference:
$ CC -V
CC: Studio 12.5 Sun C++ 5.14 SunOS_i386 2016/05/31
Of course, this behavior is as documented in the C++ standard, but it's definitely surprising.
EDIT: Since it was pointed out that the above version contained undefined behavior in the attempt to assign char ch = '\xa1' due to integer overflow, here's a version that avoids that and still retains the same output:
#include <stdio.h>
#include <cctype>
int main() {
char ch = -95;
printf("is whitespace: %i\n", std::isspace(ch));
return 0;
}
And that does still print 8 on my Solaris VM:
kevin#solaris:~/scratch
$ CC -library=stlport4 whitespace.cpp && ./a.out
is whitespace: 8
EDIT 2: And here's a program that might otherwise look sane but gives an unexpected result due to UB in the use of std::isspace():
#include <cstdio>
#include <cstring>
#include <cctype>
static int count_whitespace(const char* str, int n) {
int count = 0;
for (int i = 0; i < n; i++)
if (std::isspace(str[i])) // oops!
count += 1;
return count;
}
int main() {
const char* batman = "I am batman\xa1";
int n = std::strlen(batman);
std::printf("%i\n", count_whitespace(batman, n));
return 0;
}
And, on my Solaris machine:
kevin#solaris:~/scratch
$ CC whitespace.cpp && ./a.out
3
Note that depending on how you permute this program, you'll probably get the expected result of two whitespace characters; that is, there is almost certainly some compiler optimization kicking in that takes advantage of this UB to give you the wrong result faster.
You could imagine this biting you in the face if you were, for example, attempting to tokenize a UTF-8 string by searching for (non-multibyte) whitespace characters in the string. Such a program would behave correctly when casting str[i] to unsigned char.
Sometimes most people are wrong. I think that's so here. Having said that there's nothing to stop an standard library implementor defining the behaviour that most people expect. So maybe that's why most people don't care, since they've never actually seen a bug resulting from this error.
The history behind the char type is that it was originally the type used to describe 7-bit ASCII characters. At the same time, C lacked a separate 8 bit integer type. So in the pre-standard days of the eighties, some compilers made char unsigned - since it doesn't make sense to have negative indices in a symbol table, while other compilers made char signed, to make it consistent with all the other integer types.
When the time came to standardize C, both versions existed. Unfortunately, the committee decided to let it remain that way, leaving the decision to the compiler. Instead they added two other types: signed char and unsigned char. signed char is part of the signed integer types, unsigned char is part of the unsigned integer types, and char is part of neither, though it must have the same representation as either signed char or unsigned char. (This is all described in C11 6.2.5)
Notably, char never was anything but 8 bits on all known implementations, save from some exotic oddball DSPs that worked with 16 bit bytes. When "extended" symbol tables were used, either the implementation changed from 7 to 8 bit characters, or wchar_t was used. Please note that wchar_t has been in the C language since the beginning, so assuming that char was at some point used for things like UTF8 is probably incorrect (though theoretically possible).
Now if char is signed, and you store a value larger than CHAR_MAX or smaller than CHAR_MIN inside it, you invoke undefined behavior, as per C11 6.5 §5. Period. So if you have an array of char and any item inside it violate the type boundaries, you have undefined behavior there already. Even though character types have to trap representations, undefined behavior could cause the code to misbehave in other ways, such as incorrect optimizations.
The ctype.h functions allow EOF as parameter, but should otherwise behave as if working with character types, even though the parameter is int to allow EOF. The text from 7.4 §1 is mostly saying that "if you pass some random int to this function, which is neither of the same representation as a char, nor EOF, the behavior is undefined".
But if you pass a char where you have already invoked signed integer overflow/underflow, you already have undefined behavior even before calling the function - this has nothing to do with the ctype.h functions or any other function. Thus your assumption that the posted "upper" function is unsafe is incorrect - this code is no different from any other code using the char type.
An example of undefined behavior caused by the cited ctype.h restrictions in 7.4 would rather be something like toupper(666).
Lets us consider this snippet:
int s;
scanf("%c",&s);
Here I have used int, and not char, for variable s, now for using s for character conversion safely I have to make it char again because when scanf reads a character it only overwrites one byte of the variable it is assigning it to, and not all four that int has.
For conversion I could use s = (char)s; as the next line, but is it possible to implement the same by subtracting something from s ?
What you've done is technically undefined behaviour. The %c format calls for a char*, you've passed it an int* which will (roughly speaking) be reinterpreted. Even assuming that the pointer value is still good after reinterpreting, storing an arbitrary character to the first byte of an int and then reading it back as int is undefined behaviour. Even if it were defined, reading an int when 3 bytes of it are uninitialized, is undefined behaviour.
In practice it probably does something sensible on your machine, and you just get garbage in the top 3 bytes (assuming little-endian).
Writing s = (char)s converts the value from int to char and then back to int again. This is implementation-defined behaviour: converting an out-of-range value to a signed type. On different implementations it might clean up the top 3 bytes, it might return some other result, or it might raise a signal.
The proper way to use scanf is:
char c;
scanf("%c", &c);
And then either int s = c; or int s = (unsigned char)c;, according to whether you want negative-valued characters to result in a negative integer, or a positive integer (up to 255, assuming 8-bit char).
I can't think of any good reason for using scanf improperly. There are good reasons for not using scanf at all, though:
int s = getchar();
Are you trying to convert a digit to its decimal value? If so, then
char c = '8';
int n = c - '0';
n should 8 at this point.
That's probably not a good idea; GCC gives me a warning for that code:
main.c:10: warning: format ‘%c’ expects type ‘char *’, but
argument 2 has type ‘int *’
In this case you're ok since you're passing a pointer to more space than you need (for most systems), but what if you did it the other way around? Could be crash city. If you really want to do something like what you have there, just do the typecast or mask it - the mask will be endian-dependent.
As written this won't work reliably . The argument, &s, to scanf is a pointer to int and scanf is expecting a pointer to char. The two data type (int and char) have different sizes (at least on most architectures) so the data may get put in the wrong spot in memeory, and the other part of s may not get properly cleared.
The answers suggesting manipulation of the result after using a pointer to int rely on unspecified behavior (i.e. that scanf will put the character value it has in the least significant byte of the int you're pointing to), and are not safe.
Not but you could use the following:
s = s & 0xFF
That will blank out all of the data except the first byte. But in general all these ideas (and the ones above) are bad ideas, since not all systems store the lowest part of the integer in memory first. So if you ever have to port this code to a big endian system, you'll be screwed.
True, you may never have to port the code, but why write unportable code to begin with?
See this for more info:
http://en.wikipedia.org/wiki/Endianness