I am trying to figure out why an array of char is assigned to a int value, now I am a little confused in using cast operator.
I didn't get what is in do statement, I hope somebody can explain
char *readword()
{
int c,i;
char t[255];
char *p;
//jump over chars who aren't letters
while ((c=getchar())<'A'|| (c>'Z' && c<'a') || c>'z')
if (c==EOF) return 0;
i=0;
do {
t[i++]=c;// shouldn't be like (char)c
} while ((c=getchar())>='A' && c<='Z' || c>='a' && c<='z');
//keep the word in heap memory
if ( c==EOF)
return 0;
t[i++]='\0';
if ((p=(char *)malloc(i))==0)
{
printf(" not enough memory\n");
exit(1);
}
strcpy(p,t);
return p;
}
The getchar() function returns an int type; and it is important to use an int to capture the getchar() return value. This is due to if getchar() fails, it returns an (int)(EOF)(as per chux comment. When it successfully returns, it will return a value that is suitable for a char.
The question code is building a char string or array, one char at a time:
t[i++]=c;
The above line could be written:
t[i++]=(char)c;
Either is suitable due to the compiler automatically converting the first case.
The mixture of char and int is fairly simple: EOF is intended as a file that can be distinguished from any value you could have read from the file.
To support that, you need to initially read the data from the file into something larger than a char, so it can accommodate at least one value that couldn't possibly have come from the file. The type they chose for that purpose was int.
So, you read a character from the file, into an int. You compare that to EOF to see if it's really a character that came from the file or not. If (and only if) you verify that it really came from the file, you save the value into a char, because you now know that's what it really represents.
That said, I'd consider it pretty poor code as it stands right now. Just for one particularly obvious example, instead of the c<'A'|| (c>'Z' && c<'a') || c>'z') type of code, you almost certainly want to use isalpha(c) instead.
It's also a lot easier to do this with scanf instead.
You can assign any int value to a char. Only the lowest 8 bits will be used. A cast would be more "correct" in terms of communicating your intent - people might not otherwise remember that anything larger than an 8-bit value will get truncated and results are likely to be unexpected.
Note that since you didn't say "unsigned char t[255]" that you actually get 7 bits and the most significant (8th) bit will be interpreted as a sign. So for example if you were to say
char t = 0xFF;
then you would in fact get -1 assigned to t.
If you assign numbers > 0xFF then all bits higher than the 8th bit will get stripped. So if you were to say:
char t = 0x101;
Than in fact you'd get the value 1 assigned to t.
The code in question is correct because getchar() returns an int and -1 is an error value so it's important to check it. For non-error cases the return will fit in an 8-bit char.
Related
The function std::isdigit is:
int isdigit(int ch);
The return (Non-zero value if the character is a numeric character, zero otherwise.) smells like the function was inherited from C, but even that does not explain why the parameter type is int not char while at the same time...
The behavior is undefined if the value of ch is not representable as
unsigned char and is not equal to EOF.
Is there any technical reason why isdigitstakes an int not a char?
The reaons is to allow EOF as input. And EOF is (from here):
EOF integer constant expression of type int and negative value
The accepted answer is correct, but I believe the question deserves more detail.
A char in C++ is either signed or unsigned depending on your implementation (and, yet, it's a distinct type from signed char and unsigned char).
Where C grew up, char was typically unsigned and assumed to be an n-bit byte that could represent [0..2^n-1]. (Yes, there were some machines that had byte sizes other than 8 bits.) In fact, chars were considered virtually indistinguishable from bytes, which is why functions like memcpy take char * rather than something like uint8_t *, why sizeof char is always 1, and why CHAR_BITS isn't named BYTE_BITS.
But the C standard, which was the baseline for C++, only promised that char could hold any value in the execution character set. They might hold additional values, but there was no guarantee. The source character set (basically 7-bit ASCII minus some control characters) required something like 97 values. For a while, the execution character set could be smaller, but in practice it almost never was. Eventually there was an explicit requirement that a char be large enough to hold an 8-bit byte.
But the range was still uncertain. If unsigned, you could rely on [0..255]. Signed chars, however, could--in theory--use a sign+magnitude representation that would give you a range of [-127..127]. Note that's only 255 unique values, not 256 values ([-128..127]) like you'd get from two's complement. If you were language lawyerly enough, you could argue that you cannot store every possible value of an 8-bit byte in a char even though that was a fundamental assumption throughout the design of the language and its run-time library. I think C++ finally closed that apparent loophole in C++17 or C++20 by, in effect, requiring that a signed char use two's complement even if the larger integral types use sign+magnitude.
When it came time to design fundamental input/output functions, they had to think about how to return a value or a signal that you've reached the end of the file. It was decided to use a special value rather than an out-of-band signaling mechanism. But what value to use? The Unix folks generally had [128..255] available and others had [-128..-1].
But that's only if you're working with text. The Unix/C folks thought of textual characters and binary byte values as the same thing. So getc() was also for reading bytes from a binary file. All 256 possible values of a char, regardless of its signedness, were already claimed.
K&R C (before the first ANSI standard) didn't require function prototypes. The compiler made assumptions about parameter and return types. This is why C and C++ have the "default promotions," even though they're less important now than they once were. In effect, you couldn't return anything smaller than an int from a function. If you did, it would just be converted to int anyway.
The natural solution was therefore to have getc() return an int containing either the character value or a special end-of-file value, imaginatively dubbed EOF, a macro for -1.
The default promotions not only mandated a function couldn't return an integral type smaller than an int, they also made it difficult to pass in a small type. So int was also the natural parameter type for functions that expected a character. And thus we ended up with function signatures like int isdigit(int ch).
If you're a Posix fan, this is basically all you need.
For the rest of us, there's a remaining gotcha: If your chars are signed, then -1 might represent a legitimate character in your execution character set. How can you distinguish between them?
The answer is that functions don't really traffic in char values at all. They're really using unsigned char values dressed up as ints.
int x = getc(source_file);
if (x != EOF) { /* reached end of file */ }
else if (0 <= x && x < 128) { /* plain 7-bit character */ }
else if (128 <= x && x < 256) {
// Here it gets interesting.
bool b1 = isdigit(x); // OK
bool b2 = isdigit(static_cast<char>(x)); // NOT PORTABLE
bool b3 = isdigit(static_cast<unsigned char>(x)); // CORRECT!
}
I have written the following code to test if the given input is a digit or not.
#include<iostream>
#include<ctype.h>
#include<stdio.h>
using namespace std;
main()
{
char c;
cout<<"Please enter a digit: ";
cin>>c;
if(isdigit(c)) //int isdigit(int c) or char isdigit(char c)
{
cout<<"You entered a digit"<<endl;
}
else
{
cout<<"You entered a non-digit value"<<endl;
}
}
My question is: what should be the input variable type? char or int?
The situation is unfortunately a bit more complex than has been told by the other answers.
First of all: the first part of your code is correct (disregarding multiple-byte encodings); if you want to read a single char with cin, you'll have to use a char variable with >> operator.
Now, about isdigit: why does it take an int instead of a char?
It all comes from C; isdigit and its companion were born to be used along with functions like getchar(), which read a character from the stream and return an int. This in turn was done to provide the character and an error code: getchar() can return EOF (which is defined as some implementation-defined negative constant) through its return code to signify that the input stream has ended.
So, the basic idea is: negative = error code; positive = actual character code.
Unfortunately, this poses interoperability problems with "regular" chars.
Short digression: char ultimately is just an integral type with a very small range, but a particularly stupid one. In most occasions - when working with bytes or character codes - you'd want it to be unsigned by default; OTOH, for coherency reasons with other integral types (int, short, long, ...), you may say that the right thing would be that plain char should be signed. The Standard chose the most stupid way: plain char is either signed or unsigned, depending from whatever the implementor of the compiler decides1.
So, you have to be prepared for char being either signed or unsigned; in most implementations it's signed by default, which poses a problem with the getchar() arrangement above.
If char is used to read bytes and is signed it means that all bytes with the high bit set (AKA bytes that, read with an unsigned 8-bit type would be >127) turn out to be negative values. This obviously isn't compatible with the getchar() using negative values for EOF - there could be overlap between actual "negative" characters and EOF.
So, when C functions talk about receiving/providing characters into int variables the contract is always that the character is assumed to be a char that has been cast to an unsigned char (so that it is always positive, negative values overflowing into the top half of its range) and then put into an int. Which brings us back to the isdigit function, which, along its companion functions, has this contract as well:
The header <ctype.h> declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.
(C99, §7.4, ¶1)
So, long story short: your if should be at the very least:
if(isdigit((unsigned char)c))
The problem is not just a theoretical one: several widespread C library implementations use the provided value straight as an index into a lookup table, so negative values will read into unallocated memory and segfault your program.
Also, you are not taking into account the fact that the stream may be closed, and thus >> returning without touching your variable (which will be at an uninitialized value); to take this into account, you should check if the stream is still in a valid state before working on c.
Of course this is a bit of an unfair rant; as #Pete Becker noted in the comment below, it's not like they were all morons, but just that the standard mostly tried to be compatible with existing implementations, which were probably evenly split between unsigned and signed char. Traces of this split can be found in most modern compilers, which can generally change the signedness of char through command line options (-fsigned-char/-funsigned-char for gcc/clang, /J in VC++).
If you want to read a single character and check whether it is a digit or not then it should be char.
If you set it as int then multiple characters will be read and the result of isDigit will always be true.
I am currently working through Accelerated C++ and have come across an issue in exercise 2-3.
A quick overview of the program - the program basically takes a name, then displays a greeting within a frame of asterisks - i.e. Hello ! surrounded framed by *'s.
The exercise - In the example program, the authors use const int to determine the padding (blank spaces) between the greeting and the asterisks. They then ask the reader, as part of the exercise, to ask the user for input as to how big they want the padding to be.
All this seems easy enough, I go ahead ask the user for two integers (int) and store them and change the program to use these integers, removing the ones used by the author, when compiling though I get the following warning;
Exercise2-3.cpp:46: warning: comparison between signed and unsigned integer expressions
After some research it appears to be because the code attempts to compare one of the above integers (int) to a string::size_type, which is fine. But I was wondering - does this mean I should change one of the integers to unsigned int? Is it important to explicitly state whether my integers are signed or unsigned?
cout << "Please enter the size of the frame between top and bottom you would like ";
int padtopbottom;
cin >> padtopbottom;
cout << "Please enter size of the frame from each side you would like: ";
unsigned int padsides;
cin >> padsides;
string::size_type c = 0; // definition of c in the program
if (r == padtopbottom + 1 && c == padsides + 1) { // where the error occurs
Above are the relevant bits of code, the c is of type string::size_type because we do not know how long the greeting might be - but why do I get this problem now, when the author's code didn't get the problem when using const int? In addition - to anyone who may have completed Accelerated C++ - will this be explained later in the book?
I am on Linux Mint using g++ via Geany, if that helps or makes a difference (as I read that it could when determining what string::size_type is).
It is usually a good idea to declare variables as unsigned or size_t if they will be compared to sizes, to avoid this issue. Whenever possible, use the exact type you will be comparing against (for example, use std::string::size_type when comparing with a std::string's length).
Compilers give warnings about comparing signed and unsigned types because the ranges of signed and unsigned ints are different, and when they are compared to one another, the results can be surprising. If you have to make such a comparison, you should explicitly convert one of the values to a type compatible with the other, perhaps after checking to ensure that the conversion is valid. For example:
unsigned u = GetSomeUnsignedValue();
int i = GetSomeSignedValue();
if (i >= 0)
{
// i is nonnegative, so it is safe to cast to unsigned value
if ((unsigned)i >= u)
iIsGreaterThanOrEqualToU();
else
iIsLessThanU();
}
else
{
iIsNegative();
}
I had the exact same problem yesterday working through problem 2-3 in Accelerated C++. The key is to change all variables you will be comparing (using Boolean operators) to compatible types. In this case, that means string::size_type (or unsigned int, but since this example is using the former, I will just stick with that even though the two are technically compatible).
Notice that in their original code they did exactly this for the c counter (page 30 in Section 2.5 of the book), as you rightly pointed out.
What makes this example more complicated is that the different padding variables (padsides and padtopbottom), as well as all counters, must also be changed to string::size_type.
Getting to your example, the code that you posted would end up looking like this:
cout << "Please enter the size of the frame between top and bottom";
string::size_type padtopbottom;
cin >> padtopbottom;
cout << "Please enter size of the frame from each side you would like: ";
string::size_type padsides;
cin >> padsides;
string::size_type c = 0; // definition of c in the program
if (r == padtopbottom + 1 && c == padsides + 1) { // where the error no longer occurs
Notice that in the previous conditional, you would get the error if you didn't initialize variable r as a string::size_type in the for loop. So you need to initialize the for loop using something like:
for (string::size_type r=0; r!=rows; ++r) //If r and rows are string::size_type, no error!
So, basically, once you introduce a string::size_type variable into the mix, any time you want to perform a boolean operation on that item, all operands must have a compatible type for it to compile without warnings.
The important difference between signed and unsigned ints
is the interpretation of the last bit. The last bit
in signed types represent the sign of the number, meaning:
e.g:
0001 is 1 signed and unsigned
1001 is -1 signed and 9 unsigned
(I avoided the whole complement issue for clarity of explanation!
This is not exactly how ints are represented in memory!)
You can imagine that it makes a difference to know if you compare
with -1 or with +9. In many cases, programmers are just too lazy
to declare counting ints as unsigned (bloating the for loop head f.i.)
It is usually not an issue because with ints you have to count to 2^31
until your sign bit bites you. That's why it is only a warning.
Because we are too lazy to write 'unsigned' instead of 'int'.
At the extreme ranges, an unsigned int can become larger than an int.
Therefore, the compiler generates a warning. If you are sure that this is not a problem, feel free to cast the types to the same type so the warning disappears (use C++ cast so that they are easy to spot).
Alternatively, make the variables the same type to stop the compiler from complaining.
I mean, is it possible to have a negative padding? If so then keep it as an int. Otherwise you should probably use unsigned int and let the stream catch the situations where the user types in a negative number.
The primary issue is that underlying hardware, the CPU, only has instructions to compare two signed values or compare two unsigned values. If you pass the unsigned comparison instruction a signed, negative value, it will treat it as a large positive number. So, -1, the bit pattern with all bits on (twos complement), becomes the maximum unsigned value for the same number of bits.
8-bits: -1 signed is the same bits as 255 unsigned
16-bits: -1 signed is the same bits as 65535 unsigned
etc.
So, if you have the following code:
int fd;
fd = open( .... );
int cnt;
SomeType buf;
cnt = read( fd, &buf, sizeof(buf) );
if( cnt < sizeof(buf) ) {
perror("read error");
}
you will find that if the read(2) call fails due to the file descriptor becoming invalid (or some other error), that cnt will be set to -1. When comparing to sizeof(buf), an unsigned value, the if() statement will be false because 0xffffffff is not less than sizeof() some (reasonable, not concocted to be max size) data structure.
Thus, you have to write the above if, to remove the signed/unsigned warning as:
if( cnt < 0 || (size_t)cnt < sizeof(buf) ) {
perror("read error");
}
This just speaks loudly to the problems.
1. Introduction of size_t and other datatypes was crafted to mostly work,
not engineered, with language changes, to be explicitly robust and
fool proof.
2. Overall, C/C++ data types should just be signed, as Java correctly
implemented.
If you have values so large that you can't find a signed value type that works, you are using too small of a processor or too large of a magnitude of values in your language of choice. If, like with money, every digit counts, there are systems to use in most languages which provide you infinite digits of precision. C/C++ just doesn't do this well, and you have to be very explicit about everything around types as mentioned in many of the other answers here.
or use this header library and write:
// |notEqaul|less|lessEqual|greater|greaterEqual
if(sweet::equal(valueA,valueB))
and don't care about signed/unsigned or different sizes
If there are non-number characters in a string and you call atoi [I'm assuming wtoi will do the same]. How will atoi treat the string?
Lets say for an example I have the following strings:
"20234543"
"232B"
"B"
I'm sure that 1 will return the integer 20234543. What I'm curious is if 2 will return "232." [Thats what I need to solve my problem]. Also 3 should not return a value. Are these beliefs false? Also... if 2 does act as I believe, how does it handle the e character at the end of the string? [Thats typically used in exponential notation]
You can test this sort of thing yourself. I copied the code from the Cplusplus reference site. It looks like your intuition about the first two examples are correct, but the third example returns '0'. 'E' and 'e' are treated just like 'B' is in the second example also.
So the rules are
On success, the function returns the converted integral number as an int value.
If no valid conversion could be performed, a zero value is returned.
If the correct value is out of the range of representable values, INT_MAX or INT_MIN is returned.
According to the standard, "The functions atof, atoi, atol, and atoll need not affect the value of the integer expression errno on an error. If the value of the result cannot be represented, the behavior is undefined." (7.20.1, Numeric conversion functions in C99).
So, technically, anything could happen. Even for the first case, since INT_MAX is guaranteed to be at least 32767, and since 20234543 is greater than that, it could fail as well.
For better error checking, use strtol:
const char *s = "232B";
char *eptr;
long value = strtol(s, &eptr, 10); /* 10 is the base */
/* now, value is 232, eptr points to "B" */
s = "20234543";
value = strtol(s, &eptr, 10);
s = "123456789012345";
value = strtol(s, &eptr, 10);
/* If there was no overflow, value will contain 123456789012345,
otherwise, value will contain LONG_MAX and errno will be ERANGE */
If you need to parse numbers with "e" in them (exponential notation), then you should use strtod. Of course, such numbers are floating-point, and strtod returns double. If you want to make an integer out of it, you can do a conversion after checking for the correct range.
atoi reads digits from the buffer until it can't any more. It stops when it encounters any character that isn't a digit, except whitespace (which it skips) or a '+' or a '-' before it has seen any digits (which it uses to select the appropriate sign for the result). It returns 0 if it saw no digits.
So to answer your specific questions: 1 returns 20234543. 2 returns 232. 3 returns 0. The character 'e' is not whitespace, a digit, '+' or '-' so atoi stops and returns if it encounters that character.
See also here.
If atoi encounters a non-number character, it returns the number formed up until that point.
I tried using atoi() in a project, but it wouldn't work if there were any non-digit characters in the mix and they came before the digit characters - it'll return zero. It seems to not mind if they come after the digits, for whatever reason.
Here's a pretty bare bones string to int converter I wrote up that doesn't seem to have that problem (bare bones in that it doesn't work with negative numbers and it doesn't incorporate any error handling, but it might be helpful in specific instances). Hopefully it might be helpful.
int stringToInt(std::string newIntString)
{
unsigned int dataElement = 0;
unsigned int i = 0;
while ( i < newIntString.length())
{
if (newIntString[i]>=48 && newIntString[i]<=57)
{
dataElement += static_cast<unsigned int>(newIntString[i]-'0')*(pow(10,newIntString.length()-(i+1)));
}
i++;
}
return dataElement;
}
I blamed myself up to this atoi-function behaviour when I was learning-approached coding program with function calculating integer factorial result given input parameter by launching command line parameter.
atoi-function returns 0 if value is something else than numeral value and "3asdf" returns 3. C -language handles command line input parameters in char -array pointer variable as we all already know.
I was told that down at the book "Linux Hater's Handbook" there's some discussion appealing for computer geeks doesn't really like atoi-function, it's kind of foolish in reason that there's no way to check validity of given input type.
Some guy asked me why I don't brother to use strtol -function located on stdlib.h -library and he gave me an example attached to my factorial-calculating recursive method but I don't care about factorial result is bigger than integer primary type value -range, out of ranged (too large base number). It will result in negative values in my program.
I solved my problem with atoi-function first checking if given user's input parameter is truly numerical value and if that matches, after then I calculate the factorial value.
Using isdigit() -function located on chtype.h -library is following:
int checkInput(char *str[]) {
for (int x = 0; x < strlen(*str); ++x)
{
if (!isdigit(*str[x])) return 1;
}
return 0;
}
My forum-pal down in other Linux programming forum told me that if I would use strtol I could handle the situations with out of ranged values or even parse signed int to unsigned long -type meaning -0 and other negative values are not accepted.
It's important upper on my code check if charachter is not numerical value. Negotation way to check this one the function returns failed results when first numerical value comes next to check in string. (or char array in C)
Writing simple code and looking to see what it does is magical and illuminating.
On point #3, it won't return "nothing." It can't. It'll return something, but that something won't be useful to you.
http://www.cplusplus.com/reference/clibrary/cstdlib/atoi/
On success, the function returns the converted integral number as an int value.
If no valid conversion could be performed, a zero value is returned.
If the correct value is out of the range of representable values, INT_MAX or INT_MIN is returned.
Lets us consider this snippet:
int s;
scanf("%c",&s);
Here I have used int, and not char, for variable s, now for using s for character conversion safely I have to make it char again because when scanf reads a character it only overwrites one byte of the variable it is assigning it to, and not all four that int has.
For conversion I could use s = (char)s; as the next line, but is it possible to implement the same by subtracting something from s ?
What you've done is technically undefined behaviour. The %c format calls for a char*, you've passed it an int* which will (roughly speaking) be reinterpreted. Even assuming that the pointer value is still good after reinterpreting, storing an arbitrary character to the first byte of an int and then reading it back as int is undefined behaviour. Even if it were defined, reading an int when 3 bytes of it are uninitialized, is undefined behaviour.
In practice it probably does something sensible on your machine, and you just get garbage in the top 3 bytes (assuming little-endian).
Writing s = (char)s converts the value from int to char and then back to int again. This is implementation-defined behaviour: converting an out-of-range value to a signed type. On different implementations it might clean up the top 3 bytes, it might return some other result, or it might raise a signal.
The proper way to use scanf is:
char c;
scanf("%c", &c);
And then either int s = c; or int s = (unsigned char)c;, according to whether you want negative-valued characters to result in a negative integer, or a positive integer (up to 255, assuming 8-bit char).
I can't think of any good reason for using scanf improperly. There are good reasons for not using scanf at all, though:
int s = getchar();
Are you trying to convert a digit to its decimal value? If so, then
char c = '8';
int n = c - '0';
n should 8 at this point.
That's probably not a good idea; GCC gives me a warning for that code:
main.c:10: warning: format ‘%c’ expects type ‘char *’, but
argument 2 has type ‘int *’
In this case you're ok since you're passing a pointer to more space than you need (for most systems), but what if you did it the other way around? Could be crash city. If you really want to do something like what you have there, just do the typecast or mask it - the mask will be endian-dependent.
As written this won't work reliably . The argument, &s, to scanf is a pointer to int and scanf is expecting a pointer to char. The two data type (int and char) have different sizes (at least on most architectures) so the data may get put in the wrong spot in memeory, and the other part of s may not get properly cleared.
The answers suggesting manipulation of the result after using a pointer to int rely on unspecified behavior (i.e. that scanf will put the character value it has in the least significant byte of the int you're pointing to), and are not safe.
Not but you could use the following:
s = s & 0xFF
That will blank out all of the data except the first byte. But in general all these ideas (and the ones above) are bad ideas, since not all systems store the lowest part of the integer in memory first. So if you ever have to port this code to a big endian system, you'll be screwed.
True, you may never have to port the code, but why write unportable code to begin with?
See this for more info:
http://en.wikipedia.org/wiki/Endianness