Consider
char *p_c = new char['1', '2', '3', '4'];
Is this syntax correct? If yes, then what does it do?
I don’t know why, but compiler allows this syntax! What will it do with regards to memory? I am not able to access the variable by *p_c. How does one determine the size of and the number of elements present?
Your code is syntactically valid C++, if rather strange, and I don't think it does what you intended:
new char['1', '2', '3', '4'] is evaluated as new char['4'] due to the way the comma operator works. (The preceding elements are evaluated from left to right, but the value of the expression is that of the rightmost element.)
So your statement is equivalent to char *p_c = new char['4'];
'4' is a char type with a numeric value that depends on the encoding that your platform uses (ASCII, EBCDIC &c. although the former is most likely on a desktop system.).
So the number of elements in the array is whatever '4' evaluates to when converted to a size_t. On an ASCII system the number of elements would be 52.
The syntax for the new expression you used is something like:
identifier = new Type[<expression>];
In the <expression> above, C++ allows any expression whose result is convertible to std::size_t. And for your own expression, you used the comma operator.
<expression> := '1', '2', '3', '4'
which will evaluate every item in the comma list and return the last, which is '4', and that result will be converted to its std::size_t value, probably (52); So, the code is equivalent to:
char* p_c = new char['4'];
char *p_c = new char['1', '2', '3', '4'];
is functionally equivalent to:
char *p_c = new char['4'];
because of comma operator. Comma operator evaluates its operant left to right and discards them except the last one (the right most one).
The character literal '4' has value 52 in ASCII (but your system doesn't have to use ASCII and neither is it required by C or C++ standards-- but almost all modern systems do use ASCII).
So, it's as if you used:
char *p_c = new char[52];
Related
I'm trying to convert a string to lowercase, and am treating it as a char* and iterating through each index. The problem is that the tolower function I read about online is not actually converting a char to lowercase: it's taking char as input and returning an integer.
cout << tolower('T') << endl;
prints 116 to the console when it should be printing T.
Is there a better way for me to convert a string to lowercase?
I've looked around online, and most sources say to "use tolower and iterate through the char array", which doesn't seem to be working for me.
So my two questions are:
What am I doing wrong with the tolower function that's making it return 116 instead of 't' when I call tolower('T')
Are there better ways to convert a string to lowercase in C++ other than using tolower on each individual character?
That's because there are two different tolower functions. The one that you're using is this one, which returns an int. That's why it's printing 116. That's the ASCII value of 't'. If you want to print a char, you can just cast it back to a char.
Alternatively, you could use this one, which actually returns the type you would expect it to return:
std::cout << std::tolower('T', std::locale()); // prints t
In response to your second question:
Are there better ways to convert a string to lowercase in C++ other than using tolower on each individual character?
Nope.
116 is indeed the correct value, however this is simply an issue of how std::cout handles integers, use char(tolower(c)) to achieve your desired results
std::cout << char(tolower('T')); // print it like this
It's even weirder than that - it takes an int and returns an int. See http://en.cppreference.com/w/cpp/string/byte/tolower.
You need to ensure the value you pass it is representable as an unsigned char - no negative values allowed, even if char is signed.
So you might end up with something like this:
char c = static_cast<char>(tolower(static_cast<unsigned char>('T')));
Ugly isn't it? But in any case converting one character at a time is very limiting. Try converting 'ß' to upper case, for example.
To lower is int so it returns int. If you check #include <ctype> you will see that definition is int tolower ( int c ); You can use loop to go trough string and to change every single char to lowe case. For example
while (str[i]) // going trough string
{
c=str[i]; // ging c value of current char in string
putchar (tolower(c)); // changing to lower case
i++; //incrementing
}
the documentation of int to_lower(int ch) mandates that ch must either be representable as an unsigned char or must be equal to EOF (which is usually -1, but don't rely on that).
It's not uncommon for character manipulation functions that have been inherited from the c standard library to work in terms of ints. There are two reasons for this:
In the early days of C, all arguments were promoted to int (function prototypes did not exist).
For consistency these functions need to handle the EOF case, which for obvious reasons cannot be a value representable by a char, since that would mean we'd have to lose one of the legitimate encodings for a character.
http://en.cppreference.com/w/cpp/string/byte/tolower
The answer is to cast the result to a char before printing.
e.g.:
std::cout << static_cast<char>(std::to_lower('A'));
Generally speaking to convert an uppercase character to a lowercase, you only need to add 32 to the uppercase character as this number is the ASCII code difference between lowercase and uppercase characters, e.g., 'a'-'A'=97-67=32.
char c = 'B';
c += 32; // c is now 'b'
printf("c=%c\n", c);
Another easy way would be to first map the uppercase character to an offset within the range of English alphabets 0-25 i.e. 'a' is index '0' and 'z' is index '25' inclusive and then remap it to a lowercase character.
char c = 'B';
c = c - 'A' + 'a'; // c is now 'b'
printf("c=%c\n", c);
In the following example:
cout<<"\n"[a==N];
I have no clue about what the [] option does in cout, but it does not print a newline when the value of a is equal to N.
I have no clue about what the [] option does in cout
This is actually not a cout option, what is happening is that "\n" is a string literal. A string literal has the type array of n const char, the [] is simply an index into an array of characters which in this case contains:
\n\0
note \0 is appended to all string literals.
The == operator results in either true or false, so the index will be:
0 if false, if a does not equal N resulting in \n
1 if true, if a equals N resulting in \0
This is rather cryptic and could have been replaced with a simple if.
For reference the C++14 standard(Lightness confirmed the draft matches the actual standard) with the closest draft being N3936 in section 2.14.5 String literals [lex.string] says (emphasis mine):
string literal has type “array of n const char”, where n is the
size of the string as defined below, and has static storage duration
(3.7).
and:
After any necessary concatenation, in translation phase 7 (2.2),
’\0’ is appended to every string literal so that programs that scan a string can find its end.
section 4.5 [conv.prom] says:
A prvalue of type bool can be converted to a prvalue of type int, with
false becoming zero and true becoming one.
Writing a null character to a text stream
The claim was made that writing a null character(\0) to a text stream is undefined behavior.
As far as I can tell this is a reasonable conclusion, cout is defined in terms of C stream, as we can see from 27.4.2 [narrow.stream.objects] which says:
The object cout controls output to a stream buffer associated with the object stdout, declared in
<cstdio> (27.9.2).
and the C11 draft standard in section 7.21.2 Streams says:
[...]Data read in from a text stream will necessarily compare equal to the data
that were earlier written out to that stream only if: the data consist only of printing
characters and the control characters horizontal tab and new-line;
and printing characters are covered in 7.4 Character handling <ctype.h>:
[...]the term control character
refers to a member of a locale-specific set of characters that are not printing
characters.199) All letters and digits are printing characters.
with footnote 199 saying:
In an implementation that uses the seven-bit US ASCII character set, the printing characters are those
whose values lie from 0x20 (space) through 0x7E (tilde); the control characters are those whose
values lie from 0 (NUL) through 0x1F (US), and the character 0x7F (DEL).
and finally we can see that the result of sending a null character is not specified and we can see this is undefined behavior from section 4 Conformance which says:
[...]Undefined behavior is otherwise
indicated in this International Standard by the words ‘‘undefined behavior’’ or by the
omission of any explicit definition of behavior.[...]
We can also look to the C99 rationale which says:
The set of characters required to be preserved in text stream I/O are those needed for writing C
programs; the intent is that the Standard should permit a C translator to be written in a maximally
portable fashion. Control characters such as backspace are not required for this purpose, so their
handling in text streams is not mandated.
cout<<"\n"[a==N];
I have no clue about what the [] option does in cout
In C++ operator Precedence table, operator [] binds tighter than operator <<, so your code is equivalent to:
cout << ("\n"[a==N]); // or cout.operator <<("\n"[a==N]);
Or in other words, operator [] does nothing directly with cout. It is used only for indexing of string literal "\n"
For example for(int i = 0; i < 3; ++i) std::cout << "abcdef"[i] << std::endl; will print characters a, b and c on consecutive lines on the screen.
Because string literals in C++ are always terminated with null character('\0', L'\0', char16_t(), etc), a string literal "\n" is a const char[2] holding the characters '\n' and '\0'
In memory layout this looks like:
+--------+--------+
| '\n' | '\0' |
+--------+--------+
0 1 <-- Offset
false true <-- Result of condition (a == n)
a != n a == n <-- Case
So if a == N is true (promoted to 1), expression "\n"[a == N] results in '\0' and '\n' if result is false.
It is functionally similar (not same) to:
char anonymous[] = "\n";
int index;
if (a == N) index = 1;
else index = 0;
cout << anonymous[index];
valueof "\n"[a==N] is '\n' or '\0'
typeof "\n"[a==N] is const char
If the intention is to print nothing (Which may be different from printing '\0' depending on platform and purpose), prefer the following line of code:
if(a != N) cout << '\n';
Even if your intention is to write either '\0' or '\n' on the stream, prefer a readable code for example:
cout << (a == N ? '\0' : '\n');
It's probably intended as a bizarre way of writing
if ( a != N ) {
cout<<"\n";
}
The [] operator selects an element from an array. The string "\n" is actually an array of two characters: a new line '\n' and a string terminator '\0'. So cout<<"\n"[a==N] will print either a '\n' character or a '\0' character.
The trouble is that you're not allowed to send a '\0' character to an I/O stream in text mode. The author of that code might have noticed that nothing seemed to happen, so he assumed that cout<<'\0' is a safe way to do nothing.
In C and C++, that is a very poor assumption because of the notion of undefined behavior. If the program does something that is not covered by the specification of the standard or the particular platform, anything can happen. A fairly likely outcome in this case is that the stream will stop working entirely — no more output to cout will appear at all.
In summary, the effect is,
"Print a newline if a is not equal to N. Otherwise, I don't know. Crash or something."
… and the moral is, don't write things so cryptically.
It is not an option of cout but an array index of "\n"
The array index [a==N] evaluates to [0] or [1], and indexes the character array represented by "\n" which contains a newline and a nul character.
However passing nul to the iostream will have undefined results, and it would be better to pass a string:
cout << &("\n"[a==N]) ;
However, the code in either case is not particularly advisable and serves no particular purpose other than to obfuscate; do not regard it as an example of good practice. The following is preferable in most instances:
cout << (a != N ? "\n" : "") ;
or just:
if( a != N ) cout << `\n` ;
Each of the following lines will generate exactly the same output:
cout << "\n"[a==N]; // Never do this.
cout << (a==N)["\n"]; // Or this.
cout << *((a==N)+"\n"); // Or this.
cout << *("\n"+(a==N)); // Or this.
As the other answers have specified, this has nothing to do with std::cout. It instead is a consequence of
How the primitive (non-overloaded) subscripting operator is implemented in C and C++.
In both languages, if array is a C-style array of primitives, array[42] is syntactic sugar for *(array+42). Even worse, there's no difference between array+42 and 42+array. This leads to interesting obfuscation: Use 42[array] instead of array[42] if your goal is to utterly obfuscate your code. It goes without saying that writing 42[array] is a terrible idea if your goal is to write understandable, maintainable code.
How booleans are transformed to integers.
Given an expression of the form a[b], either a or b must be a pointer expression and the other; the other must be an integer expression. Given the expression "\n"[a==N], the "\n" represents the pointer part of that expression and the a==N represents the integer part of the expression. Here, a==N is a boolean expression that evaluates to false or true. The integer promotion rules specify that false becomes 0 and true becomes 1 on promotion to an integer.
How string literals degrade into pointers.
When a pointer is needed, arrays in C and C++ readily degrade into a pointer that points to the first element of the array.
How string literals are implemented.
Every C-style string literal is appended with the null character '\0'. This means the internal representation of your "\n" is the array {'\n', '\0'}.
Given the above, suppose a==N evaluates to false. In this case, the behavior is well-defined across all systems: You'll get a newline. If, on the other hand, a==N evaluates to true, the behavior is highly system dependent. Based on comments to answers to the question, Windows will not like that. On Unix-like systems where std::cout is piped to the terminal window, the behavior is rather benign. Nothing happens.
Just because you can write code like that doesn't mean you should. Never write code like that.
I'm reading the C++ Primer Plus by Stephen Prata. He gives this example:
char dog[8] = { 'b', 'e', 'a', 'u', 'x', ' ', 'I', 'I'}; // not a string!
char cat[8] = {'f', 'a', 't', 'e', 's', 's', 'a', '\0'}; // a string!
with the comment that:
Both of these arrays are arrays of char, but only the second is a string.The null character
plays a fundamental role in C-style strings. For example, C++ has many functions that
handle strings, including those used by cout.They all work by processing a string character-
by-character until they reach the null character. If you ask cout to display a nice string
like cat in the preceding example, it displays the first seven characters, detects the null
character, and stops. But if you are ungracious enough to tell cout to display the dog array
from the preceding example, which is not a string, cout prints the eight letters in the
array and then keeps marching through memory byte-by-byte, interpreting each byte as a
character to print, until it reaches a null character. Because null characters, which really are
bytes set to zero, tend to be common in memory, the damage is usually contained quickly;
nonetheless, you should not treat nonstring character arrays as strings.
Now, if a declare my variables global, like this:
#include <iostream>
using namespace std;
char a[8] = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'};
char b[8] = {'1', '2', '3', '4', '5', '6', '7', '8'};
int main(void)
{
cout << a << endl;
cout << b << endl;
return 0;
}
the output will be:
abcdefgh12345678
12345678
So, indeed, the cout "keeps marching through memory byte-by-byte" but only to the end of the second character array. The same thing happens with any combination of char array. I'm thinking that all the other addresses are initialized to 0 and that's why the cout stop. Is this true? If I do something like:
for (int i = 0; i < 100; ++i)
{
cout << *(&a + i) << endl;
}
I'm getting mostly empty space at output (like 95%, perhaps), but not everywhere.
If, however, i declare my char arrays a little bit shorter, like:
char a[3] = {'a', 'b', 'c'};
char b[3] = {'1', '2', '3'};
keeping all other things the same, I'm getting the following output:
abc
123
Now the cout doesn't even get past the first char array, not to mention the second. Why is this happening? I've checked the memory addresses and they are sequential, just like in the first scenario. For example,
cout << &a << endl;
cout << &b << endl;
gives
003B903C
003B9040
Why is the behavior different in this case? Why doesn't it read beyond the first char array?
And, lastly if I do declare my variables inside main, then I do get the behavior suggested by Prata, namely, a lot of junk gets printed before, somewhere a null character is reached.
I'm guessing that in the first case, the char array is declared on the heap and that this is initialized to 0 (but not everywhere, why?) and cout behaves differently based on the length of the char array (why?)
I'm using Visual Studio 2010 for these examples.
It looks like your C++ compiler is allocating space in 4-byte chunks, so that every object has an address that is a multiple of 4 (the hex addresses in your dump are divisible by 4). Compilers like to do this because they like to make sure larger datatypes such as intand float (4 bytes wide) are aligned to 4-byte boundaries. Compilers like to do this because some kinds of computer hardware take longer to load/move/store unaligned int and float values.
In your first example, each array need 8 bytes of memory - a char fills a single byte - so the compiler allocates exactly 8 bytes. In the second example each array is 3 bytes, so the compiler allocates 4 bytes, fills the first 3 bytes with your data, and leaves the 4th byte unused.
Now in this second case it appears the unused byte was filled with a null which explains why cout stopped at the end of the string. But as others have pointed out, you cannot depend on unused bytes to be initialized to any particular value, so the behaviour of the program cannot be guaranteed.
If you change your sample arrays to have 4 bytes the program will behave as in the first example.
The contents of memory out of bounds is indeterminate. Accessing memory you do not own, even just for reading, leads to undefined behavior.
Its an undefined behaviour, you cannot say what can happen.
Try on some other system you may get different output.
The answer to your question is that it is an Undefined Behaviour and its output cannot be explained.
In addition to above explanantion, in your particular case, you have declared array globally.
Therefore in your second example a \0 is appended in the fourth byte of four-byte boundary as explained by Peter Raynham.
The '\0' is just a solution to tell how long is a string. Lets say you know how long it is by storing a value before the string.
But your case is when you intentionally leave it out the functions and normally your code as well will keep searching for the delimiter ( which is a null character ).
It is undefined what is behind the bounds of a specified memory it greatly varies.
In Mingw in debug mode with gdb its usually zeroed out, without gdb its just junk... altho this is just my experience.
For the locally declared variables they are usually on the stack so what you are reading, is probably your call stack.
Consider these two strings:
wchar_t* x = L"xy\x588xla";
wchar_t* y = L"xy\x588bla";
Upon reading this you would expect that both string literals are the same except one character - an 'x' instead of a 'b'.
It turns out that this is not the case. The first string compiles to:
y = {'x', 'y', 0x588, 'x', 'l', 'a' }
and the second is actually:
x = {'x', 'y', 0x588b, 'l', 'a' }
They are not even the same length!
Yes, the 'b' is eaten up by the hex representation ('\xNNN') character.
At the very least, this could cause confusion and subtle bugs for in hand-written strings (you could argue that unicode strings don't belong in the code body)
But the more serious problem, and the one I am facing, is in auto-generated code. There just doesn't seem to be any way to express this: {'x', 'y', 0x588, 'b', 'l', 'a' } as a literal string without resorting to writing the entire string in hex representation, which is wasteful and unreadable.
Any idea of a way around this?
What's the sense in the language behaving like this?
A simple way is to use compile time string literal concatenation, thus:
wchar_t const* y = L"xy\x588" L"bla";
For my book class I'm storing an ISBN number and I need to validate the data entered so I decided to use enumeration. (First three inputs must be single digits, last one must be a letter or digit.) However, I'm wondering if it is even possible to enumerate numbers. Ive tried putting them in as regular integers, string style with double quotes, and char style with single quotes.
Example:
class Book{
public:
enum ISBN_begin{
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'
};
enum ISBN_last{
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', a, b, c, d, e, f,
g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z};
The compiler error says expected identifier which means that its seeing the numbers as values for the identifiers and not identifiers themselves. So is enumerating digits possible?
I think you're going about this the wrong way...why not just use a simple regex that will validate the entire thing a bit more simply?
(yes, I know, that wasn't the original question, but it might make your life a lot easier.)
This page and this page provide some good examples on using regex to validate isbn numbers.
I think creating an enumeration whose values are equal to the entities they're enumerating...I think you're doing a lot more than you have to.
Why would you want to enumerate numbers? Enums exist to give numbers a name. If you want to store a number, store an integer - or in this case a char (as you also need characters to be stored). For validation, accept a string and write a function like this:
bool ISBN_Validate(string val)
{
if (!val.length == <length of ISBN>) return false;
if (val[0] < '0' || val[0] > '9') return false;
foreach (char ch in val)
{
if (ch is not between '0' and 'z') return false;
}
}
Easy - and no silly enumerations ;)
#include <ctype.h>
Don't forget the basics. The above include file gives you isalpha(), isdigit(), etc.
I would suggest using a string for each of the begin/end criteria, ie:
string BeginCriteria = "0123456789";
string EndCriteria = "0123456789abcd... so forth";
// Now to validate the input
// for the first three input characters:
if ( BeginCriteria.find( chInput ) != npos )
// Then its good.
// For the last 3 characters:
if ( EndCriteria.find( chInput ) != npos )
// Then its good.
enums really aren't what you want to use. They aren't sets like that. The members of an enum have to be symbols like variables or function names and you can give them values.
enum numbers { One = 1, Two, Three };
One after this is equivalent to a named constant with integer value 1. numbers is equivalent to a new type with a subrange of integer values.
What you probably want is to use a regular expression.
You would have to do something like this:
enum ISBN_begin {
ZERO, ONE, TWO // etc.
};
If you can't use regular expressions, why not just use an array of char instead? You could even use the same array, and just have a const index number where the ISBN_last chars begin in the array.
enums are not defined with literals, they are defined with variables
Some languages (Ada for one) allows what you want, so your request is not too silly. You are simply forgetting that in C and C++, character literals are just another form of integer literals (of type int in C, of type char in C++)