understand C++ - "character literal" vs "string literal" [duplicate] - c++

This question already has answers here:
Single quotes vs. double quotes in C or C++
(15 answers)
Closed 4 years ago.
i was reading a textbook that was talking about "character literal" vs "string literal." It said the following:
'A' is stored as 65
"A" is stored as 65 0
char letter;
letter = 'A' // this will work
letter = "A" // this will not work!
the textbooks explanation confused me. It said "because char variables are only large enough to hold one character, you cannot assign string literals to them." Can anyone explain further, its not clicking in my head. Thank you for your time

You should see this:
Single quotes vs. double quotes in C or C++
As everyone has said here, think about arrays.
A character is only one leter or digit or symbol and it is declared by simple quotes. However, when you are declaring with double quotes, you are actually indicating that is about a string or array. Thus, you should declare your variable like an array. For instance:
char letter[] = "A";
Or
char *letter = "A";
If you want a static Array, You could try something like this:
char letter[5] = {'H','E','L','L','O'};
If you want see another point view, you could read this:
http://www.cplusplus.com/doc/tutorial/ntcs/
Hope I was helpful.

What you might be missing is the fact that strings can be of arbitrary length. The compiler places the string somewhere in the program / memory the way you type it, but it needs to know where the string ends! This type of strings is known as zero- or null-terminated. It means simply that the string is the actual string data followed by a single byte with the value 0.
So in the example, 'A' is the character A. In memory, it may immediately be followed by some garbage / unrelated data, but it's fine, because the compiler knows to only ever use that one byte.
"A" is the string A. In memory, it must be followed by a null terminator, otherwise the program could get confused because there might be garbage data immediately after the string.

Think about strings as array of characters, where one element of this array is simply 'character literal'.

Related

C++ - How to parse a char* substring into int? [duplicate]

This question already has answers here:
How can I read and parse CSV files in C++?
(39 answers)
Closed 2 years ago.
I am working on a simulate file. The file has many rows with numbers, letters and characters.
Example:
GR123,7894.5444,A,4687.5643,P
GR456,1234.6556,A,9657.5686,P
GR789,2344.3422,A,9786.8465,P
GR987,6522.6354,A,3245.5754,P
I need to take the values before A and P (for the first row, 7894.5444 and 4687.5643).
How can I parse by position this string into an int? Already tried with:
double exampleA = stoi(row.substr(6,9));
double exampleP = stoi(row.substr(18,9));
But it gives me this error: request for member ‘substr’ in ‘row’, which is of non-class type ‘char*’
Also tried:
char exampleA[9];
char exampleP[9];
memcpy(&exampleA, &row[6],sizeof(exampleA));
memcpy(&exampleP, &row[6],sizeof(exampleP));
In order to convert the values after having separated them from the row but the memcpy buffer always brings the value of exampleP with exampleA like:
A : 7894.5444
P : 4687.5643?7894.5444
request for member ‘substr’ in ‘row’, which is of non-class type ‘char*’
Yeah, you can't call substr on a pointer. Perhaps you meant to store these things in a std::string instance?
As to your second attempt, there are multiple big problems with what you wrote:
The buffers you're using are 9 bytes long, but the largest number in your example looks to be 10 bytes (4 + 1 + 4 + 1). You're overflowing your buffer and overwriting random memory in your process. Edit: or you would be if you wrote proper null-terminated strings.
Related to the above, because it sort of hides the problem, don't use memcpy to copy strings. A much better first attempt would be strncpy to copy a substring to a buffer, though you still have to manually null-terminate it.
The reason why the P line shows both entries is that your compiler seems to put exampleA after exampleP on the stack, separated by a guard character, and since you don't null-terminate your strings, trying to display exampleP spills over into exampleA, which out of sheer luck seems to be followed by a null character.
Since the data seems to be in a C string anyway, you might as well use std::strtod which has the advantage of not requiring you to copy the substring:
char* endp;
double exampleA = std::strtod(row+6, &endp);
double exampleP = std::strtod(row+18, &endp);
In both cases, you might want to double check that *endp == ',' and that no conversion error was signalled.
You could use std::strtod even with a std::string, of course. It's often the most efficient solution.
Note that strtod is locale-aware, so this might not work if the input doesn't correspond to your current locale (eg. Files using period fir decimals and commas for field separators while the locale uses comma and semicolon, respectively).
I'm not sure what you mean by converting a floating point number to an integer (truncate? round? multiply by 10,000?). So I didn't address that part of the question.

Storing single characters

If I want to store a single character say 'c' am i better of using
std::string myChar = 'c';
rather than the built in char type?
char myChar = 'c';
Is there any safety gained by storing single characters as string?
There is a little safety gained as you won't accidentally use the string for calculations.
int a = 5+myChar;
Will give a compiler error if it is a string and wont if it's a char, because those are seen as numbers.
Please note, that the first example doesn't compile. It has to be
std::string myChar = "c";
(with double quotes). I see more disadvantages in this approach:
It will consume way more memory than required. With short-string optimizations the data will not be stored on the heap, but a string is still 3 words long (often 1 word is 4Byte, so that would be 12 Bytes) compared to one byte1 when using char.
The access to that char is really inconvenient, you would always have to use .front(), .back() or [0] to access that char.
It doesn't convey the meaning of your variables, it's like replacing all int-variables in your program with a std::vector<int> with a single element.
The only "safety" I can see, is as AlexGeorg already mentioned, you can't mistakenly use it in calculations. But that's it and this could also be seen as disadavantage.
So, no, your most likely not better of when using a string to store a single character. Except you have some really specific circumstances.
1plus maybe some padding.
The positive thing using string is the error at compile time when you trying to use variable for a mathematical expression example :
int sum = 15 + myChar;
You have instead some negative thing to take in consideration :
The first one is the performance, allocate a string is more expensive in term of memory occupation and time of execution.
The second one is that the String does not assure that the variable has a single character. So you have to pay attention when you use it.

C++ string copy() gives me extra characters at the end, why is that?

I am studying c++, In a blog they introduced the concept of copy function. When I tried the same in my system, the result is not matching to what I expected. Please let me know what I did wrong here in the below code.
#include <iostream>
main(){
std::string statement = "I like to work in Google";
char compName[6];
statement.copy(compName, 6, 18);
std::cout<<compName;
}
I expected Google but actual output is Googlex
I am using windows - (MinGW.org GCC-6.3.0-1)
You are confusing a sequence of characters, C style string, and std::string. Let's break them down:
A sequence of characters is just that, one character after another in some container (in your case a C style array). To a human being several characters may look like a string, but there is nothing in your code to make it such.
C style string is an array of characters terminated by a symbol \0. It is a carry over from C, as such a compiler will assume that if even if you don't tell it otherwise the array of characters may potentially be such a string.
C++ string (std::string) is a template class that stores strings. There is no need to worry how it does so internally. Although there are functions for interoperability with the first two categories, it is a completely different thing.
Now, let's figure out how a compiler sees your code:
char compName[6];
This creates an array of characters with enough space to store 6 symbols. You can write C style strings into it as long as they are 5 symbols or less, since you will need to also write '\0' at the end. Since in C++ C style arrays are unsafe, they will allow you to write more characters into them, but you cannot predict in advance where those extra characters will be written into memory (or even if your program will continue to execute). You can also potentially read more characters from the array... But you cannot even ask the question where that data will be coming from, unless you are simply playing around with your compiler. Never do that in your code.
statement.copy(compName, 6, 18);
This line writes 6 characters. It does not make it into a C style string, it is simply 6 characters in an array.
std::cout<<compName;
You are trying to output to the console a C style string... which you have not provided to a compiler. So a an operator<< receives a char [], and it assumes that you knew what you were doing and works as if you gave it C string. It displays one character after another until it reaches '\0'. When will it get such a character? I have no idea, since you never gave it one. But due to C style arrays being unsafe, it will have no problem trying to read characters past the end of an array, reading some memory blocks and thinking that they are a continuation of your non-existent C style sting.
Here you got "lucky" and you only got a single byte that appeared as an 'x', and then you got a byte with 0 written in it, and the output stopped. If you run your program at a different time, with a different compiler, or compiled with different optimisations you might get a completely different data displayed.
So what should you have done?
You can try this:
#include <iostream>
#include <string>
int main()
{
std::string statement = "I like to work in Google";
char compName[7]{};
statement.copy(compName, 6, 18);
std::cout<<compName;
return 0;
}
What did i change? I made an array able to hold 7 characters (leaving enough space for a C style string of 6 characters) and i have provided an empty initialisation list {}, which will fill the array with \0 characters. This means that when you will replace the first 6 of them with your data, there will be a terminating character in the very end.
Another approach would be to do this:
#include <iostream>
#include <string>
int main()
{
std::string statement = "I like to work in Google";
char compName[7];
auto length = statement.copy(compName, 6, 18);
compName[length] = '\0';
std::cout<<compName;
return 0;
}
Here i do not initialise the array, but i get the length of the data that is written there with a .copy method and then add the needed terminator in the correct position.
What approach is best depends on your particular application.
When inserting pointer to a character into the stream insertion operator, the pointer is required to point to null terminated string.
compName does not contain the null terminator character. Therefore inserting inserting (a pointer to an element of) it into a character stream violates the requirement above.
Please let me know what I did wrong here
You violate the requirement above. As a consequence, the behaviour of your program is undefined.
I expected Google but actual output is Googlex
This is because the behaviour of the program is undefined.
How to terminate it?
Firstly, make sure that there is room in the array for the null terminator character:
char compName[7];
Then, assign the null terminator character:
compName[6] = '\0';

What's the purpose of the char data type in C++? Why not just use strings? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I don't see the point of having the use of single quotes reserved for single characters. So, when is this used? i.e char a = 'a'; Coming from Javascript and PHP, I'm used to being able to use single quotes for entire strings, but after learning that single quotes are reserved for characters in C++, I'm curious to know why? Why would you need a single character?
PHP and JavaScript are languages that operate at quite a high level. This means that the basic types are essentially just a few different types, whose implementation is hidden inside a set of functions in the actual script engine.
C and C++, as well as most other low level languages expose more of "how the machine works". A string, in C, is a sequence of characters. If you want to deal with strings, you need to be able to deal with their components, which is char. A single character becomes useful when you want to build strings, compare the contents of strings, etc. Naturally, for normal string operations in C++, you'd use std::string, and then, like in script languages, most aspects of how the string is actually represented is hidden inside the std::string class implementation, so you don't really need to care about it. But if you were to "look inside" a std::string, it would somewhere sooner or later, become a char *, which is a pointer to a piece of memory that contains a sequence of characters, individual char elements.
One could look at it like going from having "ready made big lumps of Lego" to having only small pieces to work with. You can still build the same things, but it requires more pieces, and requires a bit more effort to construct. What you win is flexibility and speed. A char is really easy to deal with for the processor, where a single character in PHP is still represented as a string - it just happens to be one element long. As such, there is extra overhead in keeping track of this one character string, where it's stored, how long it is, etc, because the functionality in the language doesn't make any distinction between a single character and a string of a megabyte.
The purpose of C, and to a large degree also C++, is to closely represent the hardware. So your basic types are much closer to what the actual hardware representation is, and this is something you will need to learn more about if you are going to understand C and C++ well. Unfortunately, to cover ALL of that would be far beyond a single answer in SO. You will need to get yourself a good C and/or C++ book.
To run things faster and efficient you must learn how to use less space needed. When you have an 'a', this is actually a number (see ASCII table), but when you got "a" it is an array of 2 characters {'a','\0'}. The zero is to know when your string ends because the computer is not sure when the string ends. Do you want to add a length property like in javascript, to know directly the string's length? You use more space that may not be needed. Somehow you have to distinguish these two values to run efficient code. Learning C/C++ first you actually learn how things work on the low level of your computer and understand more than by learning php/javascript/ruby/python first. C++ is more customizable than higher level programming languages.
Javascript and PHP are scripting languages while C++ (and especially its predecessor C) is quite low-level native programming language where you have to consider how your variables are stored in memory.
char a = 'a'; creates a 8-bit-long numeric variable that can hold character value (ASCII code) and put the value of the character a into it. So char a = 97; does the same work.
const char* s = "a"; creates a null terminated string which is an array with two elements: the value of character a and the terminating 0 character (just number 0 or the character '\0'). The * means we create a pointer to the array. Its type is const char because it contains string literal which is constant. We could create an identical array using const char s[2] = { 97, 0 }; or const char s[2] = { 'a', '\0' };.
By the way, single quotes are not reserved exclusively for single characters. You can put a few characters into single quotes. See What do single quotes do in C++ when used on multiple characters?.
The language C++ inherited scalar types from the language C. So you should get a good book about C++ or C to get the details.
The type char is a numeric type. It's an integer that can old a character. It's usually signed but can be overwritten with the signed or unsigned prefix. Additionally the signedness can be configured at the compiler. The 'A' literal defines a number that is identical to the code of the character A.
The "a" string is an array of char's and has a zero termination. In the example you have two bytes 'a','\0'. When you use the literal the compiler passes the address as a pointer to the array. This pointer can be assigned to pointer variable
char *s = "A";
or passed to a function
foo("A");
Additionally there are all the pointer arithmentics possible that you can grasp when you got the meaning of a char array.
Edit Both the number literal 'a' and the char array "A" are const objects. It's obvious that you can't assign anything to a number like
'a' = 23; // wrong!
When you assigned the literal to variable you can change the variable later tough.
But when you stored the pointer in a pointer variable it's illegal and you get undefined behavior when you try to modify the char array:
char *s = "A";
*s = 'B'; // try to change the first byte of the char array, causes undefined behavior.
To express this it's good style to use a pointer to const char variable:
const char *s = "A";
*s = 'B'; // compiler diagnostic says: not allowed

What does it mean to be "terminated by a zero"?

I am getting into C/C++ and a lot of terms are popping up unfamiliar to me. One of them is a variable or pointer that is terminated by a zero. What does it mean for a space in memory to be terminated by a zero?
Take the string Hi in ASCII. Its simplest representation in memory is two bytes:
0x48
0x69
But where does that piece of memory end? Unless you're also prepared to pass around the number of bytes in the string, you don't know - pieces of memory don't intrinsically have a length.
So C has a standard that strings end with a zero byte, also known as a NUL character:
0x48
0x69
0x00
The string is now unambiguously two characters long, because there are two characters before the NUL.
It's a reserved value to indicate the end of a sequence of (for example) characters in a string.
More correctly known as null (or NUL) terminated. This is because the value used is zero, rather than being the character code for '0'. To clarify the distinction check out a table of the ASCII character set.
This is necessary because languages like C have a char data type, but no string data type. Therefore it is left to the devleoper to decide how to manage strings in their application. The usual way of doing this is to have an array of chars with a null value used to terminate (i.e. signify the end of) the string.
Note that there is a distinction between the length of the string, and the length of the char array that was originally declared.
char name[50];
This declares an array of 50 characters. However, these values will be uninitialised. So if I want to store the string "Hello" (5 characters long) I really don't want to bother setting the remaining 45 characters to spaces (or some other value). Instead I store a NUL value after the last character in my string.
More recent languages such as Pascal, Java and C# have a specific string type defined. These have a header value to indicate the number of characters in the string. This has a couple of benefits; firstly you don't need to walk to the end of the string to find out its length, secondly your string can contain null characters.
Wikipedia has further information in the String (computer science) entry.
Arrays and string in C is just a pointers to a memory location. By pointer you can find a start of array. The end of array is undefined. The end of character array (which is the string) is zero-byte.
So, in memory string hello is written as:
68 65 6c 6c 6f 00 |hello|
It refers to how C strings are stored in memory. The NUL character represented by \0 in string iterals is present at the end of a C string in memory. There is no other meta data associated with a C string like length for example. Note the different spelling between NUL character and NULL pointer.
There are two common ways to handle arrays that can have varying-length contents (like Strings). The first is to separately keep the length of the data stored in the array. Languages like Fortran and Ada and C++'s std::string do this. The disadvantage to doing this is that you somehow have to pass that extra information to everything that is dealing with your array.
The other way, is to reserve an extra non-data element at the end of the array to serve as a sentinel. For the sentinel you use a value that should never appear in the actual data. For strings, 0 (or "NUL") is a good choice, as that is unprintable and serves no other purpose in ASCII. So what C (and many languages copied from C) do is to assume that all strings end (or "are terminated by") a 0.
There are several drawbacks to this. For one thing, it is slow. Any time a routine needs to know the length of the string, it is an O(n) operation (searching through the entire string looking for the 0). Another problem is that you may one day want to put a 0 in your string for some reason, so now you need a whole second set of string routines that ignore the null and use a separate length anyway (eg: strnlen() ). The third big problem is that if someone forgets to put that 0 at the end (or it gets wiped out somehow), the next string operation to do a lenth check will go merrily marching through memory until it either happens to randomly find another 0, crashes, or the user loses patience and kills it. Such bugs can be a serious PITA to track down.
For all these reasons, the C approach is generally viewed with disfavor.
C-style strings are terminated by a NUL character ('\0'). This provides a marker for functions that operate on strings (e.g. strlen, strcpy) to use to identify the end of the string.
While the classic example of "terminated by a zero" is that of strings in C, the concept is more general. It can be applied to any list of things stored in an array, the size of which is not known explicitly.
The trick is simply to avoid passing around an array size by appending a sentinel value to the end of the array. Typically, some form of a zero is used, but it can be anything else (like a NAN if the array contains floating point values).
Here are three examples of this concept:
C strings, of course. A single zero character is appended to the string: "Hello" is encoded as 48 65 6c 6c 6f 00.
Arrays of pointers naturally allow zero termination, because the null pointer (the one that points to address zero) is defined to never point to a valid object. As such, you might find code like this:
Foo list[] = { somePointer, anotherPointer, NULL };
bar(list);
instead of
Foo list[] = { somePointer, anotherPointer };
bar(sizeof(list)/sizeof(*list), list);
This is why the execvpe() only needs three arguments, two of which pass arrays of user defined length. Since all that's passed to execvpe() are (possibly lots of) strings, this little function actually sports two levels of zero termination: null pointers terminating the string lists, and null characters terminating the strings themselves.
Even when the element type of the array is a more complex struct, it may still be zero terminated. In many cases, one of the struct members is defined to be the one that signals the end of the list. I have seen such function definitions, but I can't unearth a good example of this right now, sorry. Anyway, the calling code would look something like this:
Foo list[] = {
{ someValue, somePointer },
{ anotherValue, anotherPointer },
{ 0, NULL }
};
bar(list);
or even
Foo list[] = {
{ someValue, somePointer },
{ anotherValue, anotherPointer },
{} //C zeros out an object initialized with an empty initializer list.
};
bar(list);