String array length C++ issue?

String array length C++ issue? - c++

I've understood that string arrays end with a '\0' symbol. So, the following code should print 0, 1, 2 and 3. (Notice I'm using a range-based for() loop).
$ cat app.cpp
#include <iostream>
int main(){
char s[]="0123\0abc";
for(char c: s) std::cerr<<"-->"<<c<<std::endl;
return 0;
}
But it does print the whole array, including '\0's.
$ ./app
-->0
-->1
-->2
-->3
-->
-->a
-->b
-->c
-->
$ _
What is happening here? Why is the string not considered to end with '\0'? Do C++ collections consider (I imagine C++11) strings differently than in classical C++?
Moreover, the number of characters in "0123\0abc" is 8. Notice the printout makes 9 lines!
(I know that std::cout<< runs fine, as well as strlen(), as well as for(int i=s; s[i]; i++), etc., I know about the end terminator, that's not the question!).

s is of type char [9], i.e. an array containing 9 chars (including the null terminator char '\0'). Ranged-based for loop just iterators over all the 9 elements, the null terminator char '\0' is not considered specially.
Executes a for loop over a range.
Used as a more readable equivalent to the traditional for loop
operating over a range of values, such as all elements in a container.
for(char c: s) std::cerr<<"-->"<<c<<std::endl; produces code prototype equivalent to
{
auto && __range = s ;
auto __begin = __range ; // get the pointer to the beginning of the array
auto __end = __range + __bound ; // get the pointer to the end of the array ( __bound is the number of elements in the array, i.e. 9 )
for ( ; __begin != __end; ++__begin) {
char c = *__begin;
std::cerr<<"-->"<<c<<std::endl;
}
}

When you declare a char[] as char s[] = "0123\0abc" (a string literal), s becomes a char[9]. The \0 is included because it needs space too.
The range-based for-loop you use does not consider the char[9] as anything else than an array containing char with the extent 9 and will happily provide every element in the array to the inner workings of your loop. The \0 is just one of the chars in this context.

Be aware that char not necessarily needs to define a character only – it can be used to store any arbitrary 8-bit value (on some machines, char is wider, though, encountered one with a 16-bit char already – then there's no int8_t available...), although signed char or unsigned char – according to specific needs – should be preferred, as signedness of char is implementation defined (or even better: int8_t or uint8_t from cstdint header, provided they are available).
So your string literal actually is just an array of nine integral values (just as if you had created an int-array, only the type usually is narrower). A range based for loop will iterate over all of these nine 8-bit integers, and you get the output in your example.
These integral values only get a special meaning in specific contexts (functions), such as printf, puts or even operator>>, where they are then interpreted as characters. When used as C-strings, a 0 value inside such an array marks the end of the string – but this 0-character still is part of that string. For illustration: puts might look like this:
int puts(char const* str)
{
while(!*str) // stops on encountering null character
{
char c = *str;
// + get pixel representation of c for console, e. g 'B' for 66
// + print this pixel representation to current console position
// + advance by one position on console
++str;
}
return 0; // non-negative for success, could return number of
// characters output as well...
}

Here s is an array of char, so it includes \0 too.
When you use for(char c: s), the loop will search all char in the array.
But in C, the definition tells us:
A string is a contiguous sequence of characters terminated by and including the first null character.
And
[...] The length of a string is the number of bytes preceding the null character and the value of a string is the sequence of the values of the contained characters...
So, when you use C standard functions to print the array s as a string, you will see the result that you wanted. Example: printf("%s", s);
"the number of characters in "0123\0abc" is 8. Notice the printout makes 9 lines!"
Again, printf("%s; Len = %d", s, strlen(s)); runs fine!

Related

char string doesn't work what the problem with "\0"

I don't know what the problem is? My code doesn't work. :(
#include <iostream>
#include <string.h>
using namespace std;
bool isPalindrome(char* word){
int len = strlen(word);
if(len <= 1){
return true;
}else
if(word[0] == word[len-1]){
char n[len-1];
n[len-1]= "\0";
return isPalindrome(n);
}
return false;
}
int main(){
char *a = "alla";
bool b = isPalindrome(a);
cout<<b<<endl;
return 0;}
The error is "\0" , I don't know why.
My main function should be not right too.

I believe "\0" is a null-terminated 2 character C string consisting of a zero followed by a zero. Use '\0' instead, which is just a zero. Remember: use double quotes for strings and single quotes for single characters. You cannot assign a string to a string index location, so double quotes don't work like that, but you can assign a character to a string index location.
2nd error: read the documentation on strlen(). Create n with char n[len + 1];, NOT char n[len-1];, in order to make n the same length as the other string. Then, null terminate with n[len]= '\0';, not n[len - 1]= '\0';, since strlen doesn't count the null terminator in the string. I'm confused by your code though: what is the purpose of this n string? Lastly, you're writing outside your n array as you have it written! Since you made n have size len - 1, you would need to null terminate at index len - 2. len - 1 in your case is outside the array! Always null terminate inside the array at the index 1 smaller than the size of the array. When the compiler knows the side of the array, such as is the case with n, do it like this instead: n[sizeof(n) - 1] = '\0';.
You can't use a variable to set an array length in C by the way, and in C++ it may require len to be const to instantiate an array.

The Simple answer is that you should
use '\0' instead of "\0"
as n is a character array that you should use single quotation instead of double quotation
Hope this will Help

The assignment of element of character array should be with character only but in your case you are assigning to string. Note that ''/0'' is string. Try using assigning to '/0'.

setting char arrays equal to eachother with isdigit and isalpha

Im trying to set a char array equal to 2 other arrays depending on if the element in the first array is a number or a letter. The code makes logical sense to me but the output for the 2 other strings after the for loop doesn't correspond to the logic. Is it because of a missing null value somewhere in the other 2 loops or is the code itself invalid? arrayAlpha, arrayNum, and palind are all char arrays set to a length of 30 elements while string length was already determined before the for loop began.
for(int k=0; k<=stringLength; k++)
{
if( isalpha(palind[k])){
arrayAlpha[k]=palind[k];}
if ( isdigit(palind[k]))
{
arrayNum[k]=palind[k];
}
}

Given the input:
char palind[30] = "12345abcde";
arrayAlpha is garbage.
arrayNum is "12345"
However,
char palind[30] = "abcde12345";
arrayAlpha is "abcde".
arrayNum is garbage.
Thus, [k] is the problem when used in your arrayNum or arrayAlpha which doesn't start with 0.
Simple change will just be subtracting the length of the other.
arrayAlpha[k - strlen(arrayNum)] = palind[k];
arrayNum[k - strlen(arrayAlpha)] = palind[k];
since lengthOfPalind = lengthOfArrayAlpha + lengthOfArrayNum assuming palind only contains letters or numbers.

C++: How does a for loop like this one work (using pointers)?

I just came across this for loop in a reference book but I'm not really sure what's going on in the comparison since it's just a lone pointer.
char input[300], *p, *q[300], **r = q;
cin.getline(input, 300);
for (p = input; *p; p++)
How would it work?

input is a null-terminated string. cin.getline will place a 0 after the last character of the read string or at input[299] if the line exceeds 299 characters.
The char value gets implicitely converted to a boolean. It becomes true if it's not zero and consequently false if it's zero. Thus the loop condition is equivalent to *p != 0.
Therefor the loop will iterate over the array until it comes across a zero, the end of the string.

The second expression in a for statement is evaluated for "truthiness". A character is true if is not equal to 0. The for loop could be changed to the following equivalent one:
for (p = input; *p != 0; p++)
Since strings are null-terminated, this is how one iterates through the characters in a string, and stop at the end.

Garbage characters in C

Edited question
I understood my mistake in the code I had given in the original question, and the characters I was getting are garbage characters. Although, I still have a few questions about garbage characters in C:
Why can't the character be copied?
Do garbage characters have some pattern? Meaning that can you predict for an empty string what character can come, for an empty integer what will come, and so on.
When a variable is declared, why does it have a garbage character instead of being blank? Is there a specific reason of storing it with a garbage character?
For a string which is not null-terminated, will the same garbage character be printed on every OS? If yes, which one?
Are there the same garbage characters on every OS? Or are they different?
Is there a way to print these characters on the stdout buffer in C / C++?
If you see carefully in the character , there are some characters and numbers in it. Do they represent something?
Is there a list of garbage characters which can be printed in C / C++?
Original Question
Title of original question: Mysterious character output in C
I had come across this code in K & R:
int scanline (char str [], int lim) /* Line will be read in 'str []', while lim is the maximum characters to be read */
{
int c, len, j; /* 'len' will have the length of the read string */
j = 0; /* Initializing 'j' */
for (len = 0; (c = getchar ()) != EOF && c != '\n'; ++len) /* Reading a character one by one, till the user enters '\n', and checking for failure of 'getchar' */
{
if (len < (lim -2)) /* Checking that string entered has not gone beyond it's boundaries. '-2' for '\n' and '\0' */
{
str [j] = c; /* Copying read character into 'string [j]' */
++ j; /* Incrementing 'j' by 1 */
}
}
if (c == '\n') /* Checking if user has finished inputting the line */
{
str [j] = c; /* Copying newline into string */
++j;
++ len;
}
return len; /* Returning number of characters read */
}
In the K & R, it is known as getline, but I made changes, added comments, and thus defined it as scanline. To test this, I made a demo program:
#include <mocl/cancel.h>
int main (int argc, char **argv)
{
int len;
char str [50];
len = scanline (str, 50);
printf ("len = %d\n str = %s\n", len, str);
return 0;
}
The required headers and the function was in my own library, cancel.h. Then when I compiled my program, it was successful. Although, when I ran the executable, I got unexpected output (I cannot type it as I get a character which when I copy, it just gets pasted as 'm'):
The mysterious character is which when I copy, gets copied as the letter m. Also, when I run my program with different inputs, I get different mysterious outputs:
In another case, I get perfect output, just that a blank line is printed:
I had also come across this question, in which the user gets the same symbol.
What have I done till now?
I searched a lot, and I could not find any clue about this character, but if you see carefully, in the second image, I get more characters when I enter "hi this is ashish". One of them is the slash, and one is . But I get another character . I got this link which was showed how to reproduce it, and explained it, although I could not understand. When you run the code given there, you get a lot of characters, and one of them is . Although, even the author of that article could not copy it, and has not posted it. So here's the output:
That was the actual output, as that's not clear, here's a cut out version:
So basically I got to know that both the characters and are extended characters from a string. At that point, I actually figured out what was causing the problem in scanline.
The lines
if (c == '\n') /* Checking if user has finished inputting the line */
{
str [j] = c; /* Copying newline into string */
++j;
++ len;
}
were causing the problems as you were copying a newline into the string. It worked, but I'm not sure why, as doing that was just a guess. I searched but still could not find the reason.
My Questions
How does removing those lines make the program work properly?
What are the characters and ? What are they supposed to do and how did they appear over here?
Are there any more characters like this?
Why can't those characters be copied?
Is it Undefined Behavior?

There's some confusion here regarding the term garbage characters. What it refers to is any byte that resides in a variable that wasn't assigned in some well-defined way. The character A can be a garbage character if it happens to appear in (for example) a block of memory returned by malloc or an uninitialized char variable.
This is distinct from unprintable characters which are any character that does not have a well-defined representation when printed as characters. For example, ASCII codes 0 - 31 and 127 (0 - 1F and 7F hex) are control characters and therefore unprintable. There are also multibyte characters for which a particular terminal may not know how to render them.
To get into your specific questions:
Why can't the character (image) be copied?
As an unprintable character, its screen representation is not well defined. So attempting to copy and paste it from a terminal will yield unexpected results.
Do garbage characters have some pattern? Meaning that can you
predict for an empty string what character can come, for an empty
integer what will come, and so on.
The nature of garbage characters is that their contents are undefined. Trying to predict what uninitialized data will contain is a futile effort. The same piece of code compiled with two different compilers (or the same compiler with different optimization settings) can have completely different contents for any uninitialized data.
The standard doesn't say what values should go there, so implementations are free to handle it however they want. They could chose to leave whatever values happen to be at those memory addresses, they could choose to write 0 to all addresses, they could choose to write the values 0, 1, 2, 3, etc. in sequence. In other words, the contents are undefined.
When a variable is declared, why does it have a garbage character
instead of being blank? Is there a specific reason of storing it with
a garbage character?
Global variables and static local variables are initialized with all bytes zero, which is what the standard dictates. That is something that is done easily at compile time. Local variables on the other hand reside on the stack. So their values are whatever happens to be on the stack at the time the function is called.
Here's an interesting example:
void f1()
{
char str[10];
strcpy(str, "hello");
}
int main()
{
f1();
f1();
return 0;
}
Here is what a particular implementation might do:
The first time f1 is called, the local variable str is uninitialized. Then strcpy is called which copies in the string "hello". This takes up the first 6 bytes of the variable (5 for the string and 1 for the null terminator). The remaining 4 bytes are still garbage. When this functions returns, the memory that the variable str resided at is free to be used for some other purpose.
Now f1 gets called again immediately after the first call. Since no other function was called, the stack for this invocation of f1 happens to sit at the exact same place as the last invocation. So if you were to examine str at this time, you would find it contains h, e, l, l, o, and a null byte (i.e. the string "hello") for the first 6 bytes. But, this string is garbage. It wasn't specifically stored there. If some other function was called before calling f1 a second time, most likely those values would not be there.
Again, garbage means the contents are undefined. The compiler doesn't explicitly put "garbage" (or unprintable characters) in variables.
For a string which is not null-terminated, will the same garbage
character be printed on every OS? If yes, which one?
Here's one of those places you're confusing garbage and unprintable. In your specific case, the garbage character happens to be unprintable, but it doesn't have to be. Here's another example:
void f3()
{
char str1[5], str2[5];
strcpy(str1, "hello");
strcpy(str2, "test");
printf("str1=%s\n", str1);
}
Let's suppose the compiler decides to place str2 immediately after str1 in memory (although it doesn't have to). The first call to strcpy will write the string "hello" into str1, but this variable doesn't have enough room the the null terminating byte. So it gets written to the next byte in memory, which happens to be the first byte of str2. Then when the next call to strcpy runs, it puts the string "test" in str2 but in doing so it overwrites the null terminating byte put there when str1 was written to.
Then when printf gets called, you'll get this as output:
str1=hellotest
When printing str1, printf looks for the null terminator, but there isn't one inside of str1. So it keeps reading until it does. In this case there happens to be another string right after it, so it prints that as well until it finds the null terminator that was properly stored in that string.
But again, this behavior is undefined. A seemingly minor change in this function could result in str2 appearing in memory first. The compiler is free to do as it wishes in the regard, so there's no way to predict what will happen.
Are there the same garbage characters on every OS? Or are they
different?
I believe you're actually referring to unprintable characters in this case. This really depends on the character set of the OS and/or terminal in question. For example, Chinese characters are represented with multiple bytes. If your terminal can't print Chinese characters, you'll see some type of code similar to what you saw for each of the bytes. But if it can, it will display it in a well-defined manner.
Is there a way to print these characters on the stdout buffer in C /
C++?
Not as characters. You can however print out their numerical representations. For example:
void f4()
{
char c;
printf("c=%02hhX\n", (unsigned char)c);
}
The contents of c are undefined, but the above will print whatever value happens to be there in hexadecimal format.
If you see carefully in the character (image),
there are some characters and numbers in it. Do they represent
something?
Some terminals will display unprintable characters by printing a box containing the Unicode codepoint of the character so the reader can know what it is.
Unicode is a standard for text where each character is assigned a numerical code point. Besides the typical set of characters in the ASCII range, Unicode also defines other characters, such as accented letters, other alphabets like Greek, Hebrew, Cyrillic, Chinese, and Japanese, as well as various symbols. Because there are thousands of characters defined by Unicode, multiple bytes are needed to represent them. The most common encoding for Unicode is UTF-8, which allows regular ASCII characters to be encoded with one byte, and other characters to be encoded with two or more bytes as needed.
In this case, the codepoint in question is 007F. This is the DELETE control character, which is typically generated when the Delete key is pressed. Since this is a control character, your terminal is displaying it as a box with the Unicode point for the character instead of attempting to "print" it.
Is there a list of garbage characters which can be printed in C /
C++?
Again, assuming you really mean unprintable characters here, that has more to do with the terminal that's displaying the characters that with the language. Generally, control characters are unprintable, while certain multibyte characters may or may not display properly depending on the font / character set of the terminal.

For starters the function returns incorrect value of len. Let's assume that lim is equal to 2.
In this case in the loop there will be written nothing in the array due to the condition
if (len < (lim -2))
However after the first iteration of the loop len will be increased.
for (len = 0; (c = getchar ()) != EOF && c != '\n'; ++len)
^^^^^
In the second iteration again there will be written nothing in the array diue to the same condition
if (len < (lim -2))
but len will be increased.
for (len = 0; (c = getchar ()) != EOF && c != '\n'; ++len)
^^^^^
Thus nothing will be written in the array but len will be increased until for example the new line character will be encountered.
So the function is invalid. Moreover it is supposed that the function will append the read string with the terminating zero. But this is not done in the function. So you may not output the character array as a string.
The function can be written the following way
int scanline( char str [], int lim )
{
int len = 0;
int c;
while ( len < lim - 1 && ( c = getchar () ) != EOF && c != '\n' )
{
str[len++] = c;
}
if ( len < lim - 1 && c == '\n' ) str[len++] = c;
if ( len < lim ) str[len++] = '\0';
return len;
}

remove non alphabet characters from string c++ [duplicate]

This question already has answers here:
How to strip all non alphanumeric characters from a string in c++?
(12 answers)
Closed 6 years ago.
I'm trying to remove all non alphabet characters from an inputed string in c++ and don't know how to. I know it probably involves ascii numbers because that's what we're learning about. I can't figure out how to remove them. We only learned up to loops and haven't started arrays yet. Not sure what to do.
If the string is Hello 1234 World&*
It would print HelloWorld

If you use std::string and STL, you can:
string s("Hello 1234 World&*");
s.erase(remove_if(s.begin(), s.end(), [](char c) { return !isalpha(c); } ), s.end());
http://ideone.com/OIsJmb
Note: If you want to be able to handle strings holding text in just about any language except English, or where programs use a locale other than the default, you can use isalpha(std::locale).
PS: If you use a c-style string such as char *, you can convert it to std::string by its constructor, and convert back by its member function c_str().

If you're working with C-style strings (e.g. char* str = "foobar") then you can't "remove" characters from a string trivially (as a string is just a sequence of characters stored sequentially in memory - removing a character means copying bytes forward to fill the empty space used by the deleted character.
You'd have to allocate space for a new string and copy characters into it as-needed. The problem is, you have to allocate memory before you fill it, so you'd over-allocate memory unless you do an initial pass to get a count of the number of characters remaining in the string.
Like so:
void BlatentlyObviousHomeworkExercise() {
char* str = "someString";
size_t strLength = ... // how `strLength` is set depends on how `str` gets its value, if it's a literal then using the `sizeof` operator is fine, otherwise use `strlen` (assuming it's a null-terminated string).
size_t finalLength = 0;
for(size_t i = 0; i < strLength; i++ ) {
char c = str[i]; // get the ith element of the `str` array.
if( IsAlphabetical(c) ) finalLength++;
}
char* filteredString = new char[ finalLength + 1 ]; // note I use `new[]` instead of `malloc` as this is C++, not C. Use the right idioms :) The +1 is for the null-terminator.
size_t filteredStringI = 0;
for(size_t i = 0; i < strLength; i++ ) {
char c = str[i];
if( IsAlphabetical(c) ) filteredString[ filteredStringI++ ] = c;
}
filteredString[ filteredStringI ] = '\0'; // set the null terminator
}
bool IsAlphabet(char c) { // `IsAlphabet` rather than `IsNonAlphabet` to avoid negatives in function names/behaviors for simplicity
return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
}

I do not want to spoil the solution so I will not type out the code, only describe the solution. For your problem think of iterating through your string. Start with that. Then you need to decide if the currently selected character is part of the alphabet or not. You can do this numerous different ways. Checking ASCII values? Comparing against a string of the alphabet? Once you decide if it is a letter, then you need to rebuild the new string with that letter plus the valid letters before and after that you found or will find. Finally you need to display your new string.

If you look at an ascii table, you can see that A-Z is between 65-90 and a-z is between 97-122.
So, assuming that you only need to remove those characters (not accentuated), and not other characters from other languages for example, not represented in ascii, all you would need to do is loop the string, verify if each char is in these values and remove it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

String array length C++ issue? - c++

Related

char string doesn't work what the problem with "\0"

setting char arrays equal to eachother with isdigit and isalpha

C++: How does a for loop like this one work (using pointers)?

Garbage characters in C

remove non alphabet characters from string c++ [duplicate]

Categories

Resources