If I write:
char lili [3];
cout<<strlen(lili)<<endl;
then what is printed is : 11
but if I write:
char lili [3];
lili [3]='\0';
cout<<strlen(lili)<<endl;
then I get 3.
I don't understand why it returns 11 on the first part?
Isn't strlen supposed to return 3, since I allocated 3 chars for lili?
It is because strlen works with "C-style" null terminated strings. If you give it a plain pointer or uninitialised buffer as you did in your first example, it will keep marching through memory until it a) finds a \0, at which point it will return the length of that "string", or b) until it reaches a protected memory location and generates an error.
Given that you've tagged this C++, perhaps you should consider using std::array or better yet, std::string. Both provide length-returning functions (size()) and both have some additional range checking logic that will help prevent your code from wandering into uninitialised memory regions as you're doing here.
The strlen function searches for a byte set to \0. If you run it on an uninitialized array then the behavior is undefined.
You have to initialize your array first. Otherwise there is random data in it.
strlen is looking for a string termination sign and will count until it finds it.
strlen calculates the number of characters till it reaches '\0' (which denotes "end-of-string").
In C and C++ char[] is equivalent to char *, and strlen uses lili as a pointer to char and iterates the memory pointed to by it till it reaches the terminating '\0'. It just so happened that there was 0 byte in memory 11 bytes from the memory allocated for your array. You could have got much stranger result.
In fact, when you write lili[3] = '\0'
you access memory outside your array. The valid indices for 3-element array in C/C++ are 0-2.
Related
I am recreating the string class using char arrays. My problem is, when I allocate memory for a larger array, it generates an array that is completely the wrong size.
For example:
int allocated = 4;
char * reservedString = new char[allocated];
cout << strlen(reservedString);
Instead of creating a character array of size 4, reservedString points to a character array with 14 spots containing random characters.
This is what the debug shows me. Reserved string is now the wrong size with a bunch of random characters in it. When I try to use strcpy or strcpy_s it is writing memory out of bounds because the new array sizes are wrong.
How can I create a char array with an unknown length, which is provided by a variable, that is right size.
I can not use the std::string class or std::vector.
When you are creating an object with the new operator, your data remains not initialized. The code you provide is basically an array of bytes.
The documentation about strlen says:
computes the length of the string str up to, but not including the terminating null character.
There is no null terminator here.
You should do:
int allocated = 4;
char * reservedString = new char[allocated]();
This will initialize your array and set all the elements to \0
strlen expects a null-terminated string, which means a string that ends in a null character (\0). You're passing to it a pointer pointing to newly allocated memory, which contains uninitialized values and reading it causes undefined behavior. So when strlen searches for a null character in order to determine the length of the string, stuff is going to go wrong.
You cannot determine the size of an array given only a pointer to it unless you know it's going to be terminated by a null character or something similar. So either properly initialize the array with a null-terminated string or keep track of the length yourself.
As I worked through the Lippman C++ Primer (5th ed, C++11), I came across this code:
char ca[] = {'C', '+', '+'}; //not null terminated
cout << strlen(ca) << endl; //disaster: ca isn't null terminated
Calling the library strlen function on ca, which is not null-terminated, results in undefined behavior. Lippman et al say that "the most likely effect of this call is that strlen will keep looking through the memory that follows ca until it encounters a null character."
A later exercise asks what the following code does:
const char ca[] = {'h','e','l','l','o'};
const char *cp = ca;
while (*cp) {
cout << *cp << endl;
++cp;
}
My analysis: ca is a char array that is not null-terminated. cp, a pointer to char, initially holds the address of ca[0]. The condition of the while loop dereferences pointer cp, contextually converts the resulting char value to bool, and executes the loop block only if the conversion results in 'true.' Since any non-null char converts to a bool value of 'true,' the loop block executes, incrementing the pointer by the size of a char. The loop then steps through memory, printing each char until a null character is reached. Since ca is not null-terminated, the loop may continue well past the address of ca[4], interpreting the contents of later memory addresses as chars and writing their values to cout, until it happens to come across a chunk of bits that happen to represent the null character (all 0's). This behavior would be similar to what Lippman et al suggested that strlen(ca) does in the earlier example.
However, when I actually execute the code (again compiling with g++ -std=c++11), the program consistently prints:
'h'
'e'
'l'
'l'
'o'
and terminates. Why?
Most likely explanation: On modern desktop/server operating systems like windows and linux, memory is zeroed out before it is mapped into the address space of a program. So as long as the program doesn't use the adjacent memory locations for something else, it will look like a null terminated string.
In your case, the adjacent bytes are probably just padding, as most variables are at least 4-Byte aligned.
As far as the language is concerned this is just one possible realization of undefined behavior.
Are list-initialized char arrays still null-terminated?
There is no implicit null-terminator.
A list-initialized char array contains a null-terminated string, if at least one of the characters is initialized with the null-terminator.
If none of the characters are the null-terminator, then the array does not contain a null-terminated string.
the program consistently prints ... and terminates. Why?
You analyzed that the array would be accessed out of bounds. Your analysis is correct. You should also know that accessing an array out of bounds has undefined behaviour. So, the answer to why does it behave like this is: Because the behaviour is undefined.
As I already mentioned, your analysis is correct. Only your (implied) assumption that when the memory is accessed out of bounds, the first value must be a non-zero value. That assumption is wrong, because it is not guaranteed.
can we declare size to a pointer
#include<iostream>
#include<cstring>
using namespace std;
int main()
{
char (*ptr)=new char[3];
strcpy(ptr,"ert");
cout<<ptr<<endl;
return 0;
}
what is the meaning of this line char *ptr=new char[3] if it allocates size to ptr.since i have given the size as 3 and the string as "ert"it has to show error since the string length is too long but it doesn't .can we allocate size to pointers if so how?
You need 4 characters:
char *ptr=new char[4];
strcpy(ptr,"ert");
One extra space for the nul terminator:
|e|r|t|\0|
It's not the size of the pointer that you've declared, but the size of the character array that the pointer points to.
strcpy() does not know the length of the array that the pointer points to - it just knows it's got a pointer to the first byte it can copy into, and trusts that you know there's enough room for the copy to be made. Thus it's very fast, but it's also rather dangerous and should be used only when you're sure the destination is large enough.
strncpy() is worth looking into for some extra safety, but you still have to know that the target pointer points to something large enough for the size you specify (it protects more against the size of the source than the size of the target).
The lesson to learn here is that C and C++ won't give you any help - the compiler trusts you to get your buffer sizes right, and won't do any checking on your behalf either at compile time or runtime. This allows programs to run extremely fast (no runtime checking) but also requires the programmer to be a lot more careful. If you're writing in C++ which your tags suggest, for normal string handling you should definitely be using the std::string class unless you have a specific reason to need C-style string handling. You may well have such a reason from time to time, but don't do it unless you have to.
This statement
char (*ptr)=new char[3];
at first allocates in the heap unnamed character array with 3 elements and then the address of the first element of the array is assigned to pointer ptr.
The size of the pointer will not be changed whether you initialize it as in the statement above or the following way
char (*ptr)=new char;
that is sizeof( ptr ) will be the same and equal usually either to 4 or 8 bytes depending on the environment where the program will be compiled.
C++ does not check bounds of arrays. So in this statement
strcpy(ptr,"ert");
you have undefined behaviour of the program because string literal "ert" has four elements including the terminating zero.
I would appreciate for some C++ expertize advice on this please. I have a Char array
<unsigned char ch1[100];>
data (ASCII code) gets filled in this ( max 6 or 8 array spaces and rest is empty). I want to process valid bits in the array only either converting them to Hex or again Char array. I tried
<memcpy (ch1,ch2,sizeof(ch1))>
but all garbage values are also copied..... :(
<strcpy gives me an error>
also number of bytes copied are dynamic ( 1 time :- 4; 2 time :- 6.....)
Do you know how many valid bytes do you have in your array? If yes, you can pass that number in as the 3rd argument of memcpy.
Otherwise you can zero-initialize the array and use strcpy which will stop on the first zero:
char ch1[100];
// zero out the array so we'll know where to stop copying
memset(ch1, 0, sizeof(ch1));
... data gets filled here ....
strcpy (ch2, ch1);
// zero out array again so we'll catch the next characters that come in
memset(ch1, 0, sizeof(ch1));
... life goes on ...
So only copy the chars that are actually initialized. You as a programmer are responsible for tracking what's initialized and what's not.
char p[4]={'h','g','y'};
cout<<strlen(p);
This code prints 3.
char p[3]={'h','g','y'};
cout<<strlen(p);
This prints 8.
char p[]={'h','g','y'};
cout<<strlen(p);
This again prints 8.
Please help me as I can't figure out why three different values are printed by changing the size of the array.
strlen starts at the given pointer and advances until it reaches the character '\0'. If you don't have a '\0' in your array, it could be any number until a '\0' is reached.
Another way to reach the number you're looking for (in the case you've shown) is by using: int length = sizeof(p)/sizeof(*p);, which will give you the length of the array. However, that is not strictly the string length as defined by strlen.
As #John Dibling mentions, the reason that strlen gives the correct result on your first example is that you've allocated space for 4 characters, but only used 3; the remaining 1 character is automatically initialized to 0, which is exactly the '\0' character that strlen looks for.
Only your first example has a null terminated array of characters - the other two examples have no null termination, so you can't use strlen() on them in a well-defined manner.
char p[4]={'h','g','y'}; // p[3] is implicitly initialized to '\0'
char p[3]={'h','g','y'}; // no room in p[] for a '\0' terminator
char p[]={'h','g','y'}; // p[] implicitly sized to 3 - also no room for '\0'
Note that in the last case, if you used a string literal to initialize the array, you would get a null terminator:
char p[]= "hgy"; // p[] has 4 elements, last one is '\0'
That will get you a random number. strlen requires that strings be terminated with a '\0' to work.
try this:
char p[4]={'h','g','y', '\0'};
strlen is a standard library function that works with strings (in C sense of the term). String is defined as an array of char values that ends with a \0 value. If you supply something that is not a string to strlen, the behavior is undefined: the code might crash, the code might produce meaningless results etc.
In your examples only the first one supplies strlen with a string, which is why it works as expected. In the second and the third case, what you supply is not a string (not terminated with \0), which is why the results expectedly make no sense.
'\0' terminate your char buffer.
char p[4]={'h','g','y', '\0'};
This is because strlen() expects to find a null-terminator for the string. In this case, you don't have it, so strlen() keeps counting until it finds a \0 or gives a memory access violation and your program dies. RIP!