Explanation of the Function that Defines strlen

Explanation of the Function that Defines strlen - c++

I'm learning about pointers in c++.
I have researched and found the manual function that defines strlen to be something like this.
int strlen(const char *a){
const char *b;
for (b=a;*b;++b);
return b-a;
}
Would anyone be able to explain this block of code in plain english? In particular, why is *b set as the terminating condition in the for loop?
This is not an answer to homework. It's just a question that arose while I was researching. Thanks.

This is a particularly terse piece of C code, with a for loop that does not have a body.
The idea is to set pointer b to the beginning of the string a, and keep advancing it until you hit character '\0', which indicates the end of the stirng (i.e. serves as null terminator). Nothing else needs to be done in that loop, hence its body is empty.
Once the loop is over, subtracting a from b yields the number of characters between the initial character of the string and its null terminator, i.e. the length of the string.
Here is a more readable way to write the same loop:
for (b=a ; *b != '\0' ; ++b) // Use explicit comparison to zero
; // Put semicolon on a separate line
When C expression is used in a statement that requires a logical expression, an implicit comparison to zero is applied. Hence, *b != '\0' is the same as *b.

In both C and C++ strings are really called null terminated byte strings. That null terminator is equal to zero. And in both C and C++ the value zero is equivalent to false.
What the loop does is to iterate until the "current character" (pointed to by b) becomes equal to the terminator.

Related

Char pointer giving me some really strange characters

When I run the example code, the wordLength is 7 (hence the output 7). But my char array gets some really weird characters in the end of it.
wordLength = word.length();
cout << wordLength;
char * wordchar = new char[wordLength]; //new char[7]; ??
for (int i = 0; i < word.length(); i++) //0-6 = 7
{
wordchar[i] = 'a';
}
cout << wordchar;
The output: 7 aaaaaaa²²²²¦¦¦¦¦ÂD╩2¦♀
Desired output is: aaaaaaa... What is the garbage behind it?? And how did it end up there?

You should add \0 at the end of wordchar.
char * wordchar = new char[wordLength +1];
//add chars as you have done
wordchar[wordLength] = `\0`
The reason is that C-strings are null terminated.

C strings are terminated with a '\0' character that marks their end (in contrast, C++ std::string just stores the length separately).
In copying the characters to wordchar you didn't terminate the string, thus, when operator<< outputs wordchar, it goes on until it finds the first \0 character that happens to be after the memory location pointed to by wordchar, and in the process it prints all the garbage values that happen to be in memory in between.
To fix the problem, you should:
make the allocated string 1 char longer;
add the \0 character at the end.
Still, in C++ you'll normally just want to use std::string.

Use: -
char * wordchar = new char[wordLength+1]; // 1 extra for null character
before for loop and
wordchar[i] ='\0'
after for loop , C strings are null terminated.
Without this it keeps on printing, till it finds the first null character,printing all the garbage values.

You avoid the trailing zero, that's the cause.
In C and C++ the way the whole eco-system treats string length is that it assumes a trailing zero ('\0' or simply 0 numerically). This is different then for example pascal strings, where the memory representation starts with the number which tells how many of the next characters comprise the particular string.
So if you have a certain string content what you want to store, you have to allocate one additional byte for the trailing zero. If you manipulate memory content, you'll always have to keep in mind the trailing zero and preserve it. Otherwise strstr and other string manipulation functions can mutate memory content when running off the track and keep on working on the following memory section. Without trailing zero strlen will also give a false result, it also counts until it encounters the first zero.
You are not the only one making this mistake, it often gets important roles in security vulnerabilities and their exploits. The exploit takes advantage of the side effect that function go off trail and manipulate other things then what was originally intended. This is a very important and dangerous part of C.
In C++ (as you tagged your question) you better use STL's std::string, and STL methods instead of C style manipulations.

C++ interview function [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How does “while(*s++ = *t++)” work?
I had the following question during an interview. Can someone please explain it to me?
void question( char *s, char *t)
{
while (*s++ = *t++);
}

It introduces a massive security vulnerability into your program. Do not write, or use, code like this under any circumstances.
If we break the code down, we get:
*t++ reads the character pointed to by t, and increments t; the expression's value is the character that was read.
*s++ = expression writes that character to where s points, and increments s; the expression's value is the character that was written.
while (expression); keeps looping as long as the expression's value is non-zero; in this case, until we wrote a character with the value zero.
So the function keeps copying characters from t to s until it reaches a zero-valued character. There is no way to tell whether s points to a large enough array to hold these, so in general it will write beyond the end of the array and cause undefined behaviour; anything from subtle behaviour with no unwanted effects, to a crash, to the execution of malicious code.
You can only call this function if you know in advance (an upper bound for) how many characters will be copied; if you know that, then there are (usually) more efficient ways to copy the data than to check the value of each. Therefore, you should (almost) never use this function, or the C library function (strcpy) that it approximates.
This use of a zero-valued character to terminate a string is a common idiom in C; in C++ it is usually more convenient to use the std::string class to represent strings instead. In that case, the equivalent code would be simply s = t, which would manage the strings' memory safely.

Copies the string, pointer by t to the memory, pointed by s.
operator= will return the assigned value. t is supposed to point to a NULL-terminated string and s should point to memory, large enough to store that string.
So, the while loop will stop when \0 is hit, which is the end of the string, pointed by t. During this while loop, all chars (different from \0) in t will be copied into s.
Expanded a little, it's the same as:
while( *t != '\0' ) // while the current char is not NULL
{
*s = *t; // copy it into s
++s; // increment s, to point to the next byte
++t; // increment t, to point to the next char, that will be copied
}
*s = *t; // copy the last char of t - the '\0'

It copies null-terminated string t into s. Semantics as strcpy.

Testing conditions using Character pointers

int main()
{
char *p,c;
for(p="Hello World";c=*p;++p)
{
printf("%c",c);
}
}
In the above code,i know that ++p will make pointer 'p' point to next character in the "Hello World".And i also know that there is no boundary checking performed on arrays in C or C++.The output of the program is 'Hello World'. How am i able to test conditions using
c=*p;
What does 'c=*p' return.As far as my understanding goes, when '++p' reaches the end of the 'hello world', pointer 'p' should point to some garbage value and the loop should print some garbage values.

c=*p; doesn't return anything, it's an expression. The for loop evaluates the value of c after the assignment.
when '++p' reaches the end of the 'hello world', pointer 'p' should point to some garbage value
Not really. Before reaching the end, it reaches the null terminating character - '\0'. Which is 0. Which is assigned to c and evaluates the break condition to false. So the loop finishes before it goes out of bounds.

C strings are by definition terminated by a NULL character '\0', if it is a string then it has to end it in a NULL. therefore c = *p will point to a NULL character when the string ends, which is in your case the immediately next character of 'd'. And the NULL character in the ASCII table has an integer value 0, which evaluates to false and gets out of the for loop.
Note that if a C string does not end in a NULL character (then at first it is not a C string), then basically there is no way of detecting that it is a string, as it will be stores as a sequence of bytes. In that case it will be simply a byte array or a string, will depend on how we interpret.
Also not that c = *p does not return anything, it is an expression and it is evaluated. e = *p transfers the value pointed by the current value of p into the var c, the value of which is the final evaluation of the expression.

Strings are terminated by a null character (aka with \0), so then at the end p should point to the null character that terminates the string, thus c would be \0 which is considered a false condition.

String going crazy if I don't give it a little extra room. Can anyone explain what is happening here?

First, I'd like to say that I'm new to C / C++, I'm originally a PHP developer so I am bred to abuse variables any way I like 'em.
C is a strict country, compilers don't like me here very much, I am used to breaking the rules to get things done.
Anyway, this is my simple piece of code:
char IP[15] = "192.168.2.1";
char separator[2] = "||";
puts( separator );
Output:
||192.168.2.1
But if I change the definition of separator to:
char separator[3] = "||";
I get the desired output:
||
So why did I need to give the man extra space, so he doesn't sleep with the man before him?

That's because you get a not null-terminated string when separator length is forced to 2.
Always remember to allocate an extra character for the null terminator. For a string of length N you need N+1 characters.
Once you violate this requirement any code that expects null-terminated strings (puts() function included) will run into undefined behavior.
Your best bet is to not force any specific length:
char separator[] = "||";
will allocate an array of exactly the right size.

Strings in C are NUL-terminated. This means that a string of two characters requires three bytes (two for the characters and the third for the zero byte that denotes the end of the string).
In your example it is possible to omit the size of the array and the compiler will allocate the correct amount of storage:
char IP[] = "192.168.2.1";
char separator[] = "||";
Lastly, if you are coding in C++ rather than C, you're better off using std::string.

If you're using C++ anyway, I'd recommend using the std::string class instead of C strings - much easier and less error-prone IMHO, especially for people with a scripting language background.

There is a hidden nul character '\0' at the end of each string. You have to leave space for that.
If you do
char seperator[] = "||";
you will get a string of size 3, not size 2.

Because in C strings are nul terminated (their end is marked with a 0 byte). If you declare separator to be an array of two characters, and give them both non-zero values, then there is no terminator! Therefore when you puts the array pretty much anything could be tacked on the end (whatever happens to sit in memory past the end of the array - in this case, it appears that it's the IP array).
Edit: this following is incorrect. See comments below.
When you make the array length 3, the extra byte happens to have 0 in it, which terminates the string. However, you probably can't rely on that behavior - if the value is uninitialized it could really contain anything.

In C strings are ended with a special '\0' character, so your separator literal "||" is actually one character longer. puts function just prints every character until it encounters '\0' - in your case one after the IP string.

In C, strings include a (invisible) null byte at the end. You need to account for that null byte.
char ip[15] = "1.2.3.4";
in the code above, ip has enough space for 15 characters. 14 "regular characters" and the null byte. It's too short: should be char ip[16] = "1.2.3.4";
ip[0] == '1';
ip[1] == '.';
/* ... */
ip[6] == '4';
ip[7] == '\0';

Since no one pointed it out so far: If you declare your variable like this, the strings will be automagically null-terminated, and you don't have to mess around with the array sizes:
const char* IP = "192.168.2.1";
const char* seperator = "||";
Note however, that I assume you don't intend to change these strings.
But as already mentioned, the safe way in C++ would be using the std::string class.

A C "String" always ends in NULL, but you just do not give it to the string if you write
char separator[2] = "||". And puts expects this \0 at the ned in the first case it writes till it finds a \0 and here you can see where it is found at the end of the IP address. Interesting enoiugh you can even see how the local variables are layed out on the stack.

The line: char seperator[2] = "||"; should get you undefined behaviour since the length of that character array (which includes the null at the end) will be 3.
Also, what compiler have you compiled the above code with? I compiled with g++ and it flagged the above line as an error.

String in C\C++ are null terminated, i.e. have a hidden zero at the end.
So your separator string would be:
{'|', '|', '\0'} = "||"

'\0' related issue

Looking at this loop that copies one c-string to another:
void strcpyr(char *s, char *t)
{
while(*s++=*t++)// Why does this work?
;
}
Why do we not check for the '\0' character in the while loop, like this?
while((*s++=*r++)!='\0')..
How does the first loop terminate?

The statement *s++=*t++ not only assigns the next character from t to s but also returns the current value of *t as the result of the expression. The while loop terminates on any false value, including '\0'.
Think of it this way. If you did:
char c = *s++ = *t++;
in addition to copying a char from *t to *s and incrementing both, it would also set c to the current value of *t.

When we hit the '\0' in the string initially pointed to by t, the *s++=*t++, which does the assignment, also returns the value that's assigned to the position pointed to by s, or '\0', which evaluates to false and terminates the loop.
In your second example, you explicitly rely on the fact that the assignment returns the assigned character, while the first example implicitly uses this fact (and the fact that the 0 character (also written '\0') is considered to be false, while all other characters evaluate to true, so the expression c != '\0' will yield the same result as c.

The loop is going to terminate because '\0' is effectively 0, and the what the "while" is evaluating is not a the result of an equality test (==), but the right-value of the assignment expression.

The reason we are not explicitly checking for zero is that in C 0 is false.
Therefore the loop
while(*s++=*t++)
;
will terminate when the character pointed to by t is 0.
-Adam

I think you mean to write this:
void strcpyr(char *s, char *t) {
while (*s++ = *t++);
}
The loop terminates when the value pointed to by "t" is zero. For C (and C++) loops and conditionals, any integer that is non-zero is true.

The while loop is testing the result of the assignment.
The result of an assignment is the value assigned into the left-hand side of the statement. On the last iteration, when *t == '\0', the '\0' is assigned into s, which becomes the value the while loop considers before deciding it is time to quit.

In C, a=b is actually an expression which is evaluated as 'b'. It is easier to write:
if(a=b) {
//some block
}
then:
a=b;
if(a!=0) {
//some block
}
In C language within if, while, for statement the check that is made is: is expression not zero?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Explanation of the Function that Defines strlen - c++

In both C and C++ strings are really called null terminated byte strings. That null terminator is equal to zero. And in both C and C++ the value zero is equivalent to false. What the loop does is to iterate until the "current character" (pointed to by b) becomes equal to the terminator.

Related

Char pointer giving me some really strange characters

C++ interview function [duplicate]

Testing conditions using Character pointers

String going crazy if I don't give it a little extra room. Can anyone explain what is happening here?

'\0' related issue

Categories

Resources