Erase unwanted characters - c++

We have created a char array with a fixed length. Now, we write a word or a sentence inside that array. However, the length of this word-sentence is shorter than the length of the char array, so when we print the message with printf function, a number of crap characters are also printed. We would like to erase all this characters, even if the length of the message written is variable.
Thank you!

C strings are terminated by a NUL byte ('\0'). If you don't have this terminator then printf doesn't know that your string has ended. The solution is to put a \0 after your word in the array.
Note: learn to use std::string which manages this for you.

Have you considered not using a fixed array size if that is not what you want? You could just use a char* instead and assign it's size dynamically.
If you want a fixed size for some reason the only solution I can come up with is to track in a separate variable how long the word is and then only print the n first characters. There is no way to determine which chars are valid and which are not in the array as far as I know.

Other than that, if your buffer is supposed to be just a byte array and not a (null terminated) string, then you can use fwrite instead of fprintf to dump the contents.
But in general, I agree with others that it could be better to use std::string.

Related

Please explain me the need for appending '\0' while converting string to char

While using proc/mysql for c++ I have taken string as user input and converted into char via strcpy(c,s.c_str()); function, where c is the binding variable through which I'll add value in the database table and s is the string (user input), it is working fine but my teacher is asking me append '\0' at the end - I can't understand the reason why I need to?
Your teacher is deluded.
c_str() in itself appends a zero [or rather, std::string reserves space for an extra character when creating the string, and makes sure this is zero at least at the point of c_str() returning - in C++11, it is guaranteed that there is an extra character space filled with zero at the end of the string, always].
You DO need a zero at the end of a string to mark the end of the string in a C-style string, such as those used by strcpy.
[As others have pointed out, you should also check that the string fits before copying, and I would suggest reject if it won't fit, as truncating it will lead to other problems - as well as checking that there isn't any sql-injection attacks and a multitude of other things required for "good pracice in an SQL environment"]
While the teacher is deluded on the appending '\0' to the string, your code exhibits another very bad bug.
You should never use strcpy in such a fashion. You should always use some routine which controls the nubmer of characters copied, like strncpy(), or other alternatives, and provide it with the size of receiving variable. Otherwise you are just asking for troubles.
Just guessing, it's a protection against buffer overflow. If c is only N bytes long and s.c_str() returns a pointer to a N+k length string, you'd write k bytes after c, which is bad.
Now let's say (if you didn't SEGFAULT already) you pass this c NUL-terminated string to a C function, you have no guarantee that the \0 you wrote after c is still there. This C function will then read an undefined amount of bytes after c, which is badder worse.
Anyway, use ::strncpy():
char c[64];
::strncpy(c, s.c_str(), sizeof(c));
c[sizeof(c)-1] = '\0';

Now when we get a string from the user using gets(), where does the '\0' terminating character go?

Now when we declare a string, the last character is the null character, right.
(Now pls see the image of the code and its output that i have attached)
As you can see in the image attached, i am getting the null character at the 7th posn!!! What is happening?
According to the book i refer to(see the other image attached), a string always has an extra character associated with it, at the end of the string, called the null character which adds to the size of the string.
But by the above code i am getting the null character at the 7th position, although according to the book, i should get it at the 6th position.
Can someone explain the output pls?
Any help is really appreciated!!
Thank You!
Do not use gets() - ever! It is entirely immaterial what gets() does as is has no place in any reasonably written code! It is certainly removed from the C++ standard and, as far as I know, also from C (I think C removed it first). gets() happily overruns the buffer provided as it doesn't even know the size of the storage provided. It was blamed as the primary reason for most hacks of systems.
In the code you linked to there is such a buffer overrun. Also not that sizeof() determines the size of a variable. It does not consider its content in any shape or form: sizeof(str) will not change unless you change the type of str. If you want to determine the size of the string in that array you'll need to use strlen(str).
If you really need to read a string into a C array using FILE* functions, you shall use fgets() which, in addition ot the pointer to the storage and the stream (e.g. stdin for the default input stream) also takes the size of the array as parameter. fgets() fails if it can't read a complete null-terminated string.
You declare a char array that can hold up to 5 chars, however, dummy\0 is 6 characters long, resulting in buffer overflow.

how to make a not null-terminated c string?

i am wondering :char *cs = .....;what will happen to strlen() and printf("%s",cs) if cs point to memory block which is huge but with no '\0' in it?
i write these lines:
char s2[3] = {'a','a','a'};
printf("str is %s,length is %d",s2,strlen(s2));
i get the result :"aaa","3",but i think this result is because that a '\0'(or a 0 byte) happens to reside in the location s2+3.
how to make a not null-terminated c string? strlen and other c string function relies heavily on the '\0' byte,what if there is no '\0',i just want know this rule deeper and better.
ps: my curiosity is aroused by studying the follw post on SO.
How to convert a const char * to std::string
and these word in that post :
"This is actually trickier than it looks, because you can't call strlen unless the string is actually nul terminated."
If it's not null-terminated, then it's not a C string, and you can't use functions like strlen - they will march off the end of the array, causing undefined behaviour. You'll need to keep track of the length some other way.
You can still print a non-terminated character array with printf, as long as you give the length:
printf("str is %.3s",s2);
printf("str is %.*s",s2_length,s2);
or, if you have access to the array itself, not a pointer:
printf("str is %.*s", (int)(sizeof s2), s2);
You've also tagged the question C++: in that language, you usually want to avoid all this error-prone malarkey and use std::string instead.
A "C string" is, by definition, null-terminated. The name comes from the C convention of having null-terminated strings. If you want something else, it's not a C string.
So if you have a string that is not null-terminated, you cannot use the C string manipulation routines on it. You can't use strlen, strcpy or strcat. Basically, any function that takes a char* but no separate length is not usable.
Then what can you do? If you have a string that is not null-terminated, you will have the length separately. (If you don't, you're screwed. You need some way to find the length, either by a terminator or by storing it separately.) What you can do is allocate a buffer of the appropriate size, copy the string over, and append a null. Or you can write your own set of string manipulation functions that work with pointer and length. In C++ you can use std::string's constructor that takes a char* and a length; that one doesn't need the terminator.
Your supposition is correct: your strlen is returning the correct value out of sheer luck, because there happens to be a zero on the stack right after your improperly terminated string. It probably helps that the string is 3 bytes, and the compiler is likely aligning stuff on the stack to 4-byte boundaries.
You cannot depend on this. C strings need NUL characters (zeroes) at the end to work correctly. C string handling is messy, and error-prone; there are libraries and APIs that help make it less so… but it's still easy to screw up. :)
In this particular case, your string could be initialized as one of these:
A: char s2[4] = { 'a','a','a', 0 }; // good if string MUST be 3 chars long
B: char *s2 = "aaa"; // if you don't need to modify the string after creation
C: char s2[]="aaa"; // if you DO need to modify the string afterwards
Also note that declarations B and C are 'safer' in the sense that if someone comes along later and changes the string declaration in a way that alters the length, B and C are still correct automatically, whereas A depends on the programmer remembering to change the array size and keeping the explicit null terminator at the end.
What happens is that strlen keeps going, reading memory values until it eventually gets to a null. it then assumes that is the terminator and returns the length that could be massively large. If you're using strlen in an environment that expects C-strings to be used, you could then copy this huge buffer of data into another one that is just not big enough - causing buffer overrun problems, or at best, you could copy a large amount of garbage data into your buffer.
Copying a non-null terminated C string into a std:string will do this. If you then decide that you know this string is only 3 characters long and discard the rest, you will still have a massively long std:string that contains the first 3 good characters and then a load of wastage. That's inefficient.
The moral is, if you're using the CRT functions to operator on C strings, they must be null-terminated. Its no different to any other API, you must follow the rules that API sets down for correct usage.
Of course, there is no reason you cannot use the CRT functions if you always use the specific-length versions (eg strncpy) but you will have to limit yourself to just those, always, and manually keep track of the correct lengths.
Convention states that a char array with a terminating \0 is a null terminated string. This means that all str*() functions expect to find a null-terminator at the end of the char-array. But that's it, it's convention only.
By convention also strings should contain printable characters.
If you create an array like you did char arr[3] = {'a', 'a', 'a'}; you have created a char array. Since it is not terminated by a \0 it is not called a string in C, although its contents can be printed to stdout.
The C standard does not define the term string until the section 7 - Library functions. The definition in C11 7.1.1p1 reads:
A string is a contiguous sequence of characters terminated by and including the first null character.
(emphasis mine)
If the definition of string is a sequence of characters terminated by a null character, a sequence of non-null characters not terminated by a null is not a string, period.
What you have done is undefined behavior.
You are trying to write to a memory location that is not yours.
Change it to
char s2[] = {'a','a','a','\0'};

single character c-style string full of junk

It's a shame I can't figure out such basic thing about c++, but c-style strings are acting as I wouldn't expect. For example, I create it like this:
char* cstr = new char[1];
It's initialized to: Íýýýýý««««««««îţ . Like normal, I can set just first char because others are not really existing (or I thought that they aren't). While working whit c-style strings all this junk is ingored and everything works fine.
Now I mixed std::string whit those c-stlye one and what I get is a mess. Whit this code:
std::string str = "aaa";
str += cstr;
I end up whit: aaaÍýýýýý««««««««îţ , but now those characters actually exist as string.size() returns length including this junk.
I can't find why is this happening, but it must be connected whit string creating, because something like char* cstr = "aaa" results in aaa without any additional junk, but trying to change string initialized this way results in memory access violation. Could someone explain me this behavior please? Thanks!
PS: My JavaScript Failed to load so if someone could format this post properly, I'd be glad!
Answer: Oh god! How could I forget on that... thanks to all for, well, immediate answer. Best one was from minitech so I'll mark this as answer as soon as my java script loads up :/
All C-style strings are null-terminated. So, a string initialized using new char[1] leaves you space for no characters. You can't set the first character to anything but \0, otherwise normal string operations will keep reading into memory until they find a zero. So use new char[2] instead.
When working with C-style strings you need to have a null terminator:
char* cstr = new char[2];
cstr[0] = 'X';
cstr[1] = '\0';
Having said all that, it is really bad code to do the above. Just use std::string unless you have a very good reason not too. It takes care of the memory allocations and deallocations for you.
C-style strings require a NUL ('\0') terminator; they don't have a length associated with them like C++ strings do. So your single-character string must be new char[2]; it will not be initialized; and you will need to make sure it's terminated with \0.
When you use new char[1], you request space for an array of characters. There is no request that said characters are initialized. Thus, the "junk" that you see is uninitialized memory. Before treating the array as a C-style string, you should do this:
cstr[0] = '\0';
c-style strings are NULL delimited. So, to ignore any junk in memory you need to place NULL byte('\0') in the string body. Otherwise, system library function will look at all bytes starting with your string start until they meet NULL byte in the memory (which will be at some random position).
This also mean that to have c-style string of one character you actually need to allocate 2 bytes: one for a meaningful character and second for '\0'.

How to get a C-string out of a string that contains \0 without losing the \0

I currently have a pretty huge string. I NEED to convert it into a C-string (char*), because the function I want to use only take C-string in parameter.
My problem here is that any thing I tried made the final C-string wayyy smaller then the original string, because my string contains many \0. Those \0 are essential, so I can't simply remove them :(...
I tried various way to do so, but the most common were :
myString.c_str();
myString.data();
Unfortunately the C-string was always only the content of the original string that was before the first \0.
Any help will be greatly appreciated!
You cannot create a C-string which contains '\0' characters, because a C-string is, by definition, a sequence of characters terminated by '\0' (also called a "zero-terminated string"), so the first '\0' in the sequence ends the string.
However, there are interfaces that take a a pointer to the first character and the length of the character sequence. These might be able to deal with character sequences including '\0'.
Watch out for myString.data(), because this returns a pointer to a character sequence that might not be zero-terminated, while mystring.c_str() always returns a zero-terminated C-string.
This is not possible. The null is the end of a null terminated string. If you take a look at your character buffer (use &myString[0]), you'll see that the NULLs are still there. However, no C functions are going to interpret those NULLs correctly because NULL is not a valid value in the middle of a string in C.
Well, myString has probably been truncated at construction/assignment time. You can try std::basic_string::assign which takes two iterators as arguments or simply use std::vector <char>, the latter being more usual in your use case.
And your API taking that C string must actually support taking a char pointer together with a length.
I'm a bit confused, but:
string x("abc");
if (x.c_str()[3] == '\0')
{ cout << "there it is" << endl; }
This may not meet your needs, you did say 'Those \0 are essential', but how about escaping or replacing the '\0' chars?
Would one of these ideas work?
replace the '\0' chars with a '\t' (tab char, decimal 9).
replace the '\0' with some rarely used char value like decimal 1, or decimal 255.
Create an escape code, say by replacing each '\0' char with a coded substring, (like octal as in "\000"). (Be sure to replace any original '\' with a coded value as well (like "\134")).