Making the middle of a pointer NUL; does it work? [duplicate] - c++

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What happens to memory after ‘\0’ in a C string?
Is it possible to make a pointer's, say, third element NUL ('\0'), thereafter "erasing" the pointers remaining elements?

Assuming you're talking about C-style strings then yes, kind of:
char s[] = "abcdefgh"; // s = "abcdefgh"
// (actually contains 9 characters: "abcdefgh\0")
s[3] = '\0'; // s = "abc"
// (still contains 9 characters, but now: "abc\0efgh\0")
Note that the characters beyond s[3] haven't magically disappeared at this point - it's just that displaying the string, or passing it to any function that takes a C-style string, results in the string only appearing to contain three characters. You can still do e.g.
s[3] = 'x'; // s = "abcxefgh"
// (still contains 9 characters, but now: "abcxefgh\0")

Related

Can someone tell me the output and provide explanation of the following c++ code? [duplicate]

This question already has answers here:
C++ character concatenation with std::string behavior. Please explain this
(3 answers)
Closed 1 year ago.
cout<<"#" + 'a'<<endl;
string s = "#";
s += 'a';
cout<<s<<endl;
I am not able to figure out how the typecasting is working in the case "#" + 'a'
In cpp string works like an array of characters, so when you assign s = '#' it compiles like this:
s[0] = '#'
and in the second line it actually compiles like this:
s[1] = 'a'
finally, s is:
#a

Strange behaviour while adding string to another string in C++ [duplicate]

This question already has answers here:
How can I repeat a string a variable number of times in C++?
(10 answers)
Closed 2 years ago.
one is 2, and ans is "000000".
string ans = "000000";
ans += string("1", one);
cout<<ans<<endl;
The output is:
0000001�
But I want the output:
00000011
What am I doing wrong?
string("1", one) does not do what you think it does. It does not duplicate the "1" string one number of times. It instead copies the 1st one number of chars from "1", which in this case is the '1' character and the '\0' null-terminator that follows it, which is where the � is coming from in the output. That is not what you want.
Use string(one, '1') instead. That will duplicate the '1' character one number of times, like you want, eg:
ans = "000000";
ans += string(one, '1');
cout << ans << endl;
Just use c++ strings and use + operator to catenate strings.

remove non alphabet characters from string c++ [duplicate]

This question already has answers here:
How to strip all non alphanumeric characters from a string in c++?
(12 answers)
Closed 6 years ago.
I'm trying to remove all non alphabet characters from an inputed string in c++ and don't know how to. I know it probably involves ascii numbers because that's what we're learning about. I can't figure out how to remove them. We only learned up to loops and haven't started arrays yet. Not sure what to do.
If the string is Hello 1234 World&*
It would print HelloWorld
If you use std::string and STL, you can:
string s("Hello 1234 World&*");
s.erase(remove_if(s.begin(), s.end(), [](char c) { return !isalpha(c); } ), s.end());
http://ideone.com/OIsJmb
Note: If you want to be able to handle strings holding text in just about any language except English, or where programs use a locale other than the default, you can use isalpha(std::locale).
PS: If you use a c-style string such as char *, you can convert it to std::string by its constructor, and convert back by its member function c_str().
If you're working with C-style strings (e.g. char* str = "foobar") then you can't "remove" characters from a string trivially (as a string is just a sequence of characters stored sequentially in memory - removing a character means copying bytes forward to fill the empty space used by the deleted character.
You'd have to allocate space for a new string and copy characters into it as-needed. The problem is, you have to allocate memory before you fill it, so you'd over-allocate memory unless you do an initial pass to get a count of the number of characters remaining in the string.
Like so:
void BlatentlyObviousHomeworkExercise() {
char* str = "someString";
size_t strLength = ... // how `strLength` is set depends on how `str` gets its value, if it's a literal then using the `sizeof` operator is fine, otherwise use `strlen` (assuming it's a null-terminated string).
size_t finalLength = 0;
for(size_t i = 0; i < strLength; i++ ) {
char c = str[i]; // get the ith element of the `str` array.
if( IsAlphabetical(c) ) finalLength++;
}
char* filteredString = new char[ finalLength + 1 ]; // note I use `new[]` instead of `malloc` as this is C++, not C. Use the right idioms :) The +1 is for the null-terminator.
size_t filteredStringI = 0;
for(size_t i = 0; i < strLength; i++ ) {
char c = str[i];
if( IsAlphabetical(c) ) filteredString[ filteredStringI++ ] = c;
}
filteredString[ filteredStringI ] = '\0'; // set the null terminator
}
bool IsAlphabet(char c) { // `IsAlphabet` rather than `IsNonAlphabet` to avoid negatives in function names/behaviors for simplicity
return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
}
I do not want to spoil the solution so I will not type out the code, only describe the solution. For your problem think of iterating through your string. Start with that. Then you need to decide if the currently selected character is part of the alphabet or not. You can do this numerous different ways. Checking ASCII values? Comparing against a string of the alphabet? Once you decide if it is a letter, then you need to rebuild the new string with that letter plus the valid letters before and after that you found or will find. Finally you need to display your new string.
If you look at an ascii table, you can see that A-Z is between 65-90 and a-z is between 97-122.
So, assuming that you only need to remove those characters (not accentuated), and not other characters from other languages for example, not represented in ascii, all you would need to do is loop the string, verify if each char is in these values and remove it.

C++ getline() behaves strangely when reading stringstream containing \0 [duplicate]

This question already has answers here:
How do you construct a std::string with an embedded null?
(11 answers)
Closed 8 years ago.
I'm trying to read a large buffer from a socket which uses \0 to delimit pieces of data and \n to delimit lines.
I thought getline() would be an easy way to get each line but it's behaving strangely.
I'm using \n as the delimiter in getline().
string line;
string test1 = "aaa,123\nbbb\nccc,456\n";
stringstream ss1(test1);
while(std::getline(ss1, line, '\n')) {
cout << line << endl;
}
// outputs:
// aaa,123
// bbb
// ccc,456
string test2 = "aaa\0123\0\nbbb\0\nccc\0456\0\n";
stringstream ss2(test2);
while(std::getline(ss2, line, '\n')) {
cout << line << endl;
}
// outputs:
// aaa
// 3
Why is this happening in test2? Where is the 3 coming from? Must I remove the \0 to make this work? Is there an easier/better way to mark strings in my buffer when I do a socket recv()?
\0 in a special symbol. It shows when the string ends.
For example, if you type in "a string", the compiler automatically adds a \0 on the end, which signifies the end of the string. However, it is legal to have a \0 in the middle of the string, it just means that everything after it is ignored.
So basically, any operation you do on the string, not just the getline, will treat the string as "aaa", ignoring everything after the first \0 that is found. But...
As #Fred Larson points out
Oh, I see where the 3 comes from. The first \0 isn't a null, it's the start of \012, which is a carriage return. Then the 3 follows.
So actually, the string is being treated as "aaa\n3". Which is why you get the output you do.
Edit: And thanks to Galik, I will also add that these rules I mention might only apply to a string literal / c-string. It may be a different case with std::strings, in which the length of the string is known ahead of time.
\0 is the standard string terminator symbol. As such, you may either read character by character or avoid \0 as delemeters

Length of String

String manipulation problem
http://www.ideone.com/qyTkL
In the above program (given in the book C++ Primer, Third Edition By Stanley B. Lippman, Josée Lajoie Exercise 3.14) the length of the Character pointer taken is len+1
char *pc2 = new char[ len + 1];
http://www.ideone.com/pGa6c
However, in this program the length of the Character pointer i have taken is len
char *pc2 = new char[ len ];
Why is there the need to take the length of new string as 1 greater when we get the same result. Please Explain.
Mind it the Programs i have shown here are altered slightly. Not exactly the same one as in the book.
To store a string of length n in C, you need n+1 chars. This is because a string in C is simply an array of chars terminated by the null character \0. Thus, the memory that stores the string "hello" looks like
'h' 'e' 'l' 'l' 'o' '\0'
and consists of 6 chars even though the word hello is only 5 letters long.
The inconsistency you're seeing could be a semantic one; some would say that length of the word hello is len = 5, so we need to allocate len+1 chars, while some would say that since hello requires 6 chars we should say its length (as a C string) is len=6.
Note, by the way, that the C way of storing strings is not the only possible one. For example, one could store a string as an integer (giving the string's length) followed by characters. (I believe this is what Pascal does?). If one doesn't use a length field such as this, one needs another way to know when the string stops. The C way is that the string stops whenever a null character is reached.
To get a feel for how this works, you might want to try the following:
char* string = "hello, world!";
printf("%s\n", string);
char* string2 = "hello\0, world!";
printf("%s\n", string2);
(The assignment char* string = "foo"; is just a shorthand way of creating an array with 4 elements, and giving the first the value 'f', the second 'o', the third 'o', and the fourth '\0').
It's a convention that the string is terminated by an extra null character so whoever allocates storage has to allocate len + 1 characters.
It causes problem. But, sometimes, when len isn't aligned, the OS adds some bytes after it, so the problem is hidden.