C++ exam on string class implementation - c++

I just took an exam where I was asked the following:
Write the function body of each of the methods GenStrLen, InsertChar and StrReverse for the given code below. You must take into consideration the following;
How strings are constructed in C++
The string must not overflow
Insertion of character increases its length by 1
An empty string is indicated by StrLen = 0
class Strings {
private:
char str[80];
int StrLen;
public:
// Constructor
Strings() {
StrLen=0;
};
// A function for returning the length of the string 'str'
int GetStrLen(void) {
};
// A function to inser a character 'ch' at the end of the string 'str'
void InsertChar(char ch) {
};
// A function to reverse the content of the string 'str'
void StrReverse(void) {
};
};
The answer I gave was something like this (see bellow). My one of problem is that used many extra variables and that makes me believe am not doing it the best possible way, and the other thing is that is not working....
class Strings {
private:
char str[80];
int StrLen;
int index; // *** Had to add this ***
public:
Strings(){
StrLen=0;
}
int GetStrLen(void){
for (int i=0 ; str[i]!='\0' ; i++)
index++;
return index; // *** Here am getting a weird value, something like 1829584505306 ***
}
void InsertChar(char ch){
str[index] = ch; // *** Not sure if this is correct cuz I was not given int index ***
}
void StrRevrse(void){
GetStrLen();
char revStr[index+1];
for (int i=0 ; str[i]!='\0' ; i++){
for (int r=index ; r>0 ; r--)
revStr[r] = str[i];
}
}
};
I would appreciate if anyone could explain me roughly what is the best way to have answered the question and why. Also how come my professor closes each class function like " }; ", I thought that was only used for ending classes and constructors only.
Thanks a lot for your help.

First, the trivial }; question is just a matter of style. I do that too when I put function bodies inside class declarations. In that case the ; is just an empty statement and doesn't change the meaning of the program. It can be left out of the end of the functions (but not the end of the class).
Here's some major problems with what you wrote:
You never initialize the contents of str. It's not guaranteed to start out with \0 bytes.
You never initialize index, you only set it within GetStrLen. It could have value -19281281 when the program starts. What if someone calls InsertChar before they call GetStrLen?
You never update index in InsertChar. What if someone calls InsertChar twice in a row?
In StrReverse, you create a reversed string called revStr, but then you never do anything with it. The string in str stays the same afterwords.
The confusing part to me is why you created a new variable called index, presumably to track the index of one-past-the-last character the string, when there was already a variable called StrLen for this purpose, which you totally ignored. The index of of one-past-the-last character is the length of the string, so you should just have kept the length of the string up to date, and used that, e.g.
int GetStrLen(void){
return StrLen;
}
void InsertChar(char ch){
if (StrLen < 80) {
str[StrLen] = ch;
StrLen = StrLen + 1; // Update the length of the string
} else {
// Do not allow the string to overflow. Normally, you would throw an exception here
// but if you don't know what that is, you instructor was probably just expecting
// you to return without trying to insert the character.
throw std::overflow_error();
}
}
Your algorithm for string reversal, however, is just completely wrong. Think through what that code says (assuming index is initialized and updated correctly elsewhere). It says "for every character in str, overwrite the entirety of revStr, backwards, with this character". If str started out as "Hello World", revStr would end up as "ddddddddddd", since d is the last character in str.
What you should do is something like this:
void StrReverse() {
char revStr[80];
for (int i = 0; i < StrLen; ++i) {
revStr[(StrLen - 1) - i] = str[i];
}
}
Take note of how that works. Say that StrLen = 10. Then we're copying position 0 of str into position 9 of revStr, and then position 1 of str into position 9 of revStr, etc, etc, until we copy position StrLen - 1 of str into position 0 of revStr.
But then you've got a reversed string in revStr and you're still missing the part where you put that back into str, so the complete method would look like
void StrReverse() {
char revStr[80];
for (int i = 0; i < StrLen; ++i) {
revStr[(StrLen - 1) - i] = str[i];
}
for (int i = 0; i < StrLen; ++i) {
str[i] = revStr[i];
}
}
And there are cleverer ways to do this where you don't have to have a temporary string revStr, but the above is perfectly functional and would be a correct answer to the problem.
By the way, you really don't need to worry about NULL bytes (\0s) at all in this code. The fact that you are (or at least you should be) tracking the length of the string with the StrLen variable makes the end sentinel unnecessary since using StrLen you already know the point beyond which the contents of str should be ignored.

int GetStrLen(void){
for (int i=0 ; str[i]!='\0' ; i++)
index++;
return index; // *** Here am getting a weird value, something like 1829584505306 ***
}
You are getting a weird value because you never initialized index, you just started incrementing it.

Your GetStrLen() function doesn't work because the str array is uninitialized. It probably doesn't contain any zero elements.
You don't need the index member. Just use StrLen to keep track of the current string length.

There are lots of interesting lessons to learn by this exam question. Firstly the examiner is does not appear to a fluent C++ programmer themselves! You might want to look at the style of the code, including whether the variables and method names are meaningful as well as some of the other comments you've been given about usage of (void), const, etc... Do the method names really need "Str" in them? We are operating with a "Strings" class, after all!
For "How strings are constructed in C++", well (like in C) these are null-terminated and don't store the length with them, like Pascal (and this class) does. [#Gustavo, strlen() will not work here, since the string is not a null-terminated one.] In the "real world" we'd use the std::string class.
"The string must not overflow", but how does the user of the class know if they try to overflow the string. #Tyler's suggestion of throwing a std::overflow_exception (perhaps with a message) would work, but if you are writing your own string class (purely as an exercise, you're very unlikely to need to do so in real life) then you should probably provide your own exception class.
"Insertion of character increases its length by 1", this implies that GetStrLen() doesn't calculate the length of the string, but purely returns the value of StrLen initialised at construction and updated with insertion.
You might also want to think about how you're going to test your class. For illustrative purposes, I added a Print() method so that you can look at the contents of the class, but you should probably take a look at something like Cpp Unit Lite.
For what it's worth, I'm including my own implementation. Unlike the other implementations so far, I have chosen to use raw-pointers in the reverse function and its swap helper. I have presumed that using things like std::swap and std::reverse are outside the scope of this examination, but you will want to familiarise yourself with the Standard Library so that you can get on and program without re-inventing wheels.
#include <iostream>
void swap_chars(char* left, char* right) {
char temp = *left;
*left = *right;
*right = temp;
}
class Strings {
private:
char m_buffer[80];
int m_length;
public:
// Constructor
Strings()
:m_length(0)
{
}
// A function for returning the length of the string 'm_buffer'
int GetLength() const {
return m_length;
}
// A function to inser a character 'ch' at the end of the string 'm_buffer'
void InsertChar(char ch) {
if (m_length < sizeof m_buffer) {
m_buffer[m_length++] = ch;
}
}
// A function to reverse the content of the string 'm_buffer'
void Reverse() {
char* left = &m_buffer[0];
char* right = &m_buffer[m_length - 1];
for (; left < right; ++left, --right) {
swap_chars(left, right);
}
}
void Print() const {
for (int index = 0; index < m_length; ++index) {
std::cout << m_buffer[index];
}
std::cout << std::endl;
}
};
int main(int, char**) {
Strings test_string;
char test[] = "This is a test string!This is a test string!This is a test string!This is a test string!\000";
for (char* c = test; *c; ++c) {
test_string.InsertChar(*c);
}
test_string.Print();
test_string.Reverse();
test_string.Print();
// The output of this program should look like this...
// This is a test string!This is a test string!This is a test string!This is a test
// tset a si sihT!gnirts tset a si sihT!gnirts tset a si sihT!gnirts tset a si sihT
return 0;
}
Good luck with the rest of your studies!

void InsertChar(char ch){
str[index] = ch; // *** Not sure if this is correct cuz I was not given int index ***
}
This should be something more like
str[strlen-1]=ch; //overwrite the null with ch
str[strlen]='\0'; //re-add the null
strlen++;

Your teacher gave you very good hints on the question, read it again and try answering yourself. Here's my untested solution:
class Strings {
private:
char str[80];
int StrLen;
public:
// Constructor
Strings() {
StrLen=0;
str[0]=0;
};
// A function for returning the length of the string 'str'
int GetStrLen(void) {
return StrLen;
};
// A function to inser a character 'ch' at the end of the string 'str'
void InsertChar(char ch) {
if(StrLen < 80)
str[StrLen++]=ch;
};
// A function to reverse the content of the string 'str'
void StrReverse(void) {
for(int i=0; i<StrLen / 2; ++i) {
char aux = str[i];
str[i] = str[StrLen - i - 1];
str[StrLen - i - 1] = aux;
}
};
};

When you init the char array, you should set its first element to 0, and the same for index. Thus you get a weird length in GetStrLen since it is up to the gods when you find the 0 you are looking for.
[Update] In C/C++ if you do not explicitly initialize your variables, you usually get them filled with random garbage (the content of the raw memory allocated to them). There are some exceptions to this rule, but the best practice is to always initialize your variables explicitly. [/Update]
In InsertChar, you should (after checking for overflow) use StrLen to index the array (as the comment specifies "inser a character 'ch' at the end of the string 'str'"), then set the new terminating 0 character and increment StrLen.

You don't need index as a member data. You can have it a local variable if you so please in GetStrLen(): just declare it there rather than in the class body. The reason you get a weird value when you return index is because you never initialized it. To fix that, initialize index to zero in GetStrLen().
But there's a better way to do things: when you insert a character via InsertChar() increment the value of StrLen, so that GetStrLen() need only return that value. This will make GetStrLen() much faster: it will run in constant time (the same performance regardless of the length of string).
In InsertChar() you can use StrLen as you index rather than index, which we already determined is redundant. But remember that you must make sure the string terminates with a '\0' value. Also remember to maintain StrLen by incrementing it to make GetStrLen()'s life easier. In addition, you must take the extra step in InsertChar() to avoid a buffer overflow. This happens when the user inserts a character to the string when the length of the string is alreay 79 characters. (Yes, 79: you must spend one character on the terminating null).
I don't see an instruction as to how to behave when that happens, so it must be up to your good judgment call. If the user tries to add the 80th character you might ignore the request and return, or you might set an error flag -- it's up to you.
In your StrReverse() function you have a few mistakes. First, you call GetStrLen() but ignore its return value. Then why call it? Second, you're creating a temporary string and work on that, rather than on the string member of the class. So your function doesn't change the string member, when it should in fact reverse it. And last, you could reverse the string faster by iterating through half of it only.
Work on the member data string. To reverse a string you can swap the first element (character) of the string with its last (not the terminating null, the character just before that!), the second element with the second-to-last and so on. You're done when you arrive at the middle of the string. Don't forget that the string must terminate with a '\0' character.
While you were solving the exam it would also be a good opportunity to teach your instructor a think or two about C++: we don't say f(void) because that belongs to the old days of C89. In C++ we say f(). We also strive in C++ to use class initializer lists whenever we can. Also remind your instructor how important const-correctness is: when a function shouldn't change the object is should be marked as such. int GetStrLen(void) should be int GetStrLen() const.

You don't need to figure out the length. You already know it it is strLen. Also there was nothing in the original question to indicate that the buffer should contain a null terminated string.
int GetStrLen(void){
return strLen;
}
Just using an assertion here but another option is to throw an exception.
void InsertChar(char ch){
assert(strLen < 80);
str[strLen++] = ch;
}
Reversing the string is just a matter of swapping the elements in the str buffer.
void StrRevrse(void){
int n = strLen >> 1;
for (int i = 0; i < n; i++) {
char c = str[i];
str[i] = str[strLen - i];
str[strLen - i] = c;
}
}

I would use StrLen to track the length of the string. Since the length also indicates the end of the string, we can use that for inserting:
int GetStrLen(void) {
return StrLen;
}
int InsertChar(char ch)
{
if (strLen < sizeof(str))
{
str[StrLen] = ch;
++strLen;
}
}
void StrReverse(void) {
for (int n = 0; n < StrLen / 2; ++n)
{
char tmp = str[n];
str[n] = str[StrLen - n - 1];
str[StrLen - n - 1] = tmp;
}
}

first of all why on you use String.h for the string length?
strlen(char[] array) returns the Lenght or any char array to a int.
Your function return a werid value because you never initialize index, and the array has zero values, first initilize then execute your method.

Related

Logical error. Elements in std::string not replaced properly with for loop

I'm currently doing a programming exercise from a C++ book for beginners. The task reads as follows: "Write a function that reverses the characters in a text string by using two pointers. The only function parameter shall be a pointer to the string."
My issue is that I haven't been able to make the characters swap properly, see the output below. (And I also made the assumption that the function parameter doesn't count, hence why I'm technically using three pointers).
I am almost certain that the problem has to do with the for loop. I wrote this pseudocode:
Assign value of element number i in at_front to the 1st element in transfer_back.
Assign value of element number elem in at_back to element number i in at_front.
Assign value of the 1st element in transfer_back to element number elem in at_back.
Increment i, decrement elem. Repeat loop until !(i < elem)
I wasn't sure whether of not I was supposed to take the null terminator into account. I tried writing (elem - 1) but that messed up with the characters even more so I've currently left it as it is.
#include <iostream>
#include <string>
using namespace std;
void strrev(string *at_front) {
string *transfer_back = at_front, *at_back = transfer_back;
int elem = 0;
while(at_back->operator[](elem) != '\0') {
elem++;
}
for(int i = 0; i < elem; i++) {
transfer_back->operator[](0) = at_front->operator[](i);
at_front->operator[](i) = at_back->operator[](elem);
at_back->operator[](elem) = transfer_back->operator[](0);
elem--;
}
}
int main() {
string str = "ereh txet yna";
string *point_str = &str;
strrev(point_str);
cout << *point_str << endl;
return 0;
}
Expected output: "any text here"
Terminal window: "xany text her"
The fact that the 'x' has been assigned to the first element is something I haven't been able to grasp.
Here is the correct answer
void strrev(string *at_front) {
string *at_back = at_front;
char transfer_back;
int elem = 0;
while(at_back->operator[](elem) != '\0') {
elem++;
}
for(int i = 0; i <elem; i++) {
transfer_back = at_front->operator[](i);
at_front->operator[](i) = at_back->operator[](elem);
at_back->operator[](elem) = transfer_back;
elem--;
}
}
Let me explain why you have that error. string *transfer_back = at_front those two are pointed to the same reference, that is why when you change transfer_back->operator[](0) = at_front->operator[](i);this change will reflect in at_front string as well.
"Write a function that reverses the characters in a text string by using two pointers. The only function parameter shall be a pointer to the string."
This sounds to me like the question addresses C strings but not std::string.
Assuming my feeling is right, this could look like:
#include <iostream>
#include <string>
void strrev(char *at_front) {
char *at_back = at_front;
if (!*at_back) return; // early out in edge case
// move at_back to end (last char before 0-terminator)
while (at_back[1]) ++at_back;
// reverse by swapping contents of front and back
while (at_front < at_back) {
std::swap(*at_front++, *at_back--);
}
}
int main() {
char str[] = "ereh txet yna";
strrev(str);
std::cout << str << '\n';
return 0;
}
Output:
any text here
Live Demo on coliru
Note:
I stored the original string in a char str[].
If I had used char *str = "ereh txet yna"; I had assigned an address of a constant string to str. This feels very wrong as I want to modify the contents of str which must not be done on constants.
strrev():
The at_back[1] reads the next char after address in at_back. For a valid C string, this should be always possible as I excluded the empty string (consisting of 0-terminator only) before.
The swapping loop moves at_front as well as at_back. As the pointer is given as value, this has no "destructive" effect outside of strrev().
Concerning std::swap(*at_front++, *at_back--);:
The swapping combines access to pointer contents with pointer increment/decrement, using postfix-increment/-decrement. IMHO, one of the rare cases where the postfix operators are useful somehow.
Alternatively, I could have written:
std::swap(*at_front, *at_back); ++at_front; --at_back;
Please, note that std::string is a container class. A pointer to the container cannot be used to address its contained raw string directly. For this, std::string provides various access methods like e.g.
std::string::operator[]()
std::string::at()
std::string::data()
etc.

How can I find the size of a (* char) array inside of a function?

I understand how to find the size using a string type array:
char * shuffleStrings(string theStrings[])
{
int sz = 0;
while(!theStrings[sz].empty())
{
sz++;
}
sz--;
printf("sz is %d\n", sz);
char * shuffled = new char[sz];
return shuffled;
}
One of my questions in the above example also is, why do I have to decrement the size by 1 to find the true number of elements in the array?
So if the code looked like this:
char * shuffleStrings(char * theStrings[])
{
//how can I find the size??
//I tried this and got a weird continuous block of printing
int i = 0;
while(!theStrings)
{
theStrings++;
i++;
}
printf("sz is %d\n", i);
char * shuffled = new char[i];
return shuffled;
}
You should not decrement the counter to get the real size, in the fist snippet. if you have two element and one empty element, the loop will end with value , which is correct.
In the second snippet, you work on a pointer to a pointr. So the while-condition should be *theStrings (supposing that a NULL pointer ist the marker for the end of your table.
Note that in both cases, if the table would not hold the marker for the end of table, you'd risk to go out of bounds. Why not work with vector<string> ? Then you could get the size without any loop, and would not risk to go out of bounds
What you are seeing here is the "termination" character in the string or '\0'
You can see this better when you use a char* array instead of a string.
Here is an example of a size calculator that I have made.
int getSize(const char* s)
{
unsigned int i = 0;
char x = ' ';
while ((x = s[i++]) != '\0');
return i - 1;
}
As you can see, the char* is terminated with a '\0' character to indicate the end of the string. That is the character that you are counting in your algorithm and that is why you are getting the extra character.
As to your second question, seem to want to create a new array with size of all of the strings.
To do this, you could calculate the length of each string and then add them together to create a new array.

Character pointers messed up in simple Boyer-Moore implementation

I am currently experimenting with a very simple Boyer-Moore variant.
In general my implementation works, but if I try to utilize it in a loop the character pointer containing the haystack gets messed up. And I mean that characters in it are altered, or mixed.
The result is consistent, i.e. running the same test multiple times yields the same screw up.
This is the looping code:
string src("This haystack contains a needle! needless to say that only 2 matches need to be found!");
string pat("needle");
const char* res = src.c_str();
while((res = boyerMoore(res, pat)))
++res;
This is my implementation of the string search algorithm (the above code calls a convenience wrapper which pulls the character pointer and length of the string):
unsigned char*
boyerMoore(const unsigned char* src, size_t srcLgth, const unsigned char* pat, size_t patLgth)
{
if(srcLgth < patLgth || !src || !pat)
return nullptr;
size_t skip[UCHAR_MAX]; //this is the skip table
for(int i = 0; i < UCHAR_MAX; ++i)
skip[i] = patLgth; //initialize it with default value
for(size_t i = 0; i < patLgth; ++i)
skip[(int)pat[i]] = patLgth - i - 1; //set skip value of chars in pattern
std::cout<<src<<"\n"; //just to see what's going on here!
size_t srcI = patLgth - 1; //our first character to check
while(srcI < srcLgth)
{
size_t j = 0; //char match ct
while(j < patLgth)
{
if(src[srcI - j] == pat[patLgth - j - 1])
++j;
else
{
//since the number of characters to skip may be negative, I just increment in that case
size_t t = skip[(int)src[srcI - j]];
if(t > j)
srcI = srcI + t - j;
else
++srcI;
break;
}
}
if(j == patLgth)
return (unsigned char*)&src[srcI + 1 - j];
}
return nullptr;
}
The loop produced this output (i.e. these are the haystacks the algorithm received):
This haystack contains a needle! needless to say that only 2 matches need to be found!
eedle! needless to say that only 2 matches need to be found!
eedless to say that eed 2 meed to beed to be found!
As you can see the input is completely messed up after the second run. What am I missing? I thought the contents could not be modified, since I'm passing const pointers.
Is the way of setting the pointer in the loop wrong, or is my string search screwing up?
Btw: This is the complete code, except for includes and the main function around the looping code.
EDIT:
The missing nullptr of the first return was due to a copy/paste error, in the source it is actually there.
For clarification, this is my wrapper function:
inline char* boyerMoore(const string &src, const string &pat)
{
return (const char*) boyerMoore((const unsigned char*) src.c_str(), src.size(),
(const unsigned char*) pat.c_str(), pat.size());
}
In your boyerMoore() function, the first return isn't returning a value (you have just return; rather than return nullptr;) GCC doesn't always warn about missing return values, and not returning anything is undefined behavior. That means that when you store the return value in res and call the function again, there's no telling what will print out. You can see a related discussion here.
Also, you have omitted your convenience function that calculates the length of the strings that you are passing in. I would recommend double checking that logic to make sure the sizes are correct - I'm assuming you are using strlen or similar.

Why does my array element retrieval function return random value?

I am trying to make an own simple string implementation in C++. My implementation is not \0 delimited, but uses the first element in my character array (the data structure I have chosen to implement the string) as the length of the string.
In essence, I have this as my data structure: typedef char * arrayString; and I have got the following as the implementation of some primal string manipulating routines:
#include "stdafx.h"
#include <iostream>
#include "new_string.h"
// Our string implementation will store the
// length of the string in the first byte of
// the string.
int getLength(const arrayString &s1) {
return s1[0] - '0';
}
void append_str(arrayString &s, char c) {
int length = getLength(s); // get the length of our current string
length++; // account for the new character
arrayString newString = new char[length]; // create a new heap allocated string
newString[0] = length;
// fill the string with the old contents
for (int counter = 1; counter < length; counter++) {
newString[counter] = s[counter];
}
// append the new character
newString[length - 1] = c;
delete[] s; // prevent a memory leak
s = newString;
}
void display(const arrayString &s1) {
int max = getLength(s1);
for (int counter = 1; counter <= max; counter++) {
std::cout << s1[counter];
}
}
void appendTest() {
arrayString a = new char[5];
a[0] = '5'; a[1] = 'f'; a[2] = 'o'; a[3] = 't'; a[4] = 'i';
append_str(a, 's');
display(a);
}
My issue is with the implementation of my function getLength(). I have tried to debug my program inside Visual Studio, and all seems nice and well in the beginning.
The first time getLength() is called, inside the append_str() function, it returns the correct value for the string length (5). When it get's called inside the display(), my own custom string displaying function (to prevent a bug with std::cout), it reads the value (6) correctly, but returns -42? What's going on?
NOTES
Ignore my comments in the code. It's purely educational and it's just me trying to see what level of commenting improves the code and what level reduces its quality.
In get_length(), I had to do first_element - '0' because otherwise, the function would return the ascii value of the arithmetic value inside. For instance, for decimal 6, it returned 54.
This is an educational endeavour, so if you see anything else worth commenting on, or fixing, by all means, let me know.
Since you are getting the length as return s1[0] - '0'; in getLength() you should set then length as newString[0] = length + '0'; instead of newString[0] = length;
As a side why are you storing the size of the string in the array? why not have some sort of integer member that you store the size in. A couple of bytes really isn't going to hurt and now you have a string that can be more than 256 characters long.
You are accessing your array out of bounds at couple of places.
In append_str
for (int counter = 1; counter < length; counter++) {
newString[counter] = s[counter];
}
In the example you presented, the starting string is "5foti" -- without the terminating null character. The maximum valid index is 4. In the above function, length has already been set to 6 and you are accessing s[5].
This can be fixed by changing the conditional in the for statement to counter < length-1;
And in display.
int max = getLength(s1);
for (int counter = 1; counter <= max; counter++) {
std::cout << s1[counter];
}
Here again, you are accessing the array out of bounds by using counter <= max in the loop.
This can be fixed by changing the conditional in the for statement to counter < max;
Here are some improvements, that should also cover your question:
Instead of a typedef, define a class for your string. The class should have an int for the length and a char* for the string data itself.
Use operator overloads in your class "string" so you can append them with + etc.
The - '0' gives me pain. You subtract the ASCII value of 42 from the length, but you do not add it as a character. Also, the length can be 127 at maximum, because char goes from -128 to +127. See point #1.
append_str changes the pointer of your object. That's very bad practice!
Ok, thank you everyone for helping me out.
The problem appeared to be inside the appendTest() function, where I was storing in the first element of the array the character code for the value I wanted to have as a size (i.e storing '5' instead of just 5). It seems that I didn't edit previous code that I had correctly, and that's what caused me the issues.
As an aside to what many of you are asking, why am I not using classes or better design, it's because I want to implement a basic string structure having many constraints, such as no classes, etc. I basically want to use only arrays, and the most I am affording myself is to make them dynamically allocated, i.e resizable.

Can someone explain how this function works?

I was wondering if anyone could help me understand this code by just talking me through the trace of the code because I get confused with what the int pos variable actually does or is used for when it gets to the if statement. my confusion is mostly concerned with the if statement. for this code I'll give the function the string "hello" and after it runs data2 should hold the character string "helo" with one "l" because it removed the duplicates.
I was just hoping someone could walk me through an in depth trace of the code so I can understand it more because my c++ book doesn't explain the ".find()" function too well. I hope this is easy to explain, because I really want to learn this!!
Thankyou!
string data("hello");
string duplicates(string &data)
{
string data2;
int pos;
for(int i=0;i<data.length();i++)
{
if((pos=data2.find(data[i]))<0){
data2 += data[i];
}
}
return data2;
}
It either returns the empty string, or invokes undefined behavior by castimg an unsigned value that is out of ints range. While signed to unsigned conversions are safe and well defined, round tripping is only safe for the values both can represent. Some compilers will exploit this to eliminate branches only reachable with UB.
On a particular implementation of C++ it may do something in particular.
The programmer writing this bit of code screwed up in a way that probably worked every time it was tested. If they eliminated pos as a variable, removed the assignment (keepimg the right hand side) in the if clause, and replaced -1 with std::string::npos the code both becomes easier to understand and probably what the coder intended.
I am assuing that the strings in question are std::string:
duplicates takes a std::string called data by reference (so it can modify it) and returns another std::string:
std::string duplicates(std::string &data)
{
It has two local variables, data2 and pos. data2 is a std::string and pos is an int (the "default" signed integer for the compiler, usually 2s complement, at least 16 bits, usually 32 bits):
std::string data2;
int pos;
We loop the local variable i from 0 to 1 less than data.length(). So i varies over the valid indexes into data. Note that if data's length is longer than the max value an int can store, this invokes undefined behavior (but that is unlikely, unless int is very small, or the string is very very long (billions of characters):
for(int i=0;i<data.length();i++)
{
Here we do two things at once. This is generally considered bad coding practice. We assign the return value of data2.find(data[i]) to pos (converting the std::size_t implicitly to an int), then check if it is less than 0 and branch based on that.
There are a few problems here. Converting a std::size_t to int invokes undefined by the standard behavior. Many compilers will do a 2s complement truncating cast, but this is not guaranteed, and other compilers will use the fact that the only way a std::size_t could convert to an int less than zero is undefined behavior, and optimize out the branch completely (gcc, for example, with certain flags):
if((pos=data2.find(data[i]))<0) {
the other problem being needlessly doing 2 things at once. Putting this on two lines causes no harm.
The programmer intended this line of code to run if and only if data[i] was not found in data2. If so, it appends that character into data2. As noted, the programmer failed, except if they got "lucky" with undefined behavior: (their tests all got "lucky" I'm certain):
data2 += data[i];
}
}
We then return what the programmer intended to be a string containing each unique character in data:
return data2;
}
I will rewrite the code to be non-horrible. First, C++03 style:
std::string duplicates(std::string const&data) {
std::string data2;
for(int i=0;i<data.length();++i) {
std::size_t pos = data2.find(data[i]);
if(pos == std::string::npos) {
data2 += data[i];
}
}
return data2;
}
next, C++11 style:
std::string duplicates(std::string const& data) {
std::string data2;
for( char c : data ) {
auto pos = data2.find(c);
if(pos == std::string::npos) {
data2 += c;
}
}
return data2;
}
The code below is the key code:
if((pos=data2.find(data[i]))<0){
data2 += data[i];
data[i] is going to get each character of the source string. Pretend we store this in 'C'.
data2.find(C) is now going to search for this character inside the NEW string. Call this result 'R'
if ((pos=R)<0) { } is checking if the character was NOT found. -1 will be returned if not found. If it is not found, then the character will be added to the destination string using data2 += data[i];
Note that this is complicated by they fact they are storing the pos, but its not used, so it could read:
string duplicates(string &data)
{
string data2;
for(int i=0;i<data.length();i++)
{
if(data2.find(data[i])<0)
{
data2 += data[i];
}
}
return data2;
}
What's happening here is pretty straightforward. A string is passed in to the function, and we loop over this string, examining each character in the string individually. The only complication is at the statement if((pos=data2.find(data[i]))<0). Let's break it down:
data2.find(data[i]) - this calls the find function from the string class. Given some sequence, it finds the index of the first occurrence of that sequence within the string. If the sequence is not found it returns an error value. For our intents, we can assume this value is -1. It's actually a bit more complicated than that, but for now this is enough. So to recap: 0 and up means sequence was found, -1 means not found.
pos=data2.find(data[i]) - we assign that index to a variable.
pos=data2.find(data[i]))<0 - we check the index that was returned. Since if the sequence is not found, -1 is returned, this check will evaluate to true if -1 is returned.
...And that's basically it. It's easier to read the code if we restructure the loop like so:
for(int i=0;i<data.length();i++)
{
pos=data2.find(data[i]);
if(pos < 0) {
data2 += data[i];
}
}
basically, this function make a string with unique character of the input string.
First, you need to know that .find return the position of the character given in parameter, or return string::npos (-1) if the character wasn't found.
So if you put "hello", it goes like this :
data2 = ""
data2.find('h') == -1
data2 = "h"
data2.find('e') == -1
data2 = "he"
data2.find('l') == -1
data2 = "hel"
data2.find('l') == 2
data2 = "hel"
data2.find('o') == -1
data2 = "helo"
so, what's the utility of pos ? .find return an unsigned int so checking if an unsigned int goes bellow 0 will always return false, so we store it in an int and we can do the comparison.
but wait, there is more. you may ask how a unsigned int can be equal -1 ? you can read this will waiting for more complete answer of a more competent guys