c++ string capacity change during copy assignment - c++

In the C++ Standard std:string follows an exponential growth policy, therefore I suppose the capacity() of string during concatenation will always be increased when necessary. However, when I test test.cpp, I found that in the for-loop, only every two times will the capacity() be shrunk back to length() during assignment.
Why isn't this behavior depending on the length of string, but depending on how frequent I change the string? Is it some kind of optimization?
The following codes are tested with g++ -std=c++11.
test.cpp:
#include <iostream>
int main(int argc, char **argv) {
std::string s = "";
for (int i = 1; i <= 1000; i++) {
//s += "*";
s = s + "*";
std::cout << s.length() << " " << s.capacity() << std::endl;
}
return 0;
}
And the output will be like this:
1 1
2 2
3 4
4 4
5 8
6 6 // why is capacity shrunk?
7 12
8 8 // and again?
9 16
10 10 // and again?
11 20
12 12 // and again?
13 24
14 14 // and again?
15 28
16 16 // and again?
17 32
...
996 996
997 1992
998 998 // and again?
999 1996
1000 1000 // and again?

When you do this:
s = s + "*";
You're doing two separate things: making a new temporary string, consisting of "*" concatenated onto the end of the contents s, and then copy-assigning that new string to s.
It's not the + that's shrinking, it's the =. When copy-assigning from one string to another, there's no reason to copy the capacity, just the actual used bytes.
Your commented-out code does this:
s += "*";
… is only doing one thing, appending "*" onto the end of s. So, there's nowhere for the "optimization" to happen (if it happened, it would be a pessimization, defeating the entire purpose of the exponential growth).

It's actually not convered by the C++ standard what happens to capacity() when strings are moved, assigned, etc. This could be a defect. The only constraints are those derivable from the time complexity specified for the operation.
See here for similar discussion about vectors.

Related

Moving through text file c++

I'm trying to save numbers from first txt file to second one in reversed order.
To be clear, inside 1st txt I have typed numbers from 1 to 10 (decimal notation). When I try to count them, I get 5 or 7, depending on what's between them (space or enter).
Then, another error is that inside 2nd txt program saves as much "0s" as dl's variable value is equal to instead of loaded numbers in reversed order.
I paste the whole code, because I don't know file operation rules good enough to determine which exact part could be the source of problem. Thank You in advance.
#include <fstream>
#include <iostream>
using namespace std;
int main() {
fstream plik1;
plik1.open("L8_F3_Z2a.txt", ios::in | ios::binary);
fstream plik2;
plik2.open("L8_F3_Z2b.txt", ios::out);
if(!plik1.good() || !plik2.good()) {
cout << "file(s) invalid" << endl;
return 1;
}
plik1.seekg(0, ios::end);
int dl = plik1.tellg() / sizeof(int);
cout << "length = " << dl << endl;
int a;
for(int i = 0; i < dl; i++) {
plik1.seekg((i + 1) * sizeof(int), ios::end);
plik1 >> a;
plik2 << a;
cout << i + 1 << ". a = " << a << endl;
}
plik1.close();
plik2.close();
return 0;
}
edit the output is:
length = 7
1. a = 0
2. a = 0
3. a = 0
4. a = 0
5. a = 0
6. a = 0
7. a = 0
--------------------------------
Process exited after 0.03841 seconds with return value 0
Press any key to continue . . .
Problem
When a file is encoded as text the binary size of the data is irrelevant.
int dl = plik1.tellg() / sizeof(int);
will get you the side of the file in integers, but the file isn't storing integers. It is storing a stream of characters. Say for example the file holds one number:
12345
which is five characters long. Assuming the file is using good ol ASCII, that's 5 bytes. When 12345 is converted to an int it will probably be 4 or 8 bytes and almost certainly not 5 bytes. Assuming the common 32 bit (4 byte) int
int dl = plik1.tellg() / sizeof(int);
int dl = 5 / 4;
int dl = 1;
Yay! It worked! But only by the grace of whatever deity or cosmic entity you worship. Or don't worship. I'm not going to judge. To show why you can't count on this, lets look at
123
this is three characters and 3 bytes, so
int dl = plik1.tellg() / sizeof(int);
int dl = 3 / 4;
int dl = 0;
Whoops.
Similarly
1 2 3 4 5
is five numbers. The file length will probably be the sum of one byte per digit and one byte per space, 9 bytes.
Where this gets weird is some systems, looking at you Windows, use a two character end of line marker, carriage return and a line feed. This means
1
2
3
4
5
will sum up to 13 bytes.
This is why you see a different size depending on whether the numbers are separated with spaces or newlines.
Solution
The only way to find out how many numbers are in the file is to read the file, convert the contents to numbers, and count the numbers as you find them.
How to do that:
int num;
int count = 0;
while (plik1 >> num) // read numbers until we can't read any more
{
count++;
}
From this you can determine the size of the array you need. Then you rewind the file, seek back to the beginning, allocate the array and read the file AGAIN into the array. This is dumb. File IO is painfully slow. You don't want to do it twice. You want to read the file once and store as you go without caring how many numbers are in the file.
Fortunately there are a number of tools built into C++ that do exactly that. I like std::vector
std::vector<int> nums;
int num;
while (plik1 >> num)
{
nums.push_back(num);
}
vector even keeps count for you.
Next you could
std::reverse(nums.begin(), nums.end());
and write the result back out.
for (int num: nums)
{
plik2 << num << ' ';
}
Documentation for std::reverse
If your instructor has a no vector policy, and unfortunately many do, your best bet is to write your own simple version of vector. There are many examples of how to do this already on Stack Overflow.
Addendum
In binary 5 integers will likely be 20 or 40 bytes no matter how many digits are used and no separators are required.
It sounds like storing data as binary is the bees knees, right? Like it's going to be much easier.
But it's not. Different computers and different compilers use different sizes for integers. All you are guaranteed is an int is at least 2 bytes and no larger than a long. All of the integer types could be exactly the same size at 64 bits. Blah. Worse, not all computers store integers in the same order. Because it's easier to do some operations if the number is stored backwards, guess what? Often the number is stored backwards. You have to be very, very careful with binary data and establish a data protocol (search term for more on this topic: Serialization) that defines the how the data is to be interpreted by everyone.

Undefined Behavior quirk: reading outside a buffer causes a loop to never terminate?

I wrote a very trivial program to try to examine the undefined behavior attached to buffer overflows. Specifically, regarding what happens when you perform a read on data outside the allocated space.
#include <iostream>
#include<iomanip>
int main() {
int values[10];
for (int i = 0; i < 10; i++) {
values[i] = i;
}
std::cout << values << " ";
std::cout << std::endl;
for (int i = 0; i < 11; i++) {
//UB occurs here when values[i] is executed with i == 10
std::cout << std::setw(2) << i << "(" << (values + i) << "): " << values[i] << std::endl;
}
system("pause");
return 0;
}
When I run this program on Visual Studio, the results aren't terribly surprising: reading index 10 produces garbage:
000000000025FD70
0(000000000025FD70): 0
1(000000000025FD74): 1
2(000000000025FD78): 2
3(000000000025FD7C): 3
4(000000000025FD80): 4
5(000000000025FD84): 5
6(000000000025FD88): 6
7(000000000025FD8C): 7
8(000000000025FD90): 8
9(000000000025FD94): 9
10(000000000025FD98): -1966502944
Press any key to continue . . .
But when I fed this program into Ideone.com's online compiler, I got extremely bizarre behavior:
0xff8cac48
0(0xff8cac48): 0
1(0xff8cac4c): 1
2(0xff8cac50): 2
3(0xff8cac54): 3
4(0xff8cac58): 4
5(0xff8cac5c): 5
6(0xff8cac60): 6
7(0xff8cac64): 7
8(0xff8cac68): 8
9(0xff8cac6c): 9
10(0xff8cac70): 1
11(0xff8cac74): -7557836
12(0xff8cac78): -7557984
13(0xff8cac7c): 1435443200
14(0xff8cac80): 0
15(0xff8cac84): 0
16(0xff8cac88): 0
17(0xff8cac8c): 1434052387
18(0xff8cac90): 134515248
19(0xff8cac94): 0
20(0xff8cac98): 0
21(0xff8cac9c): 1434052387
22(0xff8caca0): 1
23(0xff8caca4): -7557836
24(0xff8caca8): -7557828
25(0xff8cacac): 1432254426
26(0xff8cacb0): 1
27(0xff8cacb4): -7557836
28(0xff8cacb8): -7557932
29(0xff8cacbc): 134520132
30(0xff8cacc0): 134513420
31(0xff8cacc4): 1435443200
32(0xff8cacc8): 0
33(0xff8caccc): 0
34(0xff8cacd0): 0
35(0xff8cacd4): 346972086
36(0xff8cacd8): -29697309
37(0xff8cacdc): 0
38(0xff8cace0): 0
39(0xff8cace4): 0
40(0xff8cace8): 1
41(0xff8cacec): 134514984
42(0xff8cacf0): 0
43(0xff8cacf4): 1432277024
44(0xff8cacf8): 1434052153
45(0xff8cacfc): 1432326144
46(0xff8cad00): 1
47(0xff8cad04): 134514984
...
//The heck?! This just ends with a Runtime Error after like 200 lines.
So apparently, with their compiler, overrunning the buffer by a single index causes the program to enter an infinite loop!
Now, to reiterate: I realize that I'm dealing with undefined behavior here. But despite that, I'd like to know what on earth is happening behind the scenes to cause this. The code that physically performs the buffer overrun is still performing a read of 4 bytes and writing whatever it reads to a (presumably better protected) buffer. What is the compiler/CPU doing that causes these issues?
There are two execution paths leading to the condition i < 11 being evaluated.
The first is before the initial loop iteration. Since i had been initialised to 0 just before the check, this is trivially true.
The second is after a successful loop iteration. Since the loop iteration caused values[i] to be accessed, and values only has 10 elements, this can only be valid if i < 10. And if i < 10, after i++, i < 11 must also be true.
This is what Ideone's compiler (GCC) is detecting. There is no way the condition i < 11 can ever be false unless you have an invalid program, therefore it can be optimised away. At the same time, your compiler doesn't go out of its way to check whether you might have an invalid program unless you provide additional options to tell it to do so (such as -fsanitize=undefined in GCC/clang).
This is a trade off implementations must make. They can favour understandable behaviour for invalid programs, or they can favour raw speed for valid programs. Or a mix of both. GCC definitely focuses greatly on the latter, at least by default.

find all palindromes inside a string

I am stuck and cant seem to figure out where I should go from here.
I would appreciate any hints or tips on how I should approach this problem. Been trying to figure this out for over 9 hrs no luck.
The question is as follows:
A string s is said to be palindromic if it reads the same backwards and forwards. A decomposition of s is a set of non-overlapping sub-strings of s whose concatenation is s.
Write a C++ program that takes as input a string and computes all its palin-
dromic decompositions. For example.if s is the string 0204451881 then the
decomposition 020, 44 5 1881 is a palindromic decomposition. So is
0 2 0 4 4 5 1 8 8 1
0 2 0 4 4 5 1 88 1
0 2 0 4 4 5 1881
0 2 0 44 5 1 8 8 1
0 2 0 44 5 1 88 1
020 4 4 5 1 8 8 1
020 4 4 5 1 88 1
020 4 4 5 1881
020 44 5 1 8 8 1
020 44 5 1 88 1
020 44 5 1881
this is a class project.
so far I have:
#include <iostream>
#include <string>
using namespace std;
void palDecom(string str1);
bool isPal(const string &str);
void subPal(string str1);
int main()
{
string s = "0204451881";
palDecom(s);
subPal(s);
return 0;
}
//shows the decomposition as the single char of the string
//takes a string as input
void palDecom(string str1)
{
int stringLastIndex = (str1.length());
for (int i = 0; i < stringLastIndex; i++)
{
cout<< str1[i] <<" ";
}
cout<<endl;
}
void subPal(string str1)
{
int stringLastIndex = (str1.length());
for (int curIndx = 0; curIndx < stringLastIndex; curIndx++)
{
for(int comparIndx = 1; comparIndx < stringLastIndex; comparIndx++)
{
//cout<< "i was in this loop"<<endl;
if (isPalindrome((str1,curIndx,comparIndx)))
//cout<<str1.substr(0,curIndx-1)<<" "<< str1.substr(curIndx,comparIndx) <<" "<< str1.substr(comparIndx,stringLastIndex)<<endl;
}
}
}
bool isPal(const string &str)
{
int start=0, end=str.length()-1;
while (start < end) {
if (str[start++] != str[end--])
return false;
}
return true;
}
Actually, I just managed to realize this:
Palindromes decompose to combination splits.
What this means is that each palindrome will "split" into additional sub-palindromes based on how many "layers" of palindrome it possesses.
For example: The sequence
12213443
-> 1221 + 3443
-> 1 + 22 + 1 + 3 + 44 + 3
-> 1 + 2 + 2 + 1 + 3 + 44 + 3
As you parse down the string, the possibilities will just increase by the amount of palindromes a larger one can decompose to until you have palindromes of 1 character width.
Granted, I realize that palindromes can overlap:
1221221
-> 1221 + 22 + 1
OR -> 1 + 22 + 1221
This is an additional quandary, but is definitely solvable.
Additionally, you can choose to think about smaller palindromes coming together to create larger ones.
Personally, I think this line of thought will lead to a better algorithm and method of solving from the above, as composing new palindromes while iterating in one direction is probably easier than decomposing them in just one direction.
I think the best course is to start playing with palindromes and map out the possible decompositions. By analysing this, you should be able to find a repetitive pattern that can then be mapped to a recursive solution.
Regardless, this answer definitely can use recursion. There is a clear pattern here; you just need to explore it more and find it.
I wish I had a more definitive answer but I myself am struggling with the problem. I hope someone else can edit this and pick up the threads?
Use recursion to solve this problem by scanning the string from left to right.
Keep a stack of the previous palindrome partitions that have already been found "to the left" of the "current position" in the overall string. This stack could be an array or std::vector of pointers to the ends (i.e. - one past the last character) of each previously found palindrome. In this case, the "current position" is indicated by the top element of the stack, or the beginning of the string if the stack is empty.
The base/exit case of the recursion is when the current position refers to the end of the entire string. In that case you've already exhausted the string. Print out the palindromes as indicated by the palindrome stack (starting from the bottom) and then return. (Hint: Don't alter the original string to insert nul terminators to print each palindrome as a string. Instead, just print each palindrome character-by-character according to the partitions on the stack, print spaces between the palindromes and a newline at the end of the stack.)
Otherwise, have a loop that goes from 1 up through the number of characters remaining in the string starting from the current position. At each iteration, test if the current position is a palindrome of length equal to your loop index. If it is such a palindrome, then push a partition for that palindrome onto the stack and recurse down to one level deeper.
That should do it.
I wouldn't use a std::stack to implement the stack. Instead use a std::vector or an array. If you use std::vector, then don't do structural operations (e.g. - push_back, pop_back, etc.) on it in the recursion. Instead, just resize() it to hold up to strlen(str) partition elements before you begin recursing because the deepest stack will be when each character of the string is a palindrome. Then in your recursion, you simply pass the logical, current size of the stack. This tells you the index where the next palindrome partition should be placed (i.e. - at index size) and allows you to access any previously existing top element of the stack (i.e. - at index size - 1). This approach will work for an array or a std::vector.
If you do want to use std::vector::push_back() (or std::stack), then you just need to remember to std::vector::pop_back() after you return from each recursion. This approach would allow you to not need to pass the "logical" size of the stack explicitly around as the vector itself would know its correct size.
#include <iostream>
#include <cstdlib>
#include <cctype>
#include <cstring>
#include <iomanip>
using std:: cin;
using std:: cout;
using std:: endl;
using std:: setw;
const int MAX_LEN = 100;
int palDecom(const char str[]);
bool isPal(const char str[], int start, int end);
int main()
{
char str[MAX_LEN];
cin >> setw(MAX_LEN) >> str;
cout << palDecom(str) ;
return EXIT_SUCCESS;
}
int palDecom(const char str[])
{
int counter=0;
for (int i = 1; i < strlen(str) ; i++)
for(int lastindex = strlen(str)-1; lastindex < strlen(str) ; lastindex--)
{
if(isPal(str, i , lastindex-1))
counter ++;
}
return counter;
}
bool isPal(const char str[], int start, int end)
{
if(start == strlen(str))
return 1;
if (str[start] == str[end]){
isPal(str, str[start], str[end-1]);
return true;
}
return false;
}

Modulo gives impossible value

I want a table of four values between 1 to 6.
I'm using: rand() % 6 + 1;
This should give values between 1 and 6.
Except if rand() generates the value 0.
I keep getting 7's. I don't want any 7's
What is the range of rand? How I prevent it from generation any 0 values?
Alternative solutions are quite welcome.
My teacher gave us the clue of using "random".
We use Borland C++ Builder 5 at school.
I am using Dev-C++ 5.3.0.3 at home.
I find there are a few differences to how they work, which I find strange..
I can't use random(), it gives me not declared in scope...
int main (){
int I;
int Fasit[3];
srand (time(NULL) );
for(I=0; I<4; I++) {
Fasit[I]=rand() % 6 + 1;
}
std::cout << Fasit[0] << " " << Fasit[1] << " " << Fasit[2] << " " << Fasit[3] << " ";
return 0;
}
Some values I get:
2 6 1 7
5 2 1 4
5 2 1 4
5 2 1 4
1 3 1 6
5 3 3 7
5 3 3 7
5 3 3 7
7 shouldn't be possible, should it?
PS: I know my print is ham fisted, I will make it a bit more elegant once the number generation works.
Consdier these lines:
int Fasit[3];
for(I=0; I<4; I++) {
Fasit[I]
You declare an array of three entries, which you write to four times.
Try your program again, but with:
int Fasit[4];
You only have 3 elements in Fasit[3]; When you write to Fasit[3], you are in the realm of undefined behavior, which in this case manifests it self with apparent contradiction.
Fasit[3] allows you to access only Fasit[0], Fasit[1], and Fasit[2].
Accessing Fasit[3], either for reading and writing, is undefined behavior. Your code is both writing and reading to Fasit[3] :-). The program is accessing the array out-of-bound. Fix it!
As to why 7 is printed, that is just coincidence. Note that Fasit[0-3] is always printed in the range 1-6 as you expected.
See also:
Array Index out of bound in C
Bounds checking
int Fasit[3];
You are creating an array of size 3, which can be accessed with indexes 0, 1 or 2 only.
You are writing and reading Fasit[3], which has an undefined behaviour. When a behaviour is undefined, you are bound to obtain weird results. This is it.

array in C++ inside forloop

What is happening when i write array[i] = '\0' inside a for loop?
char arrayPin[256];
for(int i = 0; i<256; i++)
{
arrayPin[i] = '\0';
}
The program attempts to access memory at the location of <base address of 'array'> + (<sizeof array element> * 'i') and assign the value 0 to it (binary 0, not character '0'). This operation may or may not succeed, and may even crash the application, depending upon the state of 'array' and 'i'.
If your array is of type char* or char[] and the assignment operation succeeds, then inserting the binary 0 at position 'i' will truncate the string at that position when it is used with things that understand C-style strings (printf() being one example).
So if you do this in a for loop across the entire length of the string, you will wipe out any existing data in the string and cause it to be interpreted as an empty/zero-length string by things that process C-style strings.
char arrayPin[256];
After the line above, arrayPin in an uninitialized array whose contents are unknown (assuming it is not a global).
----------------------------
|?|?|?|?|?|?|?|?|?|?|...|? |
----------------------------
byte: 0 1 2 3 4 5 6 7 8 9 255
Following code:
for(int i = 0; i<256; i++)
{
arrayPin[i] = '\0';
}
initializes every arrayPin element to 0:
----------------------------
|0|0|0|0|0|0|0|0|0|0|...|0 |
----------------------------
byte: 0 1 2 3 4 5 6 7 8 9 255
I suppose you have something like char *array. In this case It will write character with the code 0x00 into ith position.
This is quite useful when you work with ANSI strings. \0 indicates the end of the string. For example:
char str[] = "Hello world";
cout << str << endl; // Output "Hello world"
str[5] = '\0';
cout << str << endl; // Output just "Hello"