C++: How does string vectors' random access time work? - c++

I know a simple int vector have O(1) random access time, since it is easy to compute the position of the xth element, given all elements have the same size.
Now whats up with a string vector?
Since the string lengths vary, it can't have O(1) random access time, can it? If it can, what is the logic behind it?
Thanks.
Update:
The answers are very clear and concise, thank you all for the help.
I accepted Joey's answer because it is simple and easy to understand.

The vector does have O(1) access time.
String objects are all the same size (on a given implementation), regardless of the size of the string they represent. Typically the string object contains a pointer to allocated memory which holds the string data.
So, if s is a std::string, then sizeof s is constant and equal to sizeof(std::string), but s.size() depends on the string value. The vector only cares about sizeof(std::string).

The string references are stored in one location. The strings may be stored anywhere in memory. So, you still get O(1) random access time.
---------------------------
| 4000 | 4200 | 5000 | 6300 | <- data
---------------------------
[1000] [1004] [1008] [1012] <- address
[4000] [4200] [5000] [6300] <- starting address
"string1" "string2" "string3" "string4" <- string

Because the string object has a fixed size just like any other type. The difference is that string object stores its own string on heap, and it keeps a pointer to the string which is fixed in size.

The actual string in a std::string is usually just a pointer. The sizeof a string is always the same, even if the length of the string it holds vary.

You've gotten a number of answers (e.g., Steve Jessop's and AraK's) that are mostly correct already. I'll add just one minor detail: many current implementations of std::string use what's called a short string optimization (SSO), which means they allocate a small, fixed, amount of space in the string object itself that can be used to store short strings, and only when/if the length exceeds what's allocated in the string object itself does it actually allocate separate space on the heap to store the data.
As far as a vector of strings goes, this make no real difference: each string object has a fixed size regardless of the length of the string itself. The difference is that with SSO that fixed size is larger -- and in many cases the string object does not have block allocated on the heap to hold the actual data.

Related

C++17 what is the fastest way to convert list ( or other container ) to string cross platform

i have list of strings
std::list<std::string> list = { "--version", "--setcolor", "--setint" };
list.push_back("--command1");
list.push_back("--command2");
list.push_back("--command3");
basically list of commands which in the end like to convert them to string
the list, in the end, can contain up to 100 commands.
what is the fastest way to do it in c++17?
UPDATE
It can be also array or other container
the end result should be like this :
std::string s = "--version --setcolor --setint --command1 --command2 --command3";
Thanks
Using std::string as an accumulator (e.g., "string buffer", as seen in Java or C#) is standard practice in C++ and there's nothing C++11/14/17/20 adds to make it "faster".
Simply add up the length of the substrings - and the delimiters between them - string::reserve enough capacity in your target string, then string::append away ...
[#Justin answered in a comment, but putting it here as well because comments are ephemeral]
In C++ strings are modifiable (unlike other languages like Java and C# where they're immutable). So you can change them, including adding characters to them (e.g., via string::append).
There are two relevant concepts: the string's size - which is how many characters it has right now, and the string's capacity - which is how much it could hold given the storage already allocated to it. If the capacity is the same as the size then adding anything to the string will require a new allocation of storage from the heap, and a copy of the current contents of the string into it (and a deallocation of the old storage back to the heap.) (*) If the capacity is larger than the size then appending new characters will not require that allocation/copy/deallocation if the new characters will fit in the capacity available.
If you're doing multiple appends to a string this might mean multiple alloc/copy/dealloc cycles. Which is expensive.
So instead you reserve additional capacity up front: By calling string::reserve the capacity is increased to the amount specified in one shot. Then you append all your stuff bit by bit (byte by byte ..., substring by substring) and it'll all fit (if you calculated the necessary capacity correctly) without any additional alloc/copy/dealloc.
string::reserve is discussed here at cppreference. While you're there check out string::capacity - to find the current capacity of the string, and why not also look at string::resize - used to shrink the capacity until it is just as large as needed for the current contents of the string.
(*) Yes I'm leaving out the small string optimization ...

Memory usage is the same as data size?

I'm doing a performance test of the SHA3 algorithm on a variable, I'm checking the execution time of the algorithm for different size of the variable. For this I am using char type and increasing the size of it, but I do not know if I am doing it effectively right. I will use the line of code below to explain my doubt.
char[1000] = "A text";
I know that each char has a size of 1 Byte. My question is: when I predefine a vector, will the size of the variable be the index of the vector, in this case 1000? Or will the size of the variable be given by the content inside it, in this case by the text, which would be 6 Bytes?
The test that I'm doing is right? Or does not the allocated memory size account for the performance of SHA3? (I ask this because I intend to do the same test with larger values. If I want to, for example, do this test with 20 KBytes, will I have to fill in the variable with 20000 characters?)
I'm using C++.
The amount of memory allocated on the stack by that line of code will be 1,000 bytes. However, what you send to your SHA3 code may only be the number of bytes of the string "A text", depending on how you're calling it, and how it uses the data. If it calculates the length of the string using a function like strlen(), then it will likely only iterate over the 6 characters (and 1 NUL byte) of the string and ignore the remaining 993 bytes. So it really depends on how you're using it and how you're calculating the size for your tests.

How is stored a vector<string> in memory

I am working on a project where I absolutly need to have data contiguous in memory.
I want to store some (maximum 100) string (I don't know the actual size of each string). So I will create a vector of string of 100 elements.
std::vector<std::string> vect;
vect.reserve(100)
But a String can be of any size. So how does it work? Is my vector reallocated everytime I change a string? Or is a std::string simply like a pointer to the first character of the string like a char* would be for a C string?
Each string will be an instance of class string and that instance will contain a char*.
The string objects in the vector will be in contiguous memory.
The chars of each string will be in contiguous memory
All The chars of all the strings will not be in contiguous memory, unless you define a custom std::allocator for the strings
The location in memory of the strings may change when you increase the size of the vector or call shrink_to_fit
The location in memory of the chars of each string may change when you increase the size of the string
The vector will not be reallocated if you modify or remove one of the strings
There is something called Small String Optimization. If that comes into play the chars of each string will be stored within the string instead of another location pointed to by char*
The data in std::vector is laid out contiguosly. However std::strings implementation does not guarantee that the memory holding the character array is stored locally to the class itself. How could it? Like you said you don't know how large the string will be.
A lot of array like structures have a layout like follows:
class string
{
T * begin;
T * end;
T * capacity;
}
Which means that your vector of 100 strings will have 100 instances of a class layout that POINTS to the memory where the string is stored.
Now if you need to pack memory allocations as tightly as possible and still want to use std::string you can write a custom allocator.
Maybe you can write the string data into a char array and have a second container that stores the lengths of each individual string + NULL terminator.
The implementation of string is implementation defined and has actually changed between different versions of certain compilers (for example from gcc 4.9 to 5.0). The is absolutely no guarantee that the chars in consecutive strings are contiguous in memory, even if you use a custom allocator.
So if you really need the chars to be contiguous in memory, you must use just a vector<char>.

(Why) does an empty string have an address?

I guessed no, but this output of something like this shows it does
string s="";
cout<<&s;
what is the point of having empty string with an address ?
Do you think that should not cost any memory at all ?
Yes, every variable that you keep in memory has an address. As for what the "point" is, there may be several:
Your (literal) string is not actually "empty", it contains a single '\0' character. The std::string object that is created to contain it may allocate its own character buffer for holding this data, so it is not necessarily empty either.
If you are using a language in which strings are mutable (as is the case in C++), then there is no guarantee that an empty string will remain empty.
In an object-oriented language, a string instance with no data associated with it can still be used to call various instance methods on the string class. This requires a valid object instance in memory.
There is a difference between an empty string and a null string. Sometimes the distinction can be important.
And yes, I very much agree with the implementation of the language that an "empty" variable should still exist in and consume memory. In an object-oriented language an instance of an object is more than just the data that it stores, and there's nothing wrong with having an instance of an object that is not currently storing any actual data.
Following your logic, int i; would also not allocate any memory space, since you are not assigning any value to it. But how is it possible then, that this subsequent operation i = 10; works after that?
When you declare a variable, you are actually allocating memory space of a certain size (depending on the variable's type) to store something. If you want to use this space right way or not is up to you, but the declaration of the variable is what triggers memory allocation for it.
Some coding practices say you shouldn't declare a variable until the moment you need to use it.
An 'empty' string object is still an object - there may be more to its internal implementation than just the memory required to store the literal string itself. Besides that, most C-style strings (like the ones used in C++) are null-terminated, meaning even that "empty" string still uses one byte for the terminator.
Every named object in C++ has an address. There is even a specific requirement that the size of every type be at least 1 so that T[N] and T[N+1] are different, or so that in T a, b; both variables have distinct addresses.
In your case, s is a named object of type std::string, so it has an address. The fact that you constructed s from a particular value is immaterial. What matters is that s has been constructed, so it is an object, so it has an address.
s is a string object so it has an address. It has some internal data structures keeping track of the string. For example, current length of the string, current storage reserved for string, etc.
More generally, the C++ standard requires all objects to have a nonzero size. This helps ensure that every object has a unique address.
9 Classes
Complete objects and member subobjects of class type shall have nonzero size.
In C++, all classes are a specific, unchanging size. (varying by compiler and library, but specific at compile-time.) The std::string usually consists of a pointer, a length of allocation, and a length used. That's ~12 bytes, no matter how long the string is, and you have allocated std::string s on the call stack. When you display the address of the std::string, cout displays the location of the std::string in memory.
If the string doesn't point at anything, it won't allocate any space from the heap, which is like what you're thinking. But, all c-strings end in a trailing NULL, so the c-string "" is one character long, not zero. This means when you assign the c-string "" to the std::string, the std::string allocates 1 (or more) bytes, and assigns it the value of the trailing NULL character (usually zero '\0').
If there truly was no point to the empty string, then the programmer would not write the instruction at all. The language is loyal and trusting! And will never assume memory you allocate to be "wasted". Even if you are lost and heading over a cliff, it will hold your hand to the bitter end.
I think it'd be interesting to know, just as a curiosity though, that if you create a variable that isn't 'used' later, such as your empty string, the compiler may very well optimize it away so it incurs no cost to begin with. I guess compilers aren't as trusting...

Can C++ automatic variables vary in size?

In the following C++ program:
#include <string>
using namespace std;
int main()
{
string s = "small";
s = "bigger";
}
is it more correct to say that the variable s has a fixed size or that the variable s varies in size?
It depends on what you mean by "size".
The static size of s (as returned by sizeof(s)) will be the same.
However, the size occupied on the heap will vary between the two cases.
What do you want to do with the information?
i'll say yes and no.
s will be the same string instance but it's internal buffer (which is preallocated depending on your STL implementation) will contain a copy of the constant string you wanted to affect to it.
Should the constant string (or any other char* or string) have a bigger size than the internal preallocated buffer of s, s buffer will be reallocated depending on string buffer reallocation algorithm implemented in your STL implmentation.
This is going to lead to a dangerous discussion because the concept of "size" is not well defined in your question.
The size of a class s is known at compile time, it's simply the sum of the sizes of it's members + whatever extra information needs to be kept for classes (I'll admit I don't know all the details) The important thing to get out of this, however is the sizeof(s) will NOT change between assignments.
HOWEVER, the memory footprint of s can change during runtime through the use of heap allocations. So as you assign the bigger string to s, it's memory footprint will increase because it will probably need more space allocated on the heap. You should probably try and specify what you want.
The std::string variable never changes its size. It just refers to a different piece of memory with a different size and different data.
Neither, exactly. The variable s is referring to a string object.
#include <string>
using namespace std;
int main()
{
string s = "small"; //s is assigned a reference to a new string object containing "small"
s = "bigger"; //s is modified using an overloaded operator
}
Edit, corrected some details and clarified point
See: http://www.cplusplus.com/reference/string/string/ and in particular http://www.cplusplus.com/reference/string/string/operator=/
The assignment results in the original content being dropped and the content of the right side of the operation being copied into the object. similar to doing s.assign("bigger"), but assign has a broader range of acceptable parameters.
To get to your original question, the contents of the object s can have variable size. See http://www.cplusplus.com/reference/string/string/resize/ for more details on this.
A variable is an object we refer to by a name. The "physical" size of an object -- sizeof(s) in this case -- doesn't change, ever. They type is still std::string and the size of a std::string is always constant. However, things like strings and vectors (and other containers for that matter) have a "logical size" that tells us how many elements of some type they store. A string "logically" stores characters. I say "logically" because a string object doesn't really contain the characters directly. Usually it has only a couple of pointers as "physical members". Since the string objects manages a dynamically allocated array of characters and provides proper copy semantics and convenient access to the characters we can thing of those characters as members ("logical members"). Since growing a string is a matter of reallocating memory and updating pointers we don't even need sizeof(s) to change.
i would say this is string object , And it has capability to grow dynamically and vice-versa