Somewhere in my brainstem a voice whispers:
In C++, an array does not need more memory than the number of elements
need.
std::string str = "aabbcc";
std::array<std::string, 3> str_array = {"aa", "bb", "cc"};
Accordingly, both should have the same size, because (unlike in Java), there is no separate size field or similar. But I haven't found a reference.
Is this true? Under which circumstances is it not?
Storing strings in any language is more complicated than you think. A C++ std::string must provide you contiguous storage for the contents. Apart from that, std::string can hold more things, like pointer/iterator to the last character, number of characters in it, etc. std::string::size is required to be O(1), so it must store more information than just a buffer. Also, most standard library implementations provide SSO (small string optimization). When SSO is enabled, std::string allocates a small buffer, to avoid unneccessary dynamic allocations. You can also reserve more memory than you need. Lets say, you need to collect 800-1000 characters in loop. You can do it like this:
std::string str;
for(...)
str += some_character;
But this will cause unneccessary memory allocations and deallocations. If you can estimate number of characters you want to store, you should reserve memory.
std::string str;
str.reserve(1000);
for(...)
str.push_back(some_character);
Then, you can always shrink_to_fit, to save memory:
str.shrink_to_fit();
There are also other things you must be aware of:
reserve increases capacity, but size stays the same. It means, that std::string must also store (or be able to calculate) for how many more characters buffer capacity allows.
string literals are null terminated
std::basic_string::c_str must return null terminated array of characters, so it is possible that std::string also contains null terminator (unluckily I am not sure how it is done)
there are more encodings and characters sets - ASCII is just one of them. UTF-8 and UTF-16 encoded strings may need to use few stored elements to add up to one code point, but this is more complicated.
In C++, an array does not need more memory than the number of elements need.
This is true. A raw array has a size equal to the size of it's element type times the number of elements. So,
int array[10];
has a size of sizeof(int) * std::size(array). std::array is the same but it is allowed to have padding so
std::array<int, 10> array;
has the size of sizeof(int) * std::size(array) + P where P is some integer amount of padding.
Your example though isn't quite the same thing.
A std::string is a container. It has it's own size that is separate of what it contains. So sizeof(std::string) will always be the same thing regardless of how many characters are in the string. So ignoring short string optimization
std::string str = "aabbcc";
Takes of sizeof(std::string) plus however much the string allocated for the underlying c-string. That is not the same value as
std::array<std::string, 3> str_array = {"aa", "bb", "cc"};
Since you now have 3 * sizeof(std::string) plus whatever each string allocated.
Accordingly, both should have the same size (e.g. 6 bytes),
Not a correct deduction.
The memory used by a std::string, if you want to call its size, consists of at least a pointer and the memory allocated to hold the data.
The memory allocated to hold the data can also include the space required to hold the terminating null character.
Given
std::string s = "aabbcc";
std::string a = "aa";
std::string b = "bb";
std::string c = "cc";
mem(s) != mem(a) + mem(b) + mem(c)
Virtually every string can hold following info:
The size of the string i.e. num of chars it contains.
The capacity of memory holding the string's chars.
The value of the string.
Additionally it may also hold:
A copy of it's allocator and reference count for the value.
They don’t have the same size. Strings are saved null-terminated, giving you an extra byte for each string.
Related
I am looking for something which give me size which taken by str character pointer.
int main()
{
char * str = (char *) malloc(sizeof(char) * 100);
int size = 0;
size = /* library function or anything use to find size */
printf("Total size of str array - %d\n", size);
}
I want prove that give memory is 100 bytes.
Is any one have any idea about this ?
A raw pointer only knows it points to a single element of it's type. If that thing it points to happens to be part of an array, the pointer doesn't know and there's no way to get that information from it.
You want to instead use types that do know their size, like for example; std::string, std::array or std::vector.
The C and C++ standards do not provide a way to get, from an address, the amount of memory that was requested in the call to malloc that returned that address.
Some C or C++ implementations provide a way to get the amount of memory that was provided at the given address, such as malloc_size. The amount provided may be greater than the amount that was requested.
If the memory contains a string, which is an array of characters terminated by a null character, then you can determine the length of the string by counting characters up to the null character. This function is provided by the standard strlen function. This length is different from the space allocated unless, of course, the string happens to fill the space.
There is no (good, standard, portable) way to tell from a pointer value alone whether it's the first element of an array or not, nor how many elements follow it. That information has to be tracked separately.
If you're writing in C++, don't do your own memory management if you can help it. Use a standard container type like std::vector or std::map (or std::string for text). If you must do your own memory management, use the new and delete operators instead of the *alloc and free library functions, and wrap a class around those operations that also keeps track of how many elements have been allocated (which, like std::vector and std::map, is returned via a read-only size() method).
I would like to concatenate 2 strings in C or C++ without new memory allocation and copying. Is it possible?
Possible C code:
char* str1 = (char*)malloc(100);
char* str2 = (char*)malloc(50);
char* str3 = /* some code that concatenates these 2 strings
without copying to occupy a continuous memory region */
Then, when I don't need them any more, I just do:
free(str1);
free(str2);
Or if possible, I would like to achieve the same in C++, using std::string or maybe char*, but using new and delete (possibly void operator delete ( void* ptr, std::size_t sz ) operator (C++14) on the str3).
There are a lot of questions about strings concatenation, but I haven't found one that asks the same.
No, it is not possible
In C, malloc operations return blocks of memory that have no relationship to each other. But in C, strings must be a continuous array of bytes. So there is no way to extend str1 without copying, let alone concatenate.
For C++, perhaps ropes may be of interest: See this answer.
Ropes are allocated in chunks that do not have to be contiguous. This supports O(1) concatenation. However, the accessors make it appear as a single string of bytes. I'm certain that to convert ropes back to std::string or C style strings will take a copy however, but this is probably the closest to what you want.
Also, it is probably a premature optimization to worry about the costs of copying a few strings around. Unless you are moving lots of data, it won't matter
Text concatenation is possible by writing your own string data structure. Easier in C++ than C.
struct My_String
{
std::vector<char *> text_fragments;
};
You would have to implement all the text manipulation and searching algorithms based on this data structure. Nothing in the C library could be applied to the My_String structure. The std::string in C++ would not be compatible.
One of the issues is how to handle text modification. If one of the text fragments is a constant literal (that can't be modified), it would need to be copied before it could be modified. But copying is against the requirements. :-(
A "string" in C is a an array of chars with a null char at the end. And an array is "a data structure that lets you store one or more elements consecutively in memory". GNU C reference
You cannot concatenate two arrays that are not in consecutive memory blocks without copying one of them. You can do it however without allocating new memory. E.g.
char* str1 = malloc(100); // size 100 bytes, uninitialised
str1[0] = '\0'; // string length 0, size of str1 100
strcat(str1, "a"); // string length 1, size of str1 still 100
strcat(str1, "b"); // string length 2, size of str1 still 100
You could if you want retrieve chars of 2 strings as if they were one without copying or reallocating. Here is an example function to do that (simple example, don't use in production code)
char* str1 = (char*)malloc(100);
char* str2 = (char*)malloc(50);
char get_char(int i) {
if (i > 0 && i < 100) {
return str1[i];
}
if (i >= 100 && i < 150) {
return str2[i-100];
}
return 0;
}
But in such a case you couldn't have a char* str3 to perform pointer arithmetic with and access all 150 chars.
Tags C and C++ are contradictory. In C, I'd recommend exploring realloc. You can code something along following lines:
char* str = malloc(50);
str = realloc(ptr, 55);
If you are lucky, the realloc call will not reallocate new memory and just 'extened' the already allocated segment, but there is no guarantee for this. This way you at at least have a shot of avoiding reallocations of the string. You will still have to copy contents of the second string into neweley allocated memory.
When I call gethostname using a char my length 25 but when I use a string my length is 64. Not really sure why. Both of them I am declaring the same size on HOST_NAME_MAX.
char hostname[HOST_NAME_MAX];
BOOL host = gethostname(hostname, sizeof hostname);
expectedComputerName = hostname;
int size2 = expectedComputerName.length();
std::string test(HOST_NAME_MAX, 0);
host = gethostname(&test[0], test.length());
int testSize = test.length();
An std::string object can contain NULs (i.e. '\0' characters). You are storing the name in the first bytes of a string object that was created with a size of HOST_NAME_MAX length.
Storing something in the beginning of the string data won't change the length of the string that remains therefore HOST_NAME_MAX.
When creating a string from a char pointer instead the std::string object created will contain up to, but excluding, the first NUL character (0x00). The reason is that a C string cannot contain NULs because the first NUL is used to mark the end of the string.
Consider what you're doing in each case. In the former code snippet, you're declaring a character array capable of holding HOST_NAME_MAX-1 characters (1 for the null terminator). You then load some string data into that buffer via the call to gethostname and then print out the length of buffer by assigning it to a std::string object using std::string::operator= that takes a const char *. One of the effects of this is that it will change an internal size variable of std::string to be strlen of the buffer, which is not necessarily the same as HOST_NAME_MAX. A call to std::string::length simply returns that variable.
In the latter case, you're using the std::string constructor that takes a size and initial character to construct test. This constructor sets the internal size variable to whatever size you passed in, which is HOST_NAME_MAX. The fact that you then copy in some data to std::strings internal buffer has no bearing on its size variable. As with the other case, a call to the length() member function simply returns the size - which is HOST_NAME_MAX - regardless of whether or not the actual length of the underlying buffer is smaller than HOST_NAME_MAX.
As #MattMcNabb mentioned in the comments, you could fix this by:
test.resize( strlen(test.c_str()) );
Why might you want to do this? Consistency with the char buffer approach might be a reason, but another reason may be performance oriented. In the latter case you're not only outright setting the length of the string to HOST_NAME_MAX, but also its capacity (omitting the SSO for brevity), which you can find starting on line 242 of libstdc++'s std::string implementation. What this means in terms of performance is that even though only, say, 25 characters are actually in your test string, the next time you append to that string (via +=,std::string::append,etc), it's more than likely to have to reallocate and grow the string, as shown here, because the internal size and internal capacity are equal. Following #MattMcNabb's suggestion, however, the string's internal size is reduced down to the length of the actual payload, while keeping the capacity the same as before, and you avoid the almost immediate re-growth and re-copy of the string, as shown here.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
std::string and its automatic memory resizing
I am just curious, how are strings stored in memory? for example, when I do this:
string testString = "asd";
it allocates 4 bytes, right? a + s + d + \0.
But later, when I want to assign some new text to this string, it works, but I don't understand how. For example I do this:
testString = "123456789"
Now it should be 10 bytes long. But what if there wasn't space for such string? let's say that fifth+sixth bytes from the beginning of string are taken by some other 2 chars. How does the CPU handles it? It finds completely new position in memory where that string fits?
This is implementation dependent, but the general idea is that the string class will contain a pointer to a region of memory where the actual contents of the string are stored. Two common implementations are storing 3 pointers (begin of the allocated region and data, end of data, end of allocated region) or a pointer (begin of allocated region and data) and two integers (number of characters in the string and number of allocated bytes).
When new data is appended to the string, if it fits the allocated region it will just be written and the size/end of data pointer will be updated accordingly. If the data does not fit in the region a new buffer will be created and the data copied.
Also note that many implementations have optimizations for small strings, where the string class does contain a small buffer. If the contents of the string fit in the buffer, then no memory is dynamically allocated and only the local buffer is used.
string is not a simple datatype like char *. It's a class, which has implementation details that aren't necessarily visible.
Among other things, string includes a counter to keep track of how big it really is.
char[] test = "asd"; // allocates exactly 4 bytes
string testString = "asd"; // who knows?
testString = "longer"; // allocates more if necessary
Suggestion: write a simple program and step through it using a debugger. Examine the string, and see how the private members change as the value is changed.
string is an object, not just some memory location. It dynamically allocates memory as needed.
The = operator is overloaded; when you say testString = "123456789"; a method is being called and deals with the const char * you passed in.
It's stored with a size. If you store a new string, it will optionally deallocate the existing memory and allocate new memory to cope with the change in size.
And it doesn't necessarily allocate 4 bytes the first time you assign a string of 4 bytes to it. It may allocate more space than that (it won't allocate less).
I would like to get the bytes a std::string's string occupies in memory, not the number of characters. The string contains a multibyte string. Would std::string::size() do this for me?
EDIT: Also, does size() also include the terminating NULL?
std::string operates on bytes, not on Unicode characters, so std::string::size() will indeed return the size of the data in bytes (without the overhead that std::string needs to store the data, of course).
No, std::string stores only the data you tell it to store (it does not need the trailing NULL character). So it will not be included in the size, unless you explicitly create a string with a trailing NULL character.
You could be pedantic about it:
std::string x("X");
std::cout << x.size() * sizeof(std::string::value_type);
But std::string::value_type is char and sizeof(char) is defined as 1.
This only becomes important if you typedef the string type (because it may change in the future or because of compiler options).
// Some header file:
typedef std::basic_string<T_CHAR> T_string;
// Source a million miles away
T_string x("X");
std::cout << x.size() * sizeof(T_string::value_type);
std::string::size() is indeed the size in bytes.
To get the amount of memory in use by the string you would have to sum the capacity() with the overhead used for management. Note that it is capacity() and not size(). The capacity determines the number of characters (charT) allocated, while size() tells you how many of them are actually in use.
In particular, std::string implementations don't usually *shrink_to_fit* the contents, so if you create a string and then remove elements from the end, the size() will be decremented, but in most cases (this is implementation defined) capacity() will not.
Some implementations might not allocate the exact amount of memory required, but rather obtain blocks of given sizes to reduce memory fragmentation. In an implementation that used power of two sized blocks for the strings, a string with size 17 could be allocating as much as 32 characters.
Yes, size() will give you the number of char in the string. One character in multibyte encoding take up multiple char.
There is inherent conflict in the question as written: std::string is defined as std::basic_string<char,...> -- that is, its element type is char (1-byte), but later you stated "the string contains a multibyte string" ("multibyte" == wchar_t?).
The size() member function does not count a trailing null. It's value represents the number of characters (not bytes).
Assuming you intended to say your multibyte string is std::wstring (alias for std::basic_string<wchar_t,...>), the memory footprint for the std::wstring's characters, including the null-terminator is:
std::wstring myString;
...
size_t bytesCount = (myString.size() + 1) * sizeof(wchar_t);
It's instructive to consider how one would write a reusable template function that would work for ANY potential instantiation of std::basic_string<> like this**:
// Return number of bytes occupied by null-terminated inString.c_str().
template <typename _Elem>
inline size_t stringBytes(const std::basic_string<typename _Elem>& inString, bool bCountNull)
{
return (inString.size() + (bCountNull ? 1 : 0)) * sizeof(_Elem);
}
** For simplicity, ignores the traits and allocator types rarely specified explicitly for std::basic_string<> (they have defaults).