String efficiency with small strings - c++

I think i heard somewhere that strings have something called "small string optimization", a way of avoiding allocations. Can I avoid allocations altogether by doing something like this:
auto s = "hello" + "world!"s;
Instead of:
auto s = "hello, world!"s;

No, that won't work. SSO means storing short strings without a pointer inside a string object. As soon as you concatenate two short strings, it won't fit. There are string classes that have larger buffers internally, in case you need one that does SSO for say 31 characters.

Related

Storing single characters

If I want to store a single character say 'c' am i better of using
std::string myChar = 'c';
rather than the built in char type?
char myChar = 'c';
Is there any safety gained by storing single characters as string?
There is a little safety gained as you won't accidentally use the string for calculations.
int a = 5+myChar;
Will give a compiler error if it is a string and wont if it's a char, because those are seen as numbers.
Please note, that the first example doesn't compile. It has to be
std::string myChar = "c";
(with double quotes). I see more disadvantages in this approach:
It will consume way more memory than required. With short-string optimizations the data will not be stored on the heap, but a string is still 3 words long (often 1 word is 4Byte, so that would be 12 Bytes) compared to one byte1 when using char.
The access to that char is really inconvenient, you would always have to use .front(), .back() or [0] to access that char.
It doesn't convey the meaning of your variables, it's like replacing all int-variables in your program with a std::vector<int> with a single element.
The only "safety" I can see, is as AlexGeorg already mentioned, you can't mistakenly use it in calculations. But that's it and this could also be seen as disadavantage.
So, no, your most likely not better of when using a string to store a single character. Except you have some really specific circumstances.
1plus maybe some padding.
The positive thing using string is the error at compile time when you trying to use variable for a mathematical expression example :
int sum = 15 + myChar;
You have instead some negative thing to take in consideration :
The first one is the performance, allocate a string is more expensive in term of memory occupation and time of execution.
The second one is that the String does not assure that the variable has a single character. So you have to pay attention when you use it.

Are raw strings faster than normal strings?

I wanted to ask if raw strings are faster than normal strings at compile-time.
Let me explain what i mean with "raw" and "normal" strings...
We know there is a 'R' literal.
const char * raw = R"(Hello\nWorld!)"; will output
Hello\nWorld!
const char * normal = "Hello\nWorld!" will output
Hello
World!
So whats actually faster? I think using the R-Literal for strings like Hi, how are you? is faster than the 'normal' way we use strings.
So whats actually faster? I think using the R-Literal for strings like Hi, how are you? is faster than the 'normal' way we use strings.
OK, as you're asking for the compile time impact of "normal" or raw string literals, it could be that raw string literals can be handled faster, since the compiler won't need to handle escape character parsing and translation.
Though I believe that the difference won't be really significant.
The major advantage of raw string literals is, that you don't need to care about escaping special characters when writing the source code.
Raw strings might be slightly faster or slower to parse in a particular compiler, but the difference will almost certainly be too small to notice.
The purpose of raw strings isn't to improve compilation speed. It's to let you write string literals that contain lots of special characters (like backslashes and quotes) in a more-readable way, without having to insert lots of additional backslashes for escaping.
Use normal string literals unless your string needs lots of escaping that makes it look awkward in the source code. Use raw string literals just for those cases.

Adding string variables C++

Why can't we add two C-strings in C++? This is what I know, please correct me, and add to it.
Is it because the '+' operator is not overloaded to do the operation. The compiler essentially interprets the variable name as a pointer. Since, we can not add two pointers, so we can't add two string variables like this: str = str + "str"?
First, realize that concatenating strings as std::string::operator+() and std::string::operator+=() do, is a conceptually different operation from addition.
The things you're asking about "adding" are C-style strings, which come from C, so C++ supports them. There is no operator overloading in C.
A C-style string is just a pointer to an array of characters in memory, which is (hopefully) terminated by a NULL character '\0'. A char * may point to a C-string, or it may not. You have to manage the memory yourself. The memory may be allocated statically, and there may be only enough room for the characters you put there to start with, such as when you define a string char myString[] = "blah";. There are no built-in, automatic memory reallocation mechanisms, so even if such a concatenation function were defined, it couldn't guarantee that there would be room for whatever you want to append to a string in the target buffer.
By contrast, in C++, std::string is a class that dynamically allocates memory as needed. std::string objects are always std::strings; they don't point to other things. Concatenation operators are defined (+, +=), and they handle memory (re)allocation for you.
...So now you know the two main types of strings used in C++, so you can do some further research on "C-style strings" or "C strings"1; and std::string, which is a "C++ string". Maybe start at Wikipedia: String (C++).
1 FYI, "C-String" is also the name of a type of revealing clothing, so you may get some NSFW search results mixed in using that term.

Overlapping strings

I have a problem with overlapping char*.
I'm working in a low-memory environment, namely Arduino and I would like to use the least memory possible. I want to be able to prepend a string with another and to do it without any copying of variables which wastes memory.
This is standard C or C++.
char* bigPacket = (char*)malloc(25); //Makes a big string of length 25
char* payload = bigPacket + 2; //This is part of the big string, 2 chars in.
bigPacket[0] = 72; // Letter 'H'
bigPacket[1] = 72; //I'm expecting the final bigPacket to read "HHHello, world"
payload = "Hello, World";
print(bigPacket);
But the problem is that it does not print "HHHello, world" as it should. Instead, it just prints "HH". Is there a proper way to make it be able to overlap these strings to print "HHHello, world"?
You changed where payload points. What you needed to do was leave payload alone and change the data it points to.
strcpy(payload, "Hello World");
Edit: If you really want to avoid copies you'd end up with something like the SGI Rope class. But you'd pay a lot in code complexity.
If you want to do this without either very complicated code or multiple copies of data, destroying the benefit, you need to have the complete string as one literal in your program: "HHHelloWorld". You can then play with pointers and lengths to access various parts of it, but remember there is only one null byte, at the end of the string.
However, I suspect that this is an over-optimization. Arduino programming rarely involves a lot of very long string. It is important to keep the code simple and direct.
You should not mess with pointers for something like that. Instead you should store string literals in flash instead of sram memory. This is usually done with the help of progmem macros. Often the "F" macro is sufficient though. Then you can copy your strings - as needed - and if needed - into a suitable buffer.
Simplest example:
Serial.println(F("this is text from flash memory"));
You just assign the payload pointer to point to the constant string, you do not copy the string to what it currently points to.
In order to copy the string you need to use strcpy or memcpy:
char *bigPacket = malloc(25);
bigPacket[0] = bigpacket[1] = 72;
strcpy( bigpacket+2, "Hello, World");
print( bigPacket );
Note that this is rather unlikely to save memory, since "Hello, world" will exist as a constant string in your code, to save memory it is probably most efficient to call print multiple times.
However, I guess that is not possible in this case.

c++ best way to call function with const char* parameter type

what is the best way to call a function with the following declaration
string Extract(const char* pattern,const char* input);
i use
string str=Extract("something","input text");
is there a problem with this usage
should i use the following
char pattern[]="something";
char input[]="input";
//or use pointers with new operator and copy then free?
the both works but i like the first one but i want to know the best practice.
A literal string (e.g. "something") works just fine as a const char* argument to a function call.
The first method, i.e. passing them literally in, is usually preferable.
There are occasions though where you don't want your strings hard-coded into the text. In some ways you can say that, a bit like magic numbers, they are magic words / phrases. So you prefer to use constant identifier to store the values and pass those in instead.
This would happen often when:
1. a word has a special meaning, and is passed in many times in the code to have that meaning.
or
2. the word may be cryptic in some way and a constant identifier may be more descriptive
Unless you plain to have duplicates of the same strings, or alter those strings, I'm a fan of the first way (passing the literals directly), it means less dotting about code to find what the parameters actually are, it also means less work in passing parameters.
Seeing as this is tagged for C++, passing the literals directly allows you to easily switch the function parameters to std::string with little effort.