Is there a way to omit the empty string literals ("") in the argument list of the fmt::format function?
I have the below snippet which gives the desired output:
#include <string>
#include <fmt/core.h>
int main( )
{
const std::string application_layer_text_head { fmt::format( "{:*<5}[Application Layer]{:*<51}\n\n", "", "" ) };
fmt::print( "{}", application_layer_text_head );
}
Output:
*****[Application Layer]***************************************************
So instead of writing this: fmt::format( "{:*<5}[Application Layer]{:*<51}\n\n", "", "" ) can we remove the empty literals and write this: fmt::format( "{:*<5}[Application Layer]{:*<51}\n\n" )? I tried it but it failed to compile. Those two literals don't really serve any purpose so I want to find a way to not write them.
Just to clarify, I only want to have 5 * in the beginning and then [Application Layer] and then 51 * and then 2 \n.
Formatting markup is meant for formatting a string with some piece of user-provided data. The particulars of the specialized syntax within formatting can adjust how this formatting works, even inserting characters and the like. But this functionality is meant to be a supplement to the basic act: taking some user-provided object and injecting it into a string.
So no, format doesn't have a mechanism to allow you to avoid providing the user-provided data that is the entire reason format exists in the first place.
It should also be noted that the very meaning of the text after the : in a format specifier is defined based on the type of the object being formatted. The "*<5" means "align to 5 characters using '*' characters to fill in the blanks" only because you provided a string for that particular format parameter. So not only does format not provide a way to do this, it functionally cannot. You have to tell it what type is being used because this is an integral part of processing what "*<5" means.
As noted already, format can't do this. But your worries about string concatenation being expensive are misplaced; repeated application of operator+ is expensive (performs new allocations, copies all existing data and new data, discards old data, over and over), but in-place concatenation with operator+= and append is cheap, especially if you pre-reserve (so you're allocating once up-front and populating, not relying on amortized growth patterns in reallocation to save you). Even without pre-reserve, std::string follows amortized growth patterns, so repeated in-place concatenation is amortized O(1) (per character added), not O(n) in the size of the data so far.
The following should be, essentially by definition, at least as fast as formatting, though at the expense of a larger number of lines of code to perform the pre-reserve to prevent reallocation:
#include <string>
#include <string_view>
#include <fmt/core.h>
using namespace std::literals;
int main( )
{
// Make empty string, and reserve enough space for final form
auto application_layer_text_head = std::string();
application_layer_text_head.reserve(5 + "[Application Layer]"sv.size() + 51 + "\n\n"sv.size());
// append is in-place, returning reference to original string, so it can be chained
// Using string_view literals to avoid need for any temporary runtime allocated strings,
// while still allowing append to use known length concatenation to save scanning for NUL
application_layer_text_head.append(5, '*').append("[Application Layer]"sv).append(51, '*').append("\n\n"sv);
fmt::print("{}", application_layer_text_head);
}
If you were okay with some of the concatenations potentially performing reallocation, and a final move construction to move the resources from the temporary reference to a real string, it simplifies to a one-liner:
const auto application_layer_text_head = std::move(std::string(5, '*').append("[Application Layer]"sv).append(51, '*').append("\n\n"sv));
or, given that 5 asterisks is short enough to type, the even shorter/simpler version:
const auto application_layer_text_head = std::move("*****[Application Layer]"s.append(51, '*').append("\n\n"sv));
But keeping it to a two-liner avoids the move construction and is a little safer:
auto application_layer_text_head = "*****[Application Layer]"s;
application_layer_text_head.append(51, '*').append("\n\n"sv);
Yeah, none of those are quite as pretty as a single format literal, even with "ugly" empty placeholders. If you prefer the look of the format string, just pass along the empty placeholders the way you're already doing.
Related
I have a big read-only string that I scan for syntax and based on that simple syntax I extract a bunch of smaller strings that I use later for further processing. Based on testing, creating and copying most of the big string into the small strings is kind of a performance bottleneck (there are thousands of them per each big string).
I figured that I don't actually need to allocate for-, and copy the data though. What I really need is a sort of string snippet type instead that would only store a pointer to the start of the relevant data and the length but at the same time, it should be a drop-in replacement for std::string and all the standard library interactions it has.
That would be the easiest to implement anyways, I could roll my own class for that and implement the functions I need but if there is already something like it in the standard library then why bother.
So basically, is there a substring sort of class in STL?
Yes, since C++17 you have std::string_view.
Example:
#include <iostream>
#include <string>
#include <string_view>
int main() {
std::string foo = "Hello world";
std::string_view a(foo.c_str(), 5);
std::string_view b(foo.c_str() + 6, 5);
std::cout << a << '\n' // prints Hello
<< b << '\n'; // prints world
}
This is where using std::string_view instead of std::string is very beneficial in reducing copies of those original strings and being able to use std::string_view::substr.
Instead of copying the strings you are operating on, a string view provides a view to the underlying string - pretty much just the pointer to the start of the string and the size of it.
The Quick C++ Benchmarks example:
static void StringCopyFromLiteral(benchmark::State& state) {
// Code inside this loop is measured repeatedly
for (auto _ : state) {
std::string from_literal("hello");
// Make sure the variable is not optimized away by compiler
benchmark::DoNotOptimize(from_literal);
}
}
// Register the function as a benchmark
BENCHMARK(StringCopyFromLiteral);
static void StringCopyFromString(benchmark::State& state) {
// Code before the loop is not measured
std::string x = "hello";
for (auto _ : state) {
std::string from_string(x);
}
}
// Register the function as a benchmark
BENCHMARK(StringCopyFromString);
http://quick-bench.com/IcZllt_14hTeMaB_sBZ0CQ8x2Ro
What if I understand assembly...
More results:
http://quick-bench.com/39fLTvRdpR5zdapKSj2ZzE3asCI
The answer is simple. In the case where you construct an std::string from a small string literal, the compiler optimizes this case by directly populating the contents of the string object using constants in assembly. This avoids expensive looping as well as tests to see whether small string optimization (SSO) can be applied. In this case it knows SSO can be applied so the code the compiler generates simply involves writing the string directly into the SSO buffer.
Note this assembly code in the StringCreation case:
// Populate SSO buffer (each set of 4 characters is backwards since
// x86 is little-endian)
19.63% movb $0x6f,0x4(%r15) // "o"
19.35% movl $0x6c6c6568,(%r15) // "lleh"
// Set size
20.26% movq $0x5,0x10(%rsp) // size = 5
// Probably set heap pointer. 0 (nullptr) = use SSO buffer
20.07% movb $0x0,0x1d(%rsp)
You're looking at the constant values right there. That's not very much code, and no loop is required. In fact, the std::string constructor doesn't even have to be invoked! The compiler is just putting stuff in memory in the same places where the std::string constructor would.
If the compiler cannot apply this optimization, the results are quite different -- in particular, if we "hide" the fact that the source is a string literal by first copying the literal into a char array, the results flip:
char x[] = "hello";
for (auto _ : state) {
std::string created_string(x);
benchmark::DoNotOptimize(created_string);
}
Now the "from-char-pointer" case takes twice as long! Why?
I suspect that this is because the "copy from char pointer" case cannot simply check to see how long the string is by looking at a value. It needs to know whether small string optimization can be performed. There's a few ways it could go about this:
Measure the length of the string first, make an allocation (if needed), then copy the source to the destination. In the case where SSO does apply (it almost certainly does here) I'd expect this to take twice as long since it has to walk the source twice -- once to measure, once to copy.
Copy from the source character-by-character, appending to the new string. This requires testing on each append operation whether the string is now too long for SSO and needs to be copied into a heap-allocated char array. If the string is currently in a heap-allocated array, it needs to instead test if the allocation needs to be resized. This would also take quite a bit longer since there is at least one test for each character in the source string.
Copy from the source in chunks to lower the number of tests that need to be performed and to avoid walking the source twice. This would be faster than the character-by-character approach both because the number of tests would be lower and, because the source is not being walked twice, the CPU memory cache is going to be more effective. This would only show significant speed improvements for long strings, which we don't have here. For short strings it would work about the same as the first approach (measure, then copy).
Contrast this to the case when it's copying from another string object: it can simply look at the size() of the other string and immediately know whether it can perform SSO, and if it can't perform SSO then it also knows exactly how much memory to allocate for the new string.
This is an old problem, which I have observed in past. So thought of getting a clarification once & for all. There are many standard / orthodox C library functions, which deal only with C-style strings. For example, my current implementation looks like this:
std::string TimeStamp (const time_t seconds) // version-1
{
auto tm = *std::localtime(&seconds); // <ctime>
char readable[30] = {};
std::strftime(&readable[0], sizeof(readable) - 1, "%Y-%h-%d %H:%M:%S:", &tm);
return readable;
}
Above works as expected. But as you can see, that the readable is copied from stack array to std::string. Now this function is called very frequently for logging & other purposes.
Hence, I converted it to following:
std::string TimeStamp (const time_t seconds) // version-2
{
auto tm = *std::localtime(&seconds); // <ctime>
std::string readable(30,0);
std::strftime(&readable[0], readable.length(), "%Y-%h-%d %H:%M:%S:", &tm);
return readable;
}
At unit test level, it apparently seems to work. But for overall logging in my much larger code, it somehow gets messed up. A new line character appears after this output & many of the output strings which are called outside this function are not printed. Such issue happens only when the "version-1" is changed to "version-2".
Even following modification also doesn't help:
readable.resize(1 + std::strftime(&readable[0], readable.length(), "%Y-%h-%d %H:%M:%S:", &tm));
Is there anything wrong in my code? What is the correct way of directly using std::string in the C-style string functions?
Your first function is correct. There is no point mucking around with the troublesome details in the second function because even once you get it right, it is no improvement on the first function.
In fact it might even perform worse, because of the need to over-allocate the string and resize it down. For example perhaps the size 30 exceeds the size for Small String Optimization but the actual length of data doesn't.
std::string can have \0 in it.
so
std::string s1 = "ab\0\0cd"; // s1 contains "ab" -> size = 2
std::string s2{"ab\0\0cd", 6}; // s2 contains "ab\0\0cd" -> size = 6
Your first snippet use constructor 1 whereas the second is similar to the second one (string of size 30 filled with \0).
So you have to resize correctly your string to avoid trailling \0.
Edit: Solutions must compile against Microsoft Visual Studio 2012.
I want to use a known string length to declare another string of the same length.
The reasoning is the second string will act as a container for operation done to the first string which must be non volatile with regards to it.
e.g.
const string messy "a bunch of letters";
string dostuff(string sentence) {
string organised NNN????? // Idk, just needs the same size.
for ( x = 0; x < NNN?; x++) {
organised[x] = sentence[x]++; // Doesn't matter what this does.
}
}
In both cases above, the declaration and the exit condition, the NNN? stands for the length of 'messy'.
How do I discover the length at compile time?
std::string has two constructors which could fit your purposes.
The first, a copy constructor:
string organised(sentence);
The second, a constructor which takes a character and a count. You could initialize a string with a temporary character.
string organised(sentence.length(), '_');
Alternatively, you can:
Use an empty string and append (+=) text to it as you go along, or
Use a std::stringstream for the same purpose.
the stringstream will likely be more efficient.
Overall, I would prefer the copy constructor if the length is known.
std::string isn't a compile time type (it can't be a constexpr), so you can't use it directly to determine the length at compile time.
You could initialize a constexpr char[] and then use sizeof on that:
constexpr char messychar[] = "a bunch of letters";
// - 1 to avoid including NUL terminator which std::string doesn't care about
constexpr size_t messylen = sizeof(messychar) / sizeof(messychar[0]) - 1;
const string messy(messychar);
and use that, but frankly, that's pretty ugly; the length would be compile time, but organized would need to use the count and char constructor that would still be performed on each call, allocating and initializing only to have the contents replaced in the loop.
While it's not compile time, you'd avoid that initialization cost by just using reserve and += to build the new string, which with the #define could be done in an ugly but likely efficient way as:
constexpr char messychar[] = "a bunch of letters";
constexpr size_t messylen = sizeof(messychar) / sizeof(messychar[0]) - 1;
// messy itself may not be needed, but if it is, it's initialized optimally
// by using the compile time calculated length, so there is no need to scan for
// NUL terminators, and it can reserve the necessary space in the initial alloc
const string messy(messychar, messylen);
string dostuff(string sentence) {
string organised;
organized.reserve(messylen);
for (size_t x = 0; x < messylen; x++) {
organised += sentence[x]++; // Doesn't matter what this does.
}
}
This avoids setting organised's values more than once, allocating more than once (well, possibly twice if initial construction performs it) per call, and only performs a single read/write pass of sentence, no full read followed by read/write or the like. It also makes the loop constraint a compile time value, so the compiler has the opportunity to unroll the loop (though there is no guarantee of this, and even if it happens, it may not be helpful).
Also note: In your example, you mutate sentence, but it's accepted by value, so you're mutating the local copy, not the caller copy. If mutation of the caller value is required, accept it by reference, and if mutation is not required, accept by const reference to avoid a copy on every call (I understand the example code was filler, just mentioning this).
Just had an interesting argument in the comment to one of my questions. My opponent claims that the statement "" does not contain "" is wrong.
My reasoning is that if "" contained another "", that one would also contain "" and so on.
Who is wrong?
P.S.
I am talking about a std::string
P.S. P.S
I was not talking about substrings, but even if I add to my question " as a substring", it still makes no sense. An empty substring is nonsense. If you allow empty substrings to be contained in strings, that means you have an infinity of empty substrings. What is the point of that?
Edit:
Am I the only one that thinks there's something wrong with the function std::string::find?
C++ reference clearly says
Return Value: The position of the first character of the first match.
Ok, let's assume it makes sense for a minute and run this code:
string empty1 = "";
string empty2 = "";
int postition = empty1.find(empty2);
cout << "found \"\" at index " << position << endl;
The output is: found "" at index 0
Nonsense part: how can there be index 0 in a string of length 0? It is nonsense.
To be able to even have a 0th position, the string must be at least 1 character long.
And C++ is giving a exception in this case, which proves my point:
cout << empty2.at( empty1.find(empty2) ) << endl;
If it really contained an empty string it would had no problem printing it out.
It depends on what you mean by "contains".
The empty string is a substring of the empty string, and so is contained in that sense.
On the other hand, if you consider a string as a collection of characters, the empty string can't contain the empty string, because its elements are characters, not strings.
Relating to sets, the set
{2}
is a subset of the set
A = {1, 2, 3}
but {2} is not a member of A - all A's members are numbers, not sets.
In the same way, {} is a subset of {}, but {} is not an element in {} (it can't be because it's empty).
So you're both right.
C++ agrees with your "opponent":
#include <iostream>
#include <string>
using namespace std;
int main()
{
bool contains = string("").find(string("")) != string::npos;
cout << "\"\" contains \"\": "
<< boolalpha << contains;
}
Output: "" contains "": true
Demo
It's easy. String A contains sub-string B if there is an argument offset such that A.substr(offset, B.size()) == B. No special cases for empty strings needed.
So, let's see. std::string("").substr(0,0) turns out to be std::string(""). And we can even check your "counter-example". std::string("").substr(0,0).substr(0,0) is also well-defined and empty. Turtles all the way down.
The first thing that is unclear is whether you are talking about std::string or null terminated C strings, the second thing is why should it matter?. I will assume std::string.
The requirements on std::string determine how the component must behave, not what its internal representation must be (although some of the requirements affect the internal representation). As long as the requirements for the component are met, whether it holds something internally is an implementation detail that you might not even be able to test.
In the particular case of an empty string, there is nothing that mandates that it holds anything. It could just hold a size member set to 0 and a pointer (for the dynamically allocated memory if/when not empty) also set to 0. The requirement in operator[] requires that it returns a reference to a character with value 0, but since that character cannot be modified without causing undefined behavior, and since strict aliasing rules allow reading from an lvalue of char type, the implementation could just return a reference to one of the bytes in the size member (all set to 0) in the case of an empty string.
Some implementations of std::string use small object optimizations, in those implementations there will be memory reserved for small strings, including an empty string. While the std::string will obviously not contain a std::string internally, it might contain the sequence of characters that compose an empty string (i.e. a terminating null character)
empty string doesn't contain anything - it's EMPTY. :)
Of course an empty string does not contain an empty string. It'll be turtles all the way down if it did.
Take String empty = ""; that is declaring a string literal that is empty, if you want a string literal to represent a string literal that is empty you would need String representsEMpty = """"; but of course, you need to escape it, giving you string actuallyRepresentsEmpty = "\"\"";
ps, I am taking a pragmatic approach to this. Leave the maths nonsense at the door.
Thinking about you amendment, it could be possible that your 'opponent' meant was that an 'empty' std::string still has an internal storage for characters which is itself empty of characters. That would be an implementation detail I am sure, it could perhaps just keep a certain size (say 10) array of characters 'just incase', so it will technically not be empty.
Of course, there is the trick question answer that 'nothing' fits into anything infinite times, a sort of 'divide by zero' situation.
Today I had the same question since I'm currently bound to a lousy STL implementation (dating back to the pre-C++98 era) that differs from C++98 and all following standards:
TEST_ASSERT(std::string().find(std::string()) == string::npos); // WRONG!!! (non-standard)
This is especially bad if you try to write portable code because it's so hard to prove that no feature depends on that behaviour. Sadly in my case that's actually true: it does string processing to shorten phone numbers input depending on a subscriber line spec.
On Cppreference, I see in std::basic_string::find an explicit description about empty strings that I think matches exactly the case in question:
an empty substring is found at pos if and only if pos <= size()
The referred pos defines the position where to start the search, it defaults to 0 (the beginning).
A standard-compliant C++ Standard Library will pass the following tests:
TEST_ASSERT(std::string().find(std::string()) == 0);
TEST_ASSERT(std::string().substr(0, 0).empty());
TEST_ASSERT(std::string().substr().empty());
This interpretation of "contain" answers the question with yes.