I am writing a lot of parser code where string_view excels, and have gotten fond of the type. I recently read ArthurO'Dwyer's article std::string_view is a borrow type, where he concludes that string_view (and other 'borrow types') are fine to use as long as they "... appear only as function parameters and for-loop control variables." (with a couple of exceptions).
However, I have lately started to use string_view as return value for functions that convert enum to string (which I use a lot), like this Compiler Explorer:
#include <iostream>
#include <string>
#include <array>
#include <algorithm>
enum class Color
{
red, green, blue, yellow,
last // Must be kept last
};
constexpr std::string_view toString(Color color);
// The rest would normally be in a .cpp file
using cts = std::pair<Color, std::string_view>;
constexpr std::array colorNames = {cts{Color::red, "red color"},
cts{Color::green, "green color"},
cts{Color::blue, "blue color"},
cts{Color::yellow, "yellow color"}};
static_assert(colorNames.size() == static_cast<size_t>(Color::last));
constexpr std::string_view toString(Color color)
{
// Normally calling a library function (which also checks for no match), instead of this:
return std::ranges::find(colorNames, color, &cts::first)->second;
}
int main()
{
auto s1 = toString(Color::green);
auto s2 = toString(Color::blue);
std::cout << s1 << ' ' << s2 << std::endl;
}
The reasons I have for doing it this way are:
By having it stored in an array as string_view, I can make the entire table constexpr.
By returning the string_view directly, there is no need of converting the string representation, so the entire function can be constexpr, or at least avoid creating unnecessary strings even when called with a non-constexpr parameter.
A side effect of having the table constexpr is that I can use static_assert to check that all elements of the enum are in the table, which is really great for catching additions to the enum. I really don't like having to put the 'last' enum value in there, but I don't see a better solution.
So my question is really, is returning the string_view this way unsafe (or UB) in any way, or can I keep on doing this with good conscience?
Alternatively, is there a better (faster/safer) way of solving this general problem of enum-to-string?
Addition: After reading G. Sliepen's very good answer, I'd like to add upon my comment to his answer: I often have the opposite function as well, e.g.:
constexpr Color fromString(string_view str)
{
// No-match handling omitted
return std::ranges::find(colorNames, color, &cts::second)->first;
}
In those situations I really do need the translation as a separate table so that it can be used by both functions. But in many other cases, the function containing a switch statement is the simplest and best.
is returning the string_view this way unsafe (or UB) in any way, or can I keep on doing this with good conscience?
Yes. The way you use it is perfectly ok. The string_view returned by your toString function forms a view on data that will remain intact until the program terminates.
Alternatively, is there a better (faster/safer) way of solving this general problem of enum-to-string?
You could make a constexpr function with a switch-statement inside it, like so:
constexpr std::string_view toString(Color color)
{
switch (color) {
case Color::red: return "red";
case Color::green: return "green";
...
}
}
There should be no difference in efficiency if the function is evaluated at compile-time. But the compiler can check if you added case-statements for all the possible Colors, and if not it will give a warning. There's also no need for a Color::last this way.
Keeping both the enum and the std::array or switch-statement in sync can be annoying, especially if you have lots of enumeration values. X macros might help here.
Related
I'm reading the documentation of std::experimental::optional and I have a good idea about what it does, but I don't understand when I should use it or how I should use it. The site doesn't contain any examples as of yet which leaves it harder for me to grasp the true concept of this object. When is std::optional a good choice to use, and how does it compensate for what was not found in the previous Standard (C++11).
The simplest example I can think of:
std::optional<int> try_parse_int(std::string s)
{
//try to parse an int from the given string,
//and return "nothing" if you fail
}
The same thing might be accomplished with a reference argument instead (as in the following signature), but using std::optional makes the signature and usage nicer.
bool try_parse_int(std::string s, int& i);
Another way that this could be done is especially bad:
int* try_parse_int(std::string s); //return nullptr if fail
This requires dynamic memory allocation, worrying about ownership, etc. - always prefer one of the other two signatures above.
Another example:
class Contact
{
std::optional<std::string> home_phone;
std::optional<std::string> work_phone;
std::optional<std::string> mobile_phone;
};
This is extremely preferable to instead having something like a std::unique_ptr<std::string> for each phone number! std::optional gives you data locality, which is great for performance.
Another example:
template<typename Key, typename Value>
class Lookup
{
std::optional<Value> get(Key key);
};
If the lookup doesn't have a certain key in it, then we can simply return "no value."
I can use it like this:
Lookup<std::string, std::string> location_lookup;
std::string location = location_lookup.get("waldo").value_or("unknown");
Another example:
std::vector<std::pair<std::string, double>> search(
std::string query,
std::optional<int> max_count,
std::optional<double> min_match_score);
This makes a lot more sense than, say, having four function overloads that take every possible combination of max_count (or not) and min_match_score (or not)!
It also eliminates the accursed "Pass -1 for max_count if you don't want a limit" or "Pass std::numeric_limits<double>::min() for min_match_score if you don't want a minimum score"!
Another example:
std::optional<int> find_in_string(std::string s, std::string query);
If the query string isn't in s, I want "no int" -- not whatever special value someone decided to use for this purpose (-1?).
For additional examples, you could look at the boost::optional documentation. boost::optional and std::optional will basically be identical in terms of behavior and usage.
An example is quoted from New adopted paper: N3672, std::optional:
optional<int> str2int(string); // converts int to string if possible
int get_int_from_user()
{
string s;
for (;;) {
cin >> s;
optional<int> o = str2int(s); // 'o' may or may not contain an int
if (o) { // does optional contain a value?
return *o; // use the value
}
}
}
but I don't understand when I should use it or how I should use it.
Consider when you are writing an API and you want to express that "not having a return" value is not an error. For example, you need to read data from a socket, and when a data block is complete, you parse it and return it:
class YourBlock { /* block header, format, whatever else */ };
std::optional<YourBlock> cache_and_get_block(
some_socket_object& socket);
If the appended data completed a parsable block, you can process it; otherwise, keep reading and appending data:
void your_client_code(some_socket_object& socket)
{
char raw_data[1024]; // max 1024 bytes of raw data (for example)
while(socket.read(raw_data, 1024))
{
if(auto block = cache_and_get_block(raw_data))
{
// process *block here
// then return or break
}
// else [ no error; just keep reading and appending ]
}
}
Edit: regarding the rest of your questions:
When is std::optional a good choice to use
When you compute a value and need to return it, it makes for better semantics to return by value than to take a reference to an output value (that may not be generated).
When you want to ensure that client code has to check the output value (whoever writes the client code may not check for error - if you attempt to use an un-initialized pointer you get a core dump; if you attempt to use an un-initialized std::optional, you get a catch-able exception).
[...] and how does it compensate for what was not found in the previous Standard (C++11).
Previous to C++11, you had to use a different interface for "functions that may not return a value" - either return by pointer and check for NULL, or accept an output parameter and return an error/result code for "not available".
Both impose extra effort and attention from the client implementer to get it right and both are a source of confusion (the first pushing the client implementer to think of an operation as an allocation and requiring client code to implement pointer-handling logic and the second allowing client code to get away with using invalid/uninitialized values).
std::optional nicely takes care of the problems arising with previous solutions.
I often use optionals to represent optional data pulled from configuration files, that is to say where that data (such as with an expected, yet not necessary, element within an XML document) is optionally provided, so that I can explicitly and clearly show if the data was actually present in the XML document. Especially when the data can have a "not set" state, versus an "empty" and a "set" state (fuzzy logic). With an optional, set and not set is clear, also empty would be clear with the value of 0 or null.
This can show how the value of "not set" is not equivalent to "empty". In concept, a pointer to an int (int * p) can show this, where a null (p == 0) is not set, a value of 0 (*p == 0) is set and empty, and any other value (*p <> 0) is set to a value.
For a practical example, I have a piece of geometry pulled from an XML document that had a value called render flags, where the geometry can either override the render flags (set), disable the render flags (set to 0), or simply not affect the render flags (not set), an optional would be a clear way to represent this.
Clearly a pointer to an int, in this example, can accomplish the goal, or better, a share pointer as it can offer cleaner implementation, however, I would argue it's about code clarity in this case. Is a null always a "not set"? With a pointer, it is not clear, as null literally means not allocated or created, though it could, yet might not necessarily mean "not set". It is worth pointing out that a pointer must be released, and in good practice set to 0, however, like with a shared pointer, an optional doesn't require explicit cleanup, so there isn't a concern of mixing up the cleanup with the optional having not been set.
I believe it's about code clarity. Clarity reduces the cost of code maintenance, and development. A clear understanding of code intention is incredibly valuable.
Use of a pointer to represent this would require overloading the concept of the pointer. To represent "null" as "not set", typically you might see one or more comments through code to explain this intention. That's not a bad solution instead of an optional, however, I always opt for implicit implementation rather than explicit comments, as comments are not enforceable (such as by compilation). Examples of these implicit items for development (those articles in development that are provided purely to enforce intention) include the various C++ style casts, "const" (especially on member functions), and the "bool" type, to name a few. Arguably you don't really need these code features, so long as everyone obeys intentions or comments.
There are several questions around concerning this topic (e.g. here and here). I am a bit surprised how lenghty the proposed solutions are. Also, I am a bit lazy and would like to avoid maintaining an extra list of strings for my enums.
I came up with the following and I wonder if there is anything fundamentally wrong with my approach...
class WEEKDAY : public std::string{
public:
static const WEEKDAY MONDAY() {return WEEKDAY("MONDAY");}
static const WEEKDAY TUESDAY(){return WEEKDAY("TUESDAY");}
/* ... and so on ... */
private:
WEEKDAY(std::string s):std::string(s){};
};
Still I have to type the name/string representation more than once, but at least now its all in a single line for each possible value and also in total it does not take much more lines than a plain enum. Using these WEEKDAYS looks almost identical to using enums:
bool isAWorkingDay(WEEKDAY w){
if (w == WEEKDAY::MONDAY()){return true;}
/* ... */
return false;
}
and its straighforward to get the "string representation" (well, in fact it is just a string)
std::cout << WEEKDAY::MONDAY() << std::end;
I am still relatively new to C++ (not in writing but in understanding ;), so maybe there are things that can be done with enums that cannot be done with such kind of constants.
You could use the preprocessor to avoid duplicating the names:
#define WEEKDAY_FACTORY(DAY) \
static const WEEKDAY DAY() {return WEEKDAY(#DAY);}
WEEKDAY_FACTORY(MONDAY)
WEEKDAY_FACTORY(TUESDAY)
// and so on
Whether the deduplication is worth the obfuscation is a matter of taste. It would be more efficient to use an enumeration rather than a class containing a string in most places; I'd probably do that, and only convert to a string when needed. You could use the preprocessor to help with that in a similar way:
char const * to_string(WEEKDAY w) {
switch (w) {
#define CASE(DAY) case DAY: return #DAY;
CASE(MONDAY)
CASE(TUESDAY)
// and so on
}
return "UNKNOWN";
}
I'm working on an old network engine and the type of package sent over the network is made up of 2 bytes.
This is more or less human readable form, for example "LO" stands for Login.
In the part that reads the data there is an enormous switch, like this:
short sh=(((int)ad.cData[p])<<8)+((int)ad.cData[p+1]);
switch(sh)
{
case CMD('M','D'):
..some code here
break
where CMD is a define:
#define CMD(a,b) ((a<<8)+b)
I know there are better ways but just to clean up a bit and also to be able to search for the tag (say "LO") more easily (and not search for different types of "'L','O'" or "'L' , 'O'" or the occasional "'L', 'O'" <- spaces make it hard to search) I tried to make a MACRO for the switch so I could use "LO" instead of the define but I just can't get it to compile.
So here is the question: how do you change the #define to a macro that I can use like this instead:
case CMD("MD"):
..some code here
break
It started out as a little subtask to make life a little bit easier but now I can't get it out of my head, thanks for any help!
Cheers!
[edit] The code works, it the world that's wrong! ie. Visual Studio 2010 has a bug concerning this. No wonder I cut my teeth on it.
Macro-based solution
A string-literal is really an instance of char const[N] where N is the length of the string, including the terminating null-byte. With this in mind you can easily access any character within the string-literal by using string-literal[idx] to specify that you'd like to read the character stored at offset idx.
#define CMD(str) ((str[0]<<8)+str[1])
CMD("LO") => (("LO"[0]<<8)+"LO"[1]) => (('L'<<8)+'0')
You should however keep in mind that there's nothing preventing your from using the above macro with a string which is shorter than that of length 2, meaning that you can run into undefined-behavior if you try to read an offset which is not actually valid.
RECOMMENDED: C++11, use a constexpr function
You could create a function usable in constant-expressions (and with that, in case-labels), with a parameter of reference to const char[3], which is the "real" type of your string-literal "FO".
constexpr short cmd (char const(&ref)[3]) {
return (ref[0]<<8) + ref[1];
}
int main () {
short data = ...;
switch (data) {
case cmd("LO"):
...
}
}
C++11 and user-defined literals
In C++11 we were granted the possibility to define user-defined literals. This will make your code far easier to maintain and interpret, as well as having it be safer to use:
#include <stdexcept>
constexpr short operator"" _cmd (char const * s, unsigned long len) {
return len != 2 ? throw std::invalid_argument ("") : ((s[0]<<8)+s[1]);
}
int main () {
short data = ...;
switch (data) {
case "LO"_cmd:
...
}
}
The value associated with a case-label must be yield through a constant-expression. It might look like the above might throw an exception during runtime, but since a case-label is constant-expression the compiler must be able to evaluate "LO"_cmd during translation.
If this is not possible, as in "FOO"_cmd, the compiler will issue a diagnostic saying that the code is ill-formed.
The struct is as follows:
struct padData
{
enum buttonsAndAxes
{
select,
start,
ps
};
};
The object of the struct:
padData pad;
I am accessing this enum as follows:
printf ("\n%d", pad.buttonsAndAxes[0]);
Error:
error: invalid use of ‘enum padData::buttonsAndAxes’
Then, I tried:
printf ("\n%d", pad::buttonsAndAxes[0]);
Error:
error: ‘pad’ is not a class or namespace
Now what? Please guide.
Compiler: gcc version 4.5.0
EDIT 1:____________________________________
printf ("\nemit: %d", padData::(select)0);
results in:
error: expected unqualified-id before ‘(’ token
My aim is to fetch the word "select" through its value 0. How to achieve that? Also, is the word "select" a string?
The enum values become names in the scope of the class. So you would use padData::select from outside the class, or just select from inside the class.
In C++11 you can qualify the enumerators with the name of the enum, giving padData::buttonsAndAxes::select from the outside and buttonsAndAxes::select from inside.
Printing the name of an enumerator is not easily done in C++, because the names are gone after compilation. You need to set up a table mapping the values to their strings by hand. If you don't supply explicit values like in your example, you can simply use an array:
enum buttonsAndAxes
{
select,
start,
ps
};
const char* buttonsAndAxesNames[] = {
"select",
"start",
"ps"
};
And then you index into that array:
printf("%s", buttonsAndAxesNames[select]);
If you want some more sophisticated approach, you can find a bunch of tricks in previous questions.
printf ("\n%d", padData::select);
Enum is not array, it is used without index.
ENUMS are mainly used for better readability of code rather than calculation facilitators. ENUMS are mainly literals which are assigned values 0,1,2 etc unless specified otherwise. So you should always use them with "::" qualification rather than as array
You seem to need a good C++ book.
Enumerations, in C and C++, are a convenient way to:
map an integral value to a "smart" name
group together values that belong together
The syntax is quite simple (in C++03):
enum <enum-name> {
<value-name-0> [= <value-0>],
<value-name-1> [= <value-1>],
...
};
Where:
<enum-name> is the name of the type that is introduced
<value-name-X> is the name of a value of the enum
<value-X> is the value given to the name, and is optional
If no value is given to a name:
if it is the first, it is set to 0
else, it is set to the value of the previous name, + 1
Here is a small example demonstrating the use of enums:
enum Color {
Blue,
Green,
Red
};
char const* name(Color c) {
switch(c) {
case Blue: return "Blue";
case Green: return "Green";
case Red: return "Red";
}
assert(0 && "Who stored crap in my enum ?");
}
This illustrates a few important points at once:
Color is a type, like a struct type or a class type. It can be typedefed and all.
an enum "value-name" is an integral constant, it can be used as template parameter or in switch cases.
an enum "value-name" is injected in the scope in which the type is declared, and not nested within. (C++11 allows to scope the values with the enum class syntax)
something else entirely could be stored in the enum, while this should not happen in well behaved applications, you can do it through casting...
What is not shown, is that an enum is under the hood a plain integer. The exact underlying type though is determined at the discretion of the compiler. There are a few rules in this choice, that should not matter to you, all you should know is that the type chosen is wide enough to contain all the values of the enum (and possibly signed if required). What it implies is that the type chosen is not necessarily a plain int.
Therefore: printf("%d", Green); is a programming error. It should be printf("%d", (int)Green);.
Another important point, is that enum names do not appear in the final binary. The names are substituted for their values directly, no runtime overhead at all. Debuggers typically retrieve the names from the debug information (if available) and substitute them back in when presenting the information to you.
Below code is used to get a std::string representation from ASCII code.
string Helpers::GetStringFromASCII(const int asciiCode) const
{
return string(1,char(asciiCode));
}
It works well. But in my application, I know the ASCII codes at compile time. So I will be calling it like
string str = GetStringFromASCII(175) // I know 175 at compile time
Question
Is there any way to make the GetStringFromASCII method a template so that the processing happens at compile time and I can avoid calling the function each time at runtime.
Any thoughts?
This kind of template meta programming works well when you're dealing with primitive data types like ints and floats. If you necessarily need a string object, you can't avoid calling the std::string constructor and there's no way that call can happen at compile time. Also, I don't think you can drag the cast to char to compile time either, which, in all, means that templates cannot help you here.
Instead of feeding an int constant to a string conversion function, use a string constant directly:
string str("\xAF"); // 0xAF = 175
By the way, except for heavy performance needs in a loop, trading code readability for some CPU cycles is rarely money effective overall.
Why are you even bothering with a helper function?
string s( 1, char(175) );
That's all you need and it's the quickest you're going to get.
How about something like this:
#include <iostream>
#include <string>
using namespace std;
template <int asciiCode>
inline string const &getStringFromASCII()
{
static string s(1,char(asciiCode));
return s;
}
int main(int, char const**) {
cout << getStringFromASCII<65>() << endl;
}
EDIT: returns a ref now