I am writing a Python Library in C++ using the python C Api. There I have about 25 functions, that all accept two strings. Since Python might save strings in utf8/16/32 (the moment on char requires a bigger size the whole string will use the bigger size). When checking which kind the string is you get a enum value between 0 and 4. 0/4 should be handled as utf32, 1 as utf8 and 2 as utf16. So I currently have a nested switch for each combination:
The following example shows how the elements are handled in my code. random_func is different for each of my functions and is a template, that accepts a string_view of any type. This way to write the code results in about 100 lines of boilerplate for each function that accepts two strings.
Is there a way to handle all these cases without this immense code duplication and without sacrificing performance?
double result = 0;
Py_ssize_t len_s1 = PyUnicode_GET_LENGTH(py_s1);
void* s1 = PyUnicode_DATA(py_s1);
Py_ssize_t len_s2 = PyUnicode_GET_LENGTH(py_s2);
void* s2 = PyUnicode_DATA(py_s2);
int s1_kind = PyUnicode_KIND(py_s1);
int s2_kind = PyUnicode_KIND(py_s2);
switch (s1_kind) {
case PyUnicode_1BYTE_KIND:
switch (s2_kind) {
case PyUnicode_1BYTE_KIND:
result = random_func(
basic_string_view<char>(static_cast<char*>(s1), len_s1),
basic_string_view<char>(static_cast<char*>(s2), len_s2));
break;
case PyUnicode_2BYTE_KIND:
result = random_func(
basic_string_view<char>(static_cast<char*>(s1), len_s1),
basic_string_view<char16_t>(static_cast<char16_t*>(s2), len_s2));
break;
default:
result = random_func(
basic_string_view<char>(static_cast<char*>(s1), len_s1),
basic_string_view<char32_t>(static_cast<char32_t*>(s2), len_s2));
break;
}
break;
case PyUnicode_2BYTE_KIND:
switch (s2_kind) {
case PyUnicode_1BYTE_KIND:
result = random_func(
basic_string_view<char16_t>(static_cast<char16_t*>(s1), len_s1),
basic_string_view<char>(static_cast<char*>(s2), len_s2));
break;
case PyUnicode_2BYTE_KIND:
result = random_func(
basic_string_view<char16_t>(static_cast<char16_t*>(s1), len_s1),
basic_string_view<char16_t>(static_cast<char16_t*>(s2), len_s2));
break;
default:
result = random_func(
basic_string_view<char16_t>(static_cast<char16_t*>(s1), len_s1),
basic_string_view<char32_t>(static_cast<char32_t*>(s2), len_s2));
break;
}
break;
default:
switch (s2_kind) {
case PyUnicode_1BYTE_KIND:
result = random_func(
basic_string_view<char32_t>(static_cast<char32_t*>(s1), len_s1),
basic_string_view<char>(static_cast<char*>(s2), len_s2));
break;
case PyUnicode_2BYTE_KIND:
result = random_func(
basic_string_view<char32_t>(static_cast<char32_t*>(s1), len_s1),
basic_string_view<char16_t>(static_cast<char16_t*>(s2), len_s2));
break;
default:
result = random_func(
basic_string_view<char32_t>(static_cast<char32_t*>(s1), len_s1),
basic_string_view<char32_t>(static_cast<char32_t*>(s2), len_s2));
break;
}
break;
}
Put the complexity away in a function using variants
using python_string_view = std::variant<std::basic_string_view<char>,
std::basic_string_view<char16_t>,
std::basic_string_view<char32_t>;
python_string_view decode_python_string(python_string py_str)
{
Py_ssize_t len_s = PyUnicode_GET_LENGTH(py_str);
void* s = PyUnicode_DATA(py_str);
int s_kind = PyUnicode_KIND(py_str);
switch (s_kind) {
//return correct string_view here
}
}
int main()
{
python_string s1 = ..., s2 = ...;
auto v1 = decode_python_string(s1);
auto v2 = decode_python_string(s2);
std::visit([](auto&& val1, auto&& val2) {
random_func(val1, val2);
}, v1, v2);
}
I'm unsure about the performance though.
For what it is worth:
The difference it makes to have different char types is at the moment you extract the character values inside random_func (requiring nine template specializations, if I am right).
You would be close to a solution by fetching the chars in all cases using the largest type and masking out or shifting out the extra bytes where necessary. Instead of templating, you would pass a suitable mask and a stride information. Something like
for (char32_t* c= (char32_t*)s1; c &= mask, c != 0; c= (char32_t*)((char*)c + stride))
{
…
}
Unfortunately, not counting the extra masking operation, you hit a wall because you may have to fetch too many bytes at one end of the string, causing an illegal memory access.
Related
I have a switch statement that runs like this
switch (abc) {
case FILE_0:
lf = m_a->olf[0];
kf = m_a->pkf[0];
break;
case FILE_1:
lf = m_a->olf[1];
kf = m_a->pkf[1];
break;
.
.
default:
LOG_ERR << "Wrong type to check";
return 0;
}
This happens about 30 times and i end up with 30 cases in this single switch.
Any way to shorten it in C++ 11 ? E.g. using templates.
Your code ain't that big to be sure about the intent, though, from what I can see in the snippet, you actually want to convert the symbolic value into an index. (Can I assume this is an enum?)
What I would do is to move that code into a separate function:
auto fileEnumToIndex(FileEnum file) {
switch (file) {
case FILE_0: return 0;
case FILE_1: return 1;
default: __builtin_unreachable();
}
}
Your code than changes to:
auto index = fileEnumToIndex(abc);
lf = m_a->olf[index];
kf = m_a->pkf[index];
If the FileEnum is a real enum, you can change the code in the function fileEnumToIndex to a simple static_cast
To cover the default case, you could return a std::optional and use the std::nullopt case to do some error handling. However, when FileEnum is an actual enum, I would assume error handling when you determine that value.
You can create a map of abc and the indice and use that for determining the indice.
// somewhere, maybe outside functions
static const std::unordered_map<abc_type, int> table = {
{FILE_0, 0},
{FILE_1, 1},
...
};
// inside function
auto idx_itr = table.find(abc);
if (idx_itr != table.end()) {
lf = m_a->olf[*idx_itr];
kf = m_a->pkf[*idx_itr];
} else {
// default case
}
I have this code:
#include <iostream>
using namespace std;
int main() {
int square; char state;
cout<<"Write a numbber"; cin>>square;
square *= square;
cout<<square;
switch(square) {
case 1: state = 'h';
case 3: state = 'm';
case 7: state = 'j';
case (square > 10): state = 'u'; // I try this, but not works
}
return 0;
}
I would like to know how a condition is made inside a switch, in c ++.
The expression following case must be a compile time constant. Hence, you may not use what you are trying.
Change that to default: and then use if.
default:
if (square > 10)
state = 'u';
If you have lots of items you should use a switch. If not, if else is better.
If a switch contains more than five items, it's implemented using a lookup table or a hash list. This means that all items get the same access time, compared to a list of if:s where the last item takes much more time to reach as it has to evaluate every previous condition first.
If you want to use condition or combine some:
switch(square) {
case 1: state = 'h'; break;
case 3: state = 'm';break;
case 7: state = 'j'; break;
case 8:
case 9: state = 'u';
case 10: state = 'z';break;
default: state = 'd';
}
Case 8, 9, 10 will be combined if the square is 8.
If you don't have a break at the end of a case region, control passes along to the next case label.
I have implemented my own typesafe bitwise enum operators following this article: http://blog.bitwigglers.org/using-enum-classes-as-type-safe-bitmasks/
Here is the enum I am talking about:
enum class OutputStream : unsigned int
{
None = 0,
// Using bitshift operator (always one bit set to 1)
Console = 1 << 0,
File = 1 << 1,
Other = 1 << 2
};
In case you wonder, it's for a logging function.
Problem:
I want to use the enum in a switch statement such as
switch(stream)
{
case OutputStream::Console:
//Do this
case OutputStream::File:
//Do that
default:
break;
}
Note that there shouldn't be a break; in between the case statements since more than one case can be true.
However, this doesn't seem to work. More precisely, when I use OutputStream::Console | OutputStream::File neither case is executed.
My only solution to this problem was this awkward looking if statement:
if((stream & OutputStream::Console) != OutputStream::None) { /*Do this*/ }
if((stream & OutputStream::File) != OutputStream::None) { /*Do that*/ }
But for me, this defeats the point of a need enum based solution. What am I doing wrong?
As other said in comments, switch is not the best way, but it is still possible to do:
for (int bit = 1; bit <= (int) OutputStream::LAST; bit <<= 1)
{
switch((OutputStream) (bit & stream))
{
case OutputStream::Console:
//Do this
break;
case OutputStream::File:
//Do that
break;
// etc...
// no default case no case 0!
}
}
So basically you will iterate over all individual bits, for each test if it is present in the stream variable and jump to the appropriate case, or jump nowhere if it is 0.
But in my opinion the individual ifs are better. At least you have better control over in which order are the bits evaluated.
While going through the code for parser, e.g. Parser.cpp inside clang/Parse directory of clang compiler
switch (Close) {
default: break;
case tok::r_paren : LHSName = "("; DID = diag::err_expected_rparen; break;
case tok::r_brace : LHSName = "{"; DID = diag::err_expected_rbrace; break;
case tok::r_square: LHSName = "["; DID = diag::err_expected_rsquare; break;
case tok::greater: LHSName = "<"; DID = diag::err_expected_greater; break;
case tok::greatergreatergreater:
LHSName = "<<<"; DID = diag::err_expected_ggg; break;
}
I see that the default is at the beginning. Is there any reason for keeping it that way. Usually we keep the default at the end so I am a bit confused.
The order makes no differences, as long as you have included your breaks.
As an aside, I like to put the break immediately before every case or default. It's much easier to verify this rule has been followed than to try to look ahead to the end of each case statement.
switch (Close) {
break; default:
break; case tok::r_paren : LHSName = "("; DID = diag::err_expected_rparen;
break; case tok::r_brace : LHSName = "{"; DID = diag::err_expected_rbrace;
break; case tok::r_square: LHSName = "["; DID = diag::err_expected_rsquare;
break; case tok::greater: LHSName = "<"; DID = diag::err_expected_greater;
break; case tok::greatergreatergreater: LHSName = "<<<"; DID = diag::err_expected_ggg;
}
You might find this easier to understand if you interpret break to mean "Don't fall through into this case from any other case." instead of "Don't fall through from this case into any succeeding case."
In this layout, it is very easy to see if a break is missing, and therefore forcing the writer (and the reader) to ask him/herself "do I want a follow-through here?". All the breaks line up nicely, and it's very obvious if one is missing.
Clarification: There is no 'magic' in my answer. I'm just placing my whitespace in a manner that is much more readable to me. And whitespace doesn't matter, therefore I'm free to do this. A break at the very end of the switch is redundant. If missing, the compiler is not allowed to loop around to the top of the switch, as if it was some kind of while loop. Equally, a redundant break at the very start of a switch changes nothing, and must be accepted (and ignored) by the compiler.
The only time I see the position of a default matter is when doing something like this, which is a Duff's Device type construct and should probably never be attempted on modern platforms. :)
void copyAligned4Bytes(u32 *in, u32 *out, int numBytes)
{
assert((numBytes & 0x03) == 0);
while(numBytes)
{
switch(numBytes)
{
default: *out++ = *in++; numBytes -= 4;
case 12: *out++ = *in++; numBytes -= 4;
case 8: *out++ = *in++; numBytes -= 4;
case 4: *out++ = *in++; numBytes -= 4;
}
}
}
Ah, the joys of coding in C as closer to assembler as possible to get a certain type of jump table.
Following situation:
My system gets an hardware signal and writes a time value to a buffer in my
signal handler routine. Afterwards a (software) signal is sent with the time value as argument to the appropriate slot function.
The slot routine gets called correctly, but here my problem lays in:
In the slot function I have a simple switch-case statement like this:
switch(id) {
case 1:
do something..
id = 2;
break;
case 2:
start_time = val;
id = 3;
break;
case 3:
end_time = val;
id = 1;
break;
}
In those three cases I store a start and end time value between case 2 and 3 and
out of those time values I determine the elapsed time between the hardware
signals. This works fine, but now I have to measure the time sometimes "longer",
depening on parameter. This means, I can't stop the measurement at case 3 instead
I have case 4, 5, 6 and so on . What is an elegant and optimal solution for this "problem"
instead of writing:
if (param < xy) {
switch(id) {
case 1:
...
break;
case 2:
...
break;
} else if (param > xy) {
switch(id) {
case 1:
...;
break;
case 2:
...;
break;
case 3:
...;
break;
case 4:
...;
break;
case 5:
...;
break;
}
}
}
What you are describing is called a finite state machine there are a large number of excellent state machine libraries out there that will take care of the heavy lifting for you.
Take a look at this question and some of the others that it references.
You can try following:
switch ((param - xy) >= 0 ? id : -id) {
// param >= xy cases
case 1:
...
break;
case 2:
...
break;
...
// param < xy cases
case -1:
...
break;
case -2:
...
break;
...
}
Or for something fun an exciting, you could write some self modifying code to dynamically change your swithc jump table as the parameters it receives differ. You'd have to allocate a large enough area for the largest table size and play around with funciton pointers or assembler, but it could be done.
Try using a std::map of function pointers, a.k.a. jump table, rather than a switch statement. The map allows flexibility during run-time.
Store a pointer to the function, along with the case value. Search the map for the case value, retrieve the pointer and dereference to call the function.