Why this switch statement works that way? [duplicate] - c++

I've read the article on Wikipedia on the Duff's device, and I don't get it. I am really interested, but I've read the explanation there a couple of times and I still don't get it how the Duff's device works.
What would a more detailed explanation be?

There are some good explanations elsewhere, but let me give it a try. (This is a lot easier on a whiteboard!) Here's the Wikipedia example with some notations.
Let's say you're copying 20 bytes. The flow control of the program for the first pass is:
int count; // Set to 20
{
int n = (count + 7) / 8; // n is now 3. (The "while" is going
// to be run three times.)
switch (count % 8) { // The remainder is 4 (20 modulo 8) so
// jump to the case 4
case 0: // [skipped]
do { // [skipped]
*to = *from++; // [skipped]
case 7: *to = *from++; // [skipped]
case 6: *to = *from++; // [skipped]
case 5: *to = *from++; // [skipped]
case 4: *to = *from++; // Start here. Copy 1 byte (total 1)
case 3: *to = *from++; // Copy 1 byte (total 2)
case 2: *to = *from++; // Copy 1 byte (total 3)
case 1: *to = *from++; // Copy 1 byte (total 4)
} while (--n > 0); // N = 3 Reduce N by 1, then jump up
// to the "do" if it's still
} // greater than 0 (and it is)
}
Now, start the second pass, we run just the indicated code:
int count; //
{
int n = (count + 7) / 8; //
//
switch (count % 8) { //
//
case 0: //
do { // The while jumps to here.
*to = *from++; // Copy 1 byte (total 5)
case 7: *to = *from++; // Copy 1 byte (total 6)
case 6: *to = *from++; // Copy 1 byte (total 7)
case 5: *to = *from++; // Copy 1 byte (total 8)
case 4: *to = *from++; // Copy 1 byte (total 9)
case 3: *to = *from++; // Copy 1 byte (total 10)
case 2: *to = *from++; // Copy 1 byte (total 11)
case 1: *to = *from++; // Copy 1 byte (total 12)
} while (--n > 0); // N = 2 Reduce N by 1, then jump up
// to the "do" if it's still
} // greater than 0 (and it is)
}
Now, start the third pass:
int count; //
{
int n = (count + 7) / 8; //
//
switch (count % 8) { //
//
case 0: //
do { // The while jumps to here.
*to = *from++; // Copy 1 byte (total 13)
case 7: *to = *from++; // Copy 1 byte (total 14)
case 6: *to = *from++; // Copy 1 byte (total 15)
case 5: *to = *from++; // Copy 1 byte (total 16)
case 4: *to = *from++; // Copy 1 byte (total 17)
case 3: *to = *from++; // Copy 1 byte (total 18)
case 2: *to = *from++; // Copy 1 byte (total 19)
case 1: *to = *from++; // Copy 1 byte (total 20)
} while (--n > 0); // N = 1 Reduce N by 1, then jump up
// to the "do" if it's still
} // greater than 0 (and it's not, so bail)
} // continue here...
20 bytes are now copied.
Note: The original Duff's Device (shown above) copied to an I/O device at the to address. Thus, it wasn't necessary to increment the pointer *to. When copying between two memory buffers you'd need to use *to++.

The explanation in Dr. Dobb's Journal is the best that I found on the topic.
This being my AHA moment:
for (i = 0; i < len; ++i) {
HAL_IO_PORT = *pSource++;
}
becomes:
int n = len / 8;
for (i = 0; i < n; ++i) {
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
}
n = len % 8;
for (i = 0; i < n; ++i) {
HAL_IO_PORT = *pSource++;
}
becomes:
int n = (len + 8 - 1) / 8;
switch (len % 8) {
case 0: do { HAL_IO_PORT = *pSource++;
case 7: HAL_IO_PORT = *pSource++;
case 6: HAL_IO_PORT = *pSource++;
case 5: HAL_IO_PORT = *pSource++;
case 4: HAL_IO_PORT = *pSource++;
case 3: HAL_IO_PORT = *pSource++;
case 2: HAL_IO_PORT = *pSource++;
case 1: HAL_IO_PORT = *pSource++;
} while (--n > 0);
}

There are two key things to Duff's device. First, which I suspect is the easier part to understand, the loop is unrolled. This trades larger code size for more speed by avoiding some of the overhead involved in checking whether the loop is finished and jumping back to the top of the loop. The CPU can run faster when it's executing straight-line code instead of jumping.
The second aspect is the switch statement. It allows the code to jump into the middle of the loop the first time through. The surprising part to most people is that such a thing is allowed. Well, it's allowed. Execution starts at the calculated case label, and then it falls through to each successive assignment statement, just like any other switch statement. After the last case label, execution reaches the bottom of the loop, at which point it jumps back to the top. The top of the loop is inside the switch statement, so the switch is not re-evaluated anymore.
The original loop is unwound eight times, so the number of iterations is divided by eight. If the number of bytes to be copied isn't a multiple of eight, then there are some bytes left over. Most algorithms that copy blocks of bytes at a time will handle the remainder bytes at the end, but Duff's device handles them at the beginning. The function calculates count % 8 for the switch statement to figure what the remainder will be, jumps to the case label for that many bytes, and copies them. Then the loop continues to copy groups of eight bytes.

The point of duffs device is to reduce the number of comparisons done in a tight memcpy implementation.
Suppose you want to copy 'count' bytes from b to a, the straight forward approach is to do the following:
do {
*a = *b++;
} while (--count > 0);
How many times do you need to compare count to see if it's a above 0? 'count' times.
Now, the duff device uses a nasty unintentional side effect of a switch case which allows you to reduce the number of comparisons needed to count / 8.
Now suppose you want to copy 20 bytes using duffs device, how many comparisons would you need? Only 3, since you copy eight bytes at a time except the last first one where you copy just 4.
UPDATED: You don't have to do 8 comparisons/case-in-switch statements, but it's reasonable a trade-off between function size and speed.

When I read it for the first time, I autoformatted it to this
void dsend(char* to, char* from, count) {
int n = (count + 7) / 8;
switch (count % 8) {
case 0: do {
*to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
}
and I had no idea what was happening.
Maybe not when this question was asked, but now Wikipedia has a very good explanation
The device is valid, legal C by virtue of two attributes in C:
Relaxed specification of the switch statement in the language's definition. At the time of the device's invention this was the first edition of The C Programming Language which requires only that the controlled statement of the switch be a syntactically valid (compound) statement within which case labels can appear prefixing any sub-statement. In conjunction with the fact that, in the absence of a break statement, the flow of control will fall-through from a statement controlled by one case label to that controlled by the next, this means that the code specifies a succession of count copies from sequential source addresses to the memory-mapped output port.
The ability to legally jump into the middle of a loop in C.

1: Duffs device is a particular implementation of loop unrolling. Loop unrolling is an optimisation technique applicable if you have an operation to perform N times in a loop - you can trade program size for speed by executing the loop N/n times and then in the loop inlining (unrolling) the loop code n times e.g. replacing:
for (int i=0; i<N; i++) {
// [The loop code...]
}
with
for (int i=0; i<N/n; i++) {
// [The loop code...]
// [The loop code...]
// [The loop code...]
...
// [The loop code...] // n times!
}
Which works great if N % n == 0 - no need for Duff! If that is not true then you have to handle the remainder - which is a pain.
2: How does Duffs device differ from this standard loop unrolling?
Duffs device is just a clever way of dealing with the remainder loop cycles when N % n != 0. The whole do / while executes N / n number of times as per standard loop unrolling (because the case 0 applies). On the last first run through the loop the case kicks in and we run the loop code the 'remainder' number of times - the remaining runs through the loop run 'normally'.

Though I'm not 100% sure what you're asking for, here goes...
The issue that Duff's device addresses is one of loop unwinding (as you'll no doubt have seen on the Wiki link you posted). What this basically equates to is an optimisation of run-time efficiency, over memory footprint. Duff's device deals with serial copying, rather than just any old problem, but is a classic example of how optimisations can be made by reducing the number of times that a comparison needs to be done in a loop.
As an alternative example, which may make it easier to understand, imagine you have an array of items you wish to loop over, and add 1 to them each time... ordinarily, you might use a for loop, and loop around 100 times. This seems fairly logical and, it is... however, an optimisation can be made by unwinding the loop (obviously not too far... or you may as well just not use the loop).
So a regular for loop:
for(int i = 0; i < 100; i++)
{
myArray[i] += 1;
}
becomes
for(int i = 0; i < 100; i+10)
{
myArray[i] += 1;
myArray[i+1] += 1;
myArray[i+2] += 1;
myArray[i+3] += 1;
myArray[i+4] += 1;
myArray[i+5] += 1;
myArray[i+6] += 1;
myArray[i+7] += 1;
myArray[i+8] += 1;
myArray[i+9] += 1;
}
What Duff's device does is implement this idea, in C, but (as you saw on the Wiki) with serial copies. What you're seeing above, with the unwound example, is 10 comparisons compared to 100 in the original - this amounts to a minor, but possibly significant, optimisation.

Here's a non-detailed explanation which is what I feel to be the crux of Duff's device:
The thing is, C is basically a nice facade for assembly language (PDP-7 assembly to be specific; if you studied that you would see how striking the similarities are). And, in assembly language, you don't really have loops - you have labels and conditional-branch instructions. So the loop is just a part of the overall sequence of instructions with a label and a branch somewhere:
instruction
label1: instruction
instruction
instruction
instruction
jump to label1 if some_condition
and a switch instruction is branching/jumping ahead somewhat:
evaluate expression into register r
compare r with first case value
branch to first case label if equal
compare r with second case value
branch to second case label if equal
etc....
first_case_label:
instruction
instruction
second_case_label:
instruction
instruction
etc...
In assembly it's easily conceivable how to combine these two control structures, and when you think of it that way, their combination in C doesn't seem so weird anymore.

This is an answer I posted to another question about Duff's Device that got some upvaotes before the question was closed as a duplicate. I think it provides a bit of valuable context here on why you should avoid this construct.
"This is Duff's Device. It's a method of unrolling loops that avoids having to add a secondary fix-up loop to deal with times when the number of loop iteration isn't know to be an exact multiple of the unrolling factor.
Since most answers here seem to be generally positive about it I'm going to highlight the downsides.
With this code a compiler is going to struggle to apply any optimization to the loop body. If you just wrote the code as a simple loop a modern compiler should be able to handle the unrolling for you. This way you maintain readability and performance and have some hope of other optimizations being applied to the loop body.
The Wikipedia article referenced by others even says when this 'pattern' was removed from the Xfree86 source code performance actually improved.
This outcome is typical of blindly hand optimizing any code you happen to think might need it. It prevents the compiler from doing its job properly, makes your code less readable and more prone to bugs and typically slows it down. If you were doing things the right way in the first place, i.e. writing simple code, then profiling for bottlenecks, then optimizing, you'd never even think to use something like this. Not with a modern CPU and compiler anyway.
It's fine to understand it, but I'd be surprised if you ever actually use it."

Here is a working example for 64-bit memcpy with Duff's Device:
#include <iostream>
#include <memory>
inline void __memcpy(void* to, const void* from, size_t count)
{
size_t numIter = (count + 56) / 64; // gives the number of iterations; bit shift actually, not division
size_t rest = count & 63; // % 64
size_t rest7 = rest&7;
rest -= rest7;
// Duff's device with zero case handled:
switch (rest)
{
case 0: if (count < 8)
break;
do { *(((unsigned long long*&)to)++) = *(((unsigned long long*&)from)++);
case 56: *(((unsigned long long*&)to)++) = *(((unsigned long long*&)from)++);
case 48: *(((unsigned long long*&)to)++) = *(((unsigned long long*&)from)++);
case 40: *(((unsigned long long*&)to)++) = *(((unsigned long long*&)from)++);
case 32: *(((unsigned long long*&)to)++) = *(((unsigned long long*&)from)++);
case 24: *(((unsigned long long*&)to)++) = *(((unsigned long long*&)from)++);
case 16: *(((unsigned long long*&)to)++) = *(((unsigned long long*&)from)++);
case 8: *(((unsigned long long*&)to)++) = *(((unsigned long long*&)from)++);
} while (--numIter > 0);
}
switch (rest7)
{
case 7: *(((unsigned char*)to)+6) = *(((unsigned char*)from)+6);
case 6: *(((unsigned short*)to)+2) = *(((unsigned short*)from)+2); goto case4;
case 5: *(((unsigned char*)to)+4) = *(((unsigned char*)from)+4);
case 4: case4: *((unsigned long*)to) = *((unsigned long*)from); break;
case 3: *(((unsigned char*)to)+2) = *(((unsigned char*)from)+2);
case 2: *((unsigned short*)to) = *((unsigned short*)from); break;
case 1: *((unsigned char*)to) = *((unsigned char*)from);
}
}
void main()
{
static const size_t NUM = 1024;
std::unique_ptr<char[]> str1(new char[NUM+1]);
std::unique_ptr<char[]> str2(new char[NUM+1]);
for (size_t i = 0 ; i < NUM ; ++ i)
{
size_t idx = (i % 62);
if (idx < 26)
str1[i] = 'a' + idx;
else
if (idx < 52)
str1[i] = 'A' + idx - 26;
else
str1[i] = '0' + idx - 52;
}
for (size_t i = 0 ; i < NUM ; ++ i)
{
memset(str2.get(), ' ', NUM);
__memcpy(str2.get(), str1.get(), i);
if (memcmp(str1.get(), str2.get(), i) || str2[i] != ' ')
{
std::cout << "Test failed for i=" << i;
}
}
return;
}
It handles zero-length case (in original Duff's Device there is assumption num>0).
Function main() contains simple test cases for __memcpy.

Just experimenting, found another variant getting along without interleaving switch statement and do-while-loop:
int n = (count + 1) / 8;
switch (count % 8)
{
LOOP:
case 0:
if(n-- == 0)
break;
putchar('.');
case 7:
putchar('.');
case 6:
putchar('.');
case 5:
putchar('.');
case 4:
putchar('.');
case 3:
putchar('.');
case 2:
putchar('.');
case 1:
putchar('.');
default:
goto LOOP;
}
Technically, the goto still implements a loop, but this variant might be slightly more readable.

Related

c++ Do-while loop inside switch-case Statement [duplicate]

I've read the article on Wikipedia on the Duff's device, and I don't get it. I am really interested, but I've read the explanation there a couple of times and I still don't get it how the Duff's device works.
What would a more detailed explanation be?
There are some good explanations elsewhere, but let me give it a try. (This is a lot easier on a whiteboard!) Here's the Wikipedia example with some notations.
Let's say you're copying 20 bytes. The flow control of the program for the first pass is:
int count; // Set to 20
{
int n = (count + 7) / 8; // n is now 3. (The "while" is going
// to be run three times.)
switch (count % 8) { // The remainder is 4 (20 modulo 8) so
// jump to the case 4
case 0: // [skipped]
do { // [skipped]
*to = *from++; // [skipped]
case 7: *to = *from++; // [skipped]
case 6: *to = *from++; // [skipped]
case 5: *to = *from++; // [skipped]
case 4: *to = *from++; // Start here. Copy 1 byte (total 1)
case 3: *to = *from++; // Copy 1 byte (total 2)
case 2: *to = *from++; // Copy 1 byte (total 3)
case 1: *to = *from++; // Copy 1 byte (total 4)
} while (--n > 0); // N = 3 Reduce N by 1, then jump up
// to the "do" if it's still
} // greater than 0 (and it is)
}
Now, start the second pass, we run just the indicated code:
int count; //
{
int n = (count + 7) / 8; //
//
switch (count % 8) { //
//
case 0: //
do { // The while jumps to here.
*to = *from++; // Copy 1 byte (total 5)
case 7: *to = *from++; // Copy 1 byte (total 6)
case 6: *to = *from++; // Copy 1 byte (total 7)
case 5: *to = *from++; // Copy 1 byte (total 8)
case 4: *to = *from++; // Copy 1 byte (total 9)
case 3: *to = *from++; // Copy 1 byte (total 10)
case 2: *to = *from++; // Copy 1 byte (total 11)
case 1: *to = *from++; // Copy 1 byte (total 12)
} while (--n > 0); // N = 2 Reduce N by 1, then jump up
// to the "do" if it's still
} // greater than 0 (and it is)
}
Now, start the third pass:
int count; //
{
int n = (count + 7) / 8; //
//
switch (count % 8) { //
//
case 0: //
do { // The while jumps to here.
*to = *from++; // Copy 1 byte (total 13)
case 7: *to = *from++; // Copy 1 byte (total 14)
case 6: *to = *from++; // Copy 1 byte (total 15)
case 5: *to = *from++; // Copy 1 byte (total 16)
case 4: *to = *from++; // Copy 1 byte (total 17)
case 3: *to = *from++; // Copy 1 byte (total 18)
case 2: *to = *from++; // Copy 1 byte (total 19)
case 1: *to = *from++; // Copy 1 byte (total 20)
} while (--n > 0); // N = 1 Reduce N by 1, then jump up
// to the "do" if it's still
} // greater than 0 (and it's not, so bail)
} // continue here...
20 bytes are now copied.
Note: The original Duff's Device (shown above) copied to an I/O device at the to address. Thus, it wasn't necessary to increment the pointer *to. When copying between two memory buffers you'd need to use *to++.
The explanation in Dr. Dobb's Journal is the best that I found on the topic.
This being my AHA moment:
for (i = 0; i < len; ++i) {
HAL_IO_PORT = *pSource++;
}
becomes:
int n = len / 8;
for (i = 0; i < n; ++i) {
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
}
n = len % 8;
for (i = 0; i < n; ++i) {
HAL_IO_PORT = *pSource++;
}
becomes:
int n = (len + 8 - 1) / 8;
switch (len % 8) {
case 0: do { HAL_IO_PORT = *pSource++;
case 7: HAL_IO_PORT = *pSource++;
case 6: HAL_IO_PORT = *pSource++;
case 5: HAL_IO_PORT = *pSource++;
case 4: HAL_IO_PORT = *pSource++;
case 3: HAL_IO_PORT = *pSource++;
case 2: HAL_IO_PORT = *pSource++;
case 1: HAL_IO_PORT = *pSource++;
} while (--n > 0);
}
There are two key things to Duff's device. First, which I suspect is the easier part to understand, the loop is unrolled. This trades larger code size for more speed by avoiding some of the overhead involved in checking whether the loop is finished and jumping back to the top of the loop. The CPU can run faster when it's executing straight-line code instead of jumping.
The second aspect is the switch statement. It allows the code to jump into the middle of the loop the first time through. The surprising part to most people is that such a thing is allowed. Well, it's allowed. Execution starts at the calculated case label, and then it falls through to each successive assignment statement, just like any other switch statement. After the last case label, execution reaches the bottom of the loop, at which point it jumps back to the top. The top of the loop is inside the switch statement, so the switch is not re-evaluated anymore.
The original loop is unwound eight times, so the number of iterations is divided by eight. If the number of bytes to be copied isn't a multiple of eight, then there are some bytes left over. Most algorithms that copy blocks of bytes at a time will handle the remainder bytes at the end, but Duff's device handles them at the beginning. The function calculates count % 8 for the switch statement to figure what the remainder will be, jumps to the case label for that many bytes, and copies them. Then the loop continues to copy groups of eight bytes.
The point of duffs device is to reduce the number of comparisons done in a tight memcpy implementation.
Suppose you want to copy 'count' bytes from b to a, the straight forward approach is to do the following:
do {
*a = *b++;
} while (--count > 0);
How many times do you need to compare count to see if it's a above 0? 'count' times.
Now, the duff device uses a nasty unintentional side effect of a switch case which allows you to reduce the number of comparisons needed to count / 8.
Now suppose you want to copy 20 bytes using duffs device, how many comparisons would you need? Only 3, since you copy eight bytes at a time except the last first one where you copy just 4.
UPDATED: You don't have to do 8 comparisons/case-in-switch statements, but it's reasonable a trade-off between function size and speed.
When I read it for the first time, I autoformatted it to this
void dsend(char* to, char* from, count) {
int n = (count + 7) / 8;
switch (count % 8) {
case 0: do {
*to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
}
and I had no idea what was happening.
Maybe not when this question was asked, but now Wikipedia has a very good explanation
The device is valid, legal C by virtue of two attributes in C:
Relaxed specification of the switch statement in the language's definition. At the time of the device's invention this was the first edition of The C Programming Language which requires only that the controlled statement of the switch be a syntactically valid (compound) statement within which case labels can appear prefixing any sub-statement. In conjunction with the fact that, in the absence of a break statement, the flow of control will fall-through from a statement controlled by one case label to that controlled by the next, this means that the code specifies a succession of count copies from sequential source addresses to the memory-mapped output port.
The ability to legally jump into the middle of a loop in C.
1: Duffs device is a particular implementation of loop unrolling. Loop unrolling is an optimisation technique applicable if you have an operation to perform N times in a loop - you can trade program size for speed by executing the loop N/n times and then in the loop inlining (unrolling) the loop code n times e.g. replacing:
for (int i=0; i<N; i++) {
// [The loop code...]
}
with
for (int i=0; i<N/n; i++) {
// [The loop code...]
// [The loop code...]
// [The loop code...]
...
// [The loop code...] // n times!
}
Which works great if N % n == 0 - no need for Duff! If that is not true then you have to handle the remainder - which is a pain.
2: How does Duffs device differ from this standard loop unrolling?
Duffs device is just a clever way of dealing with the remainder loop cycles when N % n != 0. The whole do / while executes N / n number of times as per standard loop unrolling (because the case 0 applies). On the last first run through the loop the case kicks in and we run the loop code the 'remainder' number of times - the remaining runs through the loop run 'normally'.
Though I'm not 100% sure what you're asking for, here goes...
The issue that Duff's device addresses is one of loop unwinding (as you'll no doubt have seen on the Wiki link you posted). What this basically equates to is an optimisation of run-time efficiency, over memory footprint. Duff's device deals with serial copying, rather than just any old problem, but is a classic example of how optimisations can be made by reducing the number of times that a comparison needs to be done in a loop.
As an alternative example, which may make it easier to understand, imagine you have an array of items you wish to loop over, and add 1 to them each time... ordinarily, you might use a for loop, and loop around 100 times. This seems fairly logical and, it is... however, an optimisation can be made by unwinding the loop (obviously not too far... or you may as well just not use the loop).
So a regular for loop:
for(int i = 0; i < 100; i++)
{
myArray[i] += 1;
}
becomes
for(int i = 0; i < 100; i+10)
{
myArray[i] += 1;
myArray[i+1] += 1;
myArray[i+2] += 1;
myArray[i+3] += 1;
myArray[i+4] += 1;
myArray[i+5] += 1;
myArray[i+6] += 1;
myArray[i+7] += 1;
myArray[i+8] += 1;
myArray[i+9] += 1;
}
What Duff's device does is implement this idea, in C, but (as you saw on the Wiki) with serial copies. What you're seeing above, with the unwound example, is 10 comparisons compared to 100 in the original - this amounts to a minor, but possibly significant, optimisation.
Here's a non-detailed explanation which is what I feel to be the crux of Duff's device:
The thing is, C is basically a nice facade for assembly language (PDP-7 assembly to be specific; if you studied that you would see how striking the similarities are). And, in assembly language, you don't really have loops - you have labels and conditional-branch instructions. So the loop is just a part of the overall sequence of instructions with a label and a branch somewhere:
instruction
label1: instruction
instruction
instruction
instruction
jump to label1 if some_condition
and a switch instruction is branching/jumping ahead somewhat:
evaluate expression into register r
compare r with first case value
branch to first case label if equal
compare r with second case value
branch to second case label if equal
etc....
first_case_label:
instruction
instruction
second_case_label:
instruction
instruction
etc...
In assembly it's easily conceivable how to combine these two control structures, and when you think of it that way, their combination in C doesn't seem so weird anymore.
This is an answer I posted to another question about Duff's Device that got some upvaotes before the question was closed as a duplicate. I think it provides a bit of valuable context here on why you should avoid this construct.
"This is Duff's Device. It's a method of unrolling loops that avoids having to add a secondary fix-up loop to deal with times when the number of loop iteration isn't know to be an exact multiple of the unrolling factor.
Since most answers here seem to be generally positive about it I'm going to highlight the downsides.
With this code a compiler is going to struggle to apply any optimization to the loop body. If you just wrote the code as a simple loop a modern compiler should be able to handle the unrolling for you. This way you maintain readability and performance and have some hope of other optimizations being applied to the loop body.
The Wikipedia article referenced by others even says when this 'pattern' was removed from the Xfree86 source code performance actually improved.
This outcome is typical of blindly hand optimizing any code you happen to think might need it. It prevents the compiler from doing its job properly, makes your code less readable and more prone to bugs and typically slows it down. If you were doing things the right way in the first place, i.e. writing simple code, then profiling for bottlenecks, then optimizing, you'd never even think to use something like this. Not with a modern CPU and compiler anyway.
It's fine to understand it, but I'd be surprised if you ever actually use it."
Here is a working example for 64-bit memcpy with Duff's Device:
#include <iostream>
#include <memory>
inline void __memcpy(void* to, const void* from, size_t count)
{
size_t numIter = (count + 56) / 64; // gives the number of iterations; bit shift actually, not division
size_t rest = count & 63; // % 64
size_t rest7 = rest&7;
rest -= rest7;
// Duff's device with zero case handled:
switch (rest)
{
case 0: if (count < 8)
break;
do { *(((unsigned long long*&)to)++) = *(((unsigned long long*&)from)++);
case 56: *(((unsigned long long*&)to)++) = *(((unsigned long long*&)from)++);
case 48: *(((unsigned long long*&)to)++) = *(((unsigned long long*&)from)++);
case 40: *(((unsigned long long*&)to)++) = *(((unsigned long long*&)from)++);
case 32: *(((unsigned long long*&)to)++) = *(((unsigned long long*&)from)++);
case 24: *(((unsigned long long*&)to)++) = *(((unsigned long long*&)from)++);
case 16: *(((unsigned long long*&)to)++) = *(((unsigned long long*&)from)++);
case 8: *(((unsigned long long*&)to)++) = *(((unsigned long long*&)from)++);
} while (--numIter > 0);
}
switch (rest7)
{
case 7: *(((unsigned char*)to)+6) = *(((unsigned char*)from)+6);
case 6: *(((unsigned short*)to)+2) = *(((unsigned short*)from)+2); goto case4;
case 5: *(((unsigned char*)to)+4) = *(((unsigned char*)from)+4);
case 4: case4: *((unsigned long*)to) = *((unsigned long*)from); break;
case 3: *(((unsigned char*)to)+2) = *(((unsigned char*)from)+2);
case 2: *((unsigned short*)to) = *((unsigned short*)from); break;
case 1: *((unsigned char*)to) = *((unsigned char*)from);
}
}
void main()
{
static const size_t NUM = 1024;
std::unique_ptr<char[]> str1(new char[NUM+1]);
std::unique_ptr<char[]> str2(new char[NUM+1]);
for (size_t i = 0 ; i < NUM ; ++ i)
{
size_t idx = (i % 62);
if (idx < 26)
str1[i] = 'a' + idx;
else
if (idx < 52)
str1[i] = 'A' + idx - 26;
else
str1[i] = '0' + idx - 52;
}
for (size_t i = 0 ; i < NUM ; ++ i)
{
memset(str2.get(), ' ', NUM);
__memcpy(str2.get(), str1.get(), i);
if (memcmp(str1.get(), str2.get(), i) || str2[i] != ' ')
{
std::cout << "Test failed for i=" << i;
}
}
return;
}
It handles zero-length case (in original Duff's Device there is assumption num>0).
Function main() contains simple test cases for __memcpy.
Just experimenting, found another variant getting along without interleaving switch statement and do-while-loop:
int n = (count + 1) / 8;
switch (count % 8)
{
LOOP:
case 0:
if(n-- == 0)
break;
putchar('.');
case 7:
putchar('.');
case 6:
putchar('.');
case 5:
putchar('.');
case 4:
putchar('.');
case 3:
putchar('.');
case 2:
putchar('.');
case 1:
putchar('.');
default:
goto LOOP;
}
Technically, the goto still implements a loop, but this variant might be slightly more readable.

how does this switch block executes?

#include<bits/stdc++.h>
using namespace std;
void show(int errorCause)
{
switch(errorCause)
{
case 1:
{
cout<<"in 1\n";
break;
}
case 2: break;
case 3:
{
cout<<"in 3\n";
break;
case 4:
{
cout<<"in 4\n";
case 5: cout<<"in 5\n";
break;
}
}
break;
default:
{
cout<<"in deafult\n";
break;
}
}
return;
}
int main()
{
show(5);
return 0;
}
I used this sample of code and I could not figure out its flow.According to me it should match the default condition as the errorCause does not match anything,but its output is:
in 5
I don't understand why it is not going to default condition?
Here is my build environment details:
compiler:
g++ version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
System:
Ubuntu 14.04(64-bit)
You pass 5, why should the switch statement not go into 'case 5'?
To make it clear: Remove all these curly braces inside the switch-block, none of them is necessary. The re-align and format the code, then it should be clear.
case/default labels for a switch statement may appear anywhere within that switch statement, except within a nested switch statement.
A famous example of this usage is Duff's device for unrolling loops:
void copy(unsigned char *to, const unsigned char *from, size_t count)
{
size_t n;
if (!count)
return;
n = (count + 7) / 8;
switch (count % 8) {
case 0:
do {
*to++ = *from++;
case 1:
*to++ = *from++;
case 2:
*to++ = *from++;
case 3:
*to++ = *from++;
case 4:
*to++ = *from++;
case 5:
*to++ = *from++;
case 6:
*to++ = *from++;
case 7:
*to++ = *from++;
case 1:
} while (--n > 0);
}
}
(adapted from the original).
At first glance, that doesn't make any sense (and it is somewhat redundant if you allow the compiler to unroll loops for you), but it illustrates that case labels can be placed more or less where you like within the switch statement.
First, don't write code like that. <g>
Second, the reason that it gets to case 5: is simply that there's a case 5: inside the switch statement. It doesn't matter that it's nested inside two levels of curly braces; it's just a label for the code to jump to. It doesn't have to be at the outer level of the switch statement.
It's because actually the switch statement evaluation is "relaxed", so the braces do not matter there. Only case matters, but you can jump right into the middle of a scope by the case (or even to the middle of a loop, see Duff's device).
because the value you passed is 5 , which exactly matches with the switch case parameter.
case 5: cout<<"in 5\n";
break;
if you want to get the default statement then modify the main function as shown below :
int main()
{
show(6);
return 0;
}
hope this helps.

Switch statement c++

I am writing a code to assign a scoring system to values of a card. I have a member function that takes an int and changes its value based on the scoring system. I can't seem to get it to output anything besides 10 :
int Obj::eval(int b)
{
switch (b)
{
case 0:
b = 11; //automatically assigns ace value of 11
case 1:
b = 2;
case 2:
b = 3;
case 3:
b = 4;
case 4:
b = 5;
case 5:
b = 6;
case 6:
b = 7;
case 7:
b = 8;
case 8:
b = 9;
case 9:
b = 10;
case 10:
b = 10;
case 11:
b = 10;
case 12:
b = 10;
}
return b;
}
Insert break at the end of each case. C's switch is "fall-through": if you don't prevent it, code just keeps executing next line: if b is 0, all the assignments will get executed, in order. break will jump out of the switch.
I.e. your code needs to look like this:
switch (b)
{
case 0:
b = 11; //automatically assigns ace value of 11
break;
case 1:
b = 2;
break;
/* ... */
A switch case should end with a break;. Otherwise there will be a fall through and all the subsequent cases will be executed.
Your code should look something similar to this.
switch(b)
{
case 0:
//bodyhere
break;
case 1:
//bodyhere
break;
}
It's important not to miss the break statements unless you intend to execute the following cases too.
You should always use the break statement after each case because C++ will continue to execute the next case. For example:
switch(b)
{
case 0:
// body
// body
break;
case 1:
// body
// body
break;
case 2:
// body
// body
break;
}
You can also use a default as a last case. However, you don't need a break statement with the default.
switch(b)
{
case 0:
// body
break;
case 1:
// body
break;
case 2:
// body
break;
default:
// body
}
Switch statements in C and C++ have a "feature" called fallthrough, where if you don't actually break out of the cases, execution will just continue through to the next case (thus always resulting in b receiving 10).
Add break statements after each case.
case 0:
b = 11; //automatically assigns ace value of 11
break;
case 1:
b = 2;
break;
case 2:
b = 3;
break;
// etc.
Languages derived from C where control moves to the matching case, and then execution continues or "falls through" to the statements associated with the next case in the source text. Should use a break to avoid it.
case 0:
b = 11;
break;
case 1:
b = 2;
break;
In your case return would be fine too,
case 0:
b = 11;
return b;
case 1:
b = 2;
return b;
In addition to the case fallthrough problem described above, there are more efficient means other than switch to map between values, such as std::map
static map<int, int> myMap = { { 0, 11 }, { 1, 2 }, { 2, 3 }, ... { 11, 10 }, { 12, 10 } };
int Obj::eval(int b)
{
return myMap[b];
}
You need to add a break statement at the end of each branch. Otherwise the flow of control will continue to the next branch.

Revising the syntax of Duff's device - Is this legal C/C++?

Only last night I encountered the curious Duff's device for the first time. I've been doing some reading on it, and I don't think it's that daunting to understand. What I am curious about is the strange syntax (from Wikipedia):
register short *to, *from;
register int count;
{
register int n = (count + 7) / 8;
switch(count % 8) {
case 0: do { *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while(--n > 0);
}
}
I was reading the C++ standard definition of a switch statement (please let me know if that's outdated, I'm not familiar with Open-Std.org). So far as I can understand case statements are just simplified jump statements for use by the switch statement.
The switch itself completely ignores the nested do-while, and the loop ignores the case statements. Since the switch jumps inside of the loop, the loop is executed. The switch is there to cover the remainder (from the division by 8), and the loop handles the portion that is evenly divisible. This all makes sense.
My question then is why the awkward syntax? It occurs to me that the loop could be written such that all of the case statements are contained within, yes? I see nothing in the standard that prohibits this behavior, and it compiles correctly under GCC 4.7, so is the following considered legal?
register short *to, *from;
register int count;
{
register int n = (count + 7) / 8;
switch (count <= 0 ? 8 : count % 8)
{
do
{
case 0: *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
default: ; // invalid count, suppress warning, etc.
} while(--n > 0);
}
}
To me this makes the intent of the code much clearer. Thanks for any feedback. ;)
Edit: As noted below, the original code was written for C and had implicit int for the count and n variables. Since I tagged it C++ I've modified that.
Edit 2: Modified the revised example code to account for invalid count values.
Looking at the C++11 Standard, the part of the code I think you're asking about would be allowed. Have you tried it?
The rule I see that's most applicable is:
Note: Usually, the substatement that is the subject of a switch is compound and case and default labels appear on the top-level statements contained within the (compound) substatement, but this is not required.
In fact, this means you could get rid of the braces around the do-while, and write
int n = (count + 7) / 8;
switch (count % 8) do
{
case 0: *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while(--n > 0);
However, this line is NOT valid C++:
register n = (count + 7) / 8;
C++ does not allow default-int, the type of a variable must be specified or inferred.
Oh here, fix the number of iterations without breaking formatting:
int n = 1 + count / 8;
switch (count % 8) do
{
*to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
case 0: ;
} while(--n > 0);
The code is certainly legal: There are no requirements at all about for blocks and/or loops. It is worth noting, though, that count == 0 isn't properly handled by the above loop. It is, however, properly handled by this one:
int count = atoi(ac == 1? "1": av[1]);
switch (count % 4)
{
case 0: while (0 < count) { std::cout << "0\n";
case 3: std::cout << "3\n";
case 2: std::cout << "2\n";
case 1: std::cout << "1\n";
count -= 4;
}
}
Putting the case 0 label inside the loop would also incorrectly execute the nested statements. Although I have seen Duff's device always using a do-while loop, it seems the above code is more natural to deal with the boundary condition.
Yes, it is legal. The labels of a switch statement is typically written at the same level as the switch statement. However, it is legal to write them inside compound statements, e.g. in the middle of a loop.
EDIT: There is no requirement that the body of the switch statement must start with a label, any code is legal. However, there is no way into it from the switch statement itself, so unless it is a loop, or a plain label, the code will be unreachable.
Another interesting thing with the swtich statement is that the braces are optional. However, in that case only one label is allowed:
switch (i)
case 5: printf("High five\n");

Is there any reason to use switch statement instead of strings of if and elseif? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Advantage of switch over if-else statement
Why the switch statement and not if-else?
The switch statement seems to be totally useless. Anything it can do can be done by if and else if link.
They probably even compile to the same code.
So why bother having it?
The break statements in switch drives me crazy and that label: format reminds me of goto.
This is for objective-c, c, C++. I am not sure if vb.net has switch statement, but even if it does I must have forgotten because I never use it.
They may well compile to the same code. But the intent is not necessarily to provide better compiled code so much as it is to provide better source code.
You can do while or for loops with if and goto as well but that doesn't make while and for useless. Would you rather have:
for (i = 0; i < 10; i++)
doSomethingWith (i);
or:
i = 0;
loop12:
if (! (i < 10))
goto skip12
doSomethingWith (i);
i++;
goto loop12
skip12:
if (color == WHITE)
{
}
else if (color == BLACK)
{
}
else if (color == GREY)
{
}
else if ((color == ORANGE) || (color == GREEN) || (color == BLUE))
{
}
else
{
}
vs
switch(color)
{
case WHITE:
break;
case BLACK:
break;
case GREY:
break;
case ORANGE:
case GREEN:
case BLUE:
break;
default:
break;
}
Isn't the latter more readable and requires lesser key strokes?
Apart from readability there's another unique use of switch-case: Duff's Device. This technique exploits the goto-ness of switch-case coupled with while.
void dsend(char* to, char* from, count) {
int n = (count + 7) / 8;
switch (count % 8) {
case 0: do {
*to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
}
Performance of switch is same as if and else if blocks in the worst case. It may be better. This has been discussed before: Advantage of switch over if-else statement
Pros:
Switch offers a better way to write program than if
Switch works faster than if because during execution compiler generates jump table to decide which case is satisfied rather than checking which case is satisfied!
Cons:
case can only have int or char constants or an expression that evaluates to one of these!