__m256d TRANSPOSE4 Equivalent? - c++

Intel has included __MM_TRANPOSE4_PS to transpose a 4x4 matrix of vectors. I'm wanting to do the equivalent with __m256d. However, I can't seem to figure out how to get _mm256_shuffle_pd in the same manner.
_MM_TRANSPOSE4_PS Code
#define _MM_TRANSPOSE4_PS(row0, row1, row2, row3) { \
__m128 tmp3, tmp2, tmp1, tmp0; \
\
tmp0 = _mm_shuffle_ps((row0), (row1), 0x44); \
tmp2 = _mm_shuffle_ps((row0), (row1), 0xEE); \
tmp1 = _mm_shuffle_ps((row2), (row3), 0x44); \
tmp3 = _mm_shuffle_ps((row2), (row3), 0xEE); \
\
(row0) = _mm_shuffle_ps(tmp0, tmp1, 0x88); \
(row1) = _mm_shuffle_ps(tmp0, tmp1, 0xDD); \
(row2) = _mm_shuffle_ps(tmp2, tmp3, 0x88); \
(row3) = _mm_shuffle_ps(tmp2, tmp3, 0xDD); \
}
My attempt at a _MM_TRANSPOSE4_PD inside a loop i need it in
for (int copy = i; copy < m2.size();)
{
__m256d row0 = _mm256_load_pd(m2data + copy);
copy += m2.col();
__m256d row1 = _mm256_load_pd(m2data + copy);
copy += m2.col();
__m256d row2 = _mm256_load_pd(m2data + copy);
copy += m2.col();
__m256d row3 = _mm256_load_pd(m2data + copy);
copy += m2.col();
__m256d tmp3, tmp2, tmp1, tmp0;
tmp0 = _mm256_shuffle_pd(row0,row1, 0x44);
tmp2 = _mm256_shuffle_pd(row0,row1, 0xEE);
tmp1 = _mm256_shuffle_pd(row2,row3, 0x44);
tmp3 = _mm256_shuffle_pd(row2,row3, 0xEE);
row0 = _mm256_shuffle_pd(tmp0, tmp1, 0x88);
row1 = _mm256_shuffle_pd(tmp0, tmp1, 0xDD);
row2 = _mm256_shuffle_pd(tmp2, tmp3, 0x88);
row3 = _mm256_shuffle_pd(tmp2, tmp3, 0xDD);
_mm256_store_pd(reinterpret_cast<double*>(buffer + counter++),row0);
_mm256_store_pd(reinterpret_cast<double*>(buffer + counter++),row1);
_mm256_store_pd(reinterpret_cast<double*>(buffer + counter++),row2);
_mm256_store_pd(reinterpret_cast<double*>(buffer + counter++),row3);
}

Here is the macro equivalent of the solution I found.
#define _MM_TRANSPOSE4_PD(row0,row1,row2,row3) \
{ \
__m256d tmp3, tmp2, tmp1, tmp0; \
\
tmp0 = _mm256_shuffle_pd((row0),(row1), 0x0); \
tmp2 = _mm256_shuffle_pd((row0),(row1), 0xF); \
tmp1 = _mm256_shuffle_pd((row2),(row3), 0x0); \
tmp3 = _mm256_shuffle_pd((row2),(row3), 0xF); \
\
(row0) = _mm256_permute2f128_pd(tmp0, tmp1, 0x20); \
(row1) = _mm256_permute2f128_pd(tmp2, tmp3, 0x20); \
(row2) = _mm256_permute2f128_pd(tmp0, tmp1, 0x31); \
(row3) = _mm256_permute2f128_pd(tmp2, tmp3, 0x31); \
}

Related

How can I merge two X-Macros together?

I have a lot of repetitive code that needs me to use different sets of data frequently in some function or some operation. i.e as shown below (the numbers and letters are just place holders, all i need to do is string two sets of data together using x macros)
a = 1
a = 2
a = 3
a = 4
.
.
.
then
b = 1
b = 2
b = 3
.
.
.
and
c = 1
c = 2
c = 3
.
.
.
I was trying to create an X-macro that combines the following two X-macros into one
//X-macro 1
#define SET_1 \
X(a) \
X(b) \
X(c) \
//X-macro 2
#define SET_2 \
X(1) \
X(2) \
X(3) \
X(4)
Any help?
How about this approach:
#define X_abc(X,X2) \
X(a,X2) \
X(b,X2) \
X(c,X2)
#define X_1234(x,X2) \
X2(x,1) \
X2(x,2) \
X2(x,3) \
X2(x,4)
#define SET(x,y) x = y;
#define DEFINE(x,y) int x = y;
X_abc(X_1234,DEFINE)

C++ Boost::Spirit parsing complex boolean expressions and constructing an equivalent tree

Our input expressions are similar to this (even more complex):
( ( ?var1 <= (?var2 + 125) && ?var1 > (?var2 + 10) ) || !(?var1 == ?var3) )
Note: variables are always started by either '?' or '_'
Our desired output:
||
/ \
/ \
/ \
/ \
/ \
&& !
/ \ |
/ \ |
/ \ ==
/ \ / \
/ \ ?var1 ?var3
<= >
/ \ / \
/ \ / \
/ \ / \
?var1 + ?var1 +
/ \ / \
/ \ / \
/ \ / \
?var2 125 ?var2 10
Your helps are really appreciated.

Extracting first parameter of __VA_ARGS__

Suppose that I have a macro:
#define FOO(a, ...) if (a) foo(a, ## __VA_ARGS__)
This works well:
FOO(a) will be transformed to if (a) foo(a)
FOO(a, <some_parameters>) will be transformed to if (a) foo(a, <some_parameters>)
Is it possible to modify this macro, so only the first parameter of __VA_ARGS__ (if exists) passed to foo? So, I need:
FOO(a) to be transformed to if (a) foo(a)
FOO(a, b, <some_parameters>) to be transformed to if (a) foo(a, b)
I've tried to solve this with the same idea as BOOST_PP_VARIADIC_SIZE has, but it turned out this macro returns 1 for BOOST_PP_VARIADIC_SIZE() (empty arguments), which is not expected (I expected 0).
Note, that I need a solution, where b and <some_parameters> are evaluated only when bool(a) is true.
I propose a variadic macro with a generic lambda as a solution.
The Important points are as follows:
It is difficult to pass both a and __VA_ARGS__ to a lambda as passed arguments in macro because when __VA_ARGS__ is empty
[](){...}(a, __VA_ARGS__)
becomes
[](){...}(a,)
and this , leads compilation error.
Thus we split the first and second arguments of FOO into the captured and the passed ones respectively as follows.
Then we can use a generic lambda in the macro even if __VA_ARGS__ is empty.
[a](){...}(__VA_ARGS__)
The size of __VA_ARGS__ can be evaluated at compile-time as constexpr auto N. Then we can use if constexpr to separate function calls.
We can also apply if statement with initializer which is introduced from C++17 for if(a).
Then the proposed macro is as follows.
This also works for you.
DEMO
#include <tuple>
#define FOO(a, ...) \
if(const bool a_ = (a); a_) \
[a_](auto&&... args) \
{ \
const auto t = std::make_tuple(std::forward<decltype(args)>(args)...); \
constexpr auto N = std::tuple_size<decltype(t)>::value; \
\
if constexpr( N==0 ) { \
return foo(a_); \
} \
else { \
return foo(a_, std::get<0>(t)); \
} \
}(__VA_ARGS__)
Based on this answer, I could solve the problem:
#define PRIVATE_CONCAT(a, b) a ## b
#define CONCAT(a, b) PRIVATE_CONCAT(a, b)
#define GET_100TH( \
_01, _02, _03, _04, _05, _06, _07, _08, _09, _10, \
_11, _12, _13, _14, _15, _16, _17, _18, _19, _20, \
_21, _22, _23, _24, _25, _26, _27, _28, _29, _30, \
_31, _32, _33, _34, _35, _36, _37, _38, _39, _40, \
_41, _42, _43, _44, _45, _46, _47, _48, _49, _50, \
_51, _52, _53, _54, _55, _56, _57, _58, _59, _60, \
_61, _62, _63, _64, _65, _66, _67, _68, _69, _70, \
_71, _72, _73, _74, _75, _76, _77, _78, _79, _80, \
_81, _82, _83, _84, _85, _86, _87, _88, _89, _90, \
_91, _92, _93, _94, _95, _96, _97, _98, _99, PAR, \
...) PAR
#define HAS_PARAMETER(...) GET_100TH(placeholder, ##__VA_ARGS__, \
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, \
1, 1, 1, 1, 1, 1, 1, 1, 0)
#define FIRST_PARAMETER_WITH_PREPENDED_COMMA0(...)
#define FIRST_PARAMETER_WITH_PREPENDED_COMMA1(a, ...) , a
#define FIRST_PARAMETER_WITH_PREPENDED_COMMA(...) CONCAT(FIRST_PARAMETER_WITH_PREPENDED_COMMA, HAS_PARAMETER(__VA_ARGS__))(__VA_ARGS__)
#define FOO(a, ...) if (a) foo(a FIRST_PARAMETER_WITH_PREPENDED_COMMA(__VA_ARGS__))

Stop knitr from stripping padding spaces in kable

I have the following table that is generated from this reprex:
---
output:
pdf_document:
keep_tex: true
---
```{r, include = FALSE}
library(tidyverse)
library(knitr)
library(kableExtra)
```
```{r}
df <- data_frame(
level = c("Level 1", "Level 2", "Level 3", "Level 4"),
cat1_1 = c("39.5", "25.1", "28.9", "6.6"),
cat2_1 = c("37.7", "26.1", "30.0", "6.2"),
cat3_1 = c("30.3", "23.3", "29.7", "16.7"),
cat1_2 = c("58.7", "29.1", "9.9", "2.3"),
cat2_2 = c("56.4", "30.3", "10.7", "2.5"),
cat3_2 = c("43.6", "31.4", "16.8", "8.1")
)
kable(df, format = "latex", align = "c", booktabs = TRUE, escape = TRUE,
col.names = c("Level", "Cat1", "Cat2", "Cat3", "Cat4", "Cat5", "Cat6"),
caption = "Percentage of Respondents in Each Category by Level and Group") %>%
kable_styling(latex_options = "HOLD_position", full_width = TRUE) %>%
column_spec(1, width = "12em") %>%
row_spec(0, bold = TRUE, align = "c") %>%
add_header_above(c(" " = 1, "Group 1 (%)" = 3,
"Group 2 (%)" = 3), bold = TRUE)
```
What I desire is to have the columns, centered but aligned on the decimal. I understand from here and here, that the "S" column alignment is not currently supported, which is fine. As a work around, I intended to right align, and then add pad with extra spaces on the end to push the number back to the center:
---
output:
pdf_document:
keep_tex: true
---
```{r, include = FALSE}
library(tidyverse)
library(knitr)
library(kableExtra)
```
```{r}
df <- data_frame(
level = c("Level 1", "Level 2", "Level 3", "Level 4"),
cat1_1 = c("39.5", "25.1", "28.9", "6.6"),
cat2_1 = c("37.7", "26.1", "30.0", "6.2"),
cat3_1 = c("30.3", "23.3", "29.7", "16.7"),
cat1_2 = c("58.7", "29.1", "9.9", "2.3"),
cat2_2 = c("56.4", "30.3", "10.7", "2.5"),
cat3_2 = c("43.6", "31.4", "16.8", "8.1")
) %>%
mutate_at(vars(contains("_")), funs(paste0(., paste(rep(" ", 4), collapse = ""))))
df
```
```{r}
kable(df, format = "latex", align = c("l", rep("r", 6)), booktabs = TRUE,
col.names = c("Level", "Cat1", "Cat2", "Cat3", "Cat4", "Cat5", "Cat6"),
caption = "Percentage of Respondents in Each Category by Level and Group") %>%
kable_styling(latex_options = "HOLD_position", full_width = TRUE) %>%
column_spec(1, width = "7em") %>%
row_spec(0, bold = TRUE, align = "c") %>%
add_header_above(c(" " = 1, "Group 1 (%)" = 3,
"Group 2 (%)" = 3), bold = TRUE)
```
Now the columns are successfully right aligned, but they have not pushed back to the middle. Looking at the raw Latex we can see why:
\begin{table}[H]
\caption{\label{tab:unnamed-chunk-3}Percentage of Respondents in Each Category by Level and Group}
\centering
\begin{tabu} to \linewidth {>{\raggedright\arraybackslash}p{7em}>{\raggedleft}X>{\raggedleft}X>{\raggedleft}X>{\raggedleft}X>{\raggedleft}X>{\raggedleft}X}
\toprule
\multicolumn{1}{c}{\textbf{ }} & \multicolumn{3}{c}{\textbf{Group 1 (\%)}} & \multicolumn{3}{c}{\textbf{Group 2 (\%)}} \\
\cmidrule(l{3pt}r{3pt}){2-4} \cmidrule(l{3pt}r{3pt}){5-7}
\multicolumn{1}{>{\centering\arraybackslash}p{7em}}{\textbf{Level}} & \multicolumn{1}{c}{\textbf{Cat1}} & \multicolumn{1}{c}{\textbf{Cat2}} & \multicolumn{1}{c}{\textbf{Cat3}} & \multicolumn{1}{c}{\textbf{Cat4}} & \multicolumn{1}{c}{\textbf{Cat5}} & \multicolumn{1}{c}{\textbf{Cat6}}\\
\midrule
Level 1 & 39.5 & 37.7 & 30.3 & 58.7 & 56.4 & 43.6\\
Level 2 & 25.1 & 26.1 & 23.3 & 29.1 & 30.3 & 31.4\\
Level 3 & 28.9 & 30.0 & 29.7 & 9.9 & 10.7 & 16.8\\
Level 4 & 6.6 & 6.2 & 16.7 & 2.3 & 2.5 & 8.1\\
\bottomrule
\end{tabu}
\end{table}
Where the data is entered, the padding spaces have been stripped out. I have tried to escape the trailing spaces:
df <- data_frame(
level = c("Level 1", "Level 2", "Level 3", "Level 4"),
cat1_1 = c("39.5", "25.1", "28.9", "6.6"),
cat2_1 = c("37.7", "26.1", "30.0", "6.2"),
cat3_1 = c("30.3", "23.3", "29.7", "16.7"),
cat1_2 = c("58.7", "29.1", "9.9", "2.3"),
cat2_2 = c("56.4", "30.3", "10.7", "2.5"),
cat3_2 = c("43.6", "31.4", "16.8", "8.1")
) %>%
mutate_at(vars(contains("_")), funs(paste0(., paste(rep(" ", 4), collapse = "")))) %>%
mutate_at(vars(contains("_")), str_replace_all, " ", fixed("\\\\ "))
df
#> # A tibble: 4 x 7
#> level cat1_1 cat2_1 cat3_1 cat1_2 cat2_2 cat3_2
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Level… "39.5\\ \… "37.7\\ \… "30.3\\ \… "58.7\\ \… "56.4\\ \… "43.6\\ \…
#> 2 Level… "25.1\\ \… "26.1\\ \… "23.3\\ \… "29.1\\ \… "30.3\\ \… "31.4\\ \…
#> 3 Level… "28.9\\ \… "30.0\\ \… "29.7\\ \… "9.9\\ \\… "10.7\\ \… "16.8\\ \…
#> 4 Level… "6.6\\ \\… "6.2\\ \\… "16.7\\ \… "2.3\\ \\… "2.5\\ \\… "8.1\\ \\…
But I get this error when trying to knit the document:
! Misplaced \noalign.
\bottomrule ->\noalign
{\ifnum 0=`}\fi \#aboverulesep =\aboverulesep \global...
l.202 \end{tabu}
Error: Failed to compile add_space.tex. See add_space.log for more info.
Again, looking at the raw Latex is somewhat informative:
\begin{table}[H]
\caption{\label{tab:unnamed-chunk-3}Percentage of Respondents in Each Category by Level and Group}
\centering
\begin{tabu} to \linewidth {>{\raggedright\arraybackslash}p{7em}>{\raggedleft}X>{\raggedleft}X>{\raggedleft}X>{\raggedleft}X>{\raggedleft}X>{\raggedleft}X}
\toprule
\multicolumn{1}{c}{\textbf{ }} & \multicolumn{3}{c}{\textbf{Group 1 (\%)}} & \multicolumn{3}{c}{\textbf{Group 2 (\%)}} \\
\cmidrule(l{3pt}r{3pt}){2-4} \cmidrule(l{3pt}r{3pt}){5-7}
\multicolumn{1}{>{\centering\arraybackslash}p{7em}}{\textbf{Level}} & \multicolumn{1}{c}{\textbf{Cat1}} & \multicolumn{1}{c}{\textbf{Cat2}} & \multicolumn{1}{c}{\textbf{Cat3}} & \multicolumn{1}{c}{\textbf{Cat4}} & \multicolumn{1}{c}{\textbf{Cat5}} & \multicolumn{1}{c}{\textbf{Cat6}}\\
\midrule
Level 1 & 39.5\ \ \ \ & 37.7\ \ \ \ & 30.3\ \ \ \ & 58.7\ \ \ \ & 56.4\ \ \ \ & 43.6\ \ \ \\\
Level 2 & 25.1\ \ \ \ & 26.1\ \ \ \ & 23.3\ \ \ \ & 29.1\ \ \ \ & 30.3\ \ \ \ & 31.4\ \ \ \\\
Level 3 & 28.9\ \ \ \ & 30.0\ \ \ \ & 29.7\ \ \ \ & 9.9\ \ \ \ & 10.7\ \ \ \ & 16.8\ \ \ \\\
Level 4 & 6.6\ \ \ \ & 6.2\ \ \ \ & 16.7\ \ \ \ & 2.3\ \ \ \ & 2.5\ \ \ \ & 8.1\ \ \ \\\
\bottomrule
\end{tabu}
\end{table}
It appears that the final trailing space is still being truncated, resulting in a line ending of \\\. Is there a way to maintain padding spaces when knitting pdf documents? Or is there a better way to achieve a centered, decimal aligned column?

glBegin/glEnd to glDrawElements()

I've been trying to port this immediate mode(glBegin/glEnd) code to direct mode(VAs) for rendering a plane. Please let me know if the direct mode code will exactly work as the immediate mode code.
Note: consider a 50X50 mesh
Immediate mode code:
int once=0, a=0,b=0;
for(int j=0; j<50-1; j++)
{
once=0;
for(int i=0; i<50; i++)
{
a=i+j*(50);
b=i+(j+1)*50;
if(once)
{
glBegin(GL_TRIANGLE_STRIP);
once=1;
}
else
{
glTexCoord2f(Texture[a].x, Texture[a].y);
glVertex2f(Mesh[a].x, Mesh[a].y);
glTexCoord2f(Texture[a].x, Texture[a].y);
glVertex2f(Mesh[b].x, Mesh[b].y);
}
}
if(once)
{
glEnd();
}
}
Direct mode code:
unsigned int indexArray[50*50];
int idx=0;
for(int j=0; j<50-1; j++)
{
for(int i=0; i<50; i++)
{
a=i+j*(50);
b=i+(j+1)*50;
indexArray[idx]=a;
indexArray[idx+1]=b;
idx+=2;
}
}
glEnableClientState(GL_TEXTURE_COORD_ARRAY);
glEnableClientState(GL_VERTEX_ARRAY);
glTexCoordPointer(2, GL_FLOAT, sizeof(2dPoint), Texture);
glVertexPointer(3, GL_FLOAT, sizeof(2dPoint), Mesh);
glDrawElements(GL_TRIANGLE_STRIP, (50-1)*(50-1)*2, GL_UNSIGNED_INT, indexArray);
glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_TEXTURE_COORD_ARRAY);
Note: 2dPoint is a structure for 2 floating point values holding x and y
Update
After correcting the glVertexPointer() for 2-d co-ordinates. I observed the triangulation happening the following way:
With glBegin()-glEnd():
/\ /\ /\ /\ /
/ \ / \ / \ / \ /
\ / \ / \ / \ / \ /
\ / \ / \ / \/ \ /
\/ \ \ / /\ \ /
/\ / \ \/ / \ \/
/ \ / \ /\ / \ /\
/ \ / \ / \ / \ / \
\ / \ / \ / \ / \
\ / \ / \/ \/ \
\ / \ \ /\ /\ \
\ / \ / \ / \ / \ \
\/ \ / \ / \ / \ \
/\ \ / \ / \ / \ \
/ \ \ / / \ / \ \
/ \ / / \ \/ \ \
/ \ / \ / \ /\ \ \
\ / \ / \ / \ \ \
\/ \ / \ / \ \ \
/\ / \ / \ \ \
With glDrawElements():
/\ /\ /\ /\ /
/ \ / \ / \ / \ /
\ / \ / \ / \ / \ /
\ / \ / \ / \/ \ /
--\/--------\--------\--/-------/\-------\--/
/\ / \ \/ / \ \/
/ \ / \ /\ / \ /\
/ \ / \ / \ / \ / \
------\-/-------\---/----\--/--------\--/----\
\ / \ / \/ \/ \
\ / \ \ /\ /\ \
\ / \ / \ / \ / \ \
\/ \ / \ / \ / \ \
/\ \ / \ / \ / \ \
/ \ \ / / \ / \ \
-/----\----- \-------/-\--------\/----------\-------\
/ \ / \ / \ /\ \ \
\ / \ / \ / \ \ \
\/ \ / \ / \ \ \
/\ / \ / \ \ \
Sorry for the alignment issues in the illustration. but as you can see, with the index array and glDrawElements(), the number of triangles increased. how can i modify the index array to match the winding similar to the results of glBegin()/glEnd()?
Since you only have 2d coordinates, the glVertexPointer call is wrong.
glVertexPointer(3, GL_FLOAT, sizeof(2dPoint), Mesh);
This line tell OpenGL to always read 3 floats per vertex, so if you only have 2 of them you have to change it to:
glVertexPointer(2, GL_FLOAT, sizeof(2dPoint), Mesh);
^