xlib integrated debugging (Tracing) - gdb

Is there any debugging options built-in in the Xlib (libX11.so)? Can I get list of X11 lib calls?
I want to get full trace of xlib function calls from heavy-multithreaded, closed-source program. It is one not-public embedded platform, so I can't use gdb for multithreaded debugging and there is no ltrace on the platform.
Also, this program can't connect to x server over tcp/ip, only unix-socket. I want to do tracing of xlib calls from xlib itself.
ps. Xlib from rather modern xfree or even xorg. from gnu linux

You may be able to use xscope to monitor the requests sent over a Unix socket, even when you can't send the X protocol over TCP to be able to use network monitoring tools like Wireshark.

You might look into xlibtrace, which traces at the interface between Xlib and your code, rather than the X Windows wire protocol. I've executed a couple of the examples, and it seems to work.
The source is available at http://kev.pulo.com.au/xlibtrace
I had to modify it to get it to compile:
diff -u src/libxlibtrace-functions.h.sh.orig src/libxlibtrace-functions.h.sh
--- src/libxlibtrace-functions.h.sh.orig 2009-01-19 23:43:46.000000000 -0500
+++ src/libxlibtrace-functions.h.sh 2016-02-24 13:49:25.155556294 -0500
## -81,7 +81,7 ##
return (t ~ /^[cC][oO][nN][sS][tT][ ]/);
}
-function isarray(t) {
+function our_isarray(t) {
return (t ~ /\[.*\]$/);
}
## -90,7 +90,7 ##
return sprintf("%s", t);
} else if (isfunctionpointer(t)) {
return gensub("^(.*\\(\\*)(\\).*)$", "\\1"n"\\2", "", t);
- } else if (isarray(t)) {
+ } else if (our_isarray(t)) {
return gensub("^(.*)(\\[.*\\])$", "\\1"n"\\2", "", t);
} else {
return sprintf("%s %s", t, n);
diff -u src/libxlibtrace-print-x.h.orig src/libxlibtrace-print-x.h
--- src/libxlibtrace-print-x.h.orig 2009-01-19 22:30:06.000000000 -0500
+++ src/libxlibtrace-print-x.h 2016-02-24 14:27:08.681352710 -0500
## -2415,6 +2415,20 ##
dofflush(f);
})
+// XGenericEventCookie
+#define __REALTYPE_XGenericEventCookie__ XGenericEventCookie
+#define __REALTYPE_XGenericEventCookie_p__ XGenericEventCookie *
+#define __REALTYPE_XGenericEventCookie_pp__ XGenericEventCookie **
+#define __TRACE_PRINT_TYPE_STRUCT_BODY_XGenericEventCookie__(safetype) \
+ __TRACE_PRINT_STRUCT_MEMBER__(f, safetype, *value, int, type) __PRINT_COMMA__(f) \
+ __TRACE_PRINT_STRUCT_MEMBER__(f, safetype, *value, unsigned_long, serial) __PRINT_COMMA__(f) \
+ __TRACE_PRINT_STRUCT_MEMBER__(f, safetype, *value, Bool, send_event) __PRINT_COMMA__(f) \
+ __TRACE_PRINT_STRUCT_MEMBER__(f, safetype, *value, Display_p, display) __PRINT_COMMA__(f) \
+ __TRACE_PRINT_STRUCT_MEMBER__(f, safetype, *value, int, extension) __PRINT_COMMA__(f) \
+ __TRACE_PRINT_STRUCT_MEMBER__(f, safetype, *value, int, evtype) __PRINT_COMMA__(f) \
+ __TRACE_PRINT_STRUCT_MEMBER__(f, safetype, *value, unsigned_int, cookie) __PRINT_COMMA__(f) \
+ __TRACE_PRINT_STRUCT_MEMBER__(f, safetype, *value, void_p, data)
+__INDIRECT_CALL_3__(__TRACE_PRINT_TYPE_STRUCT,__LIBXLIBTRACE_PRINT_X_SUFF__,__)(XGenericEventCookie)
#undef __LIBXLIBTRACE_PRINT_X_BODY__

Related

Why does GTK2's gtk_widget_add_accelerator not add a shortcut sometimes?

I'm seeing this issue where gtk_widget_add_accelerator doesn't work intermittently. Some menu items have a short cut and some don't on a GTK2 application running on Ubuntu 16.04. I installed the symbols and source to see what is going on and the signal check inside gtk_widget_add_accelerator is failing:
g_signal_query (g_signal_lookup (accel_signal, G_OBJECT_TYPE (widget)), &query);
if (!query.signal_id ||
!(query.signal_flags & G_SIGNAL_ACTION) ||
query.return_type != G_TYPE_NONE ||
query.n_params)
{
/* hmm, should be elaborate enough */
g_warning (G_STRLOC ": widget `%s' has no activatable signal \"%s\" without arguments",
G_OBJECT_TYPE_NAME (widget), accel_signal);
return;
}
So I copied that block of code to just before I call gtk_widget_add_accelerator like this:
const char *Signal = "activate";
GSignalQuery query;
g_signal_query (g_signal_lookup (Signal, G_OBJECT_TYPE (w)), &query);
if (!stricmp(Sc, "Ctrl+C")) // This is the short cut that doesn't work
{
if (!query.signal_id)
printf("Bad sig id\n");
else if (!(query.signal_flags & G_SIGNAL_ACTION))
printf("No sig act\n");
else if (query.return_type != G_TYPE_NONE)
printf("Ret type err\n");
else if (query.n_params)
printf("Param err.\n");
else
printf("Pre-cond ok.\n");
}
gtk_widget_add_accelerator( w,
Signal,
Menu->AccelGrp,
GtkKey,
mod,
Gtk::GTK_ACCEL_VISIBLE
);
And it prints 'Pre-cond ok.' which means there is a valid signal at my application level but NOT inside the GTK2 library. So is there a build problem? Mismatched headers? IDK.
So I started looking at exactly what I'm building against. The make file uses these flags:
Libs = \
-lmagic \
-static-libgcc \
`pkg-config --libs gtk+-2.0`
Inc = \
-I./include/common \
-I./include/linux/Gtk \
-I/usr/include/gstreamer-1.0 \
-I/usr/lib/x86_64-linux-gnu/gstreamer-1.0/include \
`pkg-config --cflags gtk+-2.0` \
-Iinclude/common \
-Iinclude/linux \
-Iinclude/linux/Gtk
Seems pretty standard stuff right?
So I'm somewhat at a loss as to why this isn't working. Any ideas?
For reference this is the code that creates the signal:
Info = GTK_MENU_ITEM(gtk_menu_item_new_with_mnemonic(Txt));
Gtk::gulong ret = Gtk::g_signal_connect_data(Info,
"activate",
(Gtk::GCallback) MenuItemCallback,
this,
NULL,
Gtk::G_CONNECT_SWAPPED);
And the contents of 'query' in my app:
(gdb) p query
$1 = {signal_id = 132, signal_name = 0x7ffff76c4859 "activate",
itype = 17714704,
signal_flags = (Gtk::G_SIGNAL_RUN_FIRST | Gtk::G_SIGNAL_ACTION),
return_type = 4, n_params = 0, param_types = 0x0}
And when I step into the GTK2 library:
(gdb) p query
$3 = {signal_id = 132, signal_name = 0x7ffff76c4859 "activate",
itype = 17714704, signal_flags = (G_SIGNAL_RUN_FIRST | G_SIGNAL_ACTION),
return_type = 4, n_params = 0, param_types = 0x0}
But it only gets to that stage after the g_warning line. Maybe that's due to compiler optimization though.
Ok. It seems the debug symbols for Gtk were leading me down the wrong path. I built and installed GTK2 from source and it was much better.
gtk_widget_add_accelerator is actually doing the right thing and not falling into it's signal error handler. The real issue is that I'm replacing the menu item with a different object in LgiMenuItem::Icon when I convert the item to one that supports an icon as well. This deletes the accelerator that I added earlier. I noticed this when all the missing accelerators had icons.
So the "solution" as such is to re-add the shortcut after converting the menu item to one that supports an icon. Maybe down the track I'll re-factor the code to be more efficient but at this point I'm just happy it works.

How can I build jcal for Windows?

I need Windows executables of this project:
https://github.com/ashkang/jcal
I tried adding all .h and a .c files with a main to a single Console non-qt C++ project in Qt Creator but it issues a lot errors like: \jdate\jdate.c:-1 error: undefined reference to jlocaltime_r'.
I have also added unknown files like jasctime_r.3 to the project.
How can I compile this project in Windows?
Ok, it does indeed work, if you use the GNU build system it is intended for, but you do need some code changes to get around missing functions on windows.
localtime_r() and gmtime_r() can be replaced with localtime_s() and gmtime_s() and the arguments swapped.
for strptime(), you need to provide an implementation, here's an open source one that works if you fix the #includes and add a #define TM_YEAR_BASE 1900.
Put the fixed strptime.c in the sources/libjalali directory and apply the following patch:
diff --git a/sources/configure.ac b/sources/configure.ac
index e623ec9..409d9e7 100644
--- a/sources/configure.ac
+++ b/sources/configure.ac
## -66,4 +66,9 ## if test $installpyjalali = "yes"; then
fi
AM_CONDITIONAL([WANT_PYJALALI], [test $installpyjalali = "yes"])
+case $host in
+ *mingw*) targetismingw=yes ;;
+esac
+AM_CONDITIONAL([MINGW], [test x$targetismingw = xyes])
+
AC_OUTPUT
diff --git a/sources/libjalali/Makefile.am b/sources/libjalali/Makefile.am
index 078c68a..64ef85d 100644
--- a/sources/libjalali/Makefile.am
+++ b/sources/libjalali/Makefile.am
## -5,6 +5,9 ##
lib_LTLIBRARIES = libjalali.la
libjalali_la_SOURCES = jalali.c jtime.c
+if MINGW
+libjalali_la_SOURCES += strptime.c
+endif
# 0:0:0
# 0 -> interface version, changes whenever you change the API
diff --git a/sources/libjalali/jalali.c b/sources/libjalali/jalali.c
index 49fc43f..6e3bdd9 100644
--- a/sources/libjalali/jalali.c
+++ b/sources/libjalali/jalali.c
## -28,6 +28,10 ##
#include "jalali.h"
#include "jconfig.h"
+#ifdef _WIN32
+#define localtime_r(timep, result) localtime_s((result), (timep))
+#endif
+
/*
* Assuming *factor* numbers of *lo* make one *hi*, cluster *lo*s and change
* *hi* appropriately. In the end:
## -49,7 +53,9 ## const int jalali_month_len[] = { 31, 31, 31, 31, 31, 31, 30, 30, 30, 30,
const int accumulated_jalali_month_len[] = { 0, 31, 62, 93, 124, 155, 186,
216, 246, 276, 306, 336 };
+#ifndef _WIN32
extern char* tzname[2];
+#endif
/*
* Jalali leap year indication function. The algorithm used here
diff --git a/sources/libjalali/jtime.c b/sources/libjalali/jtime.c
index 319dbdd..ba8ec1a 100644
--- a/sources/libjalali/jtime.c
+++ b/sources/libjalali/jtime.c
## -27,6 +27,11 ##
#include "jalali.h"
#include "jtime.h"
+#ifdef _WIN32
+#define localtime_r(timep, result) localtime_s((result), (timep))
+#define gmtime_r(timep, result) gmtime_s((result), (timep))
+#endif
+
const char* GMT_ZONE = "UTC";
const char* GMT_ZONE_fa = "گرینویچ";
const char* jalali_months[] = { "Farvardin", "Ordibehesht", "Khordaad",
## -65,7 +70,9 ## const char* tzname_fa[2] = { "زمان زمستانی", "زمان تابستان
static char in_buf[MAX_BUF_SIZE] = {0};
static struct jtm in_jtm;
+#ifndef _WIN32
extern char* tzname[2];
+#endif
extern const int jalali_month_len[];
void in_jasctime(const struct jtm* jtm, char* buf)
diff --git a/sources/libjalali/jtime.h b/sources/libjalali/jtime.h
index fd658f1..48b4d9a 100644
--- a/sources/libjalali/jtime.h
+++ b/sources/libjalali/jtime.h
## -56,6 +56,10 ## extern struct jtm* jlocaltime_r(const time_t* timep, struct jtm* result);
extern int jalali_to_farsi(char* buf, size_t n, int padding, char* pad, int d);
+#ifdef _WIN32
+extern char *strptime(const char *buf, const char *fmt, struct tm *tm);
+#endif
+
#ifdef __cplusplus
}
#endif
diff --git a/sources/src/jdate.c b/sources/src/jdate.c
index 8a47e19..cae0329 100644
--- a/sources/src/jdate.c
+++ b/sources/src/jdate.c
## -34,6 +34,11 ##
#include "../libjalali/jtime.h"
#include "jdate.h"
+#ifdef _WIN32
+#define localtime_r(timep, result) localtime_s((result), (timep))
+#define gmtime_r(timep, result) gmtime_s((result), (timep))
+#endif
+
extern char* optarg;
/*
Then, get msys2, start its shell and install packages:
pacman -S mingw32/mingw-w64-i686-gcc msys/autoconf msys/automake msys/libtool
This should be enough I guess.
Now you can use the following commands in the sources directory to build it:
autoreconf -i
./configure
make

Using libunwind on ARMV7HF or ARMV5TE daemon app has no stack trace depth

I have written a C++ application (deamon) which has some issues with application crashing due segmentation faults.
To get information about the code where the crash(es) happens i use libunwind. Which works very well on x86_x64 linux systems. There i get a nice stacktrace.
But on ARM architectures, the stacktrace only consists of the stack up to the "Deamon-code" iself and not into the shared libraries i use
I am using gcc 4.9 (for arm arm-linux-gnueabihf-g++4.9)
I am building the libraries and the deamon in debug mode (-g) with following parameters to get the tables for stack tracing
-funwind-tables -fasynchronous-unwind-tables -mapcs-frame -g
for the linker i use for the shared libs
-shared -pthread -static-libgcc -static-libstdc++ -rdynamic
Here is the sample stack trace of x86
1 0x0000000000403f68 sp=0x00007f0c2be8d630 crit_err_hdlr(int, siginfo_t*, void*) + 0x28
2 0x00007f0c2dfa33d0 sp=0x00007f0c2be8d680 __restore_rt + 0x0
3 0x00007f0c2dfa32a9 sp=0x00007f0c2be8dc18 raise + 0x29
4 0x00007f0c2e958bab sp=0x00007f0c2be8dc20 Raumserver::Request::RequestAction_Crash::crashLevel4() + 0x2b
5 0x00007f0c2e958d9c sp=0x00007f0c2be8dc60 Raumserver::Request::RequestAction_Crash::executeAction() + 0x12c
6 0x00007f0c2e96f045 sp=0x00007f0c2be8dce0 Raumserver::Request::RequestAction::execute() + 0x55
7 0x00007f0c2e940c83 sp=0x00007f0c2be8de00 Raumserver::Manager::RequestActionManager::requestProcessingWorkerThread()
+ 0x113
8 0x00007f0c2ee86380 sp=0x00007f0c2be8df00 execute_native_thread_routine + 0x20
9 0x00007f0c2df996fa sp=0x00007f0c2be8df20 start_thread + 0xca
10 0x00007f0c2dccfb5d sp=0x00007f0c2be8dfc0 clone + 0x6d
11 0x0000000000000000 sp=0x00007f0c2be8dfc8 + 0x6d
and here the trace from the arm devices. It seems that the stack information
for the shared libs is not present?!
1 0x0000000000013403 sp=0x00000000b49fe930 crit_err_hdlr(int, siginfo_t*, void*) + 0x1a
2 0x00000000b6a3b6a0 sp=0x00000000b49fe960 __default_rt_sa_restorer_v2 + 0x0
3 0x00000000b6b5774c sp=0x00000000b49fecd4 raise + 0x24
Here is a summary from the code for building the stacktrace which is called from the signal handler:
#ifdef __arm__
#include <libunwind.h>
#include <libunwind-arm.h>
#else
#include <libunwind.h>
#include <libunwind-x86_64.h>
#endif
...
void backtrace()
{
unw_cursor_t cursor;
unw_context_t context;
unw_getcontext(&context);
unw_init_local(&cursor, &context);
int n=0;
int err = unw_step(&cursor);
while ( err )
{
unw_word_t ip, sp, off;
unw_get_reg(&cursor, UNW_REG_IP, &ip);
unw_get_reg(&cursor, UNW_REG_SP, &sp);
char symbol[256] = {"<unknown>"};
char buffer[256];
char *name = symbol;
if ( !unw_get_proc_name(&cursor, symbol, sizeof(symbol), &off) )
{
int status;
if ( (name = abi::__cxa_demangle(symbol, NULL, NULL, &status)) == 0 )
name = symbol;
}
sprintf(buffer, "#%-2d 0x%016" PRIxPTR " sp=0x%016" PRIxPTR " %s + 0x%" PRIxPTR "\n",
++n,
static_cast<uintptr_t>(ip),
static_cast<uintptr_t>(sp),
name,
static_cast<uintptr_t>(off));
if ( name != symbol )
free(name);
...
err = unw_step(&cursor);
}
}
And here the summary for the deamon itself:
int main(int argc, char *argv[])
{
Raumserver::Raumserver raumserverObject;
//Set our Logging Mask and open the Log
setlogmask(LOG_UPTO(LOG_NOTICE));
openlog(DAEMON_NAME, LOG_CONS | LOG_NDELAY | LOG_PERROR | LOG_PID, LOG_USER);
pid_t pid, sid;
//Fork the Parent Process
pid = fork();
if (pid < 0) { syslog (LOG_NOTICE, "Error forking the parent process"); exit(EXIT_FAILURE); }
//We got a good pid, Close the Parent Process
if (pid > 0) { syslog (LOG_NOTICE, "Got pid, closing parent process"); exit(EXIT_SUCCESS); }
//Change File Mask
umask(0);
//Create a new Signature Id for our child
sid = setsid();
if (sid < 0) { syslog (LOG_NOTICE, "Signature ID for child process could not be created!"); exit(EXIT_FAILURE); }
// get the working directory of the executable
std::string workingDirectory = getWorkingDirectory();
//Change Directory
//If we cant find the directory we exit with failure.
if ((chdir("/")) < 0) { exit(EXIT_FAILURE); }
//Close Standard File Descriptors
close(STDIN_FILENO);
close(STDOUT_FILENO);
close(STDERR_FILENO);
// Add some system signal handlers for crash reporting
//raumserverObject.addSystemSignalHandlers();
AddSignalHandlers();
// set the log adapters we want to use (because we do not want to use the standard ones which includes console output)
std::vector<std::shared_ptr<Raumkernel::Log::LogAdapter>> adapters;
auto logAdapterFile = std::shared_ptr<Raumkernel::Log::LogAdapter_File>(new Raumkernel::Log::LogAdapter_File());
logAdapterFile->setLogFilePath(workingDirectory + "logs/");
adapters.push_back(logAdapterFile);
// create raumserver object and do init
raumserverObject.setSettingsFile(workingDirectory + "settings.xml");
raumserverObject.initLogObject(Raumkernel::Log::LogType::LOGTYPE_ERROR, workingDirectory + "logs/", adapters);
raumserverObject.init();
// go into an endless loop and wait until daemon is killed by the syste,
while(true)
{
sleep(60); //Sleep for 60 seconds
}
//Close the log
closelog ();
}
Anybody an idea if i have missed some compiler flags? Is something missing?
I have found a lot of posts and all refere to the mentioned compiler flags or to libunwind. I have tried several other codes i found but they only give me a stack depth of 1
Thanks!
EDIT 11.11.2016:
For testing i built static libs instead of shared ones and added them to the application, so there are no dynamic links anymore. Unfortunetaly the stack trace is the same bad one.
I have even tried to compile the whole stuff with the thumb mode and changed the arm compiler to to gcc5/g++5
-mthumb -mtpcs-frame -mtpcs-leaf-frame
Same issue :-(

C++ assert with time stamp

is that possible to log information when assert failed with time stamp
ex.
int a = 10
assert( a > 100 );
then it will be failed and output just like with the timestamp as well
2013-12-02 , 17:00:05 assert failed !! (a > 100) line : 22
Thank you
assert is a macro (it has to be one, to give __LINE__ and __FILE__ information).
You could define your own. I would name it something else like tassert for readability reasons, perhaps like (untested code)
#ifdef NDEBUG
#define tassert(Cond) do {if (0 && (Cond)) {}; } while(0)
#else
#define tassert_at(Cond,Fil,Lin) do { if ((Cond)) { \
time_t now##Lin = time(NULL); \
char tbuf##Lin [64]; struct tm tm##Lin; \
localtime_r(&now##Lin, &tm##Lin); \
strftime (tbuf##Lin, sizeof(tbuf##Lin), \
"%Y-%m-%d,%T", &tm##Lin); \
fprintf(stderr, "tassert %s failure: %s %s:%d\n", \
#Cond, tbuf##Lin, Fil, Lin); \
abort(); }} while(0)
#define tassert(Cond) tassert_at(Cond,__FILE__,__LINE__)
#endif /*NDEBUG*/
I am using cpp concatenation ## with Lin to lower probability of name collisions, and I am using cpp stringification # to make a string out of Cond macro formal. The Cond is always expanded, to make sure the compiler catch syntactic errors in it even when disabling tassert with NDEBUG as assert(3) does.
One could put most of the code in the above macro in some function, e.g.
void tassert_at_failure (const char* cond, const char* fil, int lin) {
timer_t now = time(NULL);
char tbuf[64]; struct tm tm;
localtime_r (&now, &tm);
strftime (tbuf, sizeof(tbuf), "%Y-%m-%d,%T", &tm);
fprintf (stderr, "tassert %s failure: %s %s:%d\n",
cond, tbuf, fil, lin);
abort();
}
and then just define (a bit like <assert.h> does...)
#define tassert_at(Cond,Fil,Lin) do { if ((Cond)) { \
tassert_at_failure(#Cond, Fil, Lin); }} while(0)
but I don't like much that approach, because for debugging with gdb having  abort() being called in the macro is much easier (IMHO size of code for debugging executables does not matter at all; calling abort in a macro is much more convenient inside gdb - making shorter backtraces and avoiding one down command...). If you don't want libc portability and just use recent GNU libc you could simply redefine the Glibc specific __assert_fail function (see inside <assert.h> header file). YMMV.
BTW, in real C++ code I prefer to use << for assertion-like debug outputs. This enables usage of my own operator << outputting routines (if you give it as an additional macro argument) so I am thinking of (untested code!)
#define tassert_message_at(Cond,Out,Fil,Lin) \
do { if ((Cond)) { \
time_t now##Lin = time(NULL); \
char tbuf##Lin [64]; struct tm tm##Lin; \
localtime_r(&now##Lin, &tm##Lin); \
strftime (tbuf##Lin, sizeof(tbuf##Lin), \
"%Y-%m-%d,%T", &tm##Lin); \
std::clog << "assert " << #Cond << " failed " \
tbuf##Lin << " " << Fil << ":" << Lin \
<< Out << std::endl; \
abort (); } } while(0)
#define tassert_message(Cond,Out) \
tassert_message_at(Cond,Out,__FILE__,__LINE__)
and then I would use tassert_message(i>5,"i=" << i);
BTW, you might want to use syslog(3) instead of fprintf in your tassert_at macro.

CUDA kernel launch macro with templates

I made a macro to simplify CUDA kernel calls:
#define LAUNCH LAUNCH_ASYNC
#define LAUNCH_ASYNC(kernel_name, gridsize, blocksize, ...) \
LOG("Async kernel launch: " #kernel_name); \
kernel_name <<< (gridsize), (blocksize) >>> (__VA_ARGS__);
#define LAUNCH_SYNC(kernel_name, gridsize, blocksize, ...) \
LOG("Sync kernel launch: " #kernel_name); \
kernel_name <<< (gridsize), (blocksize) >>> (__VA_ARGS__); \
cudaDeviceSynchronize(); \
// error check, etc...
Usage:
LAUNCH(my_kernel, 32, 32, param1, param2)
LAUNCH(my_kernel<int>, 32, 32, param1, param2)
This works fine; with the first define I can enable synronous calls and error checking for debugging.
However it does not work with multiple template arguments like below:
LAUNCH(my_kernel<int,float>, 32, 32, param1, param3)
The error message I get in the line where I call the macro:
error : expected a ">"
Is it possible to make this macro work with multiple template arguments?
The problem is that the preprocessor knows nothing about angle bracket nesting, so it interprets the comma between them as macro argument separator.
If the kernel-launch syntax supports parentheses around the kernel name (I can't check now, not on a CUDA machine), you could do this:
LAUNCH((my_kernel<int, float>), 32, 32, param1, param3)
Something else you could try that I have used (based on the macro you posted) is wrapping the kernel block size and grid size arguments in their own macro:
#define KERNEL_ARGS2(grid, block) <<< grid, block >>>
#define KERNEL_ARGS3(grid, block, sh_mem) <<< grid, block, sh_mem >>>
#define KERNEL_ARGS4(grid, block, sh_mem, stream) <<< grid, block, sh_mem, stream >>>
Now you should be able to use your macro like so:
#define CUDA_LAUNCH(kernel_name, gridsize, blocksize, ...) \
kernel_name KERNEL_ARGS2(gridsize, blocksize)(__VA_ARGS__);
You can use it like:
CUDA_LAUNCH(my_kernel, grid_size, block_size, float* input, float* output, int size);
This will launch the kernel called 'my_kernal' with the given grid and block size and the input arguments.
consider this solution that also throws error
inline void echoError(cudaError_t e, const char *strs) {
char a[255];
if (e != cudaSuccess) {
strncpy(a, strs, 255);
fprintf(stderr, "Failed to %s,errorCode %s",
a, cudaGetErrorString(e));
exit(EXIT_FAILURE);
}
}
#define CUDA_KERNEL_DYN(kernel, bpg, tpb, shd, ...){ \
kernel<<<bpg,tpb,shd>>>( __VA_ARGS__ ); \
cudaError_t err = cudaGetLastError(); \
echoError(err, #kernel); \
}