How to statically link to TBB?

How to statically link to TBB? - c++

How can I statically link the intel's TBB libraries to my application?
I know all the caveats such as unfair load distribution of the scheduler, but I don't need the scheduler, just the containers, so it's ok.
Anyways I know this can be done, although its undocumented, however I just can't seem to find the way to do it right now (although I've seen it before somewhere).
So does anyone know or have any clues?
thanks

This is strongly not recommended:
Is there a version of TBB that provides statically linked libraries?
TBB is not provided as a statically linked library, for the following reasons*:
Most libraries operate locally. For example, an Intel(R) MKL FFT transforms an array. It is irrelevant how many copies of the FFT there are. Multiple copies and versions can coexist without difficulty. But some libraries control program-wide resources, such as memory and processors. For example, garbage collectors control memory allocation across a program. Analogously, TBB controls scheduling of tasks across a program. To do their job effectively, each of these must be a singleton; that is, have a sole instance that can coordinate activities across the entire program. Allowing k instances of the TBB scheduler in a single program would cause there to be k times as many software threads as hardware threads. The program would operate inefficiently, because the machine would be oversubscribed by a factor of k, causing more context switching, cache contention, and memory consumption. Furthermore, TBB's efficient support for nested parallelism would be negated when nested parallelism arose from nested invocations of distinct schedulers.
The most practical solution for creating a program-wide singleton is a dynamic shared library that contains the singleton. Of course if the schedulers could cooperate, we would not need a singleton. But that cooperation requires a centralized agent to communicate through; that is, a singleton!
Our decision to omit a statically linkable version of TBB was strongly influenced by our OpenMP experience. Like TBB, OpenMP also tries to schedule across a program. A static version of the OpenMP run-time was once provided, and it has been a constant source of problems arising from duplicate schedulers. We think it best not to repeat that history. As an indirect proof of the validity of these considerations, we could point to the fact that Microsoft Visual C++ only provides OpenMP support via dynamic libraries.
Source: http://www.threadingbuildingblocks.org/faq/11#sthash.t3BrizFQ.dpuf

EDIT - Changed to use extra_inc. Thanks Jeff!
Build with the following parameter:
make extra_inc=big_iron.inc
The static libraries will be built. See the caveats in build/big_iron.inc.

Build static libraries from source
After acquiring the source code from https://www.threadingbuildingblocks.org/, build TBB like this:
make extra_inc=big_iron.inc
If you need extra options, then instead build like this:
make extra_inc=big_iron.inc <extra options>
Running multiple TBB programs per node
If you run a multiprocessing application, e.g. using MPI, you may need to explicitly initialize the TBB scheduler with the appropriate number of threads to avoid oversubscription.
An example of this in a large application can be found in https://github.com/m-a-d-n-e-s-s/madness/blob/master/src/madness/world/thread.cc.
Comment on documentation
This feature has been available for many years (since at least 2013), although it is not documented for the reasons described in other answers.
Historical note
This feature was originally developed because IBM Blue Gene and Cray supercomputers either did not support shared libraries or did not perform well when using them, due to the lack of a locally mounted filesystem.

Using the opensource version:
After running "make tbb"，go to the build/linux_xxxxxxxx_release folder.
Then run:
ar -r libtbb.a concurrent_hash_map.o concurrent_queue.o concurrent_vector.o
dynamic_link.o itt_notify.o cache_aligned_allocator.o pipeline.o queuing_mutex.o
queuing_rw_mutex.o reader_writer_lock.o spin_rw_mutex.o spin_mutex.o critical_section.o
task.o tbb_misc.o tbb_misc_ex.o mutex.o recursive_mutex.o condition_variable.o
tbb_thread.o concurrent_monitor.o semaphore.o private_server.o rml_tbb.o
task_group_context.o governor.o market.o arena.o scheduler.o observer_proxy.o
tbb_statistics.o tbb_main.o concurrent_vector_v2.o concurrent_queue_v2.o
spin_rw_mutex_v2.o task_v2.o
And you should get libtbb.a as output.
Note that your program should build both with "-ldl" and libtbb.a

Although not officially endorsed by the TBB team, it is possible to build your own statically linked version of TBB with make extra_inc=big_iron.inc.
I have not tested it on Windows or MacOS, but on Linux, it worked (source):
wget https://github.com/01org/tbb/archive/2017_U6.tar.gz
tar xzfv 2017_U6.tar.gz
cd tbb-2017_U6
make extra_inc=big_iron.inc
The generated files are in tbb-2017_U6/build/linux*release.
When you link your application to the static TBB version:
Call g++ with the -static switch
Link against tbb (-ltbb) and pthread (-lpthread)
In my test, I also needed to explicitely reference all .o files from the manually build TBB version. Depending on your project, you might also need to pass -pthread to gcc.
I have created a toy example to document all the steps in this Github repository:
tbb-static-linking-tutorial
It also contains test code to make sure that the generated binary is portable on other Linux distributions.

Unfortunately it does not appear to be possible: From TBB site..
One suggestion on the Intel forum was to compile it manually if you really need the static linkage: From Intel Forum.

Just link the files, I just did it and works. Here's the SConscript file. There's two minor things, a symbol which has the same name in tbb and tbbmalloc which I had to prevent to be multiply defined, and I prevented the usage of ITT_NOTIFY since it creates another symbol with the same name in both libs.
Import('g_CONFIGURATION')
import os
import SCutils
import utils
tbb_basedir = os.path.join(
g_CONFIGURATION['basedir'],
'3rd-party/tbb40_233oss/')
#print 'TBB base:', tbb_basedir
#print 'CWD: ', os.getcwd()
ccflags = []
cxxflags = [
'-m64',
'-march=native',
'-I{0}'.format(tbb_basedir),
'-I{0}'.format(os.path.join(tbb_basedir, 'src')),
#'-I{0}'.format(os.path.join(tbb_basedir, 'src/tbb')),
'-I{0}'.format(os.path.join(tbb_basedir, 'src/rml/include')),
'-I{0}'.format(os.path.join(tbb_basedir, 'include')),
]
cppdefines = [
# 'DO_ITT_NOTIFY',
'USE_PTHREAD',
'__TBB_BUILD=1',
]
linkflags = []
if g_CONFIGURATION['build'] == 'debug':
ccflags.extend([
'-O0',
'-g',
'-ggdb2',
])
cppdefines.extend([
'TBB_USE_DEBUG',
])
else:
ccflags.extend([
'-O2',
])
tbbenv = Environment(
platform = 'posix',
CCFLAGS=ccflags,
CXXFLAGS=cxxflags,
CPPDEFINES=cppdefines,
LINKFLAGS=linkflags
)
############################################################################
# Build verbosity
if not SCutils.has_option('verbose'):
SCutils.setup_quiet_build(tbbenv, True if SCutils.has_option('colorblind') else False)
############################################################################
tbbmallocenv = tbbenv.Clone()
tbbmallocenv.Append(CCFLAGS=[
'-fno-rtti',
'-fno-exceptions',
'-fno-schedule-insns2',
])
#tbbenv.Command('version_string.tmp', None, '')
# Write version_string.tmp
with open(os.path.join(os.getcwd(), 'version_string.tmp'), 'wb') as fd:
(out, err, ret) = utils.xcall([
'/bin/bash',
os.path.join(g_CONFIGURATION['basedir'], '3rd-party/tbb40_233oss/build/version_info_linux.sh')
])
if ret:
raise SCons.Errors.StopError('version_info_linux.sh execution failed')
fd.write(out);
#print 'put version_string in', os.path.join(os.getcwd(), 'version_string.tmp')
#print out
fd.close()
result = []
def setup_tbb():
print 'CWD: ', os.getcwd()
tbb_sources = SCutils.find_files(os.path.join(tbb_basedir,'src/tbb'), r'^.*\.cpp$')
tbb_sources.extend([
'src/tbbmalloc/frontend.cpp',
'src/tbbmalloc/backref.cpp',
'src/tbbmalloc/tbbmalloc.cpp',
'src/tbbmalloc/large_objects.cpp',
'src/tbbmalloc/backend.cpp',
'src/rml/client/rml_tbb.cpp',
])
print tbb_sources
result.append(tbbenv.StaticLibrary(target='libtbb', source=tbb_sources))
setup_tbb()
Return('result')

Related

non-hermetic Bazel action to enable remote caching

I've been iterating on a bazel rule for a tool that is dependent on a "custom" (verilator if you're familiar). This tool is supposed to read arguments and inputs and generate cpp files. The action that invokes verilator is defined below
ctx.actions.run(
arguments = [args],
executable = verilator_toolchain.verilator_bin,
inputs = inputs,
outputs = [verilator_output],
progress_message = "[Verilator] Compiling {}".format(ctx.label),
)
The problem is that the executable given to this action is not /exactly/ the same across platforms -- it is slightly larger, has a different hash when comparing mac and linux executables here.
I can trust that the output can be the same, and I'd like to share a remote cache for this action for both platforms; is there a "best practice" where I can rewrite this action to be non-hermetic so the toolchain binary isn't considered as an "input" to the cache? I think the cpp rules do something similar to this.

No, outside of writing an incorrect, non-hermetic rule, there's no way to prevent Bazel for putting all action inputs into the hash key.

What are those folders in SDL-1.2.15

I'm trying to understand source code of SDL-1.2.15, and to find out how it renders stuff on windows. But I can't find where the rendering is happening. I looked inside SDL-1.2.15/src/video folder, and there is a ton of subfolders, and I don't know what any of these stands for. See for yourself.
aalib/ directfb/ ipod/ os2fslib/ quartz/ windib/
ataricommon/ dummy/ maccommon/ photon/ riscos/ windx5/
bwindow/ fbcon/ macdsp/ picogui/ svga/ wscons/
caca/ gapi/ macrom/ ps2gs/ symbian/ x11/
dc/ gem/ nanox/ ps3/ vgl/ xbios/
dga/ ggi/ nds/ qtopia/ wincommon/ Xext/
Is this documented somewhere? This is a pretty popular library, so it probably is documented, right? Right? What's the point of having source code if you can't even understand it, if you can't find functions you are using.

While not all the names are self-explanatory, they contain some hints.
directfb, fbcon (framebuffer console) and X (x11, Xext) are output layers on Linux (unix).
The ones starting with win indicate they are for Windows. More specifically, windib should be about device independent bitmaps (DIBs), dx5 about DirectX 5, and wincommon about some common stuff. Indeed, using grep shows that (only) these folders contain Windows-specific code:
grep -r windows.h src/video/*
[ lists files in the win* folders ]
You could also just compile the package on Windows and see which files were compiled (which folders contain object files)
However, to find out what it actually does, you should rather study the function you're interested in (e.g. SDL_BlitSurface), look at it's implementation, and then look at the implementation of the functions it uses. Start in SDL_video.h (and notice that SDL_BlitSurface is just a define).
You should use some tool to search the code base. Grep or some IDE. Or both.

First of all, why not SDL2?
These are different SDL's video drivers. You can get what driver is used by your program by calling SDL_VideoDriverName. Which driver will be used determined by target platform (e.g. operating system - most drivers are platform-specific), environment variable SDL_VIDEODRIVER, or calling side.

What does 'f' suffix mean on a C++ library name and how do I load it?

I'm using gperftools v2.3rc and would like to use the improved profiling of threads feature. The release notes state in part:
new cpu profiling mode on Linux is now implemented. It sets up separate profiling timers for separate threads. ... [It] is enabled if both librt.f is loaded and CPUPROFILE_PER_THREAD_TIMERS environment variable is set. ...
My C++ application is linked with librt.so (-lrt — the POSIX.1b Realtime Extensions library), but I have not heard of a library with a .f suffix before. What does the .f mean, where can I find this library, and how do I load it in my application?

I suspect temporary arthritis brought on by lack of coffee (it's a typo). What is meant is librt.so. From the middle of src/profile-handler.cc:
// We use weak alias to timer_create to avoid runtime dependency on
// -lrt and in turn -lpthread.
//
// At runtime we detect if timer_create is available and if so we
// can enable linux-sigev-thread mode of profiling
and further down in the code:
#if HAVE_LINUX_SIGEV_THREAD_ID
if (getenv("CPUPROFILE_PER_THREAD_TIMERS")) {
if (timer_create && pthread_once) { // <-- note this bit here.
timer_sharing_ = TIMERS_SEPARATE;
CreateThreadTimerKey(&thread_timer_key);
per_thread_timer_enabled_ = true;
} else {
RAW_LOG(INFO,
"Not enabling linux-per-thread-timers mode due to lack of timer_create."
" Preload or link to librt.so for this to work");
}
}
#endif
It's checking if the envvar is set and librt has been loaded. It's about librt.so.

FreeRTOS with C++ main file

I am trying to use C++ application with FreeRTOS.
I come to know about this post :- https://sourceforge.net/p/freertos/discussion/382005/thread/5d5201c0 but I am not sure how and where to add this TaskCPP.h file.
Right now I have very simple main.cpp file something like this.
int main(void)
{
//Set priority bits to preempt priority
NVIC_PriorityGroupConfig(NVIC_PriorityGroup_4);
for( ;; );
return 0;
}
And this gives me an error :-
/usr/bin/../lib/gcc/arm-none-eabi/4.7.4/../../../../arm-none-eabi/bin/ld: error: STM32F4_FreeRTOS.axf uses VFP register arguments, /usr/bin/../lib/gcc/arm-none-eabi/4.7.4/libgcc.a(unwind-arm.o) does not
/usr/bin/../lib/gcc/arm-none-eabi/4.7.4/../../../../arm-none-eabi/bin/ld: failed to merge target specific data of file /usr/bin/../lib/gcc/arm-none-eabi/4.7.4/libgcc.a(unwind-arm.o)
I am not sure what is wrong with settings.

That error is related to your tool chain. Your target triple indicates, a more generic tool chain, but FreeRTOS seems to use more specific ARM features. You may want to read this question: ARM compilation error, VFP registered used by executable, not object file
As workaround: call your compiler with -print-multi-lib and check whether the libraries required by FreeRTOS are available. If they are, you'll have to enable them. If they are not, you'll have to use another tool chain.

Why does ICU's Locale::getDefault() return "root"?

Using the ICU library with C++ I'm doing:
char const *lang = Locale::getDefault().getLanguage();
If I write a small test program and run it on my Mac system, I get en for lang. However, inside a larger group project I'm working on, I get root. Anybody have any idea why? I did find this:
http://userguide.icu-project.org/locale/resources
so my guess is that, when running under the larger system, some ICU resources aren't being found, but I don't know what resources, why they're not being found, or how to fix it.
Additional Information
/usr/bin/locale returns:
LANG="en_US.ISO8859-1"
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL="C"
If I write a small C program:
char const *lang = setlocale( LC_ALL, "" ):
I get en_US.ISO8859-1.
OS: Mac OS X 10.6.4 (Snow Leopard)
ICU version: 4.3.4 (latest available via MacPorts).
A little help? Thanks.

root is surely an odd default locale - you don't see many native root-speakers these days.
But seriously, is it safe to assume on the larger system that someone hasn't called one of the variants of setDefault("root")?
What does something like /usr/bin/locale return on this system (if you can run that)?
ICU 4.4 now has a test program called 'icuinfo', does it also return root as the default locale?
What OS/platform is this on, and which version of ICU?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js