How to tell if Suitesparse/CHOLMOD is using GPU?

How to tell if Suitesparse/CHOLMOD is using GPU? - build

I built Julia, which incorporates SuiteSparse, from scratch. When building the SuiteSparse dependency I ensured the instructions were followed for setting the relevant parts of the SuiteSparse_config.mk file.
However, having completed the build the execution time for c = A\b with 220k unknowns (very regular structure for A) isn't changed.
How can I test whether CHOLMOD is actively using the GPU or not?

I did notice that something similar was asked here. It was for a C/CUDA environment, but perhaps it applies.
From that answer:
Only the long integer version of CHOLMOD can leverage GPU acceleration.
The long integer version is distinguished by api calls like cholmod_l_start instead of cholmod_start.
It may be the case that Julia does not use the "long integer" version of CHOLMOD calls. I see no evidence for it in cholmod.jl.
As I said earlier, perhaps one of the Julia Language developers will pipe up if you file the issue in the repo. Otherwise, you may need to build Julia after changing cholmod.jl first.

Related

Creating a limited use version of a program in VC++

Our company helps migrate client software from other languages to C++. We provide them C++ source code for their application along with header files and compiled libraries for runtime support functions. We charge for both the migration as well as the runtime. Recently a potential client asked to migrate one of a number of systems they have. This system contains 7 programs and we would like to limit the runtime so only these 7 programs can acess it. We can time limit the runtime by putting an encrypted expiration date in the object library but, since we have to provide the source code for the converted programs, we are having difficult coming up with a way to limit the access to a specific set of programs. Obviously, anything we put into the source code to identify the program could be copied to any other program so the only hope seems to be having the run time library discover some set of characteristics about the programs and then validating them against a set of characteristics embedded in the run time library. As I understand it, C++ has very little reflection capability (RTTI is all I could find) so I wanted to ask if anyone has faced a similar problem and found a way to solve it. Thanks in advance for any suggestions.
Based on the two answers a little clarification seems in order. We fully expect the client to modify the source code and normally we provide them an unrestricted version of the runtime libraries. This particular client requested a version that was limited to a single system and is happy to enter into a license that restricts the use of the runtime library to that system. Therefore a discussion of the legal issues isn't relevant. The issue is a technical one -- given a license that is limited to a single system and given that the client has the source to the calling programs but not the runtime, is there a way to limit access to the runtime to the set of programs comprising that system thus enforcing the terms of the license.

If they're not supposed to make further changes to the programs, why did you give them the source code? And if they are expected to continue changing the programs (i.e. maintenance), who decides whether a change constitutes a new program that's not allowed to use the library?
There's no technical way to enforce that licensing model.
There's possibly a legal way -- in the code that loads/enables the library, write a comment "This is a copy protection measure". Then DMCA forbids them from including that code into other programs (in the USA). But IANAL, and I don't think DMCA is valid anyway.
Consult a lawyer to find out what rights you have under the contract/bill of sale to restrict their use.

The most obvious answer I could think of is to get the name and/or path of the calling process-- simply compare this name to the 7 "allowed" programs in your support library. Certainly, they could create a new process with the same name, but they might not know to do so.
Another level could be to further compare the executable size against the known size for that application. (You'll likely want to allow a reasonably wide range around the expected size, in case they make changes to the source code, and/or compile with different options.)
As another thought, you might try adding some seemingly benign strings into the app's resources. ("Copyright 2011 ~Your Corporation Name~")-- You can then scan the parent executable for the magic strings. If they create a new product, they might not think to create this resource.
Finally, as already noted by Ben, if you are giving them the source code, there are likely no foolproof solutions to this problem. (As he said, at what point does "modified" code become a new application?) The best you will likely be able to do is to add enough small roadblocks that they won't bother trying to use that lib for another product. It likely depends on how determined and/or lucky they are.

Why not just technically limit the use of the runtime to one system? There are many software protection solutions out there, one that comes to my mind is SmartDongle.
Now the runtime could still be used by any other program on that machine, but I think this should be a minor concern, no?

Open-source C++ scanning library

Rationale: In my day-to-day C++ code development, I frequently need to
answer basic questions such as who calls what in a very large C++ code
base that is frequently changing. But, I also need to have some
automated way to exactly identify what the code is doing around a
particular area of code. "grep" tools such as Cscope are useful (and
I use them heavily already), but are not C++-language-aware: They
don't give any way to identify the types and kinds of lexical
environment of a given use of a type or function a such way that is
conducive to automation (even if said automation is limited to
"read-only" operations such as code browsing and navigation, but I'm
asking for much more than that below).
Question: Does there exist already an open-source C/C++-based library
(native, not managed, not Microsoft- or Linux-specific) that can
statically scan or analyze a large tree of C++ code, and can produce
result sets that answer detailed questions such as:
What functions are called by some supplied function?
What functions make use of this supplied type?
Ditto the above questions if C++ classes or class templates are involved.
The result set should provide some sort of "handle". I should be able
to feed that handle back to the library to perform the following types
of introspection:
What is the byte offset into the file where the reference was made?
What is the reference into the abstract syntax tree (AST) of that
reference, so that I can inspect surrounding code constructs? And
each AST entity would also have file path, byte-offset, and
type-info data associated with it, so that I could recursively walk
up the graph of callers or referrers to do useful operations.
The answer should meet the following requirements:
API: The API exposed must be one of the following:
C or C++ and probably is "C handle" or C++-class-instance-based
(and if it is, must be generic C o C++ code and not Microsoft- or
Linux-specific code constructs unless it is to meet specifics of
the given platform), or
Command-line standard input and standard output based.
C++ aware: Is not limited to C code, but understands C++ language
constructs in minute detail including awareness of inter-class
inheritance relationships and C++ templates.
Fast: Should scan large code bases significantly faster than
compiling the entire code base from scratch. This probably needs to
be relaxed, but only if Incremental result retrieval and Resilient
to small code changes requirements are fully met below.
Provide Result counts: I should be able to ask "How many results
would you provide to some request (and no don't send me all of the
results)?" that responds on the order of less than 3 seconds versus
having to retrieve all results for any given question. If it takes
too long to get that answer, then wastes development time. This is
coupled with the next requirement.
Incremental result retrieval: I should be able to then ask "Give me
just the next N results of this request", and then a handle to the
result set so that I can ask the question repeatedly, thus
incrementally pulling out the results in stages. This means I
should not have to wait for the entire result set before seeing
some subset of all of the results. And that I can cancel the
operation safely if I have seen enough results. Reason: I need to
answer the question: "What is the build or development impact of
changing some particular function signature?"
Resilient to small code changes: If I change a header or source
file, I should not have to wait for the entire code base to be
rescanned, but only that header or source file
rescanned. Rescanning should be quick. E.g., don't do what cscope
requires you to do, which is to rescan the entire code base for
small changes. It is understood that if you change a header, then
scanning can take longer since other files that include that header
would have to be rescanned.
IDE Agnostic: Is text editor agnostic (don't make me use a specific
text editor; I've made my choice already, thank you!)
Platform Agnostic: Is platform-agnostic (don't make me only use it
on Linux or only on Windows, as I have to use both of those
platforms in my daily grind, but I need the tool to be useful on
both as I have code sandboxes on both platforms).
Non-binary: Should not cost me anything other than time to download
and compile the library and all of its dependencies.
Not trial-ware.
Actively Supported: It is likely that sending help requests to mailing lists
or associated forums is likely to get a response in less than 2
days.
Network agnostic: Databases the library builds should be able to be used directly on
a network from 32-bit and 64-bit systems, both Linux and Windows
interchangeably, at the same time, and do not embed hardcoded paths
to filesystems that would otherwise "root" the database to a
particular network.
Build environment agnostic: Does not require intimate knowledge of my build environment, with
the notable exception of possibly requiring knowledge of compiler
supplied CPP macro definitions (e.g. -Dmacro=value).

I would say that CLang Index is a close fit. However I don't think that it stores data in a database.
Anyway the CLang framework offer what you actually need to build a tool tailored to your needs, if only because of its C, C++ and Objective-C parsing / indexing capabitilies. And since it's provided as a set of reusable libraries... it was crafted for being developed on!

I have to admit that I haven't used either because I work with a lot of Microsoft-specific code that uses Microsoft compiler extensions that i don't expect them to understand, but the two open source analyzers I'm aware of are Mozilla Pork and the Clang Analyzer.

If you are looking for results of code analysis (metrics, graphs, ...) why not use a tool (instead of API) to do that? If you can, I suggest you to take a look at Understand.
It's not free (there's a trial version) but I found it very useful.

Maybe Doxygen with GraphViz could be the answer of some of your constraints but not all,for example the analysis of Doxygen is not incremental.

Matlab to C or C++

I am working on an image processing project using Matlab. We should run our program (intended to be an application) on a cell phone.We were then asked to convert our code into C or C++ language so we get a feel of how long it would take for execution and then choose a platform. So far we didn't figure out how to do this conversion.. Any ideas of what to do to convert Matlab to C or C++??

The first thing you need to realise is that porting code from one language to another (especially languages as different as Matlab and C++) is generally non-trivial and time-consuming. You need to know both languages well, and you need to have similar facilities available in both. In the case of Matlab and C++, Matlab gives you a lot of stuff that you just won't have available in C++ without using libraries. So the first thing to do is identify which libraries you're going to need to use in C++. (You can write some of the stuff yourself, but you'll be there a long time if you write all of it yourself.)
If you're doing image processing, I highly recommend looking into something like ITK at http://www.itk.org -- I've written my image processing software twice in C++, once without ITK (coding everything myself) and once with, and the version that used ITK was finished faster, performed better and was ten times more fun to work on. FWIW.

Matlab can gererate C code for you.
See:
http://www.mathworks.com/products/featured/embeddedmatlab/
The generated code does however depend on matlab libraries. So you probably can't use it for a cell phone. But it might save you some time anyways.

I also used the MATLAB Coder to convert some functions consisting of a few hundred lines of MATLAB into C. This included using MATLAB's eigenvalue solver and matrix inversion functions.
Although Coder was able to produce C code (which theoretically was identical), it was very convoluted, bloated, impossible to decipher, and appeared to be extremely inefficient. It literally created about 10x as many lines of code as it should have needed. I ended up converting it all by hand so that I would actually be able to comprehend the C code later and make further changes/updates. This task however, can be very tedious/dangerous, as the array indexing in Matlab is 1-based and in C it's 0-based. You're likely to add bugs into the code, as I experienced. you'll also have to convert any vector/matrix arithmetic into loops that handle scalars (or use some type of C matrix algebra package)

The MathWorks provides a product called MATLAB Coder that claims to generate "readable and portable C and C++ code from MATLAB® code". I haven't tried it myself, so I can't comment on how well it accomplishes these goals.
With regard to the Image Processing Toolbox, this list (presumably for R2016b) shows which functions have been enabled for code generation and any limitations they may have.

Matlab has a tool called "Matlab Coder" which can convert your matlab file to C code or mex file. My code is relatively simple so it works fine. Speed up gain is about 10 times faster. This saves me time coding a few hundreds lines. Hope it's helpful for you too
Quick Start Guide for MATLAB Coder Confirmation
The links describe the process of converting your code in 3 major steps:
First you need to make a few simplifications in your present code so that it would be simple enough for the Coder to translate.
Second, you will use the tool to generate a mex file and test if everything is actually working.
Finally you would change some setting and generate the C code. In my case, the C code has about 700 lines including all the original matlab code (about 150 lines) as comments. I think it's quite readable and could be improve upon. However, I already get a 10 times speed up gain from the mex file anyway. So this is definitely a good thing.
We can't not be sure that this will work in all case but it's definitely worth trying.

I remember there is a tool to export m-files as c(++)-files. But I could never get that running. You need to add some obscure MATLAB-headers in the c/c++code, ... And I think it is also not recommended.
If you have running MATLAB-code, it shouldn't take too much effort to do the conversion "by hand". I have been working on several project where MATLAB was used and it was never consider to use any tools to convert the code to C/C++. It was always done "by hand".
I believe to have been the only one who ever investigate into using a tool.

Well there is not straight conversion from matlab to c/c++ You will need to understand the language and the differences between matlab and c/c++ and then start coding it in c/c++. Code a little test a little until it works.

Does an R compiler to C/C++ exist?

I'm wondering about the best way to deploy R. Matlab has the "matlab compiler" (MCR). There has been discussion about something similar in the past for R that would compile R into C or C++. Does anyone have any experience with the R to C Compiler (RCC) that was developed by John Garvin at Rice?
I've looked into it, and it seems to be the only project that worked on compiling R code into executable code. And as far as I can tell, it isn't still being used.
[Edit 1:]: To be clear, I know that there are C and C++ (and Java, Python, etc.) interfaces to R (rJava, rcpp, Rpy, etc.). I'm wondering about specific ways to compile and deploy R code without installing R in advance.
[Edit 2:]: John Mellor-Crummey tells me that they're still working on RCC and hope to make it available in 4 months or so (at the earliest). I'll update this further if I find anything else out.

A byte code compiler will be part of the R 2.13 release. By default it is not used in this release but it is available; I expect the 2.14 release will by default byte compile all base and recommended packages. The compiler::compile help page and the R Installation and Administration Manual give some more details.

I had forgotten about the Rice project, it has been a while. I think the operational term here is stated at the top of the project page: Last Updated 3/8/06.
And we all know R changes a lot. So I have only the standard few pointers for you:
Luke Tierney, who not only knows a lot about R internals but equally about byte compilers, has been working on such a project. Nothing ready yet, and it would still work in conjunction with the standard R engine.
Stephen Milborrow has the Ra extension to R that works with his just-in-time compiler package jit
my Introduction to High-Performance Computing with R tutorials (most recent tutorial slides from UseR! 2009) covers the profiling, compiling extentions, parallel computing with R, ... part, including
Rcpp and and a bit about
RInside.
In short: there is no way have what you desire specific ways to compile and deploy R code without installing R in advance. Sorry.
Edit/Update (April 2011): Luke's new compiler package will be part of R 2.13.0 (to be released April 2011) but not 'activated' by default which is expected for R 2.14.0 expected for October 2011.
Edit/Update (December 2011): Prof Tierney just release a massive 100+ page paper on the byte-code compiler.

Why do people get the fear when deploying R? I'm fairly sure I've seen this question before.
Installing R is a piece of cake (you don't actually say which OS you care about). For Windows its one .exe. file, run it, say "yes" a few times and its done. I suspect the installer exe probably has flags for unattended installation too.

You may check out the P compiler which implements a subset of R. Especially, lists, matrices, vectors etc. are implemented as well as lsfit, chol, svd, ...
You can download a free version at
www.ptechnologies.org
It speeds up computations substantially.
Best,
AS

I haven't used Garvin's package and don't know what is possible along those lines. However:
Typically people just write computationally intensive functions directly in C/C++/Fortran, after profiling to find the bottlenecks. See the RCpp interface or Calling C functions from R using .C and .Call for examples. The Scythe Statistical Library is also very nice for R users since the syntax/function names are similar.

Compile and optimize for different target architectures

Summary: I want to take advantage of compiler optimizations and processor instruction sets, but still have a portable application (running on different processors). Normally I could indeed compile 5 times and let the user choose the right one to run.
My question is: how can I can automate this, so that the processor is detected at runtime and the right executable is executed without the user having to chose it?
I have an application with a lot of low level math calculations. These calculations will typically run for a long time.
I would like to take advantage of as much optimization as possible, preferably also of (not always supported) instruction sets. On the other hand I would like my application to be portable and easy to use (so I would not like to compile 5 different versions and let the user choose).
Is there a possibility to compile 5 different versions of my code and run dynamically the most optimized version that's possible at execution time? With 5 different versions I mean with different instruction sets and different optimizations for processors.
I don't care about the size of the application.
At this moment I'm using gcc on Linux (my code is in C++), but I'm also interested in this for the Intel compiler and for the MinGW compiler for compilation to Windows.
The executable doesn't have to be able to run on different OS'es, but ideally there would be something possible with automatically selecting 32 bit and 64 bit as well.
Edit: Please give clear pointers how to do it, preferably with small code examples or links to explanations. From my point of view I need a super generic solution, which is applicable on any random C++ project I have later.
Edit I assigned the bounty to ShuggyCoUk, he had a great number of pointers to look out for. I would have liked to split it between multiple answers but that is not possible. I'm not having this implemented yet, so the question is still 'open'! Please, still add and/or improve answers, even though there is no bounty to be given anymore.
Thanks everybody!

Yes it's possible. Compile all your differently optimised versions as different dynamic libraries with a common entry point, and provide an executable stub that that loads and runs
the correct library at run-time, via the entry point, depending on config file or other information.

Can you use script?
You could detect the CPU using script, and dynamically load the executable that is most optimized for architecture. It can choose 32/64 bit versions too.
If you are using a Linux you can query the cpu with
cat /proc/cpuinfo
You could probably do this with a bash/perl/python script or windows scripting host on windows. You probably don't want to force the user to install a script engine. One that works on the OS out of the box IMHO would be best.
In fact, on windows you probably would want to write a small C# app so you can more easily query the architecture. The C# app could just spawn whatever executable is fastest.
Alternatively you could put your different versions of code in a dll's or shared object's, then dynamically load them based on the detected architecture. As long as they have the same call signature it should work.

If you wish this to cleanly work on Windows and take full advantage in 64bit capable platforms of the additional 1. Addressing space and 2. registers (likely of more use to you) you must have at a minimum a separate process for the 64bit ones.
You can achieve this by having a separate executable with the relevant PE64 header. Simply using CreateProcess will launch this as the relevant bitness (unless the executable launched is in some redirected location there is no need to worry about WoW64 folder redirection
Given this limitation on windows it is likely that simply 'chaining along' to the relevant executable will be the simplest option for all different options, as well as making testing an individual one simpler.
It also means you 'main' executable is free to be totally separate depending on the target operating system (as detecting the cpu/OS capabilities is, by it's nature, very OS specific) and then do most of the rest of your code as shared objects/dlls.
Also you can 'share' the same files for two different architectures if you currently do not feel that there is any point using the differing capabilities.
I would suggest that the main executable is capable of being forced into making a specific choice so you can see what happens with 'lesser' versions on a more capable machine (or what errors come up if you try something different).
Other possibilities given this model are:
Statically linking to different versions of the standard runtimes (for ones with/without thread safety) and using them appropriately if you are running without any SMP/SMT capabilities.
Detect if multiple cores are present and whether they are real or hyper threading (also whether the OS knows how the schedule effectively in those cases)
checking the performance of things like the system timer/high performance timers and using code optimized to this behaviour, say if you do anything where you look for a certain amount of time to expire and thus can know your best possible granularity.
If you wish to optimize you choice of code based on cache sizing/other load on the box. If you are using unrolled loops then more aggressive unrolling options may depend on having a certain amount level 1/2 cache.
Compiling conditionally to use doubles/floats depending on the architecture. Less important on intel hardware but if you are targetting certain ARM cpu's some have actual floating point hardware support and others require emulation. The optimal code would change heavily, even to the extent you just use conditional compilation rather than using the optimizing compiler(1).
Making use of co-processor hardware like CUDA capable graphics cards.
detect virtualization and alter behaviour (perhaps trying to avoid file system writes)
As to doing this check you have a few options, the most useful one on Intel being the the cpuid instruction.
Windows
Use someone else's implementation but you'll have to pay
Use a free open source one
Linux
Use the built in one
You could also look at open source software doing the same thing
Pixman does a fair amount of this and is a permissive licence.
Alternatively re-implement/update an existing one using available documentation on the features you need.
Quite a lot of separate documents to work out how to detect things:
Intel:
SSE 4.1/4.2
SSE3
MMX
A large part of what you would be paying for in the CPU-Z library is someone doing all this (and the nasty little issues involved) for you.
be careful with this - it is hard to beat decent optimizing compilers on this

Have a look at liboil: http://liboil.freedesktop.org/wiki/ . It can dynamically select implementations of multimedia-related computations at run-time. You may find you can liboil itself and not just its techniques.

Since you mention you are using GCC, I'll assume your code is in C (or C++).
Neil Butterworth already suggested making separate dynamic libraries, but that requires some non-trivial cross-platform considerations (manually loading dynamic libraries is different on Linux, Windows, OSX, etc., and getting it right will likely take some time).
A cheap solution is to simply write all of your variants using unique names, and use a function pointer to select the proper one at runtime.
I suspect the extra dereference caused by the function pointer will be amortized by the actual work you are doing (but you'll want to confirm that).
Also, getting different compiler optimizations will likely require different .c/.cpp files, as well as some twiddling of your build tool. But it's probably less overall work than separate libraries (which needed this already in one form or another).

Since you didn't specify whether you have limits on the number of files, I propose another solution: compile 5 executables, and then create a sixth executable that launches the appropriate binary. Here is some pseudocode, for Linux
int main(int argc, char* argv[])
{
char* target_path[MAXPATH];
char* new_argv[];
char* specific_version = determine_name_of_specific_version();
strcpy(target_path, "/usr/lib/myapp/versions");
strcat(target_path, specific_version);
/* append NULL to argv */
new_argv = malloc(sizeof(char*)*(argc+1));
memcpy(new_argv, argv, argc*sizeof(char*));
new_argv[argc] = 0;
/* optionally set new_argv[0] to target_path */
execv(target_path, new_argv);
}
On the plus side, this approach allows to provide the user transparently with both 32-bit and 64-bit binaries, unlike any library methods that have been proposed. On the minus side, there is no execv in Win32 (but a good emulation in cygwin); on Windows, you have to create a new process, rather than re-execing the current one.

Lets break the problem down to its two constituent parts. 1) Creating platform dependent optimized code and 2) building on multiple platforms.
The first problem is pretty straightforward. Encapsulate the platform dependent code in a set of functions. Create a different implementation of each function for each platform. Put each implementation in its own file or set of files. It's easiest for the build system if you put each platform's code in a separate directory.
For part two I suggest you look at Gnu Atuotools (Automake, AutoConf, and Libtool). If you've ever downloaded and built a GNU program from source code you know you have to run ./configure before running make. The purpose of the configure script is to 1) verify that your system has all of the required libraries and utilities need to build and run the program and 2) customize the Makefiles for the target platform. Autotools is the set of utilities for generating the configure script.
Using autoconf, you can create little macros to check that the machine supports all of the CPU instructions your platform dependent code needs. In most cases, the macros already exists, you just have to copy them into your autoconf script. Then, automake and autoconf can set up the Makefiles to pull in the appropriate implementation.
All this is a bit much for creating an example here. It takes a little time to learn. But the documentation is all out there. There is even a free book available online. And the process is applicable to your future projects. For multi-platform support, this is really the most robust and easiest way to go, I think. A lot of the suggestions posted in other answers are things that Autotools deals with (CPU detection, static & shared library support) without you have to think about it too much. The only wrinkle you might have to deal with is finding out if Autotools are available for MinGW. I know they are part of Cygwin if you can go that route instead.

You mentioned the Intel compiler. That is funny, because it can do something like this by default. However, there is a catch. The Intel compiler didn't insert checks for the approopriate SSE functionality. Instead, they checked if you had a particular Intel chip. There would still be a slow default case. As a result, AMD CPUs would not get suitable SSE-optimized versions. There are hacks floating around that will replace the Intel check with a proper SSE check.
The 32/64 bits difference will require two executables. Both the ELF and PE format store this information in the exectuables header. It's not too hard to start the 32 bits version by default, check if you are on a 64 bit system, and then restart the 64 bit version. But it may be easier to create an appropriate symlink at installation time.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js