Clojurescript very slow on compiling - clojure

I am trying clojurescript and find it takes a long time to compile a very simple clojurescript source file into js. I can't believe this.
time cljsc hello.cljs '{:optimizations :advanced}' > hello.js
real 1m27.324s
user 1m2.412s
sys 0m0.676s
The snippet is from Clojurescript's github quick start page:
(ns hello)
(defn ^:export greet [n]
(str "Hello " n))
Leaving out the :optimizations option, I still find it takes a long time:
time cljsc hello.cljs > hello.js
real 0m10.867s
user 0m22.301s
sys 0m0.412s
Is that normal ? Or how can I speed up that ?

By calling cljsc you are firing up a JVM every single time you compile, which has to load tons of code, and then do the actual compiling. The JVM startup time alone is painful.
The general workflow is to not use cljsc, but keep a JVM open and compile with it every time. A common way to do this is to use lein-cljsbuild, which I highly recommend.

Taking ten seconds to compile doesn't seem totally crazy to me, because it has to compile all of the standard cljs library to javascript as well as your trivial program. As for the advanced optimizations, that's probably just how long it takes to do anything. It's reading and optimizing thousands of lines of cljs.core javascript, and probably also the google closure libraries that (I think) are used by cljs.core.

Related

How to set a breakpoint in a Clojure program using Calva?

I am working with a Clojure code and need to make changes in some parts of it. For that, I need to stop it in a breakpoint to see what are the values of atoms, variables, .... one step before new changes. The code I am working with is:
https://github.com/lspector/clojush
I need to stop it in breed.clj,defn perform-genetic-operator-list, at the end of line 99 (num-parents).
I am using Calva in VS code and run the code via the terminal with this command:
"lein run clojush.problems.demos.simple-regression".
I tried putting #break inside "num-parents (:parents (get genetic-operators operator))" as:
num-parents (:parents (get genetic-operators #break operator))
but it did not work and program stops showing some errors. Also I tried instrumentation and again did not work.
Any help is highly appreciated. Thanks!

Is it possible to mark tests as taking long time in googletest runs

Most of my tests finish quickly, the time taken is unnoticeable. But a few of them take a few seconds. I would like to print a hint to the user:
TEST(something, thing) {
std::cout << "This might take a few seconds\n";
ASSERT_EQ(expected_result, long_computation());
}
This doesn't blend in well with what is printed. Is there a feature for this in googletest? I couldn't find anything related. Any way to make goggliest understand it, print a hint to the user, and even report error in case the test runs too long? Or any plugin that does this? Thx
TEST(something, thing, max_time: 3 seconds) {
ASSERT_EQ(expected_result, long_computation());
}
I'm not sure anything like this exists but I believe you can tailor some of the features you want at least partially.
You can measure the time of execution of your test yourself and add statements like ASSERT_GT(time_limit, measured_time).
Consider using https://github.com/google/googletest/blob/master/googletest/docs/advanced.md#logging-additional-information for writing the timings or "long"/"fast" attributes.
Consider using https://github.com/google/googletest/blob/master/googletest/docs/advanced.md#running-a-subset-of-the-tests for running only fast, or only long tests, f.e. run only fast tests (provided you named all long tests like *Long): ./foo_test --gtest_filter=-*Long

How to properly debug a binary generated by `go test -c` using GDB?

The go test command has support for the -c flag, described as follows:
-c Compile the test binary to pkg.test but do not run it.
(Where pkg is the last element of the package's import path.)
As far as I understand, generating a binary like this is the way to run it interactively using GDB. However, since the test binary is created by combining the source and test files temporarily in some /tmp/ directory, this is what happens when I run list in gdb:
Loading Go Runtime support.
(gdb) list
42 github.com/<username>/<project>/_test/_testmain.go: No such file or directory.
This means I cannot happily inspect the Go source code in GDB like I'm used to. I know it is possible to force the temporary directory to stay by passing the -work flag to the go test command, but then it is still a huge hassle since the binary is not created in that directory and such. I was wondering if anyone found a clean solution to this problem.
Go 1.5 has been released, and there is still no officially sanctioned Go debugger. I haven't had much success using GDB for effectively debugging Go programs or test binaries. However, I have had success using Delve, a non-official debugger that is still undergoing development: https://github.com/derekparker/delve
To run your test code in the debugger, simply install delve:
go get -u github.com/derekparker/delve/cmd/dlv
... and then start the tests in the debugger from within your workspace:
dlv test
From the debugger prompt, you can single-step, set breakpoints, etc.
Give it a whirl!
Unfortunately, this appears to be a known issue that's not going to be fixed. See this discussion:
https://groups.google.com/forum/#!topic/golang-nuts/nIA09gp3eNU
I've seen two solutions to this problem.
1) create a .gdbinit file with a set substitute-path command to
redirect gdb to the actual location of the source. This file could be
generated by the go tool but you'd risk overwriting someone's custom
.gdbinit file and would tie the go tool to gdb which seems like a bad
idea.
2) Replace the source file paths in the executable (which are pointing
to /tmp/...) with the location they reside on disk. This is
straightforward if the real path is shorter then the /tmp/... path.
This would likely require additional support from the compiler /
linker to make this solution more generic.
It spawned this issue on the Go Google Code issue tracker, to which the decision ended up being:
https://code.google.com/p/go/issues/detail?id=2881
This is annoying, but it is the least of many annoying possibilities.
As a rule, the go tool should not be scribbling in the source
directories, which might not even be writable, and it shouldn't be
leaving files elsewhere after it exits. There is next to nothing
interesting in _testmain.go. People testing with gdb can break on
testing.Main instead.
Russ
Status: Unfortunate
So, in short, it sucks, and while you can work around it and GDB a test executable, the development team is unlikely to make it as easy as it could be for you.
I'm still new to the golang game but for what it's worth basic debugging seems to work.
The list command you're trying to work can be used so long as you're already at a breakpoint somewhere in your code. For example:
(gdb) b aws.go:54
Breakpoint 1 at 0x61841: file /Users/mat/gocode/src/github.com/stellar/deliverator/aws/aws.go, line 54.
(gdb) r
Starting program: /Users/mat/gocode/src/github.com/stellar/deliverator/aws/aws.test
[snip: some arnings about BinaryCache]
Breakpoint 1, github.com/stellar/deliverator/aws.imageIsNewer (latest=0xc2081fe2d0, ami=0xc2081fe3c0, ~r2=false)
at /Users/mat/gocode/src/github.com/stellar/deliverator/aws/aws.go:54
54 layout := "2006-01-02T15:04:05.000Z"
(gdb) list
49 func imageIsNewer(latest *ec2.Image, ami *ec2.Image) bool {
50 if latest == nil {
51 return true
52 }
53
54 layout := "2006-01-02T15:04:05.000Z"
55
56 amiCreationTime, amiErr := time.Parse(layout, *ami.CreationDate)
57 if amiErr != nil {
58 panic(amiErr)
This is just after running the following in the aws subdir of my project:
go test -c
gdb aws.test
As an additional caveat, it does seem very selective about where breakpoints can be placed. Seems like it has to be an expression but that conclusion is only via experimentation.
If you're willing to use tools besides GDB, check out godebug. To use it, first install with:
go get github.com/mailgun/godebug
Next, insert a breakpoint somewhere by adding the following statement to your code:
_ = "breakpoint"
Now run your tests with the godebug test command.
godebug test
It supports many of the parameters from the go test command.
-test.bench string
regular expression per path component to select benchmarks to run
-test.benchmem
print memory allocations for benchmarks
-test.benchtime duration
approximate run time for each benchmark (default 1s)
-test.blockprofile string
write a goroutine blocking profile to the named file after execution
-test.blockprofilerate int
if >= 0, calls runtime.SetBlockProfileRate() (default 1)
-test.count n
run tests and benchmarks n times (default 1)
-test.coverprofile string
write a coverage profile to the named file after execution
-test.cpu string
comma-separated list of number of CPUs to use for each test
-test.cpuprofile string
write a cpu profile to the named file during execution
-test.memprofile string
write a memory profile to the named file after execution
-test.memprofilerate int
if >=0, sets runtime.MemProfileRate
-test.outputdir string
directory in which to write profiles
-test.parallel int
maximum test parallelism (default 4)
-test.run string
regular expression to select tests and examples to run
-test.short
run smaller test suite to save time
-test.timeout duration
if positive, sets an aggregate time limit for all tests
-test.trace string
write an execution trace to the named file after execution
-test.v
verbose: print additional output

my c++ extension behaves differently with faulthandler

Background
I have a C++ extension which runs a 3D watershed pass on a buffer. It's got a nice Cython wrapper to initialise a massive buffer of signed chars to represent the voxels. I initialise some native data structures in python (in a compiled cython file) and then call one C++ function to initialise the buffer, and another to actually run the algorithm (I could have written these in Cython too, but I'd like it to work as a C++ library as well without a python.h dependancy.)
Weirdness
I'm in the process of debugging my code, trying different image sizes to gauge RAM usage and speed, etc, and I've noticed something very strange about the results - they change depending on whether I use python test.py (specifically /usr/bin/python on Mac OS X 10.7.5/Lion, which is python 2.7) or python and running import test, and calling a function on it (and indeed, on my laptop (OS X 10.6.latest, with macports python 2.7) the results are also deterministically different - each platform/situation is different, but each one is always the same as itself.). In all cases, the same function is called, loads some input data from a file, and runs the C++ module.
A note on 64-bit python - I am not using distutils to compile this code, but something akin to my answer here (i.e. with an explicit -arch x86_64 call). This shouldn't mean anything, and all my processes in Activity Monitor are called Intel (64-bit).
As you may know, the point of watershed is to find objects in the pixel soup - in 2D it's often used on photos. Here, I'm using it to find lumps in 3D in much the same way - I start with some lumps ("grains") in the image and I want to find the inverse lumps ("cells") in the space between them.
The way the results change is that I literally find a different number of lumps. For exactly the same input data:
python test.py:
grain count: 1434
seemed to have 8000000 voxels, with average value 0.8398655
find cells:
running watershed algorithm...
found 1242 cells from 1434 original grains!
...
however,
python, import test, test.run():
grain count: 1434
seemed to have 8000000 voxels, with average value 0.8398655
find cells:
running watershed algorithm...
found 927 cells from 1434 original grains!
...
This is the same in the interactive python shell and bpython, which I originally thought was to blame.
Note the "average value" number is exactly the same - this indicates that the same fraction of voxels have initially been marked as in the problem space - i.e. that my input file was initialised in (very very probably) exactly the same way both times in voxel-space.
Also note that no part of the algorithm is non-deterministic; there are no random numbers or approximations; subject to floating point error (which should be the same each time) we should be performing exactly the same computations on exactly the same numbers both times. Watershed runs using a big buffer of integers (here signed chars) and the results are counting clusters of those integers, all of which is implemented in one big C++ call.
I have tested the __file__ attribute of the relevant module objects (which are themselves attributes of the imported test), and they're pointing to the same installed watershed.so in my system's site-packages.
Questions
I don't even know where to begin debugging this - how is it possible to call the same function with the same input data and get different results? - what about interactive python might cause this (e.g. by changing the way the data is initialised)? - Which parts of the (rather large) codebase are relevant to these questions?
In my experience it's much more useful to post ALL the code in a stackoverflow question, and not assume you know where the problem is. However, that is thousands of lines of code here, and I have literally no idea where to start! I'm happy to post small snippets on request.
I'm also happy to hear debugging strategies - interpreter state that I can check, details about the way python might affect an imported C++ binary, and so on.
Here's the structure of the code:
project/
clibs/
custom_types/
adjacency.cpp (and hpp) << graph adjacency (2nd pass; irrelevant = irr)
*array.c (and h) << dynamic array of void*s
*bit_vector.c (and h) << int* as bitfield with helper functions
polyhedron.cpp (and hpp) << for voxel initialisation; convex hull result
smallest_ints.cpp (and hpp) << for voxel entity affiliation tracking (irr)
custom_types.cpp (and hpp) << wraps all files in custom_types/
delaunay.cpp (and hpp) << marshals calls to stripack.f90
*stripack.f90 (and h) << for computing the convex hulls of grains
tensors/
*D3Vector.cpp (and hpp) << 3D double vector impl with operators
watershed.cpp (and hpp) << main algorithm entry points (ini, +two passes)
pywat/
__init__.py
watershed.pyx << cython class, python entry points.
geometric_graph.py << python code for post processing (irr)
setup.py << compile and install directives
test/
test.py << entry point for testing installed lib
(files marked * have been used extensively in other projects and are very well tested, those suffixed irr contain code only run after the problem has been caused.)
Details
as requested, the main stanza in test/test.py:
testfile = 'valid_filename'
if __name__ == "__main__":
# handles segfaults...
import faulthandler
faulthandler.enable()
run(testfile)
and my interactive invocation looks like:
import test
test.run(test.testfile)
Clues
when I run this at the straight interpreter:
import faulthandler
faulthandler.enable()
import test
test.run(test.testfile)
I get the results from the file invocation (i.e. 1242 cells), although when I run it in bpython, it just crashes.
This is clearly the source of the problem - hats off to Ignacio Vazquez-Abrams for asking the right question straight away.
UPDATE:
I've opened a bug on the faulthandler github and I'm working towards a solution. If I find something that people can learn from I'll post it as an answer.
After debugging this application extensively (printf()ing out all the data at multiple points during the run, piping outputs to log files, diffing the log files) I found what seemed to cause the strange behaviour.
I was using uninitialised memory in a couple of places, and (for some bizarre reason) this gave me repeatable behaviour differences between the two cases I describe above - one without faulthandler and one with.
Incidentally, this is also why the bug disappeared from one machine but continued to manifest itself on another, part way through debugging (which really should have given me a clue!)
My mistake here was to assume things about the problem based on a spurious correlation - in theory the garbage ram should have been differently random each time I accessed it (ahh, theory.) In this case I would have been quicker finding the problem with a printout of the main calculation function and a rubber duck.
So, as usual, the answer is the bug is not in the library, it is somewhere in your code - in this case, it was my fault for malloc()ing a chunk of RAM, falsely assuming that other parts of my code were going to initialise it (which they only did sometimes.)

Timing the runtime of a program via the Terminal

I've written a C++ program of which i would like to time the length of time it takes to complete - is there some terminal command i could use?
You can use the "time" command available in most (maybe all) the linux distributions. It will print the time spent as system, as user, and the total time.
For example
bash-4.1$ time (sleep 1; sleep 1)
will output something like
real 0m2.020s
user 0m0.014s
sys 0m0.005s
As you can see with the parenthesis you can launch every command chain you wish.
It's called time in *nix
Iterate over the function several times (1000's probably) so you can get a large enough number. Then use time.h to create two variables of type time_t - one before execution, one after. Subtract the two and divide by the iterations.
Or Measure-Command in PowerShell.
I try to better explain :)
If you have compiled your code using g++, for example:
g++ -std=c++14 c++/dijkstra_shortest_reach_2.cpp -o dsq
In order to run it, you type:
./dsq
In order to run it with a file content as an input, you type:
./dsq < input07Dijkstra.txt
Now for the answer.
In order to get the duration of the program output to the screen, just type:
time(./dsq < input07Dijkstra.txt)
Or without an input:
time(./dsq)
For the first command my output is:
real 0m16.082s
user 0m15.968s
sys 0m0.089s
Hope it helps!