Ant build failure

Ant build failure - build

I am trying to build my project using ant but it keeps on failing. It says
"java.lang.OutOfMemoryError: Java heap space ..."
and sometimes:
"java.lang.OutOfMemoryError: gc overhead limit exceeded "
I tried to increase the heap space by
ANT_OPTS = -Xms256m -Xmx1024m -XX:PermSize=128m -XX:MaxPermSize=512m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
Still same problem. I am not able to find any other solution. Is there anything else I can do?

Related

Memchk (valgrind) reporting inconsistent results in different docker hosts

I have a fairly robust CI test for a C++ library, these tests (around 50) run over the same docker image but in different machines.
In one machine ("A") all the memcheck (valgrind) tests pass (i.e. no memory leaks).
In the other ("B"), all tests produce the same valgrind error below.
51/56 MemCheck #51: combinations.cpp.x ....................***Exception: SegFault 0.14 sec
valgrind: m_libcfile.c:66 (vgPlain_safe_fd): Assertion 'newfd >= VG_(fd_hard_limit)' failed.
Cannot find memory tester output file: /builds/user/boost-multi/build/Testing/Temporary/MemoryChecker.51.log
The machines are very similar, both are intel i7.
The only difference I can think of is that one is:
A. Ubuntu 22.10, Linux 5.19.0-29, docker 20.10.16
and the other:
B. Fedora 37, Linux 6.1.7-200.fc37.x86_64, docker 20.10.23
and perhaps some configuration of docker I don't know about.
Is there some configuration of docker that might generate the difference? or of the kernel? or some option in valgrind to workaround this problem?
I know for a fact that in real machines (not docker) valgrind doesn't produce any memory error.
The options I use for valgrind are always -leak-check=yes --num-callers=51 --trace-children=yes --leak-check=full --track-origins=yes --gen-suppressions=all.
Valgrind version in the image is 3.19.0-1 from the debian:testing image.
Note that this isn't an error reported by valgrind, it is an error within valgrind.
Perhaps after all, the only difference is that Ubuntu version of valgrind is compiled in release mode and the error is just ignored. (<-- this doesn't make sense, valgrind is the same in both cases because the docker image is the same).
I tried removing --num-callers=51 or setting it at 12 (default value), to no avail.

I found a difference between the images and the real machine and a workaround.
It has to do with the number of file descriptors.
(This was pointed out briefly in one of the threads on valgind bug issues on Mac OS https://bugs.kde.org/show_bug.cgi?id=381815#c0)
Inside the docker image running in Ubuntu 22.10:
ulimit -n
1048576
Inside the docker image running in Fedora 37:
ulimit -n
1073741816
(which looks like a ridiculous number or an overflow)
In the Fedora 37 and the Ubuntu 22.10 real machines:
ulimit -n
1024
So, doing this in the CI recipe, "solved" the problem:
- ulimit -n # reports current value
- ulimit -n 1024 # workaround neededed by valgrind in docker running in Fedora 37
- ctest ... (with memcheck)
I have no idea why this workaround works.
For reference:
$ ulimit --help
...
-n the maximum number of open file descriptors

First off, "you are doing it wrong" with your Valgrind arguments. For CI I recommend a two stage approach. Use as many default arguments as possible for the CI run (--trace-children=yes may well be necessary but not the others). If your codebase is leaky then you may need to check for leaks, but if you can maintain a zero leak policy (or only suppressed leaks) then you can tell if there are new leaks from the summary. After your CI detects an issue you can run again with the kitchen sink options to get full information. Your runs will be significantly faster without all those options.
Back to the question.
Valgrind is trying to dup() some file (the guest exe, a tempfile or something like that). The fd that it fets is higher than what it thinks the nofile rlimit is, so it is asserting.
A billion files is ridiculous.
Valgrind will try to call prlimit RLIMIT_NOFILE, with a fallback call to rlimit, and a second fallback to setting the limit to 1024.
To realy see what is going on you need to modify the Valgrind source (m_main.c, setup_file_descriptors, set local show to True). With this change I see
fd limits: host, before: cur 65535 max 65535
fd limits: host, after: cur 65535 max 65535
fd limits: guest : cur 65523 max 65523
Otherwise with strace I see
2049 prlimit64(0, RLIMIT_NOFILE, NULL, {rlim_cur=65535, rlim_max=65535}) = 0
2049 prlimit64(0, RLIMIT_NOFILE, {rlim_cur=65535, rlim_max=65535}, NULL) = 0
(all the above on RHEL 7.6 amd64)
EDIT: Note that the above show Valgrind querying and setting the resource limit. If you use ulimit to lower the limit before running Valgrind, then Valgrind will try to honour that limit. Also note that Valgrind reserves a small number (8) of files for its own use.

HTCondor - Partitionable slot not working

I am following the tutorial on
Center for High Throughput Computing and Introduction to Configuration in the HTCondor website to set up a Partitionable slot. Before any configuration I run
condor_status
and get the following output.
I update the file 00-minicondor in /etc/condor/config.d by adding the following lines at the end of the file.
NUM_SLOTS = 1
NUM_SLOTS_TYPE_1 = 1
SLOT_TYPE_1 = cpus=4
SLOT_TYPE_1_PARTITIONABLE = TRUE
and reconfigure
sudo condor_reconfig
Now with
condor_status
I get this output as expected. Now, I run the following command to check everything is fine
condor_status -af Name Slotype Cpus
and find slot1#ip-172-31-54-214.ec2.internal undefined 1 instead of slot1#ip-172-31-54-214.ec2.internal Partitionable 4 61295 that is what I would expect. Moreover, when I try to summit a job that asks for more than 1 cpu it does not allocate space for it (It stays waiting forever) as it should.
I don't know if I made some mistake during the installation process or what could be happening. I would really appreciate any help!
EXTRA INFO: If it can be of any help have have installed HTCondor with the command
curl -fsSL https://get.htcondor.org | sudo /bin/bash -s – –no-dry-run
on Ubuntu 18.04 running on an old p2.xlarge instance (it has 4 cores).
UPDATE: After rebooting the whole thing it seems to be working. I can now send jobs with different CPUs requests and it will start them properly.
The only issue I would say persists is that Memory allocation is not showing properly, for example:
But in reality it is allocating enough memory for the job (in this case around 12 GB).
If I run again
condor_status -af Name Slotype Cpus
I still get something I am not supposed to
But at least it is showing the correct number of CPUs (even if it just says undefined).

What is the output of condor_q -better when the job is idle?

WSO2 EI log about Java heap space

I have called an endpoint and it response a large data, unfortunately show the error message in WSO2 carbon log . How can I solve it? Thank you.
TID: [-1] [] [2018-02-26 17:48:47,869] ERROR {org.wso2.carbon.das.messageflow.data.publisher.data.MessageFlowObserverStore} - Error occurred while notifying the statistics observer {org.wso2.carbon.das.messageflow.data.publisher.data.MessageFlowObserverStore}
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at com.esotericsoftware.kryo.io.Output.flush(Output.java:181)
at com.esotericsoftware.kryo.io.Output.require(Output.java:160)
at com.esotericsoftware.kryo.io.Output.writeString_slow(Output.java:462)
at com.esotericsoftware.kryo.io.Output.writeString(Output.java:363)
at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:191)
at com.esotericsoftware.kryo.serializers.DefaultSerializers$StringSerializer.write(DefaultSerializers.java:184)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
at com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
at com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:113)
at com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:39)
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534)
at org.wso2.carbon.das.messageflow.data.publisher.publish.StatisticsPublisher.addEventData(StatisticsPublisher.java:116)
at org.wso2.carbon.das.messageflow.data.publisher.publish.StatisticsPublisher.process(StatisticsPublisher.java:67)
at org.wso2.carbon.das.messageflow.data.publisher.observer.DASMediationFlowObserver.updateStatistics(DASMediationFlowObserver.java:55)
at org.wso2.carbon.das.messageflow.data.publisher.data.MessageFlowObserverStore.notifyObservers(MessageFlowObserverStore.java:71)
at org.wso2.carbon.das.messageflow.data.publisher.services.MessageFlowReporterThread.processAndPublishEventList(MessageFlowReporterThread.java:225)
at org.wso2.carbon.das.messageflow.data.publisher.services.MessageFlowReporterThread.run(MessageFlowReporterThread.java:95)

By looking at the out of memory issue it is hard to say anything about the culprit. In order to find out the actual root cause we have to analyze the heapdump (There will heapdump created by wso2 servers automatically in CARBON_HOME/repository/logs/heap-dump.hprof) using an analyzing tool such as MAT, jprofile.
However, if the response message is large, there is a possibility that the server goes OOM as it keeps and may build the response message in memory. If you want to process large messages, you can tune the heap memory allocation as in the doc.

grails FAILURE: Build failed with an exception.Execution failed fro task ':boot run'

got this error while I tried to run my very first grail app..:(
enter image description heregot this error while I tried to run my very first grail app..:(

Some steps you can follow to resolve...
Check the JDK and Grails, both need to be same 32 bit or both need to be same 64 bit
Execute this on command prompt: java -Xmx2048m -Xms256m
Then rebuild and run app

The initial memory allocated to your JVM is bigger than the maximum JVM memory size you allocated via your -Xmx parameter.
see What are the Xms and Xmx parameters when starting JVMs?

Combining temp/lat/lon netcdf files via nco

I am trying to calculate mean mean daily temperature from daily maximum and daily minimum netcdf files, so performed following tasks. But it is not giving me result. Could you please help me on this?
C:\nco>ncks -A G:\CORDEX\ACCESS1-0\RCP45\tasmin.nc G:\CORDEX\ACCESS1-0\RCP45\tasmax.nc
1 file(s) copied.
1 file(s) moved.
C:\nco>ncap2 -s "tasavg=(tasmin+tasmax)/2" G:\CORDEX\ACCESS1-0\RCP45\tasmax.nc G:\CORDEX\ACCESS1-0\RCP45\tasavg.nc
ncap2: ERROR malloc() returns error on Unable to malloc() value buffer when retrieving variable from disk request for 985675200 B = 962573 k
B = 940 MB = 0 GB
ncap2: malloc() error is "Not enough space"
ncap2: User-supplied supplemental error message is "nco_var_get()"
ncap2: INFO NCO has reported a malloc() failure. malloc() failures usually indicate that your machine does not have enough free memory (RAM+swap) to perform the requested operation.
Could you please help me on this

It clearly indicates that the RAM is not enough. You might need to free up some memory and try to run above command.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js