Out of memory opencv haartraining - c++

I have 30 positive images and 60 negatives images.
When I tried to execute the haartraining with 4GB of memory and Quadcore processor machine, I get this error message:
OpenCV ERROR: Insufficient memory (Out of memory)
in function cvAlloc, cxalloc.cpp(111)
Terminating the application...
called from cvUnregisterType, cxpersistence.cpp(4933)
The command is:
./opencv-haartraining -vec vector/myvector.vec -bg negatives.txt -npos 24 -nneg 55 -mem 2048 -mode ALL -w 86 -h 150
The computer has only 765 MB used, but the process exceeds the given limit, and uses a lot of memory in swap until the overflow occurs. Any suggestions of what can be done to solve this problem?
Regards

Maybe your "myvector.vec" is too big. All these pictures are loaded to RAM.
Try to resize the images.

Related

Limiting Java 8 Memory Consumption

I'm running three Java 8 JVMs on a 64 bit Ubuntu VM which was built from a minimal install with nothing extra running other than the three JVMs. The VM itself has 2GB of memory and each JVM was limited by -Xmx512M which I assumed would be fine as there would be a couple of hundred MB spare.
A few weeks ago, one crashed and the hs_err_pid dump showed:
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 196608 bytes for committing reserved memory.
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
I restarted the JVM with a reduced heap size of 384MB and so far everything is fine. However when I currently look at the VM using the ps command and sort in descending RSS size I see
RSS %MEM VSZ PID CMD
708768 35.4 2536124 29568 java -Xms64m -Xmx512m ...
542776 27.1 2340996 12934 java -Xms64m -Xmx384m ...
387336 19.3 2542336 6788 java -Xms64m -Xmx512m ...
12128 0.6 288120 1239 /usr/lib/snapd/snapd
4564 0.2 21476 27132 -bash
3524 0.1 5724 1235 /sbin/iscsid
3184 0.1 37928 1 /sbin/init
3032 0.1 27772 28829 ps ax -o rss,pmem,vsz,pid,cmd --sort -rss
3020 0.1 652988 1308 /usr/bin/lxcfs /var/lib/lxcfs/
2936 0.1 274596 1237 /usr/lib/accountsservice/accounts-daemon
..
..
and the free command shows
total used free shared buff/cache available
Mem: 1952 1657 80 20 213 41
Swap: 0 0 0
Taking the first process as an example, there is an RSS size of 708768 KB even though the heap limit would be 524288 KB (512*1024).
I am aware that extra memory is used over the JVM heap but the question is how can I control this to ensure I do not run out of memory again ? I am trying to set the heap size for each JVM as large as I can without crashing them.
Or is there a good general guideline as to how to set JVM heap size in relation to overall memory availability ?
There does not appear to be a way of controlling how much extra memory the JVM will use over the heap. However by monitoring the application over a period of time, a good estimate of this amount can be obtained. If the overall consumption of the java process is higher than desired, then the heap size can be reduced. Further monitoring is needed to see if this impacts performance.
Continuing with the example above and using the command ps ax -o rss,pmem,vsz,pid,cmd --sort -rss we see usage as of today is
RSS %MEM VSZ PID CMD
704144 35.2 2536124 29568 java -Xms64m -Xmx512m ...
429504 21.4 2340996 12934 java -Xms64m -Xmx384m ...
367732 18.3 2542336 6788 java -Xms64m -Xmx512m ...
13872 0.6 288120 1239 /usr/lib/snapd/snapd
..
..
These java processes are all running the same application but with different data sets. The first process (29568) has stayed stable using about 190M beyond the heap limit while the second (12934) has reduced from 156M to 35M. The total memory usage of the third has stayed well under the heap size which suggests the heap limit could be reduced.
It would seem that allowing 200MB extra non heap memory per java process here would be more than enough as that gives 600MB leeway total. Subtracting this from 2GB leaves 1400MB so the three -Xmx parameter values combined should be less than this amount.
As will be gleaned from reading the article pointed out in a comment by Fairoz there are many different ways in which the JVM can use non heap memory. One of these that is measurable though is the thread stack size. The default for a JVM can be found on linux using java -XX:+PrintFlagsFinal -version | grep ThreadStackSize In the case above it is 1MB and as there are about 25 threads, we can safely say that at least 25MB extra will always be required.

Can not use -w -h in traincascade

I have sone problem with size of sample traincascade. I use opencv_traincascade.exe to train rear of car. Size of rear of car which i need to have a size 72x48. Now i use createsample to create 2000 sample size 72x48 with command
D:\Project_Android\Classifier\bin\opencv_createsamples.exe -info positive.txt -bg negative.txt -vec vector.vec -num 2000 -w 72 -h 48
Now, i have a vector file and negative.txt (sure negative.txt have absolute direction to background images). i use opencv_traincascade.exe with command.
D:\Project_Android\Classifier\bin\opencv_traincascade.exe -data HaarTraining -vec vector.vec -bg negative.txt -numPos 250 -numNeg 1500 -numStages 10 -nonsym -minhitrate 0.999 -maxfalsealarm 0.5 -mode ALL -w 72 -h 48 -precalcValBufSize 2046 -precalcIdxBufSize 2046 PAUSE
I get an error.
When i use size 24x24 (-w 24 -h 24) for createsample.exe step and cascadetraining step, everything is ok, traincasecade is run.
it seem to be size 24x24 is a default size of sample and it still remain. and i seem to be cannot config it in file .bat (i write command in file bat and click file bat to run)
And one thing i see in command window in above picture that: -precalcValBufSize and -precalcIdxBufSize still remain default value 1024. No change with -w -h -precalcValBufSize -precalcIdxBufSize.
So what problem with it, i don't understand why don't i choose size of sample for train. I just can use size 24x24, all size other always give error "Assertion failed <_img.row*_img.height == vecSize> in CvcascadeImageReader...pla pla"
Please help me to solve this problem. So many thank to you

Not enough space to cache rdd in memory warning

I am running a spark job, and I got Not enough space to cache rdd_128_17000 in memory warning. However, in the attached file, it obviously saying only 90.8 G out of 719.3 G is used. Why is that? Thanks!
15/10/16 02:19:41 WARN storage.MemoryStore: Not enough space to cache rdd_128_17000 in memory! (computed 21.4 GB so far)
15/10/16 02:19:41 INFO storage.MemoryStore: Memory use = 4.1 GB (blocks) + 21.2 GB (scratch space shared across 1 thread(s)) = 25.2 GB. Storage limit = 36.0 GB.
15/10/16 02:19:44 WARN storage.MemoryStore: Not enough space to cache rdd_129_17000 in memory! (computed 9.4 GB so far)
15/10/16 02:19:44 INFO storage.MemoryStore: Memory use = 4.1 GB (blocks) + 30.6 GB (scratch space shared across 1 thread(s)) = 34.6 GB. Storage limit = 36.0 GB.
15/10/16 02:25:37 INFO metrics.MetricsSaver: 1001 MetricsLockFreeSaver 339 comitted 11 matured S3WriteBytes values
15/10/16 02:29:00 INFO s3n.MultipartUploadOutputStream: uploadPart /mnt1/var/lib/hadoop/s3/959a772f-d03a-41fd-bc9d-6d5c5b9812a1-0000 134217728 bytes md5: qkQ8nlvC8COVftXkknPE3A== md5hex: aa443c9e5bc2f023957ed5e49273c4dc
15/10/16 02:38:15 INFO s3n.MultipartUploadOutputStream: uploadPart /mnt/var/lib/hadoop/s3/959a772f-d03a-41fd-bc9d-6d5c5b9812a1-0001 134217728 bytes md5: RgoGg/yJpqzjIvD5DqjCig== md5hex: 460a0683fc89a6ace322f0f90ea8c28a
15/10/16 02:42:20 INFO metrics.MetricsSaver: 2001 MetricsLockFreeSaver 339 comitted 10 matured S3WriteBytes values
This is likely to be caused by the configuration of spark.storage.memoryFraction being too low. Spark will only use this fraction of the allocated memory to cache RDDs.
Try either:
increasing the storage fraction
rdd.persist(StorageLevel.MEMORY_ONLY_SER) to reduce memory usage by serializing the RDD data
rdd.persist(StorageLevel.MEMORY_AND_DISK) to partially persist onto disk if memory limits are reached.
This could be due to the following issue if you're loading lots of avro files:
https://mail-archives.apache.org/mod_mbox/spark-user/201510.mbox/%3CCANx3uAiJqO4qcTXePrUofKhO3N9UbQDJgNQXPYGZ14PWgfG5Aw#mail.gmail.com%3E
With a PR in progress at:
https://github.com/databricks/spark-avro/pull/95
I have a Spark-based batch application (a JAR with main() method, not written by me, I'm not a Spark expert) that I run in local mode without spark-submit, spark-shell, or spark-defaults.conf. When I tried to use IBM JRE (like one of my customers) instead of Oracle JRE (same machine and same data), I started getting those warnings.
Since the memory store is a fraction of the heap (see the page that Jacob suggested in his comment), I checked the heap size: IBM JRE uses a different strategy to decide default heap size and it was too small, so I simply added appropriate -Xms and -Xmx params and the problem disappeared: now the batch works fine both with IBM and Oracle JRE.
My usage scenario is not typical, I know, however I hope this can help someone.

Windows Get list of ALL files on volume with size

Question: how to list all files on volume with size they occupy on disk?
Applicable solutions:
cmd script
free tool with sqlite/txt/xls/xml/json output
C++ / winapi code
The problem:
There are many tools and apis to list files, but their results dont match chkdsk and actual free space info:
Size Count (x1000)
chkdsk c: 67 GB 297
dir /S 42 GB 267
FS Inspect 47 GB 251
Total Commander (Ctrl+L) 47 GB 251
explorer (selection size) 44 GB 268
explorer (volume info) 67 GB -
WinDirStat 45 GB 245
TreeSize couldn't download it - site unavailable
C++ FindFirstFile/FindNextFile 50 GB 288
C++ GetFileInformationByHandleEx 50 GB 288
Total volume size is 70 GB, about 3 GB is actually free.
I'm aware of:
File can occupy on disk, more than its actual size, i need the size it occupies (i.e. greater one)
Symlinks, Junctions etc - that would be good to see them (though i don't think this alone can really give 20 GB difference in my case)
Filesystem uses some space for indexes and system info (chkdisk shows negligible, don't give 20 GB)
I run all tools with admin privileges, hidden files are shown.
FindFirstFile/FindNextFile C++ solution - this dont give correct results, i don't know because of what, but this gives the same as Total commander NOT the same as chkdsk
Practical problem:
I have 70 GB SSD disk, all the tools report about 50 GB is occupied, but in fact it's almost full.
Format all and reinstall - is not an option since this will happens again quite soon.
I need a report about filesizes. Report total must match actual used and free space. I'm looking for an existing solution - a tool, a script or a C++ library or C++ code.
(Actual output below)
chkdsk c:
Windows has scanned the file system and found no problems.
No further action is required.
73715708 KB total disk space.
70274580 KB in 297259 files.
167232 KB in 40207 indexes.
0 KB in bad sectors.
463348 KB in use by the system.
65536 KB occupied by the log file.
2810548 KB available on disk.
4096 bytes in each allocation unit.
18428927 total allocation units on disk.
702637 allocation units available on disk.
dir /S
Total Files Listed:
269966 File(s) 45 071 190 706 bytes
143202 Dir(s) 3 202 871 296 bytes free
FS Inspect http://sourceforge.net/projects/fs-inspect/
47.4 GB 250916 Files
Total Commander
49709355k, 48544M 250915 Files
On a Posix system, the answer would be to use the stat function. Unfortunately, it does not give the number of allocated blocs in Windows so it does not meet your requirements.
The correct function from Windows API is GetFileInformationByHandleEx. You can use FindFirstFile, FindNextFile to browse the full disk, and ask for FileStandardInfo to get a FILE_STANDARD_INFO that contains for a file (among other fields): LARGE_INTEGER AllocationSize for the allocated size and LARGE_INTEGER EndOfFile for the used size.
Alternatively, you can use directly GetFileInformationByHandleEx on directories, asking for FileIdBothDirectoryInfo to get a FILE_ID_BOTH_DIR_INFO structure. This allows you to get information on many files in a single call. My advice would be to use that one, even if it is of less common usage.
To get list of all files (including hidden and system files), sorted within directories with descending size, you can go to your cmd.exe and type:
dir /s/a:-d/o:-s C:\* > "list_of_files.txt"
Where:
/s lists files within the specified directory and all subdirectories,
/a:-d lists only files (no directories),
/o:-s put files within directory in descending size order,
C:\* means all directories on disk C,
> "list_of_files.txt" means save output to list_of_files.txt file
Listing files grouped by directory may be a little inconvenient, but it's the easiest way to list all files. For more information, take a look at technet.microsoft.com
Checked on Win7 Pro.

Dealing with large TIFF images C++/Magick/libtiff

I'm writing C++ software based on OpenCV to analyse a large quantity of large TIFF images - feature extraction and characterisation. My current approach is to attempt to split each image into smaller cropped sections, analyse each one individually and amalgamate the results. I am attempting to use Image Magick convert function to do the cropping, with flags:
convert input.tif -crop 4000x4000 +repage -scene 0 "output%d.tif"
This, however, runs for hours, utilising 100% of RAM and 100% of IO throughput without producing any output.
The image details as given by magick's identify input.tif are:
input.tif TIFF64 63488x49408 63488x49408+0+0 8-bit sRGB 9.4112GB 0.000u 0:00.031
The same instruction ran on a smaller (roughly 20k by 20k) downsampled version of the image returns output within 5 seconds.
The computer has 8GB RAM, windows 8 x64 and a 2.00GHz CPU.
Would anyone be able to advise me on how to establish what is going wrong? Otherwise, could anyone advise me on an alternative way to tackle this problem?
EDIT 1: More info
c:\test>identify -list resource
Resource limits:
Width: 214.7MP
Height: 214.7MP
Area: 14.888GP
Memory: 6.9326GiB
Map: 13.865GiB
Disk: unlimited
File: 1536
Thread: 4
Throttle: 0
Time: unlimited
I think my other answer here should help you considerably, but one other thing you could check, is that you are allowing ImageMagick to use your 8GB of RAM. Try this command
identify -list resource
Resource limits:
Width: 214.7MP
Height: 214.7MP
Area: 4.295GP
Memory: 2GiB <---
Map: 4GiB
Disk: unlimited
File: 192
Thread: 1
Throttle: 0
Time: unlimited
and check the Memory option.
If it is low, you can increase it on the command line with something like this
convert -limit memory 6GiB ...
or with an environment variable like this:
export MAGICK_MEMORY_LIMIT=8000000000
identify -list resource
Resource limits:
Width: 214.7MP
Height: 214.7MP
Area: 4.295GP
Memory: 7.4506GiB <---
Map: 4GiB
Disk: unlimited
File: 192
Thread: 1
Throttle: 0
Time: unlimited