Dealing with large TIFF images C++/Magick/libtiff - c++

I'm writing C++ software based on OpenCV to analyse a large quantity of large TIFF images - feature extraction and characterisation. My current approach is to attempt to split each image into smaller cropped sections, analyse each one individually and amalgamate the results. I am attempting to use Image Magick convert function to do the cropping, with flags:
convert input.tif -crop 4000x4000 +repage -scene 0 "output%d.tif"
This, however, runs for hours, utilising 100% of RAM and 100% of IO throughput without producing any output.
The image details as given by magick's identify input.tif are:
input.tif TIFF64 63488x49408 63488x49408+0+0 8-bit sRGB 9.4112GB 0.000u 0:00.031
The same instruction ran on a smaller (roughly 20k by 20k) downsampled version of the image returns output within 5 seconds.
The computer has 8GB RAM, windows 8 x64 and a 2.00GHz CPU.
Would anyone be able to advise me on how to establish what is going wrong? Otherwise, could anyone advise me on an alternative way to tackle this problem?
EDIT 1: More info
c:\test>identify -list resource
Resource limits:
Width: 214.7MP
Height: 214.7MP
Area: 14.888GP
Memory: 6.9326GiB
Map: 13.865GiB
Disk: unlimited
File: 1536
Thread: 4
Throttle: 0
Time: unlimited

I think my other answer here should help you considerably, but one other thing you could check, is that you are allowing ImageMagick to use your 8GB of RAM. Try this command
identify -list resource
Resource limits:
Width: 214.7MP
Height: 214.7MP
Area: 4.295GP
Memory: 2GiB <---
Map: 4GiB
Disk: unlimited
File: 192
Thread: 1
Throttle: 0
Time: unlimited
and check the Memory option.
If it is low, you can increase it on the command line with something like this
convert -limit memory 6GiB ...
or with an environment variable like this:
export MAGICK_MEMORY_LIMIT=8000000000
identify -list resource
Resource limits:
Width: 214.7MP
Height: 214.7MP
Area: 4.295GP
Memory: 7.4506GiB <---
Map: 4GiB
Disk: unlimited
File: 192
Thread: 1
Throttle: 0
Time: unlimited

Related

Solaris 12.3 C++ compiler out of memory

I have a swig generated C++ code file of 24MB, nearly 5,00,000 lines of code. I am able to compile it when set the compiler Optimization level to xO0,but fails as soon as i add any other C++ compiler flags(like xprofile ...). I am using Solaris Studio 12.3 C++ compiler.
Below is the console error:
Element size (in bytes): 48
Table size (in elements): 2560000
Table maximum size: 134217727
Table size increment: 5000
Bytes written to disk: 0
Expansions required: 9
Segments used: 1
Max Segments used: 1
Max Segment offset: 134217727
Segment offset size:: 27
Resizes made: 0
Copies due to expansions: 4
Reset requests: 0
Allocation requests: 2827527
Deallocation requests: 267537
Allocated element count: 4086
Free element count: 2555914
Unused element count: 0
Free list size (elements): 0
ir2hf: error: Out of memory
Thanks in Advance.
I found this article suggesting that it has to do with the fact that Solaris the amount of memory for data segments.
Following the steps in the blog, try to remove the limit.
$ usermod -K defaultpriv=basic,sys_resource karel
Now logoff and logon again and change the limit:
$ ulimit -d unlimited
Then check that the limit has changed
$ ulimit -d
The output should be unlimited

Convert VERY large ppm files to JPEG/JPG/PNG?

So I wrote a C++ program that produces very high resolution pictures (fractals).
I use fstream to save all the data in a .ppm file.
Everything works fine, but when I go into really high resolution (38400x21600) the ppm file has ~8 Gigabytes.
With my 16 Gigabytes of Ram, however, I am still not able to convert that picture. I downloaded couple of converters, but they couldn't handle it. Even Gimp crashed when I try to "export as...".
So, does anyone know a good converter that can handle really large ppm files? In fact, I even want to go above 100 Gigabytes. I don't care if it's slow, it should just work.
If there is no such converter: Is there a way to std::ofstream in a better way? Like maybe, is there a library that automaticly produces a PNG file?
Thanks for you help !
Edit: also I asked myself what might be the best format for saving these large images. I researched and JPEG looks quite pretty (small size, still good quality). But may be there a better format? Let me know. Thanks
A few thoughts...
An 8-bit PPM file of 38400x21600 should take 2.3GB. A 16-bit PPM file of the same dimensions requires twice as much, i.e. 4.6GB so I am not sure where you got 8GB from.
VIPS is excellent for processing large images, and if I take a 38400x21600 PPM file, and use the following command in Terminal (i.e. at the command-line), I can see it peaks at 58MB of RAM to do the conversion from PPM to JPEG...
vips jpegsave fractal.ppm fractal.jpg --vips-leak
memory: high-water mark 58.13 MB
That takes 31 seconds on a reasonable spec iMac and produces a 480MB file from my (random) data, so you would expect your result to be much smaller, since mine is pretty incompressible.
ImageMagick, on the other hand, takes 1.1GB peak working set of memory and does the same conversion in 74 seconds:
/usr/bin/time -l convert fractal.ppm fractal.jpg
73.81 real 69.46 user 4.16 sys
11616595968 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
4051124 page reclaims
4 page faults
0 swaps
0 block input operations
106 block output operations
0 messages sent
0 messages received
0 signals received
9 voluntary context switches
11791 involuntary context switches
Go to the Baby X resource compiler and download the JPEG encoder, savejpeg.c. It takes an rgb buffer which has to be flat in memory. Hack into it and replace with a version that accepts a stream of 16x16 blocks. Then write your own ppm loader that loads in a 16 pixel high strip at a time.
Now the system will scale up to huge images which don't fit in memory. How you're going to display them I don't know. But the JPEG will be to specification.
https://github.com/MalcolmMcLean/babyxrc
I'd suggest that a more efficient and faster solution would be to simply get more RAM - 128GB is not prohibitively expensive these days (or add swap space).

Not enough space to cache rdd in memory warning

I am running a spark job, and I got Not enough space to cache rdd_128_17000 in memory warning. However, in the attached file, it obviously saying only 90.8 G out of 719.3 G is used. Why is that? Thanks!
15/10/16 02:19:41 WARN storage.MemoryStore: Not enough space to cache rdd_128_17000 in memory! (computed 21.4 GB so far)
15/10/16 02:19:41 INFO storage.MemoryStore: Memory use = 4.1 GB (blocks) + 21.2 GB (scratch space shared across 1 thread(s)) = 25.2 GB. Storage limit = 36.0 GB.
15/10/16 02:19:44 WARN storage.MemoryStore: Not enough space to cache rdd_129_17000 in memory! (computed 9.4 GB so far)
15/10/16 02:19:44 INFO storage.MemoryStore: Memory use = 4.1 GB (blocks) + 30.6 GB (scratch space shared across 1 thread(s)) = 34.6 GB. Storage limit = 36.0 GB.
15/10/16 02:25:37 INFO metrics.MetricsSaver: 1001 MetricsLockFreeSaver 339 comitted 11 matured S3WriteBytes values
15/10/16 02:29:00 INFO s3n.MultipartUploadOutputStream: uploadPart /mnt1/var/lib/hadoop/s3/959a772f-d03a-41fd-bc9d-6d5c5b9812a1-0000 134217728 bytes md5: qkQ8nlvC8COVftXkknPE3A== md5hex: aa443c9e5bc2f023957ed5e49273c4dc
15/10/16 02:38:15 INFO s3n.MultipartUploadOutputStream: uploadPart /mnt/var/lib/hadoop/s3/959a772f-d03a-41fd-bc9d-6d5c5b9812a1-0001 134217728 bytes md5: RgoGg/yJpqzjIvD5DqjCig== md5hex: 460a0683fc89a6ace322f0f90ea8c28a
15/10/16 02:42:20 INFO metrics.MetricsSaver: 2001 MetricsLockFreeSaver 339 comitted 10 matured S3WriteBytes values
This is likely to be caused by the configuration of spark.storage.memoryFraction being too low. Spark will only use this fraction of the allocated memory to cache RDDs.
Try either:
increasing the storage fraction
rdd.persist(StorageLevel.MEMORY_ONLY_SER) to reduce memory usage by serializing the RDD data
rdd.persist(StorageLevel.MEMORY_AND_DISK) to partially persist onto disk if memory limits are reached.
This could be due to the following issue if you're loading lots of avro files:
https://mail-archives.apache.org/mod_mbox/spark-user/201510.mbox/%3CCANx3uAiJqO4qcTXePrUofKhO3N9UbQDJgNQXPYGZ14PWgfG5Aw#mail.gmail.com%3E
With a PR in progress at:
https://github.com/databricks/spark-avro/pull/95
I have a Spark-based batch application (a JAR with main() method, not written by me, I'm not a Spark expert) that I run in local mode without spark-submit, spark-shell, or spark-defaults.conf. When I tried to use IBM JRE (like one of my customers) instead of Oracle JRE (same machine and same data), I started getting those warnings.
Since the memory store is a fraction of the heap (see the page that Jacob suggested in his comment), I checked the heap size: IBM JRE uses a different strategy to decide default heap size and it was too small, so I simply added appropriate -Xms and -Xmx params and the problem disappeared: now the batch works fine both with IBM and Oracle JRE.
My usage scenario is not typical, I know, however I hope this can help someone.

Windows Get list of ALL files on volume with size

Question: how to list all files on volume with size they occupy on disk?
Applicable solutions:
cmd script
free tool with sqlite/txt/xls/xml/json output
C++ / winapi code
The problem:
There are many tools and apis to list files, but their results dont match chkdsk and actual free space info:
Size Count (x1000)
chkdsk c: 67 GB 297
dir /S 42 GB 267
FS Inspect 47 GB 251
Total Commander (Ctrl+L) 47 GB 251
explorer (selection size) 44 GB 268
explorer (volume info) 67 GB -
WinDirStat 45 GB 245
TreeSize couldn't download it - site unavailable
C++ FindFirstFile/FindNextFile 50 GB 288
C++ GetFileInformationByHandleEx 50 GB 288
Total volume size is 70 GB, about 3 GB is actually free.
I'm aware of:
File can occupy on disk, more than its actual size, i need the size it occupies (i.e. greater one)
Symlinks, Junctions etc - that would be good to see them (though i don't think this alone can really give 20 GB difference in my case)
Filesystem uses some space for indexes and system info (chkdisk shows negligible, don't give 20 GB)
I run all tools with admin privileges, hidden files are shown.
FindFirstFile/FindNextFile C++ solution - this dont give correct results, i don't know because of what, but this gives the same as Total commander NOT the same as chkdsk
Practical problem:
I have 70 GB SSD disk, all the tools report about 50 GB is occupied, but in fact it's almost full.
Format all and reinstall - is not an option since this will happens again quite soon.
I need a report about filesizes. Report total must match actual used and free space. I'm looking for an existing solution - a tool, a script or a C++ library or C++ code.
(Actual output below)
chkdsk c:
Windows has scanned the file system and found no problems.
No further action is required.
73715708 KB total disk space.
70274580 KB in 297259 files.
167232 KB in 40207 indexes.
0 KB in bad sectors.
463348 KB in use by the system.
65536 KB occupied by the log file.
2810548 KB available on disk.
4096 bytes in each allocation unit.
18428927 total allocation units on disk.
702637 allocation units available on disk.
dir /S
Total Files Listed:
269966 File(s) 45 071 190 706 bytes
143202 Dir(s) 3 202 871 296 bytes free
FS Inspect http://sourceforge.net/projects/fs-inspect/
47.4 GB 250916 Files
Total Commander
49709355k, 48544M 250915 Files
On a Posix system, the answer would be to use the stat function. Unfortunately, it does not give the number of allocated blocs in Windows so it does not meet your requirements.
The correct function from Windows API is GetFileInformationByHandleEx. You can use FindFirstFile, FindNextFile to browse the full disk, and ask for FileStandardInfo to get a FILE_STANDARD_INFO that contains for a file (among other fields): LARGE_INTEGER AllocationSize for the allocated size and LARGE_INTEGER EndOfFile for the used size.
Alternatively, you can use directly GetFileInformationByHandleEx on directories, asking for FileIdBothDirectoryInfo to get a FILE_ID_BOTH_DIR_INFO structure. This allows you to get information on many files in a single call. My advice would be to use that one, even if it is of less common usage.
To get list of all files (including hidden and system files), sorted within directories with descending size, you can go to your cmd.exe and type:
dir /s/a:-d/o:-s C:\* > "list_of_files.txt"
Where:
/s lists files within the specified directory and all subdirectories,
/a:-d lists only files (no directories),
/o:-s put files within directory in descending size order,
C:\* means all directories on disk C,
> "list_of_files.txt" means save output to list_of_files.txt file
Listing files grouped by directory may be a little inconvenient, but it's the easiest way to list all files. For more information, take a look at technet.microsoft.com
Checked on Win7 Pro.

Out of memory opencv haartraining

I have 30 positive images and 60 negatives images.
When I tried to execute the haartraining with 4GB of memory and Quadcore processor machine, I get this error message:
OpenCV ERROR: Insufficient memory (Out of memory)
in function cvAlloc, cxalloc.cpp(111)
Terminating the application...
called from cvUnregisterType, cxpersistence.cpp(4933)
The command is:
./opencv-haartraining -vec vector/myvector.vec -bg negatives.txt -npos 24 -nneg 55 -mem 2048 -mode ALL -w 86 -h 150
The computer has only 765 MB used, but the process exceeds the given limit, and uses a lot of memory in swap until the overflow occurs. Any suggestions of what can be done to solve this problem?
Regards
Maybe your "myvector.vec" is too big. All these pictures are loaded to RAM.
Try to resize the images.