Distribute the nodes of an Elasticache cluster across different AZs? - amazon-web-services

The following list shows there are 10 shards in the Elasticache cluster. And each cluster has two nodes, they are all in one REplicationGroup. Auto-failover is enabled and Multi-AZ is disabled.
However, only us-east-1a and us-east-1a are used. us-east-1c and us-east-1d are not used. How to evenly distribute the nodes to the four AZs? (AWS cli or console)
"CacheClusterId","PreferredAvailabilityZone","ReplicationGroupId"
'abcde1234-0001-001', 'us-east-1a', 'abcde1234'
'abcde1234-0002-001', 'us-east-1a', 'abcde1234'
'abcde1234-0001-002', 'us-east-1b', 'abcde1234'
'abcde1234-0002-002', 'us-east-1b', 'abcde1234'
'abcde1234-0003-001', 'us-east-1b', 'abcde1234'
'abcde1234-0003-002', 'us-east-1a', 'abcde1234'
'abcde1234-0004-001', 'us-east-1b', 'abcde1234'
'abcde1234-0004-002', 'us-east-1a', 'abcde1234'
'abcde1234-0005-001', 'us-east-1a', 'abcde1234'
'abcde1234-0005-002', 'us-east-1b', 'abcde1234'
'abcde1234-0006-001', 'us-east-1b', 'abcde1234'
'abcde1234-0006-002', 'us-east-1a', 'abcde1234'
'abcde1234-0007-001', 'us-east-1a', 'abcde1234'
'abcde1234-0007-002', 'us-east-1b', 'abcde1234'
'abcde1234-0008-001', 'us-east-1b', 'abcde1234'
'abcde1234-0008-002', 'us-east-1a', 'abcde1234'
'abcde1234-0009-001', 'us-east-1b', 'abcde1234'
'abcde1234-0009-002', 'us-east-1a', 'abcde1234'
'abcde1234-0010-001', 'us-east-1a', 'abcde1234'
'abcde1234-0010-002', 'us-east-1b', 'abcde1234'
Should Multi-AZ be enabled?

Related

Can anyone trace this program to help me better grasp how recursion works

The program solves the Tower of Hanoi puzzle. The objective of the puzzle is to move an entire stack of disks to another rod, obeying the following simple rules:
Only one disk can be moved at a time.
Each move consists of taking the upper disk from one of the stacks and placing it on top of another stack or on an empty rod.
No larger disk may be placed on top of a smaller disk.
With 3 disks, the puzzle can be solved in 7 moves. The minimal number of moves required to solve a Tower of Hanoi puzzle is 2^n − 1, where n is the number of disks.
#include <stdio.h>
void tower(int n, char start, char end, char help)
{
if (n == 0)
{
return;
}
tower(n - 1, start, help, end);
printf("\nDisk %d has been moved from tower %c to tower %c", n, start,
end);
tower(n - 1, help, end, start);
}
int main()
{
tower(3, 'A', 'C', 'B');
return 0;
}`
In many environments running through a debugger just in not available. In particular embedded systems, or jobs that are running for N days on production before an error happens.
In these scenarios, logging the flow of the program with a simple printf() or a more sophisticated logging function can be one of the only ways to work out what happened.
Similarly with tracing the recursive flow of execution, simply add a print to your function:
void tower(int n, char start, char end, char help)
{
printf("tower(n=%d, start=%c, end=%c, help=%c)\n", n, start, end, help);
...
Giving:
tower(n=3, start=A, end=C, help=B)
tower(n=2, start=A, end=B, help=C)
tower(n=1, start=A, end=C, help=B)
tower(n=0, start=A, end=B, help=C)
Disk 1 has been moved from tower A to tower C
tower(n=0, start=B, end=C, help=A)
Disk 2 has been moved from tower A to tower B
tower(n=1, start=C, end=B, help=A)
tower(n=0, start=C, end=A, help=B)
Disk 1 has been moved from tower C to tower B
tower(n=0, start=A, end=B, help=C)
Disk 3 has been moved from tower A to tower C
tower(n=2, start=B, end=C, help=A)
tower(n=1, start=B, end=A, help=C)
tower(n=0, start=B, end=C, help=A)
Disk 1 has been moved from tower B to tower A
tower(n=0, start=C, end=A, help=B)
Disk 2 has been moved from tower B to tower C
tower(n=1, start=A, end=C, help=B)
tower(n=0, start=A, end=B, help=C)
Disk 1 has been moved from tower A to tower C
tower(n=0, start=B, end=C, help=A)
There are also the handy compiler macros __FILE__, __FUNCTION__ and __LINE__ (and a couple more, depends on your compiler). Which can be embedded into log/print statements:
printf( "Something eldritch happened in %s at %s:%d\n", __FUNCTION__, __FILE__, __LINE__ );

Why does this Deque destructor have memory leak

I use doubly linked list to implement Deque in C++.
Destructor:
Deque::~Deque()
{
while (this->left_p)
{
node *temp = this->left_p;
this->left_p = this->left_p->next;
delete temp;
}
this->right_p = NULL;
}
when i use valgrind --leak-check=full ./a.out to check memory leak just to test my destructor` I got the following output:
==2636==
==2636== HEAP SUMMARY:
==2636== in use at exit: 72,704 bytes in 1 blocks
==2636== total heap usage: 1,003 allocs, 1,002 frees, 97,760 bytes allocated
==2636==
==2636== 72,704 bytes in 1 blocks are still reachable in loss record 1 of 1
==2636== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2636== by 0x4EC3EFF: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==2636== by 0x40106B9: call_init.part.0 (dl-init.c:72)
==2636== by 0x40107CA: call_init (dl-init.c:30)
==2636== by 0x40107CA: _dl_init (dl-init.c:120)
==2636== by 0x4000C69: ??? (in /lib/x86_64-linux-gnu/ld-2.23.so)
==2636==
==2636== LEAK SUMMARY:
==2636== definitely lost: 0 bytes in 0 blocks
==2636== indirectly lost: 0 bytes in 0 blocks
==2636== possibly lost: 0 bytes in 0 blocks
==2636== still reachable: 72,704 bytes in 1 blocks
==2636== suppressed: 0 bytes in 0 blocks
==2636==
==2636== For counts of detected and suppressed errors, rerun with: -v
==2636== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
I can't figure out why there is still ONE out of 1003 allocs not being free.
Why do i have one memory leak? what is wrong with my destructor?
Test code here:
/* Deque Test Program 6 */
#include <cstring>
#include <iostream>
#include "Deque.h"
using namespace std ;
int main (int argc, char * const argv[]) {
cout << "\n\nDeque Class Test Program 6 - START\n\n";
// Make a Deque
Deque * dq1 = new Deque();
for( int i = 0 ; i<1 ; i++ ){
dq1->push_left(1);
// dq1->display();
}
cout << "Size=" << dq1->size() << endl ;
// The destructor should delete all the nodes.
delete dq1 ;
cout << "\n\nDeque Class Test Program 6 - DONE\n\n";
return 0;
}
edit: remove implementation code.
Essentially, it's not your code's fault, it's valgrind's.
Check this other question that has had the same problem:
Valgrind: Memory still reachable with trivial program using <iostream>
Quoting from the post:
First of all: relax, it's probably not a bug, but a feature. Many implementations of the C++ standard libraries use their own memory pool allocators. Memory for quite a number of destructed objects is not immediately freed and given back to the OS, but kept in the pool(s) for later re-use. The fact that the pools are not freed at the exit of the program cause Valgrind to report this memory as still reachable. The behaviour not to free pools at the exit could be called a bug of the library though.
Hope that helps :)
The memory leak reported by valgrind does not appear to be in your code:
==2636== 72,704 bytes in 1 blocks are still reachable in loss record 1 of 1
==2636== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2636== by 0x4EC3EFF: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21)
==2636== by 0x40106B9: call_init.part.0 (dl-init.c:72)
==2636== by 0x40107CA: call_init (dl-init.c:30)
==2636== by 0x40107CA: _dl_init (dl-init.c:120)
This appears to be a heap allocation from within a constructor of a global object. (In theory, it could still come from your code if operator new is called as a tail call, so that it does not show up in the backtrace, but I don't see such an object declaration in your cdoe.)
It is also not an actual leak, it is just some data allocated on the heap at program start. If you install debugging information for libstdc++, then you might get a hint of what is actually being allocated. Then you could also set a breakpoint on call_init and step through the early process initialization, to see the constructors that are called.

copy a file in safe and efficient way in thread

I'm trying to copy the contents of tempFile to CacheFile in a thread in safe and
efficient way.
what is myscript takes 1-2 min times to write to a file (ie. tempFile) and after sucessfull .I need to copy to cacheFile so who ever call the function (getStudentDetails) will get serve from cachefile rather than tempFile to avoid the delay(though first call always will be delay for 1-2 min since cacheFIle will empty).
Note: I NEED TO USE FILE NOT PRIMARY STORAGE
Below is my approach it working fine but can it be more efficient and safe?
bool unlock=true;
pthread_mutex_t count_mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t condition_var = PTHREAD_COND_INITIALIZER;
void * runScriptAndUpdateCacheFile ( void * arg )
{
std::cout<<"Inside thread lock"<<endl;
unlock=false; // to avoid multiple call to this thread till it get complete
char tempFile[32]={0};
strncpy(tempFile,"/tmp/smbXXXXXX",14); // tempFile is the main file where my script will dump the output
if( mkstemp(tempFile) < 1)
{
cout<<"fail to create temp file"<<endl;
return (NULL);
}
if (tempFile != NULL)
{
char command[256] = {0};
sprintf(command, " myscript > %s", tempFile);
int status=system(command); // write ouput of script to tempFile
if (status < 0)
{
std::cout << "Error: " << strerror(errno) << '\n';
return(NULL);
}
else
{
if (WIFEXITED(status)) // fetch exit code
{
std::cout << "Program returned normally, exit code " << WEXITSTATUS(status) << '\n';
std::ifstream src(tempFile, std::ios::binary);
std::ofstream CacheFile((char*)arg, std::ios::binary);
Cache << src.rdbuf(); // Copy to cacheFile
}
else
{
std::cout << "Program exited abnormaly\n";
return(NULL);
}
}
}
cout<<" thread is unlock"<<endl;
unlock=true;
}
getStudentDetails can be called multiple times
void getStudentDetails()
{
pthread_t threads;
char CacheFile[32]={0};
strncpy(CacheFile,"/tmp/CacheFile",19);
//Serve from CacheFile rather than orignal file coz this file
//will take 1-2 min to dump in it
std::ifstream fin(CacheFile);
if(unlock) // to avoid multiple call to thread when multile call made to getStudentDetails
// this unlock is made false when it enter to thread and made true when exit
{
pthread_mutex_lock( &count_mutex );
int rc = pthread_create(&threads, NULL, runSmbtree, CacheFile); // pass cache file to thread this file will get updated with
// orignal file create by script.
pthread_mutex_unlock( &count_mutex );
}
std::string line;
while (getline(fin, line))
{
// reading Cachefile and serveing when getStudentDetails called by client
}
}
Valgrind complains with below error
=3535== HEAP SUMMARY:
==3535== in use at exit: 864 bytes in 3 blocks
==3535== total heap usage: 169 allocs, 166 frees, 78,323 bytes allocated
==3535==
==3535== Searching for pointers to 3 not-freed blocks
==3535== Checked 25,373,616 byte
==3535==
==3535== Thread 1:
==3535== 864 bytes in 3 blocks are possibly lost in loss record 1 of 1
==3535== at 0x4C2CC70: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==3535== by 0x4012E54: allocate_dtv (dl-tls.c:296)
==3535== by 0x4012E54: _dl_allocate_tls (dl-tls.c:460)
==3535== by 0x5359DA0: allocate_stack (allocatestack.c:589)
==3535== by 0x5359DA0: pthread_create##GLIBC_2.2.5 (pthread_create.c:500)
==3535==
==3535== LEAK SUMMARY:
==3535== definitely lost: 0 bytes in 0 blocks
==3535== indirectly lost: 0 bytes in 0 blocks
==3535== possibly lost: 864 bytes in 3 blocks
==3535== still reachable: 0 bytes in 0 blocks
==3535== suppressed: 0 bytes in 0 blocks
==3535==
==3535== ERROR SUMMARY: 3 errors from 2 contexts (suppressed: 0 from 0)
==3535==
==3535== 2 errors in context 1 of 2:
==3535== Thread 2:
==3535== Syscall param open(filename) points to unaddressable byte(s)
==3535== at 0x565B4CD: ??? (syscall-template.S:81)
==3535== by 0x55E9E07: _IO_file_open (fileops.c:228)
==3535== by 0x55E9E07: _IO_file_fopen##GLIBC_2.2.5 (fileops.c:333)
==3535== by 0x55DE2E3: __fopen_internal (iofopen.c:90)
==3535== by 0x4EB29BF: std::__basic_file::open(char const*, std::_Ios_Openmode, int) (in
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19)
==3535== by 0x4EEAEA9: std::basic_filebuf >::open(char const*, std::_Ios_Openmode) (in
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19)
==3535== by 0x4EEC747: std::basic_ofstream >::basic_ofstream(char const*,
std::_Ios_Openmode) (in
/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19)
==3535== by 0x40200B: runScriptAndUpdateCacheFile(void*) (in ***)
==3535== by 0x5359181: start_thread (pthread_create.c:312)
==3535== by 0x566A30C: clone (clone.S:111)
==3535== Address 0xffefffba0 is on thread 1's stack
==3535== 384 bytes below stack pointer
==3535==
==3535== ERROR SUMMARY: 3 errors from 2 contexts (suppressed: 0 from 0)

OpenCV Segmentation Fault during Feature Matching

This function extracts and computes the query.keypoints:
// extract features
analyzer->analyze(query);
like this:
bool Analyzer::analyze(Query &query) {
// detect keypoints
query.keypoints.clear();
assert(query.keypoints.empty());
detector->detect(query.grayImage, query.keypoints);
// no keypoints detected!
if (query.keypoints.empty()) {
cout << "no keypoints detected!" << endl;
return false;
}
// compute descriptors
query.descriptors.release();
assert(query.descriptors.empty());
extractor->compute(query.grayImage, query.keypoints, query.descriptors);
// note: keypoints for which a descriptor cannot be computed are removed
if (query.keypoints.empty()) {
cout << "cannot compute keypoints!" << endl;
return false;
}
}
after this i match the query.keypoints with the pattern/training keypoints:
// if analyze() is ok, match descriptors
analyzer->match(query);
Like this:
void Analyzer::match(Query &query) {
assert(!query.descriptors.empty());
// query.matches.clear();
matcher->match(query.descriptors, query.matches);
}
Now i want to analyze the matching set:
double max_dist = 200; double min_dist = 100;
//-- Quick calculation of max and min distances between keypoints
for( int i = 0; i < pattern->descriptors.rows; i++ ) {
double dist = query.matches[i].distance;
if( dist < min_dist ) min_dist = dist;
if( dist > max_dist ) max_dist = dist;
}
//-- Localize the object
std::vector<cv::Point2f> obj;
std::vector<cv::Point2f> scene;
for( int i = 0; i < pattern->descriptors.rows; i++ ) {
if( query.matches[i].distance <= max(2*min_dist, 0.02) ) {
// get keypoints from good matches
// SEGMENTATION FAULT HERE!
// query.matches[i].queryIdx seems to be negative? possible?
cout << query.keypoints[query.matches[i].queryIdx].pt <<endl;
// BASICALLY THIS FAILS
// obj.push_back( pattern->keypoints[ query.matches[i].queryIdx ].pt );
// scene.push_back( query.keypoints[ query.matches[i].trainIdx ].pt );
}
}
But each time i try, i get a Segmentation Fault in the code above on line
// SEGMENTATION FAULT HERE!
// query.matches[i].queryIdx seems to be negative? possible?
cout << query.keypoints[query.matches[i].queryIdx].pt <<endl;
i think that there are missing keypoints. so i am not able to retrieve query.keypoints by queryID.
cv::DMatch match = query.matches[i];
cout << match.queryIdx << endl;
cout << query.matches.size() << endl;
int queryID = match.queryIdx;
cv::KeyPoint test = query.keypoints[queryID]; // FAILS
cout << test.pt << endl;
What i am doing wrong?? I am stucked..! Please enlight me.. ;)
--- EDIT ---
Here is the Valgrind Memcheck output:
[Result] Features: 420; Matches: 181; Time(ms): 757.842
Creating Query instance..
-- Max dist : 224,000000
-- Min dist : 0,000000
==11770== Invalid read of size 4
==11770== at 0x41759C: cv::Point_<float>::Point_(cv::Point_<float> const&) (operations.hpp:1623)
==11770== by 0x41A166: cv::KeyPoint::KeyPoint(cv::KeyPoint const&) (features2d.hpp:69)
==11770== by 0x41AFA2: Controller::detectObject(om::Query&) (Controller.cpp:169)
==11770== by 0x41B9B7: Controller::displayFunction(cv::Mat&, cv::Mat&) (Controller.cpp:261)
==11770== by 0x414909: processVideo() (App.cpp:109)
==11770== by 0x414C4F: main (App.cpp:144)
==11770== Address 0x2c9d9620 is not stack'd, malloc'd or (recently) free'd
==11770==
==11770==
==11770== Process terminating with default action of signal 11 (SIGSEGV)
==11770== Access not within mapped region at address 0x2C9D9620
==11770== at 0x41759C: cv::Point_<float>::Point_(cv::Point_<float> const&) (operations.hpp:1623)
==11770== by 0x41A166: cv::KeyPoint::KeyPoint(cv::KeyPoint const&) (features2d.hpp:69)
==11770== by 0x41AFA2: Controller::detectObject(om::Query&) (Controller.cpp:169)
==11770== by 0x41B9B7: Controller::displayFunction(cv::Mat&, cv::Mat&) (Controller.cpp:261)
==11770== by 0x414909: processVideo() (App.cpp:109)
==11770== by 0x414C4F: main (App.cpp:144)
==11770== If you believe this happened as a result of a stack
==11770== overflow in your program's main thread (unlikely but
==11770== possible), you can try to increase the size of the
==11770== main thread stack using the --main-stacksize= flag.
==11770== The main thread stack size used in this run was 8388608.
==11770==
==11770== HEAP SUMMARY:
==11770== in use at exit: 37,488,211 bytes in 27,473 blocks
==11770== total heap usage: 121,456 allocs, 93,983 frees, 60,575,106 bytes allocated
==11770==
==11770== LEAK SUMMARY:
==11770== definitely lost: 16,496 bytes in 35 blocks
==11770== indirectly lost: 2,140,260 bytes in 623 blocks
==11770== possibly lost: 22,477,384 bytes in 1,566 blocks
==11770== still reachable: 12,635,159 bytes in 24,368 blocks
==11770== suppressed: 0 bytes in 0 blocks
==11770== Rerun with --leak-check=full to see details of leaked memory
==11770==
==11770== For counts of detected and suppressed errors, rerun with: -v
==11770== Use --track-origins=yes to see where uninitialised values come from
==11770== ERROR SUMMARY: 1502 errors from 47 contexts (suppressed: 0 from 0)
Killed
with --leak-check=full
==11938== Process terminating with default action of signal 11 (SIGSEGV)
==11938== Access not within mapped region at address 0x2C774CA4
==11938== at 0x41759C: cv::Point_<float>::Point_(cv::Point_<float> const&) (operations.hpp:1623)
==11938== by 0x41A166: cv::KeyPoint::KeyPoint(cv::KeyPoint const&) (features2d.hpp:69)
==11938== by 0x41AFA2: Controller::detectObject(om::Query&) (Controller.cpp:169)
==11938== by 0x41B9B7: Controller::displayFunction(cv::Mat&, cv::Mat&) (Controller.cpp:261)
==11938== by 0x414909: processVideo() (App.cpp:109)
==11938== by 0x414C4F: main (App.cpp:144)
==11938== If you believe this happened as a result of a stack
==11938== overflow in your program's main thread (unlikely but
==11938== possible), you can try to increase the size of the
==11938== main thread stack using the --main-stacksize= flag.
==11938== The main thread stack size used in this run was 8388608.
==11938==
==11938== HEAP SUMMARY:
==11938== in use at exit: 35,574,636 bytes in 27,138 blocks
==11938== total heap usage: 97,784 allocs, 70,646 frees, 58,484,719 bytes allocated

Memory allocation failed even when there is still enough memory

I am working on Linux (ubuntu 13.04 exactly) and currently I have a question: Why memory allocation will fail even when there is still enough memory?
I wrote a simple test application today and I encountered this issue when running this test app. Below is the code snippet I used to have the test:
#include <stdio.h>
#include <unistd.h>
#include <list>
#include <vector>
#include <strings.h>
using namespace std;
unsigned short calcrc(unsigned char *ptr, int count)
{
unsigned short crc;
unsigned char i;
//high cpu-consumption code
//implements CRC algorithm: Cylic
//Redundancy code
}
void* CreateChild(void* param){
vector<unsigned char*> MemoryVector;
pid_t PID = fork();
if (PID == 0){
const int MEMORY_TO_ALLOC = 1024 * 1024;
unsigned char* buffer = NULL;
while(1){
buffer = NULL;
try{
buffer = new unsigned char [MEMORY_TO_ALLOC]();
calcrc(buffer, MEMORY_TO_ALLOC );
MemoryVector.push_back(buffer);
} catch(...){
printf("an exception was thrown!\n");
continue;
} //try ... catch
} //while
} // if pid == 0
return NULL;
}
int main(){
int children = 4;
while(--children >= 0){
CreateChild(NULL);
};
while(1) sleep(3600);
return 0;
}
During my test, the above code starts throwing exception when there is around 220M RAM available. And from the moment on, it looks like the application is not able to get more memory any more
because the free memory shown by TOP command remains to be above 210M. So why would this happen?
UPDATE
1. Software && Hardware Information
The RAM is 4G and swap is around 9G bytes. Running "uname -a" gives: Linux steve-ThinkPad-T410 3.8.0-30-generic #44-Ubuntu SMP Thu Aug 22 20:54:42 UTC 2013 i686 i686 i686 GNU/Linux
2. Statistic Data during the Test
Right after Test App Starts Throwing Exception
steve#steve-ThinkPad-T410:~$ free
total used free shared buffers cached
Mem: 3989340 3763292 226048 0 2548 79728
-/+ buffers/cache: 3681016 308324
Swap: 9760764 9432896 327868
10 minutes after Test App Starts Throwing Exception
steve#steve-ThinkPad-T410:~$ free
total used free shared buffers cached
Mem: 3989340 3770808 218532 0 3420 80632
-/+ buffers/cache: 3686756 302584
Swap: 9760764 9436168 324596
20 minutes after Test App Starts Throwing Exception
steve#steve-ThinkPad-T410:~$ free
total used free shared buffers cached
Mem: 3989340 3770960 218380 0 4376 104716
-/+ buffers/cache: 3661868 327472
Swap: 9760764 9535700 225064
40 minutes after Test App Starts Throwing Exception
steve#steve-ThinkPad-T410:~$ free
total used free shared buffers cached
Mem: 3989340 3739168 250172 0 2272 139108
-/+ buffers/cache: 3597788 391552
Swap: 9760764 9556292 204472
May be you have no more 1MB sequential memory pages in your address space. you have free space fragmentation.
During my test, the above code starts throwing exception when there is around 220M memory. And from the moment on, it looks like the application is not able to get more memory any more because the free memory shown by TOP command remains to be above 210M. So why would this happen?
The output of top is updated every N seconds (configured), and doesn't really show the current status.
On the other hand, memory allocation is super fast.
What happens is your program eats memory, and at certain point (when top shows 200 Mb free) it starts failing.
You are running on x86-32 so your processes are 32-bit. Even with more memory + swap they will be limited by their address space, run:
grep “HIGHMEM” /boot/config-`uname -r`
grep “VMSPLIT” /boot/config-`uname -r`
to see how your kernel is configured.
Maybe your 4 child processes are limited to 3G and are using 12G + other processes ~700M to reach the numbers you are seeing.
EDIT:
So your kernel is configured to give each user-space process 3G of address-space, of which some will be taken up with the program, libraries and initial runtime memory (which will be shared due to the fork).
Therefore you have 4 children using ~3G each - ~12G + ~780M of other programs. That leaves about 220M free once the children start reporting errors.
You could just run another child process, or you could reinstall with AND64/x86-64 version of Ubuntu, where each process will be able to allocate much more memory.