Python program unexpectidely dies - python-2.7

I am training an LSTM network using lasagne, but my program dies without any error during the training. If I use a much smaller dataset ( only 13 data points) the entire code runs.
I was initially using my GPU and thought that the memory was running out. But even though I used CPU It dies.
import resource
rsrc = resource.RLIMIT_DATA
resource.setrlimit(rsrc,(1024, resource.RLIM_INFINITY ))
My final attempt was to try this but to no avail.

Related

pytorch dataloader stucked if using opencv resize method

I can run all the cells of the tutorial notebook of Pytorch about dataloading (pytorch tutorial).
But when I use OpenCV in place of Skimage to resize the image, the dataloader gets stuck, i.e nothing happens.
In the Rescale class:
class Rescale(object):
.....
def __call__(self, sample):
....
#img = transform.resize(image, (new_h, new_w))
img = cv2.resize(image, (new_h, new_w))
.....
The dataloader and the for loop are defined with:
dataloader = DataLoader(transformed_dataset, batch_size=4,
shuffle=True, num_workers=4)
for i_batch, sample_batched in enumerate(dataloader):
print(i_batch, sample_batched['image'].size(),
sample_batched['landmarks'].size())
I can get the iterator to print something if num_workers=0. It looks like opencv does not play well with the Multiprocessing of pytorch.
I would really prefer to use same package to transform the images at train time and test time (and I am already using OpenCV for the image rescale at test time).
Any suggestions would be greatly appreciated.
I had a very similar problem and that's how I solved it:
when you import cv2 set cv2.setNumThreads(0)
and then you can set num_workers>0 in the dataloader in PyTorch.
Seems like OpenCV tries to multithread and somewhere something goes into a deadlock.
Hope it helps.
Except for cv2.setNumThreads(0),
import multiprocessing
multiprocessing.set_start_method('spawn')
can also solve this problem.
From Pytorch issues these five may help (not recommended):
1. time.sleep(0.003)
2. pin_memory = True/False
3. num_workers = 0/1
4. from torch.utils.data.dataloader import DataLoader
5. writing 8192 to /proc/sys/kernel/shmmni
OpenCV and Pytorch multiprocessing don't play well together, sometimes. When running a code with OpenCV functions calls embedded in a homebrewed function parallelized in a "multiprocessing" pool, the code eventually ends up with idle processors after several calls to the pool than can fluctuate from run to run.
Forking could be the problem, forking only clones the current thread. In this case, the thread pool may wrongly assume that it has more threads than it has.
The code is waiting infinitely for a condition never signaled because the number of threads is not as expected after forking.
Spawning resets the memory after forking, forcing a re-initialization of all data structures (as the thread pool) within OpenCV.
More discussion in Github of OpenCV issues and Pytorch issues.

fine tuning vgg raise memory error

Hi i'm trying to fine tuning vgg on my problem but when i try to train the net i get this error.
OOM when allocating tensor with shape[25088,4096]
The net has this structure:
I take this tensorflow pretrained vgg implementation code from this site.
I only add this procedure to train the net:
with tf.name_scope('joint_loss'):
joint_loss = ya_loss+yb_loss+yc_loss+yd_loss+ye_loss+yf_loss+yg_loss+yh_loss+yi_loss+yl_loss+ym_loss+yn_loss
# Loss with weight decay
l2_loss = tf.add_n([tf.nn.l2_loss(v) for v in tf.trainable_variables()])
self.joint_loss = joint_loss + self.weights_decay * l2_loss
self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(joint_loss)
i try to reduce the batch size to 2 but not works i get the same error. The error is due to the big tensor that cannot be allocated in memory. I get this error only in train cause if i feed a value without minimize the net works. How i can avoid this error? how can i save memory of graphic card(Nvidia GeForce GTX 970)?
UPDATE: if i use the GradientDescentOptimizer the training process start, instead if i use AdamOptimizer i get the memory error, seems that the GradientDescentOptimizer use less memory.
Without a backward pass ("feed a value without minimizing"), TensorFlow can immediately de-allocate intermediate activations. With a backward pass, the graph has a giant U-shape, where activations from the forward pass need to be kept in memory for the backward pass. There are some tricks (such as swapping to host memory), but in general backprop means that memory usage will be higher.
Adam does keep some extra bookkeeping variables around, so it will increase memory usage proportional to the amount of memory your weight variables are already using. If your training steps take quite a while (in which case having the variable updates on the GPU isn't important), you could instead locate the optimization ops in host memory.
If you need a larger batch size and can't reduce image resolution or model size, combining gradients from multiple workers/GPUs using something like SyncReplicasOptimizer can be a good option. Looking at the paper associated with this model, it looks like they were training on 4 GPUs each with 12GB of memory.

Python 2.7 : How to track declining RAM?

Data is updated every 5 min. Every 5 min a python script I wrote is run. This data is related to signals, and when the data says a signal is True, then the signal name is shown in a PyQt Gui that I have.
In other words, the Gui is always on my screen, and every 5 min its "main" function is triggered and the "main" function's job is to check the database of signals against the newly downloaded data. I leave this GUI open for hours and days at a time and the computer always crashes. Random python modules get corrupted (pandas can't import this or numpy can't import that) and I have to reinstall python and all the packages.
I have a hypothesis that this is related to the program being open for a long time and using up more and more memory which eventually crashes the computer when the memory runs out.
How would I test this hypothesis? If I can just show that with every 5-min run the available memory decreases, then it would suggest that my hypothesis might be correct.
Here is the code that reruns the "main" function every 5 min:
class Editor(QtGui.QMainWindow):
# my app
def main():
app = QtGui.QApplication(sys.argv)
ex = Editor()
milliseconds_autocheck_frequency = 300000 # number of milliseconds in 5 min
timer = QtCore.QTimer()
timer.timeout.connect(ex.run)
timer.start(milliseconds_autocheck_frequency)
sys.exit(app.exec_())
if __name__ == '__main__':
main()

Text classification process kills when I am using linear SVM for 10000 rows

I am programming in python 2.7 with NLTK library for both text prepossessing and classification in sentiment analysis. I am using nltk wrapper of scikit-learn algorithms. bellow code is after prepossessing and separation to train and test sets.
from nltk.classify.scikitlearn import SklearnClassifier
from sklearn.svm import SVC, LinearSVC, NuSVC
training_set = nltk.classify.util.apply_features(extractFeatures, trainTweets)
testing_set = nltk.classify.util.apply_features(extractFeatures, testTweets)
#LinearSVC
LinearSVC_classifier = SklearnClassifier(LinearSVC())
LinearSVC_classifier.train(training_set)
LinearSVCAccuracy = nltk.classify.accuracy(LinearSVC_classifier, testing_set)*100
print "LinearSVC accuracy percentage:" + str(LinearSVCAccuracy)
it works fine when the number of rows are like 4000 tweets for training, but when it increases to for example 10000 tweets, possess is getting killed with following error.
Memory cgroup out of memory: Kill process 24293 (python) score 848 or
sacrifice child
Killed process 24293, UID 29091, (python) total-vm:14569168kB,
anon-rss:14206656kB, file-rss:3412kB
Clocksource tsc unstable (delta = -17179861691 ns). Enable clocksource
failover by adding clocksource_failover kernel parameter.
RAM of my pc is 8 Gig but I even tried with 16 Gig RAM and still has problem. How can I Classify this amount for tweets without any problem?
Which OS are you running? Which python distribution? Try to install cython and/or using scikit-learn directly. Have a look at scikit-learn optimization techniques

Getting OpenCV Error: Insufficient memory while running OpenCV Sample Program: "stitching_detailed.cpp"

I recently starting working with OpenCV with the intent of stitching large amounts of images together to create massive panoramas. To begin my experimentation, I looked into the sample programs that come with the OpenCV files to get an idea about how to implement the OpenCV libraries. Since I was interested in image stitching, I went straight for the "stitching_detailed.cpp." The code can be found at:
https://code.ros.org/trac/opencv/browser/trunk/opencv/samples/cpp/stitching_detailed.cpp?rev=6856
Now, this program does most of what I need it to do, but I ran into something interesting. I found that for 9 out of 15 of the optional projection warpers, I receive the following error when I try to run the program:
Insufficient memory (Failed to allocate XXXXXXXXXX bytes) in unknown function,
file C:\slave\winInstallerMegaPack\src\opencv\modules\core\src\alloc.cpp,
line 52
where the "X's" mark integer that change between the different types of projection (as though different methods require different amounts of space). The full source code for "alloc.cpp" can be found at the following website:
https://code.ros.org/trac/opencv/browser/trunk/opencv/modules/core/src/alloc.cpp?rev=3060
However, the line of code that emits this error in alloc.cpp is:
static void* OutOfMemoryError(size_t size)
{
--HERE--> CV_Error_(CV_StsNoMem, ("Failed to allocate %lu bytes", (unsigned long)size));
return 0;
}
So, I am simply lost as to the possible reasons that this error may be occurring. I realize that this error would normally occur if the system is out of memory, but I when running this program with my test images I am never using more that ~3.5GB of RAM, according to my Task Manager.
Also, since the program was written as an sample of the OpenCV stitching capabilities BY OpenCV developers I find it hard to believe that there is a drastic memory error present within the source code.
Finally, the program works fine if I use some of the warping methods:
- spherical
- fisheye
- transverseMercator
- compressedPlanePortraitA2B1
- paniniPortraitA2B1
- paniniPortraitA1.5B1)
but as ask the program to use any of the others (through the command line tag
--warp [PROJECTION_NAME]):
- plane
- cylindrical
- stereographic
- compressedPlaneA2B1
- mercator
- compressedPlaneA1.5B1
- compressedPlanePortraitA1.5B1
- paniniA2B1
- paniniA1.5B1
I get the error mentioned above. I get pretty good results from the transverseMercator project warper, but I would like to test the stereographic in particular. Can anyone help me figure this out?
The pictures that I am trying to process are 1360 x 1024 in resolution and my computer has the following stats:
Model: HP Z800 Workstation
Operating System: Windows 7 enterprise 64-bit OPS
Processor: Intel Xeon 2.40GHz (12 cores)
Memory: 14GB RAM
Hard Drive: 1TB Hitachi
Video Card: ATI FirePro V4800
Any help would be greatly appreciated, thanks!
When I run OpenCV's APP traincascade, i get just the same error as you:
Insufficient memory (Failed to allocate XXXXXXXXXX bytes) in unknown function,
file C:\slave\winInstallerMegaPack\src\opencv\modules\core\src\alloc.cpp,
line 52
at the time, only about 70% pecent of my RAM(6G) was occupied. And when runnig trainscascade step by step, I found that the error would be thrown.when it use about more than 1.5G RAM space.
then, I found the are two arguments which can control how many memory should be used:
-precalcValBufSize
-precalcIdxBufSize
so i tried to set these two to 128, it run. I hope my experience can help you.
I thought this problem is nothing about memory leak, it is just relate to how many memory the OS limits a application occupy. I expect someone can check my guess.
I've recently had a similar issue with OpenCV image stitching. I've used create method to create stitcher instance and provided 5 images in vertical order to stitch method, but I've received insufficient memory error.
Panorama was successfully created after setting:
setWaveCorrection(false)
This solution will not be applicable if you need wave correction.
This may be related to the sequence of the stitching, I split a big picture into 3*3, and firstly I stitch them row by row and there is no problem, when I stitch them column by column, there is the problem same as you.