I am trying to debug a problem I have with sidekiq processing jobs in a slow manner, while on the development machine the jobs start immediatly and run fast
My config is saving the original image on S3, pulling it by the worker, process the styles and save them back in s3. The jobs eventually finish, when a job ends, the next job does not immediatly start, making processing a lot of images really slow.
This problem happens no matter how many sidekiq workers I start (tested from 1-20).
I run imagemagick with -limit memory 64 -limit map 128 due to heroku's limited memory dynos.
The latest heroku cedar-14 has an imagemagick version that support multiple processing
Is there any special configuration I need to take into consideration when dealing with sidekiq + heroku + paperclip/imagemagick?
Heroku's 1X dynos are quite slow, they don't have the luxury of an SSD or several cores, unlike your laptop. Use -c 3 and expect to wait or upgrade to a PX.
Related
We have an app that works perfectly fine on production but very slow on the dev machine.
Django==2.2.4
I'm using Ubuntu 20.04 but other devs are using macOS and even Windows.
Our production server is a very small compared to the dev laptops (it works very slow on every dev environment, we are 5 developers).
The app makes several request since it's a Single Page application that uses Django Rest Framework and React.js in the front-end.
We have tried different databases locally (currently postgresql, tried MySQL and sqlite3), using docker, no docker, but it does not change the performance.
Each individual request takes a few seconds to execute, but when they go all toghether the thing gets very slow. As more request are executed, the performance starts to drop.
It takes the app between 2/3 minutes to load in the dev environment and in any production or staging environment it takes little above 10 seconds.
Also tried disabling DEBUG in the back and front-end, nothing changes.
It is my opinion that one of the causes is that the dev server is single thread and it does not process a request until the previous is finished.
This makes the dev environemnt very hard to work with.
I've seen alternatives (plugins) to make the dev server multi-thread but those solutions do not work with the latests versions of django.
What alternatives could we try to improve this?
Looks like posting this question helped me think in an alternative. Using gunicorn in the dev environment really helps.
Installed it with
pip install gunicorn
And then execute it using this:
venv/bin/gunicorn be-app.wsgi --access-logfile - --workers 2 --bind localhost:8000
Of course if I want to access the static and media files I'll have to set up a local nginx but it's not a big deal
Project Detail
I am running open source code of A GAN based Research Paper named "Investigating Generative Adversarial Networks based Speech Dereverberation for Robust Speech Recognition"
source code: here
The dependencies include:
Python 2.7
TensorFlow 1.4.0
I pulled a Docker Image of TensorFlow 1.4.0 with python 2.7 on my GPU Virtual Machine Connected with ssh connection with this command :
docker pull tensorflow/tensorflow:1.4.0-gpu
I am running
bash rsrgan/run_gan_rnn_placeholder.sh
according to readme of source code
Issue's Detail
Everything is working, Model is Training and loss is decreasing, But there is only one issue that After some iterations terminal shows no output, GPU still shows PID but no Memory freed and sometime GPU-Utils becomes 0%. Training on VM's GPU and CPU are same case.
It is not a memory issue Because GPU Memory usage by model is 5400MB out of 11,000MB and RAM for CPU is also very Big
When I ran 21 Iteration on my local Computer each iteration with 0.09 hours with 1st Gen i5 and 4GB RAM all iterations executed. But whenever I run it with ssh inside docker issue happens again and again with both GPU and CPU.
Just keep in mind the issue is happening inside docker with computer connected with ssh and ssh is also not disconnect very often.
exact Numbers
If an iteration take 1.5 hour then issue happens after two to three iterations and if single iteration take 0.06 hours then issue happens exactly after 14 iteration out of 25
Perform operations inside Docker container
The first thing you can try out is to build the Docker image and then enter inside the Docker container by specifying the -ti flag or /bin/bash parameter in your docker run command.
Clone the repository inside the container and while building the image you should also copy your training data from local to inside the docker. Run the training there and commit the changes so that you need not repeat the steps in future runs as after you exit from the container, all the changes are lost if not committed.
You can find the reference for docker commit here.
$ docker commit <container-id> <image-name:tag>
While training is going on check for the GPU and CPU utilization of the VM, see if everything is working as expected.
Use Anaconda environment on you VM
Anaconda is a great package manager. You can install anaconda and create a virtual environment and run your code in the virtual environment.
$ wget <url_of_anaconda.sh>
$ bash <path_to_sh>
$ source anaconda3/bin/activate or source anaconda2/bin/activate
$ conda create -n <env_name> python==2.7.*
$ conda activate <env_name>
Install all the dependencies via conda (recommended) or pip.
Run your code.
Q1: GAN Training with Tensorflow 1.4 inside Docker Stops without Prompting
Although Docker gives OS-level virtualization inside Docker, we face issues in running some processes which run with ease on the system. So to debug the issue you should go inside the image and performs the steps above in order to debug the problem.
Q2: Training stops without Releasing Memory connected to VM with SSH Connection
Ya, this is an issue I had also faced earlier. The best way to release memory is to stop the Docker container. You can find more resource allocation options here.
Also, earlier versions of TensorFlow had issues with allocating and clearing memory properly. You can find some reference here and here. These issues have been fixed in recent versions of TensorFlow.
Additionally, check for Nvidia bug reports
Step 1: Install Nvidia-utils installed via the following command. You can find the driver version from nvidia-smi output (also mentioned in the question.)
$ sudo apt install nvidia-utils-<driver-version>
Step 2: Run the nvidia-bug-report.sh script
$ sudo /usr/bin/nvidia-bug-report.sh
Log file will be generated in your current working directory with name nvidia-bug-report.log.gz. Also, you can access the installer log at /var/log/nvidia-installer.log.
You can find additional information about Nvidia logs at these links:
Nvidia Bug Report Reference 1
Nvidia Bug Report Reference 2
Log GPU load
Hope this helps.
Im running my django project using apache + mod_wsgi in daemon mode. When I have to make the server notice changes in the source code, I touch the wsgi.py file, but I have an issue with this approach.
Some tasks that are triggered from the front-end take 10 minutes to complete. If I touch the wsgi file while one of this long tasks are running, they get killed by the restart.
Is there any way to make the server to refresh the code, but keeping the previous unfinished tasks running until the are done?
Thanks!
Don't run long-running tasks in web processes. Use an offline task manager like Celery.
This a silly question, but couldn't find an answer. I'm running a rails app on jruby, and I use sidekiq to proccess background jobs. Do I really have to run sidekiq in another instance of jvm (is that what happens running bundle exec sidekiq) ?
Jruby is too much RAM consuming so this is not possible with my aws t2.micro instance.
Due to high memory consumption your AWS microinstance will choke eventually. There is one way to have both Ruby App and Background process sidekiq running.
Either you can boot another EC2 micro with sidekiq. You app instance and sidekiq server will share the same Redis instance. So, your BG processing wont be disrupted.
Another way is to bootup a Heroku free dyno.
Or you can move to CRuby or MRI.
hope this helps
I have a Django app running in Heroku with GUnicorn, I have 1Xdyno and just found that you can set your WEB_CONCURRENCY.
What's the optimal WEB_CONCURRENCY?
Article Deploying Python Applications with Gunicorn tells about various parameters of Gunicorn and their effect on Heroku.
Below is the text copied from this article regarding WEB_CONCURRENCY
Gunicorn forks multiple system processes within each dyno to allow a Python app to support multiple concurrent requests without requiring them to be thread-safe. In Gunicorn terminology, these are referred to as worker processes (not to be confused with Heroku worker processes, which run in their own dynos).
Each forked system process consumes additional memory. This limits how many processes you can run in a single dyno. With a typical Django application memory footprint, you can expect to run 2–4 Gunicorn worker processes on a 1X dyno. Your application may allow for a variation of this, depending on your application’s specific memory requirements.
We recommend setting a configuration variable for this setting, so you can tweak it without editing code and redeploying your application.
$ heroku config:set WEB_CONCURRENCY=3