django in webfaction & apache memory load - django

I am developing a new application. I am still in development stage. However whenever I restart apache my application gets a 140MB memory cap. Whereas my other (older and more complex) application gets a 40MB. This results in webfaction sending me messages about memory usage. Apache by default starts with 2 processes resulting in a more than 300MB memory usage. I changed this to 1 process like this:
MaxSpareThreads 3
MinSpareThreads 1
ServerLimit 1
SetEnvIf X-Forwarded-SSL on HTTPS=1
ThreadsPerChild 5
WSGIDaemonProcess tipleaders processes=1 threads=6 python-path=/home/<<USERNAME>>/webapps/<<WEBSITE>>:/home/<<USERNAME>>/webapps/<<WEBSITE>>/lib/python2.7:/home/<<USERNAME>>/webapps/<<WEBSITE>>/<<WEBSITE>> maximum-requests=10
Memory usage does not increase with every request (so I guess it is not a memory leak problem).
It just starts with a very high memory cap (~150MB).
Any ideas what shall I do?
PS: this are my main imports in views.py http://dpaste.com/744785/
some other imports are here: http://dpaste.com/744786/
models.py, urls.py and settings.py are here: http://dpaste.com/744787/
EDIT
PS2: My whole site is using SSL
EDIT
as per request, the project is not dealing with media. No images, no videos. It is a simple website which parses 2 xml files with matches (events and results) and displays them to the user. No ads to the site (just one from the affiliator). No big images whatsoever. The site at the dev stage has only 10 users, and the sporting events that have been inserted in the database are not more than 5000.
EDIT:
I installed django-devserver (https://github.com/dcramer/django-devserver)
and this is what I get:
>python manage.py runserver
[profile] heap size is 7.9 MB
[profile] heap size is 7.9 MB
[sql] (219ms) 2 queries with 0 duplicates
[profile] Total time to render was 0.14s
[profile] 5.3 MB allocated, 13.0 KB deallocated, heap size is 13.3 MB
[sql] (219ms) 2 queries with 0 duplicates
[profile] Total time to render was 1.08s
[profile] 404.8 KB allocated, 18.6 KB deallocated, heap size is 13.7 MB
[12/May/2012 12:42:38] "GET / HTTP/1.1" 200 146587 (time: 6.93s; sql: 219ms (2q)
I am still puzzled on why Apache starts with 140MB allocated for my application.

Related

Gunicorn worker, threads for GPU tasks to increase concurrency/parallelism

I'm using Flask with Gunicorn to implement an AI server. The server takes in HTTP requests and calls the algorithm (built with pytorch). The computation is run on the nvidia GPU.
I need some input as to how can I achieve concurrency/parallelism in this case. The machine has 8 vCPUs, 20 GB memory and 1 GPU, 12 GB memory.
1 worker occupies, 4 GB memory, 2.2GB GPU memory.
max workers I can give is 5. (Because of GPU memory 2.2 GB * 5 workers = 11 GB )
1 worker = 1 HTTP request (max simultaneous requests = 5)
The specific question is
How can I increase the concurrency/parallelism?
Do I have to specify number of threads for computation on GPU?
Now my gunicorn command is
gunicorn --bind 0.0.0.0:8002 main:app --timeout 360 --workers=5 --worker-class=gevent --worker-connections=1000
fast Tokenizers are not thread-safe apparently.
AutoTokenizers seems like a wrapper that uses fast or slow internally. their default is set to fast (not thread-safe) .. you'll have to switch that to slow (safe) .. that's why add the use_fast=False flag
I was able to solve this by:
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
Best,
Chirag Sanghvi

Limiting Java 8 Memory Consumption

I'm running three Java 8 JVMs on a 64 bit Ubuntu VM which was built from a minimal install with nothing extra running other than the three JVMs. The VM itself has 2GB of memory and each JVM was limited by -Xmx512M which I assumed would be fine as there would be a couple of hundred MB spare.
A few weeks ago, one crashed and the hs_err_pid dump showed:
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 196608 bytes for committing reserved memory.
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
I restarted the JVM with a reduced heap size of 384MB and so far everything is fine. However when I currently look at the VM using the ps command and sort in descending RSS size I see
RSS %MEM VSZ PID CMD
708768 35.4 2536124 29568 java -Xms64m -Xmx512m ...
542776 27.1 2340996 12934 java -Xms64m -Xmx384m ...
387336 19.3 2542336 6788 java -Xms64m -Xmx512m ...
12128 0.6 288120 1239 /usr/lib/snapd/snapd
4564 0.2 21476 27132 -bash
3524 0.1 5724 1235 /sbin/iscsid
3184 0.1 37928 1 /sbin/init
3032 0.1 27772 28829 ps ax -o rss,pmem,vsz,pid,cmd --sort -rss
3020 0.1 652988 1308 /usr/bin/lxcfs /var/lib/lxcfs/
2936 0.1 274596 1237 /usr/lib/accountsservice/accounts-daemon
..
..
and the free command shows
total used free shared buff/cache available
Mem: 1952 1657 80 20 213 41
Swap: 0 0 0
Taking the first process as an example, there is an RSS size of 708768 KB even though the heap limit would be 524288 KB (512*1024).
I am aware that extra memory is used over the JVM heap but the question is how can I control this to ensure I do not run out of memory again ? I am trying to set the heap size for each JVM as large as I can without crashing them.
Or is there a good general guideline as to how to set JVM heap size in relation to overall memory availability ?
There does not appear to be a way of controlling how much extra memory the JVM will use over the heap. However by monitoring the application over a period of time, a good estimate of this amount can be obtained. If the overall consumption of the java process is higher than desired, then the heap size can be reduced. Further monitoring is needed to see if this impacts performance.
Continuing with the example above and using the command ps ax -o rss,pmem,vsz,pid,cmd --sort -rss we see usage as of today is
RSS %MEM VSZ PID CMD
704144 35.2 2536124 29568 java -Xms64m -Xmx512m ...
429504 21.4 2340996 12934 java -Xms64m -Xmx384m ...
367732 18.3 2542336 6788 java -Xms64m -Xmx512m ...
13872 0.6 288120 1239 /usr/lib/snapd/snapd
..
..
These java processes are all running the same application but with different data sets. The first process (29568) has stayed stable using about 190M beyond the heap limit while the second (12934) has reduced from 156M to 35M. The total memory usage of the third has stayed well under the heap size which suggests the heap limit could be reduced.
It would seem that allowing 200MB extra non heap memory per java process here would be more than enough as that gives 600MB leeway total. Subtracting this from 2GB leaves 1400MB so the three -Xmx parameter values combined should be less than this amount.
As will be gleaned from reading the article pointed out in a comment by Fairoz there are many different ways in which the JVM can use non heap memory. One of these that is measurable though is the thread stack size. The default for a JVM can be found on linux using java -XX:+PrintFlagsFinal -version | grep ThreadStackSize In the case above it is 1MB and as there are about 25 threads, we can safely say that at least 25MB extra will always be required.

Not enough space to cache rdd in memory warning

I am running a spark job, and I got Not enough space to cache rdd_128_17000 in memory warning. However, in the attached file, it obviously saying only 90.8 G out of 719.3 G is used. Why is that? Thanks!
15/10/16 02:19:41 WARN storage.MemoryStore: Not enough space to cache rdd_128_17000 in memory! (computed 21.4 GB so far)
15/10/16 02:19:41 INFO storage.MemoryStore: Memory use = 4.1 GB (blocks) + 21.2 GB (scratch space shared across 1 thread(s)) = 25.2 GB. Storage limit = 36.0 GB.
15/10/16 02:19:44 WARN storage.MemoryStore: Not enough space to cache rdd_129_17000 in memory! (computed 9.4 GB so far)
15/10/16 02:19:44 INFO storage.MemoryStore: Memory use = 4.1 GB (blocks) + 30.6 GB (scratch space shared across 1 thread(s)) = 34.6 GB. Storage limit = 36.0 GB.
15/10/16 02:25:37 INFO metrics.MetricsSaver: 1001 MetricsLockFreeSaver 339 comitted 11 matured S3WriteBytes values
15/10/16 02:29:00 INFO s3n.MultipartUploadOutputStream: uploadPart /mnt1/var/lib/hadoop/s3/959a772f-d03a-41fd-bc9d-6d5c5b9812a1-0000 134217728 bytes md5: qkQ8nlvC8COVftXkknPE3A== md5hex: aa443c9e5bc2f023957ed5e49273c4dc
15/10/16 02:38:15 INFO s3n.MultipartUploadOutputStream: uploadPart /mnt/var/lib/hadoop/s3/959a772f-d03a-41fd-bc9d-6d5c5b9812a1-0001 134217728 bytes md5: RgoGg/yJpqzjIvD5DqjCig== md5hex: 460a0683fc89a6ace322f0f90ea8c28a
15/10/16 02:42:20 INFO metrics.MetricsSaver: 2001 MetricsLockFreeSaver 339 comitted 10 matured S3WriteBytes values
This is likely to be caused by the configuration of spark.storage.memoryFraction being too low. Spark will only use this fraction of the allocated memory to cache RDDs.
Try either:
increasing the storage fraction
rdd.persist(StorageLevel.MEMORY_ONLY_SER) to reduce memory usage by serializing the RDD data
rdd.persist(StorageLevel.MEMORY_AND_DISK) to partially persist onto disk if memory limits are reached.
This could be due to the following issue if you're loading lots of avro files:
https://mail-archives.apache.org/mod_mbox/spark-user/201510.mbox/%3CCANx3uAiJqO4qcTXePrUofKhO3N9UbQDJgNQXPYGZ14PWgfG5Aw#mail.gmail.com%3E
With a PR in progress at:
https://github.com/databricks/spark-avro/pull/95
I have a Spark-based batch application (a JAR with main() method, not written by me, I'm not a Spark expert) that I run in local mode without spark-submit, spark-shell, or spark-defaults.conf. When I tried to use IBM JRE (like one of my customers) instead of Oracle JRE (same machine and same data), I started getting those warnings.
Since the memory store is a fraction of the heap (see the page that Jacob suggested in his comment), I checked the heap size: IBM JRE uses a different strategy to decide default heap size and it was too small, so I simply added appropriate -Xms and -Xmx params and the problem disappeared: now the batch works fine both with IBM and Oracle JRE.
My usage scenario is not typical, I know, however I hope this can help someone.

Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 5 bytes)...in mysqli.php on line 483

I am trying to upgrade my website joomla from 1.5 to 2.5 using jUpgrade. but I am encountering this problem since a long time. jUpgrade component starts upgrading, downloads and exctracts the new joomla but while migrating the categories OR contents, it stucks and gives this error:
==========
[undefined] [undefined]
Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 5 bytes) in /home/daneshna/public_html/up/libraries/joomla/database/database/mysqli.php on line 483
I tried allocating more memory size in php.ini file in the server to even 1028M (128M was the default), but this problem persists and I cant get through with it. Tried everything I could find online, but its still there.
Can anyone please help me out with this issue?
(p.s. my website is www.daneshnamah.com, a persian educational website run from Afghanistan since almost 4 years perfectly with over 6000 articles in it and over 19M visitors so far, bcz of security reasons now I wanna upgrade it to 2.5)
Thank You
Ebtihaj
Looks like something (you added recently?) is eating up massive amounts of memory.
Things like that happen often with PHP which is why it shouldn't be used for more than Hypertext.
Did you restart your webserver after changing the php.ini?
What you can do to identify the hungry parts of your website is use Xdebug to see where the memory is eaten up.
If you recently added code/plugins, you can simply remove half of your new additions, see if it happens again, if it does, comment out half of that, etc, and keep doing this until you found the culprit.

Django, low requests per second with gunicorn 4 workers

I'm trying to see why my django website (gunicorn 4 workers) is slow under heavy load, I did some profiling http://djangosnippets.org/snippets/186/ without any clear answer so I started some load tests from scratch using ab -n 1000 -c 100 http://localhost:8888/
A simple Httpreponse("hello world") no middleware ==> 3600req/s
A simple Httpreponse("hello world") with middlewares (cached session, cached authentication) ==> 2300req/s
A simple render_to_response that only print a form (cached template) ==> 1200req/s (response time was divided by 2)
A simple render_to_response with 50 memcache queries ==> 157req/s
Memcache queries should be much faster than that (I'm using PyLibMCCache)?
Is the template rendering as slow as this result?
I tried different profiling technics without any success.
$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 46936
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 400000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 46936
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
$ sysctl -p
fs.file-max = 700000
net.core.somaxconn = 5000
net.ipv4.tcp_keepalive_intvl = 30
I'm using ubuntu 12.04 (6Go of ram, core i5)
Any help please?
It really depends on how long it takes to do a memcached request and to open a new connection (django closes the connection as the request finishes), both your worker and memcached are able to handle much more stress but of course if it takes 5/10ms to do a memcached call then 50 of them are going to be the bottleneck as you have the network latency multiplied by call count.
Right now you are just benchmarking django, gunicorn, your machine and your network.
Unless you have something extremely wrong at this level this tests are not going to point you to very interesting discoveries.
What is slowing doing your app is very likely to be related to the way you use your db and memcached (and maybe at template rendering).
For this reason I really suggest you to get django debug toolbar and to see whats happening in your real pages.
If it turns out that opening a connection to memcached is the bottleneck you can try to use a connection pool and keep the connection open.
You could investigate memcached performance.
$ python manage.py shell
>>> from django.core.cache import cache
>>> cache.set("unique_key_name_12345", "some value with a size representative of the real world memcached usage", timeout=3600)
>>> from datetime import datetime
>>> def how_long(n):
start = datetime.utcnow()
for _ in xrange(n):
cache.get("unique_key_name_12345")
return (datetime.utcnow() - start).total_seconds()
With this kind of round-trip test I am seeing that 1 memcached lookup will take about 0.2 ms on my server.
The problem with django.core.cache and pylibmc is that the functions are blocking. Potentially you could get 50 times that number in the round trip for HTTP request. 50 times 0.2 ms is already 10 ms.
If you were achieving 1200 req/s on 4 workers without memcached, the average HTTP round-trip time was 1/(1200/4) = 3.33 ms. Add 10 ms to that and it becomes 13.33 ms. The throughput with 4 workers would drop to 300 req/s (whuch happens to be in the ballpark of your 157 number).