I have a Python/Django app that will require database load balancing at some point in the near future. In the meantime I'm trying to learn to implement pgpool on a local virtual machine setup.
I have 4 Ubuntu 12.04 VMs:
192.168.1.80 <- pool, pgppool2 installed and accessible
192.168.1.81 <- db1 master
192.168.1.82 <- db2 slave
192.168.1.83 <- db3 slave
I have pgpool-II version 3.1.1 and my database servers are running
PostgreSQL 9.1.
I have my app's db connection pointed to 192.168.1.80:9999 and it works fine.
The problem is when I use Apache ab to throw some load at it, none of
SELECT queries appear to be balanced. All the load goes to my db1
master. Also, quite concerning is the load on the pool server itself,
it is really high compared to db1, maybe an average of 8-10 times
higher. Meanwhile my db2 and db3 servers have a load of nearly zero,
they appear to only be replicating from db1, which isn't very load
intensive for my tests with ab.
ab -n 300 -c 4 -C 'sessionid=80a5fd3b6bb59051515e734326735f80' http://192.168.1.17:8000/contacts/
That drives the load on my pool server up to about 2.3. Load on db1
is about 0.4 and load on db2 and db3 is nearly zero.
Can someone take a look at my config and see if what I'm doing wrong?
backend_hostname0 = '192.168.1.81'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/9.1/main'
backend_flag0 = 'ALLOW_TO_FAILOVER'
backend_hostname1 = '192.168.1.82'
backend_port1 = 5433
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/9.1/main'
backend_flag1 = 'ALLOW_TO_FAILOVER'
backend_hostname2 = '192.168.1.83'
backend_port2 = 5434
backend_weight2 = 1
backend_data_directory2 = '/var/lib/postgresql/9.1/main'
backend_flag2 = 'ALLOW_TO_FAILOVER'
load_balance_mode = on
My entire config is here:
http://pastebin.com/raw.php?i=wzBc0aSp
I needed
replication_mode = off
master_slave_mode = on
Thanks to Tatsuo Ishii:
http://www.pgpool.net/pipermail/pgpool-general/2013-January/001309.html
Related
I'm using GCP Composer with newest image version composer-1.16.1-airflow-1.10.15.
Mine webservers are dying from time to time because of some missing cache files
{cli.py:1050} ERROR - [Errno 2] No such file or directory
Does anybody know how to solve it?
Additional info:
Workers:
Node count 3 Disk size (GB) 20 Machine type n1-standard-1
Web server configuration:
Machine type composer-n1-webserver-8 (8 vCPU, 7.6 GB memory)
Configuration overrides:
UPDATE 27.04.2021
I've managed to find the place responsible for killing the web-server
https://github.com/apache/airflow/blob/4aec433e48dcc66c9c7b74947c499260ab6be9e9/airflow/bin/cli.py#L1032-L1138
GCP Composer is using Celery Executor underneath - soo during the check it tries to read some cache files that are already removed by workers?
I've found it! Aaand I'll report the bug to GCP Composer team
So if the config webserver.reload_on_plugin_change=True then cli is going into that section:
https://github.com/apache/airflow/blob/4aec433e48dcc66c9c7b74947c499260ab6be9e9/airflow/bin/cli.py#L1118-L1138
# if we should check the directory with the plugin,
if self.reload_on_plugin_change:
# compare the previous and current contents of the directory
new_state = self._generate_plugin_state()
# If changed, wait until its content is fully saved.
if new_state != self._last_plugin_state:
self.log.debug(
'[%d / %d] Plugins folder changed. The gunicorn will be restarted the next time the '
'plugin directory is checked, if there is no change in it.',
num_ready_workers_running, num_workers_running
)
self._restart_on_next_plugin_check = True
self._last_plugin_state = new_state
elif self._restart_on_next_plugin_check:
self.log.debug(
'[%d / %d] Starts reloading the gunicorn configuration.',
num_ready_workers_running, num_workers_running
)
self._restart_on_next_plugin_check = False
self._last_refresh_time = time.time()
self._reload_gunicorn()
def _generate_plugin_state(self):
"""
Generate dict of filenames and last modification time of all files in settings.PLUGINS_FOLDER
directory.
"""
if not settings.PLUGINS_FOLDER:
return {}
all_filenames = []
for (root, _, filenames) in os.walk(settings.PLUGINS_FOLDER):
all_filenames.extend(os.path.join(root, f) for f in filenames)
plugin_state = {f: self._get_file_hash(f) for f in sorted(all_filenames)}
return plugin_state
It is generating files to check by calling os.walk(settings.PLUGINS_FOLDER) function.
In the same time gcsfuse is deciding to delete part of these files
And an error happens - file is not found.
So disabling webserver.reload_on_plugin_change is making the work - but this option is really convenient so I'll create the bug ticket for google
So I am experimenting with OMNeT++ and I have created a DataCenter with Fat Tree topology. Now I want to observe how a UDP application works in (as-real-as it gets) conditions. I have used the INET Framework and implemented the VideoStream Client-Server application.
So my question is that:
My network seems to work perfectly and I don't want to. I want to measure the datarate received by the client and compare it to the server's broadcast datarate. But even though I put much traffic into the network (several UDP and TCP applications) the datarate received is exactly the same as the datarate broadcasted and I am guessing in real conditions this traffic load is expected to vary in such highly dynamic enviroments. So how can I achieve the right conditions on OMNeT++ to simulate a real communication (maybe with packet losses and queuing time etc.) so I can measures these traffic loads?
There is the .ini file used:
[Config UDPStreamMultiple]
**.Pod[2].racks[1].servers[1].vms[2].numUdpApps = 1
**.Pod[2].racks[1].servers[1].vms[2].udpApp[0].typename = "UDPVideoStreamSvr"
**.Pod[2].racks[1].servers[1].vms[2].udpApp[0].localPort = 1000
**.Pod[2].racks[1].servers[1].vms[2].udpApp[0].sendInterval = 1s
**.Pod[2].racks[1].servers[1].vms[2].udpApp[0].packetLen = 20480B
**.Pod[2].racks[1].servers[1].vms[2].udpApp[0].videoSize = 512000B
**.Pod[3].racks[0].servers[0].vms[0].numUdpApps = 1
**.Pod[3].racks[0].servers[0].vms[0].udpApp[0].typename = "UDPVideoStreamSvr"
**.Pod[3].racks[0].servers[0].vms[0].udpApp[0].localPort = 1000
**.Pod[3].racks[0].servers[0].vms[0].udpApp[0].sendInterval = 1s
**.Pod[3].racks[0].servers[0].vms[0].udpApp[0].packetLen = 2048B
**.Pod[3].racks[0].servers[0].vms[0].udpApp[0].videoSize = 51200B
**.Pod[0].racks[0].servers[0].vms[0].numUdpApps = 1
**.Pod[0].racks[0].servers[0].vms[0].udpApp[0].typename = "UDPVideoStreamCli"
**.Pod[0].racks[0].servers[0].vms[0].udpApp[0].serverAddress = "20.0.0.47"
**.Pod[0].racks[0].servers[0].vms[0].udpApp[0].serverPort = 1000
**.Pod[1].racks[0].servers[0].vms[1].numUdpApps = 1
**.Pod[1].racks[0].servers[0].vms[1].udpApp[0].typename = "UDPVideoStreamCli"
**.Pod[1].racks[0].servers[0].vms[1].udpApp[0].serverAddress = "20.0.0.49"
**.Pod[1].racks[0].servers[0].vms[1].udpApp[0].serverPort = 1000
**.Pod[2].racks[0].servers[0].vms[1].numUdpApps = 1
**.Pod[2].racks[0].servers[0].vms[1].udpApp[0].typename = "UDPVideoStreamCli"
**.Pod[2].racks[0].servers[0].vms[1].udpApp[0].serverAddress = "20.0.0.49"
**.Pod[2].racks[0].servers[0].vms[1].udpApp[0].serverPort = 1000
**.Pod[2].racks[1].servers[0].vms[1].numUdpApps = 1
**.Pod[2].racks[1].servers[0].vms[1].udpApp[0].typename = "UDPVideoStreamCli"
**.Pod[2].racks[1].servers[0].vms[1].udpApp[0].serverAddress = "20.0.0.49"
**.Pod[2].racks[1].servers[0].vms[1].udpApp[0].serverPort = 1000
Thanks in advance.
So after a lot of research, thinking and tests I managed to achieve my desired goal. What I did is this:
Because the whole topology of the Datacenter is built that way so it will reduce the load that the routers have and balance the traffic, to take the necessary measures I created a smaller network. Just simple 3 nodes (StandardHost) from the INET Framework and a simple UDP Application that goes from node A to node B through midHost (e.g nodeA ---> midHost ---> nodeB). In the .ini file there should be a couple of commands like:
**.ppp[*].queueType = "DropTailQueue"
**.ppp[*].queue.frameCapacity = 50
**.ppp[*].numOutputHooks = 1
**.ppp[*].outputHook[*].typename = "ThruputMeter"
These commands are managing the links between the nodes and can be scaled according to someone needs (adjust the frame capacity maybe or the queue type). By making this small network you can easily customize it and take the metrics you need. I hope I can help anyone who wanted to do the same to get some feel about how to do it.
when I run unit tests for my application, first tests are successful, then around 100, tests start to fail, due to PDOException (Too many connections). I have already searched about this problem, but was not able to solve it.
My config is as follows:
<phpunit
backupGlobals = "false"
backupStaticAttributes = "false"
colors = "true"
convertErrorsToExceptions = "true"
convertNoticesToExceptions = "true"
convertWarningsToExceptions = "true"
processIsolation = "false"
stopOnFailure = "false"
syntaxCheck = "false"
bootstrap = "bootstrap.php.cache" >
If I change processIsolation to "true", all tests generate an error (E):
Caused by ErrorException: unserialize(): Error at offset 0 of 79 bytes
For that I tried setting "detect_unicode = Off" inside php.ini file.
If I run tests in smaller batches, like with "--group something", all tests are successful.
Can someone help me solve the issue when running all the tests at once? I really want to get rid of the PDOException.
Thanks in advance!
You should increase the maximum number of concurrent connections in your DB server.
If you're using MySQL, edit /etc/mysql/my.cnf and set the max_connections parameter to the number of concurrent connections you need. Then restart the MySQL server.
Keep in mind: In theory, the physical limits are very high. But if your queries cause a high CPU load or memory consumption, your DB server could eat up the resources required for other processes. This means, you could run out of memory, or your system can become overloaded.
For people who are having the same issue, here is more specific steps to config the my.cnf file.
If you are sure you are on the right my.cnf file, put max_connections = 500 (default is 151) to [mysqld] section in the my.cnf. Don't put it in the [client] section.
To make sure you are on the right my.cnf, if you have multiple mysqld installed from Homebrew or XAMMPP, find the right mysqld, for XAMMPP using /Applications/XAMPP/xamppfiles/sbin/mysqld --verbose --help | grep -A 1 "Default options" and you will get something like this:
Default options are read from the following files in the given order:
/Applications/XAMPP/xamppfiles/etc/xampp/my.cnf /Applications/XAMPP/xamppfiles/etc/my.cnf ~/.my.cnf
Normally it's at /Applications/XAMPP/xamppfiles/etc/my.cnf.
I have a django+uwsgi based website. some of the tables have almost 1 million rows.
After a few website usages, the VIRT memory used the uwsgi process reaches almost 20GB...almost kill my server...
Could you someone tell what may caused this problem? is it my table rows too big? (unlikely. Pinterest has much more data). now, i had to use reload-on-as= 10024
reload-on-rss= 4800 to kill the workers every few minutes....it is painful...
any help?
Here is my uwsgi.ini file
[uwsgi]
chdir = xxx
module = xxx.wsgi
master = true
processes = 2
socket =127.0.0.1:8004
chmod-socket = 664
no-orphans = true
#limit-as=256
reload-on-as= 10024
reload-on-rss= 4800
max-requests=250
uid = www-data
gid = www-data
#chmod-socket = 777
chown-socket = www-data
# clear environment on exit
vacuum = true
After some digging on stackflow and google search, here is the solution.
read this how django memory works and why it keeps going up
read this django app profiling
then I figured out the major parameter to set in uwsgi.ini is max_request. originally, I set it as 2000. now set it as 50. so it will respawn workers when memory goes up too much.
Then i try to figure out which request causes huge data query results from database. I ended up finding this little line:
amount=sum(x.amount for x in Project.objects.all())
While Project table has over 1 million complex entries.Occupying huge memory.... since I commented this out... everything runs smooth now.
So it is good to understand how the [django query works with database]
(Sorry I don't have enough reputation to comment - so apologies if this answer doesn't help in your case)
I had the same issue running Django on uwsgi/gninx and uwsgi controlled via supervisor. uwsgi-supervisor process started using lots of memory and consuming 100% CPU so only option was to repeatedly restart uwsgi.
Turned out the solution was to set up logging in the uwsgi.ini file:
logto = /var/log/uwsgi.log
There is some discussion on this here: https://github.com/unbit/uwsgi/issues/296
Common situation: I have a client on my server who may update some of the code in his python project. He can ssh into his shell and pull from his repository and all is fine -- but the code is stored in memory (as far as I know) so I need to actually kill the fastcgi process and restart it to have the code change.
I know I can gracefully restart fcgi but I don't want to have to manually do this. I want my client to update the code, and within 5 minutes or whatever, to have the new code running under the fcgi process.
Thanks
First off, if uptime is important to you, I'd suggest making the client do it. It can be as simple as giving him a command called deploy-code. Using your method, if there is an error in their code, your method requires a 10 minute turnaround (read: downtime) for fixing it, assuming he gets it correct.
That said, if you actually want to do this, you should create a daemon which will look for files modified within the last 5 minutes. If it detects one, it will execute the reboot command.
Code might look something like:
import os, time
CODE_DIR = '/tmp/foo'
while True:
if restarted = True:
restarted = False
time.sleep(5*60)
for root, dirs, files in os.walk(CODE_DIR):
if restarted=True:
break
for filename in files:
if restared=True:
break
updated_on = os.path.getmtime(os.path.join(root, filename))
current_time = time.time()
if current_time - updated_on <= 6 * 60: # 6 min
# 6 min could offer false negatives, but that's better
# than false positives
restarted = True
print "We should execute the restart command here."