Ok, strange setup, strange question. We've got a Client and an Admin web application for our SaaS app, running on asp.net-2.0/iis-6. The Admin application can change options displayed on the Client application. When those options are saved in the Admin we call a Webservice on the Client, from the Admin, to flush our cache of the options for that specific account.
Recently we started giving our Client application >1 Worker Processes, thus causing the cache of options to only be cleared on 1 of the currently running Worker Processes.
So, I obviously have other avenues of fixing this problem (however input is appreciated), but my question is: is there any way to target/iterate through each Worker Processes via a web request?
I'm making some assumptions here for this answer....
I'm assuming the client app is using one of the .NET caching classes to store your application's options?
When you say 'flush' do you mean flush them back to a configuration file or db table?
Because the cache objects and data won't be shared between processes you need a mechanism to signal to the code running on the other worker process that it needs to re-read it's options into its cache or force the process to restart (which is not exactly convenient and most likely undesirable).
If you don't have access to the client source to modify to either watch the options config file or DB table (say using a SqlCacheDependency) I think you're kinda stuck with this behaviour.
I have full access to admin and client, by cache, I mean .net's Cache object. By flush I mean removing the item from the Cache object.
I'm aware that both worker processes don't share the cache data. That's sort of my conundrum)
The system is the way it is to remove the need to hit sql every new-session that comes in. So I'm trying to find a solution that can just tell each worker process that the cache needs to be cleared w/o getting sql involved.
Related
I'm working on a long request to a django app (nginx reverse proxy, mysql db, celery-rabbitMQ-redis set) and have some doubts about the solution i should apply :
Functionning : One functionality of the app allows users to migrate thousands of objects from one system to another. Each migration is logged into a db, and the users are provided the possibility to get in a csv format the history of the migration : which objects have been migrated, which status (success, errors, ...)
To get the history, a get request is sent to a django view, which returns, after serialization and rendering into csv, the download response.
Problem : the serialisation and rendering processes, for a large set of objects (e.g. 160 000) are quite long and the request times out.
Some solutions I was thinking about/found thanks to pervious search are :
Increasing the amount of time before timeout : easy, but I saw everywhere that this is a global nginx setting and would affect every requests on the server.
Using an asynchronous task handled by celery : the concept would be to make an initial request to the server, which would launch the serializing and rendering task with celery, and give a special httpresponse to the client. Then the client would regularly ask the server if the job is done, and the server would deliver the history at the end of processing. I like this one but I'm not sure about how to technically implement that.
Creating and temporarily storing the csv file on the server, and give the user a way to access it & to download it. I'm not a big fan of that one.
So my question is : has anyone already faced a similar question ? Do you have advises for the technical implementation of the solution (#2), or a better solution to propose me ?
Thqnks !
Clearly you should use Celery + RabbitMQ/REDIS. If you look at the docs it´s not that hard to setup.
The first question is whether to use RabbitMQ or Redis. There are many SO questions about this with good information about pros/cons.
The implementation in django is really simple. You can just wrap django functions with celery tasks (with #task attribute) and it´ll become async, so this is the easy part.
The problem I see in your project is that the server who is handling http traffic is the same server running the long process. That can affect performance and user experience even if celery is running on the background. Of course that depends on how much traffic you are expecting on that machine and how many migrations can run at the same time.
One of the things you setup on Celery is the number of workers (concurrent processing units) available. So the number of cores in your machine will matter.
If you need to handle http calls quickly I would suggest to delegate the migration process to another machine. Celery/REDIS can be configured that way. Let´s say you´ve got 2 servers. One would handle only normal django calls (no celery) and trigger celery tasks on the other server (the one who actually runs the migration process). Both servers can connect to the same database.
But this is just an infrastructure optimization and you may not need it.
I hope this answers your question. If you have specific Celery issues it would be better to create another question.
I am trying to set up Geoserver as a backend to our MVC app. Geoserver works great...except it only lets me do one thing at a time. If I am processing a shapefile, the REST interface and GUI lock up until the job is done processing.
I know that there is the option to Cluster a geoserver configuration, but that would only be load balancing, so instead of only one read/write operation, I would have two instead...but we need to scale this up to at least 20 concurrent tasks at one time.
All of the references I've seen on the internet talk about locking down the number of concurrent connections, but only 1 is allowed the whole time.
Obviously GeoServer is used in production environments that have more than 1 request at the same time. I am just stumped about how to make it happen.
A few weeks ago, my colleague sent this email to the Geoserver Development team, the problem was described as a configuration lock...and that by changing a variable we could release it. The only place I saw this variable was in the source code on GitHub.
Is there a way to specify in one of the config files of Geoserver to turn these locks off so I can do concurrent read/writes? If anybody out there has encountered this before PLEASE HELP!!! Thanks!
On Fri, May 16, 2014 at 7:34 PM, Sean Winstead wrote:
Hi,
We are using GeoServer 2.5 RC2. When uploading a shape file via the REST
API, the server does not respond to other requests until after the shape
file has been processed.
For example, if I start a file upload and then click on the Layers menu
item in the web app, the response for the Layers page is not received until
after the file upload and processing have completed.
I researched the issue but did not find a suitable cause/answer. I did
install the control flow extension and created an controlflow.properties
file in the data directory, but this did not appear to have any effect.
How do I diagnose the cause of this behavior?
Simple, it's the configuration lock. Our configuration subsystem is not
able to handle correct concurrent writes,
or reads during writes, so there is a whole instance read/write lock that
is taken every time you use the rest
api and the user interface, nothing can be done while the lock is in place
If you want, you can disable it using the system variable
GeoServerConfigurationLock.enabled,
-DGeoServerConfigurationLock.enabled=true
but of course we cannot predict what will happen to the configuration if
you do that.
Cheers
Andrea
-DGeoServerConfigurationLock.enabled=true is referring to a startup parameter given to the java command when GeoServer is first started. Looking at GeoServer's bin/startup.sh and bin\startup.bat the approved way to do this is via an environment variable named JAVA_OPTS. You will see lines like
if [ -z "$JAVA_OPTS" ]; then
export JAVA_OPTS="-XX:MaxPermSize=128m"
fi
in startup.sh and
if "%JAVA_OPTS%" == "" (set JAVA_OPTS=-XX:MaxPermSize=128m)
in startup.bat. You will need to make those
... JAVA_OPTS="-DGeoServerConfigurationLock.enabled=true -XX:MaxPermSize=128m"
or define that JAVA_OPTS environment variable similarly before GeoServer is started.
The development team's response of "of course we cannot predict what will happen to the configuration if you do that", however, suggests to me that there may be concurrency issues lurking; which may be likely to surface more frequently as you scale up. Maybe you want to think about disconnecting the backend processing of those shape files from the REST requests to do so using some queueing mechanism instead of disabling GeoServer's configuration lock.
Thank You, I figured it out. We didn't even need to do this because we were only using one login for the REST interface (admin) instead of making a new user for each repository, now the locking issue doesn't happen.
I have an API which opens an access database for read and write. The API opens the connection when it's constructed and closes the connection when it's destructed. When the db is opened an .ldb file is created and when it closes it's removed (or disappears).
There are multiple applications using the API to read and write to the access db. I want to know:
Is ldb file used to track multiple connections
Does calling an db.close() closes all connections or just one instance.
Will there be any sync issues with the above approach.
db.Close() closes one connecton. The .ldb is automatically removed when all connections are closed.
Keep in mind that while Jet databases (i.e. Access) do support mutiple simultaneous users, they're not extremely well-suited for a very large concurrent user base; for one thing, they are easily corrupted when there are network issues. I'm actually dealing with that right now. If it comes to that, you will want to use a database server.
That said, I've used Jet databases in that way many times.
Not sure what you mean when you say "sync issues".
Yes, it's required to open database in shared mode by multiple users. Seems it stands for "Lock Database". See more info in MSDN: Introduction to .ldb files in Access 2000.
Close() closes only one connection, others are unaffected.
Yes, it's possible if you try to write records that another user has locked. However data will remain consistent, you will just receive error about write conflict.
Actually MS Access is not best solution for multi-connection usage scenario.
You may take a look at SQL Server Compact which is light version of MS SQL Server. It runs in-process, supports multiple connections and multithreading, most of robust T-SQL features (excluding stored procs) etc.
As an additional note to otherwise good answers, I would strongly recommend keeping a connection to a dummy table open for the lifetime of the client application.
Closing connections too often and allowing the lock file to be created/deleted every time is a huge performance bottleneck and, in some cases of rapid access to the database, can actually cause queries and inserts to fail.
You can read a bit more in this answer I gave a while ago.
When it comes to performance and reliability, you can get quite a lot out of Access databases providing that you keep some things in mind:
Keep a connection open to a dummy table for the duration of the life of the client (or at least use some timeout that would close the connection after like 20 seconds of inactivity if you don't want to keep it open all the time).
Engineer your clients apps to properly close all connections (including the dummy one when i'ts time to do it), whatever happens (eg crash, user shutdown, etc).
Leaving locks in place is not good, as it could mean that the client has left the database in an unknown state, and could increase the likelihood of corruption if other clients keep leaving stale locks.
Compact and repair the database regularly. Make it a nightly task.
This will ensure that the database is optimised, and that any stale data is removed and open locks properly closed.
Good, stable network connectivity is paramount to data integrity for a file-based database: avoid WiFi like the plague.
Have a way to kick out all clients from the database server itself.
For instance, have a table with for instance a MaintenanceLock field that clients poll regularly. If the field is set, the client should disconnect, after giving an opportunity for the user to save his work.
Similarly, when a client app starts, check this field in the database to allow or disallow the client to connect to it.
Now, you can quick out clients at any time without having to go to each user and ask them to close the app. It's also very useful to ensure that no client left open at night are still connected to the database when you run Compact & Repair maintenance on it.
We are using Django 1.3.1 and Postgres 9.1
I have a view which just fires multiple selects to get data from the database.
In Django documents it is mentioned, that when a request is completed then ROLLBACK is issued if only select statements were fired during a call to a view. But, I am seeing lot of "idle in transaction" in the log, especially when I have more than 200 requests. I don't see any commit or rollback statements in the postgres log.
What could be the problem? How should I handle this issue?
First, I would check out the related post What does it mean when a PostgreSQL process is “idle in transaction”? which covers some related ground.
One cause of "Idle in transaction" can be developers or sysadmins who
have entered "BEGIN;" in psql and forgot to "commit" or "rollback".
I've been there. :)
However, you mentioned your problem is related to have a lot of
concurrent connections. It sounds like investigating the "locks" tip
from the post above may be helpful to you.
A couple more suggestions: this problem may be secondary. The primary
problem might be that 200 connections is more than your hardware and
tuning can comfortably handle, so everything gets slow, and when things
get slow, more things are waiting for other things to finish.
If you don't have a reverse proxy like Nginx in front of your web app,
considering adding one. It can run on the same host without additional
hardware. The reverse proxy will serve to regulate the number of
connections to the backend Django web server, and thus the number of
database connections-- I've been here before with having too many
database connections and this is how I solved it!
With Apache's prefork model, there is 1=1 correspondence between the
number of Apache workers and the number of database connections,
assuming something like Apache::DBI is in use. Imagine someone connects
to the web server over a slow connection. The web and database server
take care of the request relatively quickly, but then the request is
held open on the web server unnecessarily long as the content is
dribbled back to the client. Meanwhile, the database connection slot is
tied up.
By adding a reverse proxy, the backend server can quickly delivery a
repliy back to the reverse proxy and then free the backend worker and
database slot.. The reverse proxy is then responsible for getting the
content back to the client, possibly holding open it's own connection
for longer. You may have 200 connections to the reverse proxy up front,
but you'll need far fewer workers and db slots on the backend.
If you graph the db slots with MRTG or similar, you'll see how many
slots you are actually using, and can tune down max_connections in
PostgreSQL, freeing those resources for other things.
You might also look at pg_top to
help monitor what your database is up to.
I understand this is an older question, but this article may describe the problem of idle transactions in django.
Essentially, Django's TransactionMiddleware will not explicitly COMMIT a transaction if it is not marked dirty (usually triggered by writing data). Yet, it still BEGINs a transaction for all queries even if they're read only. So, pg is left waiting to see if any more commands are coming and you get idle transactions.
The linked article shows a small modification to the transaction middleware to always commit (basically remove the condition that checks if the transaction is_dirty). I'll be trying this fix in a production environment shortly.
My website is using Django, and now I want to port part of the logic to a Redis, so I need a Redis connection for my views.py code, obvious I can't write connect to redis code in views.py because it might be called multiple times, so I need to put the connect somewhere in the django, perhaps middleware?
But I don't want to make this complicated, just the same place where the MySQL database connected, I want to add a global object for Redis connection. Perhaps later for XMPP conenction and ZeroMQ.
How to do this?
ANy idea is appreciated. Thanks in advance :)
in typical Django server settings multiple requests will be handled by the same worker process.
You can simply put a global variable to hold the connection on top of views.py and use the conenction in each view function/class, the connection will be established when the worker process starts and closes when the worker process got recycled. It's semi-permanent connection but good enough.
MySQL connection works the same way in Django. It's not each db connection per request but per worker process life-span
It isn't obvious you want to do that. It isn't obvious why you would want to do that.
So why not connect in views.py? To use a single "global connection" will mean adding locking/serialization code to ensure that your connection is safe to use amongst many calls to your views. I actually create and connect right in the method in my various and sundry views.py files. Sometimes I connect to one instance or another. I've seen no performance issues and also don't have to worry about concurrency safety. I let Redis figure that out.
Another aspect of a global shared connection is degraded performance - you'll have one page view waiting on another's to finish before it can run. By allowing each to have it's own connection you avoid one view slowing down another while waiting on access to a global connector.
Consider this: if your queries are so small and fast that you don't expect to see a performance hit from serializing every page that accesses Redis, then you won't see any performance degradation from a connection per page as you connect, query, and close. I highly doubt that the cost of setting up the connection is significantly more than serializing all page accesses that connect to Redis.
So my suggestion is to just try it. If and only if you see an issue should you worry about implementing something you will probably not need.
There is a great piece of code for this already. http://github.com/andymccurdy/redis-py