Geoserver is unable to accept concurrent requests when processing files - concurrency

I am trying to set up Geoserver as a backend to our MVC app. Geoserver works great...except it only lets me do one thing at a time. If I am processing a shapefile, the REST interface and GUI lock up until the job is done processing.
I know that there is the option to Cluster a geoserver configuration, but that would only be load balancing, so instead of only one read/write operation, I would have two instead...but we need to scale this up to at least 20 concurrent tasks at one time.
All of the references I've seen on the internet talk about locking down the number of concurrent connections, but only 1 is allowed the whole time.
Obviously GeoServer is used in production environments that have more than 1 request at the same time. I am just stumped about how to make it happen.
A few weeks ago, my colleague sent this email to the Geoserver Development team, the problem was described as a configuration lock...and that by changing a variable we could release it. The only place I saw this variable was in the source code on GitHub.
Is there a way to specify in one of the config files of Geoserver to turn these locks off so I can do concurrent read/writes? If anybody out there has encountered this before PLEASE HELP!!! Thanks!
On Fri, May 16, 2014 at 7:34 PM, Sean Winstead wrote:
Hi,
We are using GeoServer 2.5 RC2. When uploading a shape file via the REST
API, the server does not respond to other requests until after the shape
file has been processed.
For example, if I start a file upload and then click on the Layers menu
item in the web app, the response for the Layers page is not received until
after the file upload and processing have completed.
I researched the issue but did not find a suitable cause/answer. I did
install the control flow extension and created an controlflow.properties
file in the data directory, but this did not appear to have any effect.​
How do I diagnose the cause of this behavior?
Simple, it's the configuration lock. Our configuration subsystem is not
able to handle correct concurrent writes,
or reads during writes, so there is a whole instance read/write lock that
is taken every time you use the rest
api and the user interface, nothing can be done while the lock is in place
If you want, you can disable it using the system variable
GeoServerConfigurationLock.enabled,
-DGeoServerConfigurationLock.enabled=true
but of course we cannot predict what will happen to the configuration if
you do that.
Cheers
Andrea

-DGeoServerConfigurationLock.enabled=true is referring to a startup parameter given to the java command when GeoServer is first started. Looking at GeoServer's bin/startup.sh and bin\startup.bat the approved way to do this is via an environment variable named JAVA_OPTS. You will see lines like
if [ -z "$JAVA_OPTS" ]; then
export JAVA_OPTS="-XX:MaxPermSize=128m"
fi
in startup.sh and
if "%JAVA_OPTS%" == "" (set JAVA_OPTS=-XX:MaxPermSize=128m)
in startup.bat. You will need to make those
... JAVA_OPTS="-DGeoServerConfigurationLock.enabled=true -XX:MaxPermSize=128m"
or define that JAVA_OPTS environment variable similarly before GeoServer is started.
The development team's response of "of course we cannot predict what will happen to the configuration if you do that", however, suggests to me that there may be concurrency issues lurking; which may be likely to surface more frequently as you scale up. Maybe you want to think about disconnecting the backend processing of those shape files from the REST requests to do so using some queueing mechanism instead of disabling GeoServer's configuration lock.

Thank You, I figured it out. We didn't even need to do this because we were only using one login for the REST interface (admin) instead of making a new user for each repository, now the locking issue doesn't happen.

Related

Handling long requests

I'm working on a long request to a django app (nginx reverse proxy, mysql db, celery-rabbitMQ-redis set) and have some doubts about the solution i should apply :
Functionning : One functionality of the app allows users to migrate thousands of objects from one system to another. Each migration is logged into a db, and the users are provided the possibility to get in a csv format the history of the migration : which objects have been migrated, which status (success, errors, ...)
To get the history, a get request is sent to a django view, which returns, after serialization and rendering into csv, the download response.
Problem : the serialisation and rendering processes, for a large set of objects (e.g. 160 000) are quite long and the request times out.
Some solutions I was thinking about/found thanks to pervious search are :
Increasing the amount of time before timeout : easy, but I saw everywhere that this is a global nginx setting and would affect every requests on the server.
Using an asynchronous task handled by celery : the concept would be to make an initial request to the server, which would launch the serializing and rendering task with celery, and give a special httpresponse to the client. Then the client would regularly ask the server if the job is done, and the server would deliver the history at the end of processing. I like this one but I'm not sure about how to technically implement that.
Creating and temporarily storing the csv file on the server, and give the user a way to access it & to download it. I'm not a big fan of that one.
So my question is : has anyone already faced a similar question ? Do you have advises for the technical implementation of the solution (#2), or a better solution to propose me ?
Thqnks !
Clearly you should use Celery + RabbitMQ/REDIS. If you look at the docs it´s not that hard to setup.
The first question is whether to use RabbitMQ or Redis. There are many SO questions about this with good information about pros/cons.
The implementation in django is really simple. You can just wrap django functions with celery tasks (with #task attribute) and it´ll become async, so this is the easy part.
The problem I see in your project is that the server who is handling http traffic is the same server running the long process. That can affect performance and user experience even if celery is running on the background. Of course that depends on how much traffic you are expecting on that machine and how many migrations can run at the same time.
One of the things you setup on Celery is the number of workers (concurrent processing units) available. So the number of cores in your machine will matter.
If you need to handle http calls quickly I would suggest to delegate the migration process to another machine. Celery/REDIS can be configured that way. Let´s say you´ve got 2 servers. One would handle only normal django calls (no celery) and trigger celery tasks on the other server (the one who actually runs the migration process). Both servers can connect to the same database.
But this is just an infrastructure optimization and you may not need it.
I hope this answers your question. If you have specific Celery issues it would be better to create another question.

tips to reduce message traffic and size in order to have less download amount

I have a mobile application integrated to a server where users can see tasks assigned and close the task request after work. In this project timing is very important, at least ones in a minute program should check if a task is assigned. Moreover mobile should also check the server if there is a change on the task that it already downloaded.
Because of the nature of the project download amount is high. How can we reduce it? Should we use another technology for server communication (Now we use ASP.NET Web Service Application)?
Thanks in advance.
Use JSON instead of XML Server.
Try using selective sync options like instead of complete tasks sync as it would become slow with higher number of tasks.
Mark task changes locally on mobile. mark entities dirty and then only update marked tasks to cloud/Server.
as SLaks suggested use push instead of pull it will save mobile battery and user's data package.
Here is what can help you:
Microsoft Sync Framework.
http://msdn.microsoft.com/en-us/sync/bb887608.aspx
http://weblogs.aspnet05.orcsweb.com/sbehera/archive/2009/04/10/sync-framework-for-windows-mobile-devices-amp-some-use-full-links.aspx

Long running tasks with Django

My goal is to create an application that will be able to do long-lasting mainly system tasks, such as:
checking out code from the repositories,
copying directories between various localizations,
etc.
The problem is I need to prepare it somehow independently from the web browser. I mean that for example after starting the checkout/copy action, closing the web browser will not interrupt the action. So after going back to that site I can see that the copying goes on or another action started when the browser was closed...
I was searching through various tools, like RabbitMQ + Celery, Twisted, Pyro, XML-RPC but I don't know if any of these will be suitable for me. Has anyone encountered similar needs when creating Django app? Please let me know if there are any methods/packages that I should know. Code samples also will be more than welcome!
Thank you in advance for your suggestions!
(And sorry for my bad English. I'm working on it.)
Basically you need to have a process that runs outside of the request. The absolute simplest way to do this (on a Unix-like operating system, at least) is to fork():
if os.fork() == 0:
do_long_thing()
sys.exit(0)
… continue with request …
This has some downsides, though (ex, if the server crashes, the “long thing” will be lost)… Which is where, ex, Celery can come in handy. It will keep track of the jobs that need to be done, the results of jobs (success/failure/whatever) and make it easy to run the jobs on other machines.
Using Celery with a Redis backend (see Kombu's Redis transport) is very simple, so I would recommend looking there first.
You might need to have a process outside the request / response cycle. If that is the case, Celery with a Redis backend is what I would suggest looking into, as that integrates nicely with Django (as David Wolever suggested).
Another option is to create Django management commands, and then use cron to execute them at scheduled intervals.

How to download and execute a file every month invisibly to the user?

I need a way to download ( from a server ) and execute a file every month automatically and invisibly to the user.
How can I do that ?
Don't blindly check for updates on a schedule. Instead, check when the user starts your application (every time, or every 10th time, or every 30th day, but only when the app gets used).
Users hate it when an application they aren't running is taking up resources.
As Steve Jessop points out, it may also be good to occasionally check again if the app stays running for a long time.
Installing a "Scheduled Task" is still the way to go, but set it to run manually instead of on a periodic schedule, then your app can trigger it. An app can trigger a task that executes with higher permissions than the app itself (creating the task in the first place requires full admin rights). The task also remembers the last time it ran which is useful for keeping traffic down.
An application that you build as a Windows service will run in the background and can do what you want.
Microsoft Windows services, formerly
known as NT services, enable you to
create long-running executable
applications that run in their own
Windows sessions. These services can
be automatically started when the
computer boots, can be paused and
restarted, and do not show any user
interface. These features make
services ideal for use on a server or
whenever you need long-running
functionality that does not interfere
with other users who are working on
the same computer.
Least intrusive way is just to check when the application starts. (Every time, or every 10th time, or whatever). I know other large apps don't do that, but as a user I really hate it, perceive these apps as "bloat" and avoid when possible. (Example: iTunes). Anything else just clogs up my computer when I'm trying to do something else.
Also, you'd better make sure the code is safe to run; use a digital signature to be sure that the code is really from you. Otherwise you're vulnerable to a man-in-the-middle attack: I could set up an imitation server and send evil code to your users. (Or hack your server and upload evil code to your server for your users to get. Etc. etc.)

Target IIS Worker Processes on Request

Ok, strange setup, strange question. We've got a Client and an Admin web application for our SaaS app, running on asp.net-2.0/iis-6. The Admin application can change options displayed on the Client application. When those options are saved in the Admin we call a Webservice on the Client, from the Admin, to flush our cache of the options for that specific account.
Recently we started giving our Client application >1 Worker Processes, thus causing the cache of options to only be cleared on 1 of the currently running Worker Processes.
So, I obviously have other avenues of fixing this problem (however input is appreciated), but my question is: is there any way to target/iterate through each Worker Processes via a web request?
I'm making some assumptions here for this answer....
I'm assuming the client app is using one of the .NET caching classes to store your application's options?
When you say 'flush' do you mean flush them back to a configuration file or db table?
Because the cache objects and data won't be shared between processes you need a mechanism to signal to the code running on the other worker process that it needs to re-read it's options into its cache or force the process to restart (which is not exactly convenient and most likely undesirable).
If you don't have access to the client source to modify to either watch the options config file or DB table (say using a SqlCacheDependency) I think you're kinda stuck with this behaviour.
I have full access to admin and client, by cache, I mean .net's Cache object. By flush I mean removing the item from the Cache object.
I'm aware that both worker processes don't share the cache data. That's sort of my conundrum)
The system is the way it is to remove the need to hit sql every new-session that comes in. So I'm trying to find a solution that can just tell each worker process that the cache needs to be cleared w/o getting sql involved.