Process Handling in cgi - c++

I am doing simple web programming using c++ in Apache, Linux. I created a cgi script called signup.cgi. This program gets input from browser and write the data in a file called users.txt.
My question is, when two user access the signup.cgi, will it be create two different processes or only one process?
Case 1: Will it be two different Processes to access users.txt?
User1 ----> singup.cgi -----> Pid1 ----> users.txt
User1 ----> singup.cgi -----> Pid2 ----> users.txt
(or)
Case 2: Will it be only one process to access users.txt?
User1 ----> singup.cgi -----> Pid1 ----> users.txt
User1 ----> singup.cgi -----> Pid1 ----> users.txt
If It is two different processes access users.txt in same time, data in users.txt will be corrupt. How can I handle this Issue?
If It is only one process to access users.txt, I don't know what are the problems that I may get?

Apache's use of processes depends on how your server and your CGI "script" are configured. According to RFC 3875:
The CGI "script" that is invoked by the server can be a standalone program, a
dynamically-loaded or shared library, a subroutine in the (Apache software) server, or an interpreted script (See section 1.4)
"The most common implementation of CGI invokes the script as a child
process using the same user and group as the server process." (See section 9.5). But the FastCGI variant, which manages a pool of processes, to avoid the overhead of launching new processes for each request.
The script should be stateless. See section 9.7:
The stateless nature of the Web makes each script execution and resource retrieval independent of all
others even when multiple requests constitute a single conceptual
Web transaction. Because of this, a script should not make any
assumptions about the context of the user-agent submitting a
request.
To reinforce this fundamental recommendation, think that in scalable operations you may have a load balancer that routes incoming http request to one of many apache server, and the apache server could route the requests to one or of several FastCGI services. If you think stateless, you'll be safe !
So in conclusion: in your CGI program you can't assume nothing about processes and users (you have to manage sessions if you have to put related requests together).
And yes, if several processes write to the same file in the same moment, you could get real garbage in your file. You'll have to use OS level interprocess synchronisation mechanisms to manage that. Semaphores or file locks could sequentialize file access but decrease performance. memory mapped files could help in an easier fashion.
But to overcome this limitation, you'll have to achieve an architectural quantum leap. The topic is far too broead to be developped here, but for example:
use of a message queue: each process sends data to a message queue (in another process) that will handle it as soon as it can, but without delaying the request processing.
use a service oriented architecture, where each process routes request to a service, such as for example a database or object persistence layer.

Related

Multiprocess web server with ocaml

I want to make webserver with ocaml. It will have REST interface and will have no dependencies (just searching in constant data loaded to RAM on process startup) and serve read only queries (which can be served from any node - result will be the same).
I love OCaml, however, I have one problem that it can only process using on thread at a time.
I think of scaling just by having nginx in front of it and load balance to multiple process instances running on different ports on the same server.
I don't think I'm the only one running into this issue, what would be the best tool to keep running few ocaml processes at a time and to ensure that if any of them crash they would be restarted and have different ports from each other (to load balance between them)?
I was thinking about standard linux service but I don't want to create like 4 hardcoded records and call service start webserver1 on each of them.
Is there a strong requirement for multiple operating system processes? Otherwise, it seems like you could just use something like cohttp with either lwt or async to handle concurrent requests in the same OS process, using multiple threads (and an event-loop).
As you mentioned REST, you might be interested in ocaml-webmachine which is based on cohttp and comes with well-commented examples.

Isapi filter - state

I have a isapi filer and I want to add a logic based on the incoming domain ( my server farm hosts many domains).
There domain list is dynamic , I can export these domain list into a text file and read it from the isapi , but is there a way to keep this file in memory (is array or linked list) to save the IO call.
similar to global application state .
How are your worker processes distributed across your servers? Do you have one server with one worker process, or multiple servers?
If you have one server with one worker process, you can just read the file into a static array or string to manage it (just make sure you account for concurrent threads reading/modifying it simultaneously)
If you have multiple worker processes on just one server, you can use named shared memory. I've used this before in ISAPI filters to share information, and it works pretty well. It should even take care of concurrency for you. You can read more here: http://msdn.microsoft.com/en-us/library/aa366551%28v=vs.85%29.aspx
If you're spread across multiple servers, you could use a distributed cache like memcached. This is more complex to set up, but it'll give you good performance. There's a thread on setting this up here: C++ api for memcache

How do I handle blocking IO in mod_wsgi/django?

I am running Django under Apache+mod_wsgi in daemon mode with the following config:
WSGIDaemonProcess myserver processes=2 threads=15
My application does some IO on the backend, which could take several seconds.
def my_django_view:
content=... # Do some processing on backend file
return HttpResponse(content)
It appears that if I am processing more than 2 http requests that are handling this kind of IO, Django will simply block until one of the previous requests completes.
Is this expected behavior? Shouldn't threading help alleviate this i.e. shouldn't I be able to process up to 15 separate requests for a given WSGI process, before I see this kind of wait?
Or am I missing something here?
If the processing is in python, then Global Interpreter Lock is not being released -- in a single python process only one thread of python code can be executing at a time. The GIL is usually released inside C code though -- like most I/O, for example.
If this kind of processing is going to happen a lot, you might consider running a second "worker" application as a deamon, reading tasks from the database, performing the operations and writing resulsts back to the database. Apache might decide to kill processes that take too long to respond.
+1 to Radomir Dopieralski's answer.
If the task takes long you should delegate it to a process outside the request-response cycle, either by using a standard cron, or some distributed task queue like Celery
Databases for workload offloading were quite the thing in 2010, and a good idea then, but we've come a bit farther now.
We're using Apache Kafka as a queue to store our in-flight workload. So, Dataflow is now:
User -> Apache httpd -> Kafka -> python daemon processor
User post operation puts data into system to be processed via wsgi app that just writes it very fast to a Kafka queue. Minimal sanity checking is done in the post operation to keep it fast but find some obvious problems. Kafka stores the data very fast so the http response is zippy.
A separate set of python daemons pull data from Kafka and do processing on it. We actually have multiple processes that need to process it differently, but Kafka makes that fast by only writing once and having multiple readers read the same data if needed; no penalty for duplicate storage is incurred.
This allows very, very fast turnaround; optimal resource usage since we have other boxes offline handle the pull-from-kafka and can tune that to reduce lag as needed. Kafka is HA with same data written to multiple boxes in the cluster so my manager doesn't complain about 'what happens if' scenarios.
We're quite happy with Kafka. http://kafka.apache.org

Forcing asmx web service to handle requests one at a time

I am debugging an ASMX web service that receives "bursts" of requests. i.e., it is likely that the web service will receive 100 asynchronous requests within about 1 or 2 seconds. Each request seems to take about a second to process (this is expected and I'm OK with this performance). What is important however, is that each request is dealt with sequentially and no parallel processing takes places. I do not want any concurrent request processing due to the external components called by the web service. Is there any way I can force the web service to only handle each response sequentially?
I have seen the maxconnection attribute in the machine.config but this seems to only work for outbound connections, where as I wish to throttle the incoming connections.
Please note that refactoring into WCF is not an option at this point in time.
We are usinng IIS6 on Win2003.
What I've done in the past is to simply put a lock statement around any access to the external resource I was using. In my case, it was a piece of unmanaged code that claimed to be thread-safe, but which in fact would trash the C runtime library heap if accessed from more than one thread at a time.
Perhaps you should be queuing the requests up internally and processing them one by one?
It may cause the clients to poll for results (if they even need them), but you'd get the sequential pipeline you wanted...
In IIS7 you can set up a limit of connections allowed to a web site. Can you use IIS7?

Target IIS Worker Processes on Request

Ok, strange setup, strange question. We've got a Client and an Admin web application for our SaaS app, running on asp.net-2.0/iis-6. The Admin application can change options displayed on the Client application. When those options are saved in the Admin we call a Webservice on the Client, from the Admin, to flush our cache of the options for that specific account.
Recently we started giving our Client application >1 Worker Processes, thus causing the cache of options to only be cleared on 1 of the currently running Worker Processes.
So, I obviously have other avenues of fixing this problem (however input is appreciated), but my question is: is there any way to target/iterate through each Worker Processes via a web request?
I'm making some assumptions here for this answer....
I'm assuming the client app is using one of the .NET caching classes to store your application's options?
When you say 'flush' do you mean flush them back to a configuration file or db table?
Because the cache objects and data won't be shared between processes you need a mechanism to signal to the code running on the other worker process that it needs to re-read it's options into its cache or force the process to restart (which is not exactly convenient and most likely undesirable).
If you don't have access to the client source to modify to either watch the options config file or DB table (say using a SqlCacheDependency) I think you're kinda stuck with this behaviour.
I have full access to admin and client, by cache, I mean .net's Cache object. By flush I mean removing the item from the Cache object.
I'm aware that both worker processes don't share the cache data. That's sort of my conundrum)
The system is the way it is to remove the need to hit sql every new-session that comes in. So I'm trying to find a solution that can just tell each worker process that the cache needs to be cleared w/o getting sql involved.