I'm designing a component to be run as a backend of a website. The component will take care of some AI logic, and I'm building it under C++. Would it be best if I let each session start a new EXE address space, or the EXE would be up and running and each session will start a new thread?
Or is there is any better suggestions alltogether?
I would be better to keep a process alive and create a new thread for each 'session': if you are looking for good performances under heavy load, starting a new process (fork, initialization of your app, etc.) will be really slow and can constitute a bottleneck.
Compared to that, creation of a new thread (in user space) is much lighter.
Even better, you can also keep the process running, and create a pool of threads. Then a 'manager' thread will process new connections, assign it to an existing thread and start it. In that case, you don't even need to create a new thread for each new connection. And if needed, the manager thread can adapt the number of existing threads to the load of your application.
Edit:
This can be useful: Apache MPM model
Related
I am using Qt SQL which is blocking API so I have to execute SQL code in Separate thread (QtConcurrent::run) and return (Q)future.
something like this:-
QFuture<QString> future = QtConcurrent::run( []() { /* some SQL code */ } );
auto watcher = new QFutureWatcher<QString>();
watcher.setFuture(future);
connect(watcher,&QFutureWatcher<QString>::finished,
[future](){ /* code to execute after future is finished */ });
But I learned that Threading is costly. every context switch is expensive. So it looks like CPU wastage to create new Thread just for waiting for result from MySQL server. My application is going to run on single core Virtual Machine on Google Cloud anyways . it there any way I can execute Qt SQL code asynchronusly without possibly creating new thread ?
I was also wondering how other APIs like Qt Networking implement asynchronus API without create new thread ? or i am wrong and they do create new thread under the hood ?
Many threaded applications run on a single core. Flushing cache to run on a separate core is also expensive. Use the right tool for the job. There's nothing wrong with threads.
That said, if you really want to run on a single thread use a workqueue to keep track of async task progress. The libevent library does this for you, but there are others. You just run a polling loop adding work onto the queue and executing callbacks when a task needs attention or completes.
By using QtConcurrent::run you already solved one problem - cost of creating thread because it use a thread pool.
When comes to context switches, first you could try to measure them with perf stat. And depends on situation, optimize it. If its just simple queries then probably vast majority of context switches comes from the system, not your app.
Doing something async means that you can execute task and move forward with your current code without waiting for results. But usually such task i.e sql query will spawn thread/process or will make request to OS.
Qt Networking make i.e read request and OS signals (epoll) when data will arrive. But in case of single core OS will interrupt your thread anyway.
If you have many many small queries you could try optimize them to make less queries, do caching.
I was reading about the C++ threading. I encounter a example where an DocumentEditor was created. In the document editor whenever user opens a new document a new thread is created and that thread is immediately detached.
That detached thread would become a deamon thread when the document editing task will complete.
So my question is that if the user keeps the application opens for days and keep on creating new documents say 100s of them than the deamon thread count will keep on increasing ?
Or the deamons will be destroyed when the process is less on resources ?
I think you're talking about the book Practical Multithreading. The writer there was just giving an example of how threads can be useful and how detach can be used.
The writer didn't intend on covering every corner case. He just was giving an example on how detaching threads could be used. It's up to you how to deal with limited resources. It's like giving you an M6 screw and a screw-driver, and then you decide what to do with them. You can use the screw for a lamp, or a computer, or even maybe misuse it and put it in an M5 hole and break stuff. The context of using the screw and the screw-driver is a different one, and me giving an example about a lamp doesn't mean I'm explaining how a lamp works and it's electricity consumption, just like the context of having multiple threads is different than how you manage resources. It's up to you and up to your application's special case.
How to restrict proccess to create new processes?
You could assign the process to a job object. Use SetInformationJobObject with the JOB_OBJECT_LIMIT_ACTIVE_PROCESS flag to limit the number of processes in that job object to one. Do NOT set the JOB_OBJECT_LIMIT_BREAKAWAY_OK (which would allow the process to create processes that were not part of the job object).
The process could still work around that, such as by starting a new process via the task scheduler or WMI. If you're trying to do something like create a sandbox to run code you really don't trust, this won't adequate. If you have a program that you trust, but just want to place a few limits on what it does, this should be more than adequate.
To put that slightly differently, this is equivalent to locking your car. Somebody can break in (or out, in this case), but at least they have to do a bit more than just walk in unhindered.
On Windows, there isn't a way to stop a processing from spawning other processes. Nor is there on any operating system I know of.
The CreateProcess() system call is available to all processes, thus any process can create a child process.
You could run the process in a sandbox which restricts process creation, but the overhead for this is probably more than you want.
Can I ask why you want to do such a thing?
Use NT Job objects
JOBOBJECT_BASIC_LIMIT_INFORMATION can limit the number of active processes, or use JOBOBJECT_ASSOCIATE_COMPLETION_PORT and kill the new process (If you only need to kill a subset of all new processes)
I have a simple c++ application that generates reports on the back end of my web app (simple LAMP setup). The problem is the back end loads a data file that takes about 1.5GB in memory. This won't scale very well if multiple users are running it simultaneously, so my thought is to split into several programs :
Program A is the main executable that is always running on the server, and always has the data loaded, and can actually run reports.
Program B is spawned from php, and makes a simple request to program A to get the info it needs, and returns the data.
So my questions are these:
What is a good mechanism for B to ask A to do something?
How should it work when A has nothing to do? I don't really want to be polling for tasks or otherwise spinning my tires.
Use a named mutex/event, basically what this does is allows one thread (process A in your case) to sit there hanging out waiting. Then process B comes along, needing something done, and signals the mutex/event this wakes up process A, and you proceed.
If you are on Microsoft :
Mutex, Event
Ipc on linux works differently, but has the same capability:
Linux Stuff
Or alternatively, for the c++ portion you can use one of the boost IPC libraries, which are multi-platform. I'm not sure what PHP has available, but it will no doubt have something equivalent.
Use TCP sockets running on localhost.
Make the C++ application a daemon.
The PHP front-end creates a persistent connection to the daemon. pfsockopen
When a request is made, the PHP sends a request to the daemon which then processes and sends it all back. PHP Sockets C++ Sockets
EDIT
Added some links for reference. I might have some really bad C code that uses sockets of interprocess communication somewhere, but nothing handy.
IPC is easy on C++, just call the POSIX C API.
But what you're asking would be much better served by a queue manager. Make the background daemon wait for a message on the queue, and the frontend PHP just add there the specifications of the task it wants processed. Some queue managers allow the result of the task to be added to the same object, or you can define a new queue for the finish messages.
One of the best known high-performance queue manager is RabbitMQ. Another one very easy to use is MemcacheQ.
Or, you could just add a table to MySQL for tasks, the background process just queries periodically for unfinished ones. This works and can be very reliable (sometimes called Ghetto queues), but break down at high tasks/second.
I'm building my first web application after many years of desktop application development (I'm using Django/Python but maybe this is a completely generic question, I'm not sure). So please beware - this may be an ultra-newbie question...
One of my user processes involves heavy processing in the server (i.e. user inputs something, server needs ~10 minutes to process it). On a desktop application, what I would do it throw the user input into a queue protected by a mutex, and have a dedicated background thread running in low priority blocking on the queue using that mutex.
However in the web application everything seems to be oriented towards synchronization with the HTTP requests.
Assuming I will use the database as my queue, what is best practice architecture for running a background process?
There are two schools of thought on this (at least).
Throw the work on a queue and have something else outside your web-stack handle it.
Throw the work on a queue and have something else in your web-stack handle it.
In either case, you create work units in a queue somewhere (e.g. a database table) and let some process take care of them.
I typically work with number 1 where I have a dedicated windows service that takes care of these things. You could also do this with SQL jobs or something similar.
The advantage to item 2 is that you can more easily keep all your code in one place--in the web tier. You'd still need something that triggers the execution (e.g. loading the web page that processes work units with a sufficiently high timeout), but that could be easily accomplished with various mechanisms.
Since:
1) This is a common problem,
2) You're new to your platform
-- I suggest that you look in the contributed libraries for your platform to find a solution to handle the task. In addition to queuing and processing the jobs, you'll also want to consider:
1) status communications between the worker and the web-stack. This will enable web pages that show the percentage complete number for the job, assure the human that the job is progressing, etc.
2) How to ensure that the worker process does not die.
3) If a job has an error, will the worker process automatically retry it periodically?
Will you or an operations person be notified if a job fails?
4) As the number of jobs increase, can additional workers be added to gain parallelism?
Or, even better, can workers be added on other servers?
If you can't find a good solution in Django/Python, you can also consider porting a solution from another platform to yours. I use delayed_job for Ruby on Rails. The worker process is managed by runit.
Regards,
Larry
Speaking generally, I'd look at running background processes on a different server, especially if your web server has any kind of load.
Running long processes in Django: http://iraniweb.com/blog/?p=56