Actual working of libcurl library - libcurl

I need to know how libcurl works.
i.e how it handles connections, is it just a single process that handles all the connections or is there like multiple threads to handle multiple connection, what is the bottleneck in using libcurl to handle 100-200 connections at a time, what is the overhead in using this library etc.
Is there an online documentation that gives all this details? The only documentation that I found was on how to intall, and setup and how to use curl etc.
I am using curl(libcurl) to send http request to a program that is listening on a particular port. I just need to know the bottleneck.

Part1
"how it handles connections, is it just a single process that handles all the connections or is there like multiple threads to handle multiple connection"
libcurl runs in the same thread/process as your application. The multi interface handles parallel connections in that single thread. (With the exception that name resolves may optionally be done using another thread.)
Part2
" what is the bottleneck in using libcurl to handle 100-200 connections at a time, what is the overhead in using this library"
That depends on a multitude of factors. It is much better if you evaluate and test that yourself under your own special conditions. There is no extensive test data or numbers.

Related

For a client server program, what is the best approach to receive multiple client connection requests in parallel?

The program is a client server socket application being developed with C on Linux. There is a remote server to which each client connects and logs itself as being online. There will be most likely be several clients online at any given point of time, all trying to connect to the server to log themselves as being online/busy/idle etc. So how can the server handle these concurrent requests. What's a good design approach (Forking/multithreading for each connection request maybe?)?
personally i would use the event driven approach for servers. there you register a callback that is called as soon as a connection arrives. and event callbacks whenever the socket is ready to read or write.
with a huge amount of connections you will have a great performance and resource benefit compared to threads. But i would also prefere this for a smaler count of connections.
i only would use threads if you really need to use multiple cores or if you have some request that could take longer to process and where it is too complicate to handle it without threads.
i use libev as base library to handle event driven networking.
Generally speaking, you want a thread pool to service requests.
A typical structure will start with a single thread that does nothing but queue up incoming requests. Since it doesn't do very much, it's typically pretty easy for one thread to keep up with the maximum speed of the network.
That puts the items into some sort of concurrent queue. Then you have a pool of other threads reading items from the queue, doing what's needed, then depositing the result in another queue (and repeating, and repeating until the servers shuts down).
Finally, you have another single thread that just takes items from the result queue, and sends replies out to the clients.
Best approach is a combination of event driven model with multithreaded model.
You create a bunch of nonblocking sockets, but threads count should be much fewver. I.e. 10 sockets per thread.
Then you just listen for an event (incoming request) on every thread in a non-blocking mode and process it as it happens.
This technique usually performs better then non-blocking sockets or multithreaded model separately.
Take a look at Comer's "Internetworking with TCP/IP" volume 3 (BSD sockets version), it has detailed examples for different ways of writing servers and clients. The full code (sans explanations, unfortunally) is on the web. Or rummage around in http://tldp.org, there you'll find a collection of tutorials.
select or poll or epoll
These are facilities on *nix systems to aggregate multiple event sources (connections) into a single waiting point. The server adds the connections to a data structure, and then waits by calling select etc. It gets woken up when stuff happens on any of these connections, figures out which one, handles it, and then goes back to sleep. See manual for details.
There are several higher level libraries built on top of these mechanisms, that make programming them somewhat easier e.g. libevent, libev etc.

Looking for best approach to sending the same data to multiple destinations using sockets

Looking for the best approach to sending the same message to multiple destinations using TCP/IP sockets. I'm working with an existing VS 2010 C++ application on Windows. Hoping to use a standard library/design pattern approach that has many of the complexities already worked out if possible.
Here's one approach I'm thinking about.. One main thread retrieves messages from a database and adds them to some sort of thread safe queue. The application also has one thread for each client socket connection to some destination server. Each one of these threads would read from the thread safe queue, and send the message over a tcp/ip socket.
There may be better/simpler/more robust approaches than this one though..
The issues I have to be concerned about mostly are latency. The destinations could be anywhere, and there may be significant latency between one socket connection and another.
The messages must go in an exact FIFO order to all the destinations.
Also one destination will be considered the primary destination.. all messages must get to this destination, no exceptions. For the other destinations, i.e. non-primary, the messages are just copies and it's not absolutely critical if the non-primary destinations do not receive a few messages. At any point, one of the non-primary destinations could become the primary destination. If one of the destinations falls too far behind, then that thread would need to catch up to the primary destination, but skipping some messages.
Looking for any suggestions. Preliminary research so far, my situation appears to be something akin to a single producer and multiple consumers pattern, or possibly master-worker pattern in Java.
I need to implement this in C++ on Windows, and the application must use tcp/ip sockets using an existing defined protocol.
Any help at all would be greatly appreciated.
You need exactly two threads, one that saturates the IO channel to the database and another that saturates the IO channel to the network leading to the 12 servers. Unless you have multiple network interfaces (which you should think about!) you don't send things faster by using multiple threads. Also, since you don't have multiple threads taking care of the network, you don't have to sync them.
What you definitely need to know about is select(). In the case of WinSock, also take a look at WSAEventSelect/WaitForMultipleObjects. Basically, you take a message from the queue and then send it to all clients when they're ready. select() tells you when one of a set of sockets is ready to accept data, so you don't waste time waiting or block trying to send data. What you need to come up with is a schema to reconnect after broken connections, when to drop messages to lagging clients etc. Also, in case the throughput to the different targets varies a lot, you need to think about handling multiple messages in parallel. If they are small (less than a network packet's payload) it makes sense combining them anyway to avoid overhead.
I hope this short overview helps getting you started, otherwise I can elaborate on the details.

Does ZeroMQ allow several server sockets?

The native C socket API returns on accept() a new socket descriptor, which is bound to a certain remote socket. That's good because I can create a thread, pass the socket and establish a point-to-point, or better a thread-to-thread connection over the internet. And that's exactly what I want: one thread from the client should be connected to a destined thread on the server. Hence I dont need a workerpool or loadbalancing not even async operation. The server threads save history. ZeroMQ seems great but as far as I understood it does not split up sockets on accept.
Is there a way to establish such an synchronous thread-to-thread connection with ZerMQ?
You're asking how to replicate a particular solution (handing off a socket to a thread) to a broader problem (how to write scalable servers).
The 'one thread per socket' design only works in one pattern which is request-reply, e.g. HTTP. Whereas the really high volume use cases are for data distribution (publish-subscribe), or task distribution (pipeline). Neither fit a 1-to-1 model.
It is a common error when you learn a new tool to ask, "how does this tool do what my old tools do" but you won't get good results like that. Instead, take the time to actually learn how the tool works, and then use that knowledge to re-think your problems and the best solutions for them.
I thought Zmq handle this multi connection for you; I prefer to create a thread-to-thread communication by handling connection within thread callback function, This mean my main zmq connection created in separate thread; which can make separate connection control within threads.

Best approach for writing a Linux Server in C (phtreads, select or fork ? )

i got a very specific question about server programming in UNIX (Debian, kernel 2.6.32). My goal is to learn how to write a server which can handle a huge amount of clients. My target is more than 30 000 concurrent clients (even when my college mentions that 500 000 are possible, which seems QUIIITEEE a huge amount :-)), but i really don't know (even whats possible) and that is why I ask here. So my first question. How many simultaneous clients are possible? Clients can connect whenever they want and get in contact with other clients and form a group (1 group contains a maximum of 12 clients). They can chat with each other, so the TCP/IP package size varies depending on the message sent.
Clients can also send mathematical formulas to the server. The server will solve them and broadcast the answer back to the group. This is a quite heavy operation.
My current approach is to start up the server. Than using fork to create a daemon process. The daemon process binds the socket fd_listen and starts listening. It is a while (1) loop. I use accept() to get incoming calls.
Once a client connects I create a pthread for that client which will run the communication. Clients get added to a group and share some memory together (needed to keep the group running) but still every client is running on a different thread. Getting the access to the memory right was quite a hazzle but works fine now.
In the beginning of the programm i read out the /proc/sys/kernel/threads-max file and according to that i create my threads. The amount of possible threads according to that file is around 5000. Far away from the amount of clients i want to be able to serve.
Another approach i consider is to use select () and create sets. But the access time to find a socket within a set is O(N). This can be quite long if i have more than a couple of thousands clients connected. Please correct me if i am wrong.
Well, i guess i need some ideas :-)
Groetjes
Markus
P.S. i tag it for C++ and C because it applies to both languages.
The best approach as of today is an event loop like libev or libevent.
In most cases you will find that one thread is more than enough, but even if it isn't, you can always have multiple threads with separate loops (at least with libev).
Libev[ent] uses the most efficient polling solution for each OS (and anything is more efficient than select or a thread per socket).
You'll run into a couple of limits:
fd_set size: This is changable at compile time, but has quite a low limit by default, this affects select solutions.
Thread-per-socket will run out of steam far earlier - I suggest putting the longs calculations in separate threads (with pooling if required), but otherwise a single thread approach will probably scale.
To reach 500,000 you'll need a set of machines, and round-robin DNS I suspect.
TCP ports shouldn't be a problem, as long as the server doesn't connection back to the clients. I always seem to forget this, and have to be reminded.
File descriptors themselves shouldn't be too much of a problem, I think, but getting them into your polling solution may be more difficult - certainly you don't want to be passing them in each time.
I think you can use the event model(epoll + worker threads pool) to solve this problem.
first listen and accept in main thread, if the client connects to the server, the main thread distribute the client_fd to one worker thread, and add epoll list, then this worker thread will handle the reqeust from the client.
the number of worker thread can be configured by the problem, and it must be no more the the 5000.

Interprocess Communication in C++

I have a simple c++ application that generates reports on the back end of my web app (simple LAMP setup). The problem is the back end loads a data file that takes about 1.5GB in memory. This won't scale very well if multiple users are running it simultaneously, so my thought is to split into several programs :
Program A is the main executable that is always running on the server, and always has the data loaded, and can actually run reports.
Program B is spawned from php, and makes a simple request to program A to get the info it needs, and returns the data.
So my questions are these:
What is a good mechanism for B to ask A to do something?
How should it work when A has nothing to do? I don't really want to be polling for tasks or otherwise spinning my tires.
Use a named mutex/event, basically what this does is allows one thread (process A in your case) to sit there hanging out waiting. Then process B comes along, needing something done, and signals the mutex/event this wakes up process A, and you proceed.
If you are on Microsoft :
Mutex, Event
Ipc on linux works differently, but has the same capability:
Linux Stuff
Or alternatively, for the c++ portion you can use one of the boost IPC libraries, which are multi-platform. I'm not sure what PHP has available, but it will no doubt have something equivalent.
Use TCP sockets running on localhost.
Make the C++ application a daemon.
The PHP front-end creates a persistent connection to the daemon. pfsockopen
When a request is made, the PHP sends a request to the daemon which then processes and sends it all back. PHP Sockets C++ Sockets
EDIT
Added some links for reference. I might have some really bad C code that uses sockets of interprocess communication somewhere, but nothing handy.
IPC is easy on C++, just call the POSIX C API.
But what you're asking would be much better served by a queue manager. Make the background daemon wait for a message on the queue, and the frontend PHP just add there the specifications of the task it wants processed. Some queue managers allow the result of the task to be added to the same object, or you can define a new queue for the finish messages.
One of the best known high-performance queue manager is RabbitMQ. Another one very easy to use is MemcacheQ.
Or, you could just add a table to MySQL for tasks, the background process just queries periodically for unfinished ones. This works and can be very reliable (sometimes called Ghetto queues), but break down at high tasks/second.