linux openssl simple client non-blocking bio over already existing socket, buffer, and epoll abstraction - c++

Is there simple example code that shows how to create a non-blocking network-based bio from scratch? I do not need to verify right now or renegotiate, just get data flowing back and forth first.
I'm trying to use openssl on top of an already existing abstraction of epoll, sockets, and buffers. I already have existing machinery for all of that and am trying to create a BIO over it, but I cannot for the life of me get it to work. I already have an established TCP connection, so I need to insert that into a source/sink bio and then do the handshake.
The current state is the dreaded scenario where SSL_connect returns -1, SSL_get_error returns 5 (syscall error), and errno reads SUCCESS. I have seen others have the same problem, but not a single answer.
The reason for doing this instead of just using a mem bio to shuttle bytes to back forth is because the rest of the stack is fairly well optimized, and I don't want to do the extra copying.
My first idea what just to implement a bio over the in and out buffers I already maintain, but I cannot get that work either. There are a lot questions floating around this site and others. Some have outdated answers, but most just don't have answers that work when they do have answers at all.

Well, interesting. I would have made this a comment but it's too long.
I think this might boil down to the version of OpenSSL you are using. I took my existing blocking socket implementation and did this (sorry, it's a bit quick and dirty and doesn't bother to clear the error stack as it should, but that doesn't seem to be causing a problem. The busy loop is intentional, to hammer OpenSSL as hard as possible, but I also tried it with a short sleep and the result was the same):
u_long non_blocking = 1;
int ioctl_err = ioctlsocket (skt, FIONBIO, &non_blocking);
assert (ioctl_err == 0);
int connect_result = 0;
int connect_err = 0;
for ( ; ; )
{
connect_result = SSL_connect (ssl);
if (connect_result >= 0)
break;
connect_err = SSL_get_error (ssl, connect_result);
if (connect_err != SSL_ERROR_WANT_READ && connect_err != SSL_ERROR_WANT_WRITE)
break; // I put a breakpoint here; it was never hit
}
u_long non_blocking = 0; // so that the rest of my code still works
ioctl_err = ioctlsocket (skt, FIONBIO, &non_blocking);
if (connect_result <= 0) // this never happened either
do_something_appropriate ();
// Various (synchronous) calls to `SSL_read` and `SSL_write`, these all worked fine
Now I know this isn't exactly what you are doing, but from OpenSSL's point of view that shouldn't matter. So, for me, I can get it to work.
Testing environment:
Windows (sorry!)
OpenSSL 3.0.1 (which, if not the latest, is not far off)
tl;dr If you're using the OpenSSL libraries that came with your Linux installation, it might be time to move on. Plus, if you build it from source you can build it with debug info, which might come in handy some time (I've actually built two versions for exactly this reason - one optimised and one not).
So that's it. HTH. Looks like you might be OK after all.
PS: I'm obviously not doing anything with BIOs here, and maybe your problem lies there instead. If so, we need some self-contained sample code using those which exhibits the problems you are having. Then, perhaps, someone can suggest a solution.

Define BIO method:
BIO_meth_new
BIO_meth_set_write_ex
BIO_meth_set_read_ex
BIO_meth_set_ctrl
BIO_meth_free
Create BIO:
BIO_new
BIO_set_data
BIO_free
Associate BIO with SSL:
SSL_set_bio

Related

I need help figuring out tcp sockets (clsocket)

I am having trouble figuring out sockets i am just asking the server for data at a position (glm::i64vec4) and expecting a response but the position gets way off when i get the response and the data for that position reflects that (aka my voxel game make a kinda cool looking but useless mess)
It's probably just me not understanding sockets whatsoever or maybe something weird with this library
one thought i had is it was maybe something to do with mismatching blocking and non blocking on the server and client
but when i switched the server to blocking (and put each client in a seperate thread from each other and the accepting process) it did nothing
if i'm doing something really stupid please tell me i know next to nothing about sockets
here is some code that probably looks horrible
Server Code
std::deque <CActiveSocket*> clients;
CPassiveSocket socket;
socket.Initialize();
socket.SetNonblocking();//I'm doing this so i don't need multiple threads for clients
socket.Listen("0.0.0.0",port);
while (1){
{
CActiveSocket* c;
if ((c = socket.Accept()) != NULL){
clients.emplace_back(c);
}
}
for (CActiveSocket*& c : clients){
c->Receive(sizeof(glm::i64vec4));
if (c->GetBytesReceived() == sizeof(glm::i64vec4)){
chkpkt chk;
chk.pos = *(glm::i64vec4*)c->GetData();
LOOP3D(chksize+2){
chk.data(i,j,k).val = chk.pos.y*chksize+j;
chk.data(i,j,k).id=0;
}
while (c->Send((uint8*)&chk,sizeof(chkpkt)) != sizeof(chkpkt)){}
}
}
}
Client Code
//v is a glm::i64vec4
//fsock is set to Blocking
if(fsock.Send((uint8*)&v,sizeof(glm::i64vec4)))
if (fsock.Receive(sizeof(chkpkt))){
tthread::lock_guard<tthread::fast_mutex> lock(wld->filemut);
wld->ichks[v]=(*(chkpkt*)fsock.GetData()).data;//i tried using the position i get back from the server to set this (instead of v) but that made it to where nothing loaded
//i checked it and the chunks position never lines up with what i sent
}
Without your complete application codes it's unrealistic to offer any suggestions of any particular lines of code correction.
But it seems like you are using this library. It doesn't matter if not, because most of time when doing network programming, socket's weird behavior make some problems somewhat universal. Thus there are a few suggestions for the portion of socket application in your project:
It suffices to have BLOCKING sockets.
Most of time socket's read have somewhat weird behavior, that is, it might not receive the requested size of bytes at a time. Due to this, you need to repeatedly call read until the receiving buffer is read thoroughly. For a complete and robust solution you can refer to Stevens's readn routine ([Ref.1], page122).
If you are using exactly the library mentioned above, you can see that your fsock.Receive eventually calls recv. And recv is just an variant of read[Ref.2], thus the solutions for both of them are just identical. And this pattern might help:
while(fsock.Receive(sizeof(chkpkt))>0)
{
// ...
}
Ref.1: https://mathcs.clarku.edu/~jbreecher/cs280/UNIX%20Network%20Programming(Volume1,3rd).pdf
Ref.2: https://man7.org/linux/man-pages/man2/recv.2.html#DESCRIPTION

C++ Disable Delayed Ack on Windows

I am trying to replicate a real time application on a windows computer to be able to debug and make changes easier, but I ran into issue with Delayed Ack. I have already disabled nagle and confirmed that it improve the speed a bit. When sending a lots of small packets, window doesn't ACK right away and delay it by 200 ms. Doing more research about it, I came across this. Problem with changing the registry value is that, it will affect the whole system rather than just the application that I am working with. Is there anyway to disable delayed ACK on window system like TCP_QUICKACK from linux using setsockopt? I tried hard coding 12, but got WSAEINVAL from WSAGetLastError.
I saw some dev on github that mentioned to use SIO_TCP_SET_ACK_FREQUENCY but I didn't see any example on how to actually use it.
So I tried doing below
#define SIO_TCP_SET_ACK_FREQUENCY _WSAIOW(IOC_VENDOR,23)
result = WSAIoctl(sock, SIO_TCP_SET_ACK_FREQUENCY, 0, 0, info, sizeof(info), &bytes, 0, 0);
and I got WSAEFAULT as an error code. Please help!
I've seen several references online that TCP_QUICKACK may actually be supported by Winsock via setsockopt() (opt 12), even though it is NOT documented, or officially confirmed anywhere.
But, regarding your actual question, your use of SIO_TCP_SET_ACK_FREQUENCY is failing because you are not providing any input buffer to WSAIoctl() to set the actual frequency value. Try something like this:
int freq = 1; // can be 1..255, default is 2
result = WSAIoctl(sock, SIO_TCP_SET_ACK_FREQUENCY, &freq, sizeof(freq), NULL, 0, &bytes, NULL, NULL);
Note that SIO_TCP_SET_ACK_FREQUENCY is available in Windows 7 / Server 2008 R2 and later.

libircclient : Selective connection absolutely impossible to debug

I'm not usually the type to post a question, and more to search why something doesn't work first, but this time I did everything I could, and I just can't figure out what is wrong.
So here's the thing:
I'm currently programming an IRC Bot, and I'm using libircclient, a small C library to handle IRC connections. It's working pretty great, it does the job and is kinda easy to use, but ...
I'm connecting to two different servers, and so I'm using the custom networking loop, which uses the select function. On my personal computer, there's no problem with this loop, and everything works great.
But (Here's the problem), on my remote server, where the bot will be hosted, I can connect to one server but not the other.
I tried to debug everything I could. I even went to examine the sources of libircclient, to see how it worked, and put some printfs where I could, and I could see where does it comes from, but I don't understand why it does this.
So here's the code for the server (The irc_session_t objects are encapsulated, but it's normally kinda easy to understand. Feel free to ask for more informations if you want to):
// Connect the first session
first.connect();
// Connect the osu! session
second.connect();
// Initialize sockets sets
fd_set sockets, out_sockets;
// Initialize sockets count
int sockets_count;
// Initialize timeout struct
struct timeval timeout;
// Set running as true
running = true;
// While the server is running (Which means always)
while (running)
{
// First session has disconnected
if (!first.connected())
// Reconnect it
first.connect();
// Second session has disconnected
if (!second.connected())
// Reconnect it
second.connect();
// Reset timeout values
timeout.tv_sec = 1;
timeout.tv_usec = 0;
// Reset sockets count
sockets_count = 0;
// Reset sockets and out sockets
FD_ZERO(&sockets);
FD_ZERO(&out_sockets);
// Add sessions descriptors
irc_add_select_descriptors(first.session(), &sockets, &out_sockets, &sockets_count);
irc_add_select_descriptors(second.session(), &sockets, &out_sockets, &sockets_count);
// Select something. If it went wrong
int available = select(sockets_count + 1, &sockets, &out_sockets, NULL, &timeout);
// Error
if (available < 0)
// Error
Utils::throw_error("Server", "run", "Something went wrong when selecting a socket");
// We have a socket
if (available > 0)
{
// If there was something wrong when processing the first session
if (irc_process_select_descriptors(first.session(), &sockets, &out_sockets))
// Error
Utils::throw_error("Server", "run", Utils::string_format("Error with the first session: %s", first.get_error()));
// If there was something wrong when processing the second session
if (irc_process_select_descriptors(second.session(), &sockets, &out_sockets))
// Error
Utils::throw_error("Server", "run", Utils::string_format("Error with the second session: %s", second.get_error()));
}
The problem in this code is that this line:
irc_process_select_descriptors(second.session(), &sockets, &out_sockets)
Always return an error the first time it's called, and only for one server. The weird thing is that on my Windows computer, it works perfectly, while on the Ubuntu server, it just doesn't want to, and I just can't understand why.
I did some in-depth debug, and I saw that libircclient does this:
if (session->state == LIBIRC_STATE_CONNECTING && FD_ISSET(session->sock, out_set))
And this is where everything goes wrong. The session state is correctly set to LIBIRC_STATE_CONNECTING, but the second thing, FD_ISSET(session->sock, out_set) always return false. It returns true for the first session, but for the second session, never.
The two servers are irc.twitch.tv:6667 and irc.ppy.sh:6667. The servers are correctly set, and the server passwords are correct too, since everything works fine on my personal computer.
Sorry for the very long post.
Thanks in advance !
Alright, after some hours of debug, I finally got the problem.
So when a session is connected, it will enter in the LIBIRC_STATE_CONNECTING state, and then when calling irc_process_select_descriptors, it will check this:
if (session->state == LIBIRC_STATE_CONNECTING && FD_ISSET(session->sock, out_set))
The problem is that select() will alter the sockets sets, and will remove all the sets that are not relevant.
So if the server didn't send any messages before calling the irc_process_select_descriptors, FD_ISSET will return 0, because select() thought that this socket is not relevant.
I fixed it by just writing
if (session->state == LIBIRC_STATE_CONNECTING)
{
if(!FD_ISSET(session->sock, out_set))
return 0;
...
}
So it will make the program wait until the server has sent us anything.
Sorry for not having checked everything !

getaddrinfo() fails continuously with EAI_AGAIN

In my code I am using the code as follows.
do
{
r = getaddrinfo(host, service, &hints, ret);
}
while (r == EAI_AGAIN);
when testing getaddrinfo() continuously fails thus loop not terminates properly.
Do you see any way to improve the code? can we use counter to count for number of times it should loop?
Also please let me know for what are all the reasons "EAI_AGAIN" returned by getaddrinfo() call.
Here is, admittedly, a wild guess.
We're also seeing this on a slightly underpowered single core embedded system.
I assume (in our case dnsmasq) is running in a separate process, and for whatever reason (probably because we're running around in circles chasing our tails) it doesn't get enough resources (cpu/ram/...) to do its job.
A wild guess at a solution might be to put a sleep into that tight loop and let the DNS caching magic at the resources it needs to do it's work.
I will let you know if it works.

Win32 Overlapped Readfile on COM Port returning ERROR_OPERATION_ABORTED

Ok, one for the SO hive mind...
I have code which has - until today - run just fine on many systems and is deployed at many sites. It involves threads reading and writing data from a serial port.
Trying to check out a new device, my code was swamped with 995 ERROR_OPERATION_ABORTED errors calling GetOverlappedResult after the ReadFile. Sometimes the read would work, othertimes I'd get this error. Just ignoring the error and retrying would - amazingly - work without dropping any data. No ClearCommError required.
Here's the snippet.
if (!ReadFile(handle,&c,1,&read, &olap))
{
if (GetLastError() != ERROR_IO_PENDING)
{
logger().log_api(LOG_ERROR,"ser_rx_char:ReadFile");
throw Exception("ser_rx_char:ReadFile");
}
}
WaitForSingleObjectEx(r_event, INFINITE, true); // alertable, so, thread can be closed correctly.
if (GetOverlappedResult(handle,&olap,&read, TRUE) != 0)
{
if (read != 1)
throw Exception("ser_rx_char: no data");
logger().log(LOG_VERBOSE,"read char %d ( read = %d) ",c, read);
}
else
{
DWORD err = GetLastError();
if (err != 995) //Filters our ERROR_OPERATION_ABORTED
{
logger().log_api(LOG_ERROR,"ser_rx_char: GetOverlappedResult");
throw Exception("ser_rx_char:GetOverlappedResult");
}
}
My first guess is to blame the COM port driver, which I havent' used before (it's a RS422 port on a Blackmagic Decklink, FYI), but that feels like a cop-out.
Oh, and Vista SP1 Business 32-bit, for my sins.
Before I just put this down to "Someone else's problem", does anyone have any ideas of what might cause this?
How are you setting over the OVERLAPPED structure before the ReadFile? - I always zero them (other than the hEvent, obviously), which is perhaps part superstition, but I have a feeling that it's caused me a problem in the past.
I'm afraid blaming the driver (if it's non-MS and not just a tiny tweak from the reference) is not completely unrealistic. To write a COM driver is an incredibly complex thing, and the difficulty with testing it is that every application ever written uses the serial ports and their IOCTLs slightly differently.
Another common problem is not to set the whole port up - for example not calling SetCommTimeouts or SetupComm. I've no idea if you're making this sort of mistake, but I have met people who say they're not using timeouts when they actually mean that they didn't call SetCommTimeouts so they're using them but don't have a notion what they're set to...
This kind of stuff can be murder for 3rd-party COM drivers, because people have often got away with any old crap with the MS driver, and it doesn't always work the same with another device.
in addition to zeroing the OVERLAPPED, you might also check how you're setting olap.hEvent, that is, what are your arguments to CreateEvent? If you're creating an event that's pre-signalled (i.e. the third argument to CreateEvent is TRUE) I would expect an immediate return. Also, don't forget that if you specify manualReset (the second argument to CreateEvent) as FALSE, GetOverlappedResult() will helpfully clear the event for you - which might explain why it works the second time around.
Can't really tell from your snippet whether either of these affect you - hope this helps.