How to get ZeroMQ Timestamp? - c++

I'm writing a C++/ZMQ script that has a subscriber getting data from a publisher run by a separate script. I can't edit the publisher code, and I need to get the time that the ZeroMQ subscriber receives a message.
Basically, I have:
void *zmq_subscriber_ = zmq_socket( context, ZMQ_SUB );
zmq_setsockopt( zmq_subscriber_, ZMQ_SUBSCRIBE, NULL, 0 );
while ( ( zmq_msg_recv( &msg, zmq_subscriber_, ZMQ_DONTWAIT ) ) < 0 )
{ usleep( 1000 ); }
I need to know when the subscriber receives the message. Is there a way to get this information from ZeroMQ? Thanks in advance to anyone that can help!

Is there a way to get this information from ZeroMQ ?
Fortunately not directly from ZeroMQ API as-is ( in 2018/Q2 ).
Any options?
Given a coarse TimeDOMAIN resolution is fine, just store a Timestamp every time your code re-loops the while(){...; <here> } codeblock. This approach has a blind-spot of about the usleep()-duration - a latency, where a more precise moment of the receipt is undecideable.
Given this does not suffice, start using a non-blocking mode of a Poller.poll() method, and reduce any such latency to a level your intent can work with. Once handle an almost-zero-latency .poll() having zero-wait duration "inside" a Poller.poll() plus avoid spending any such usleep() so as to minimise the blind-spot.
If in an extreme need, refactor the code and introduce a new (private) API extension, so as to read such detail from Context()-instance internal state-registers. This would get you closer, if not the closest, to the actual moment of a message arrival into the hands of the SUB-side Context()'s internal processing.

Related

ZeroMQ: how to reduce multithread-communication latency with inproc?

I'm using inproc and PAIR to achieve inter-thread communication and trying to solve a latency problem due to polling. Correct me if I'm wrong: Polling is inevitable, because a plain recv() call will usually block and cannot take a specific timeout.
In my current case, among N threads, each of the N-1 worker threads has a main while-loop. The N-th thread is a controller thread which will notify all the worker threads to quit at any time. However, worker threads have to use polling with a timeout to get that quit message. This introduces a latency, the latency parameter is usually 1000ms.
Here is an example
while (true) {
const std::chrono::milliseconds nTimeoutMs(1000);
std::vector<zmq::poller_event<std::size_t>> events(n);
size_t nEvents = m_poller.wait_all(events, nTimeoutMs);
bool isToQuit = false;
for (auto& evt : events) {
zmq::message_t out_recved;
try {
evt.socket.recv(out_recved, zmq::recv_flags::dontwait);
}
catch (std::exception& e) {
trace("{}: Caught exception while polling: {}. Skipped.", GetLogTitle(), e.what());
continue;
}
if (!out_recved.empty()) {
if (IsToQuit(out_recved))
isToQuit = true;
break;
}
}
if (isToQuit)
break;
//
// main business
//
...
}
To make things worse, when the main loop has nested loops, the worker threads then need to include more polling code in each layer of the nested loops. Very ugly.
The reason why I chose ZMQ for multithread communication is because of its elegance and the potential of getting rid of thread-locking. But I never realized the polling overhead.
Am I able to achieve the typical latency when using a regular mutex or an std::atomic data operation? Should I understand that the inproc is in fact a network communication pattern in disguise so that some latency is inevitable?
An above posted statement ( a hypothesis ):
"...a plain recv() call will usually block and cannot take a specific timeout."
is not correct:
a plain .recv( ZMQ_NOBLOCK )-call will never "block",
a plain .recv( ZMQ_NOBLOCK )-call can get decorated so as to mimick "a specific timeout"
An above posted statement ( a hypothesis ):
"...have to use polling with a timeout ... introduces a latency, the latency parameter is usually 1000ms."
is not correct:
- one need not use polling with a timeout
- the less one need not set 1000 ms code-"injected"-latency, spent obviously only on-no-new-message state
Q : "Am I able to achieve the typical latency when using a regular mutex or an std::atomic data operation?"
Yes.
Q : "Should I understand that the inproc is in fact a network communication pattern in disguise so that some latency is inevitable?"
No. inproc-transport-class is the fastest of all these kinds as it is principally protocol-less / stack-less and has more to do with ultimately fast pointer-mechanics, like in a dual-end ring-buffer pointer-management.
The Best Next Step:
1 )Re-factor your code, so as to always harness but the zero-wait { .poll() | .recv() }-methods, properly decorated for both { event- | no-event- }-specific looping.
2 )
If then willing to shave the last few [us] from the smart-loop-detection turn-around-time, may focus on improved Context()-instance setting it to work with larger amount of nIOthreads > N "under the hood".
optionally 3 )
For almost hard-Real-Time systems' design one may finally harness a deterministically driven Context()-threads' and socket-specific mapping of these execution-vehicles onto specific, non-overlapped CPU-cores ( using a carefully-crafted affinity-map )
Having set 1000 [ms] in code, no one is fair to complain about spending those very 1000 [ms] waiting in a timeout, coded by herself / himself. No excuse for doing this.
Do not blame ZeroMQ for behaviour, that was coded from the application side of the API.
Never.

ZeroMQ (cppzmq) subscriber skips first message

I'm trying to use ZMQ with the CPPZMQ C++ wrapper, as it seems it is the one suggested in C++ Bindings.
The client/server (REQ/REP) seems to work fine.
When trying to implement a publish/subscribe pair of programs, it looks like the first message is lost in the subscriber. Why?
publisher.cpp:
#include <boost/date_time/posix_time/posix_time.hpp>
#include <boost/thread/thread.hpp>
#include <boost/format.hpp>
#include <zmq.hpp>
#include <string>
#include <iostream>
int main()
{
zmq::context_t context(1);
zmq::socket_t publisher(context, ZMQ_PUB);
publisher.bind("tcp://*:5555");
for(int n = 0; n < 3; n++) {
zmq::message_t env1(1);
memcpy(env1.data(), "A", 1);
std::string msg1_str = (boost::format("Hello-%i") % (n + 1)).str();
zmq::message_t msg1(msg1_str.size());
memcpy(msg1.data(), msg1_str.c_str(), msg1_str.size());
std::cout << "Sending '" << msg1_str << "' on topic A" << std::endl;
publisher.send(env1, ZMQ_SNDMORE);
publisher.send(msg1);
zmq::message_t env2(1);
memcpy(env2.data(), "B", 1);
std::string msg2_str = (boost::format("World-%i") % (n + 1)).str();
zmq::message_t msg2(msg2_str.size());
memcpy(msg2.data(), msg2_str.c_str(), msg2_str.size());
std::cout << "Sending '" << msg2_str << "' on topic B" << std::endl;
publisher.send(env2, ZMQ_SNDMORE);
publisher.send(msg2);
boost::this_thread::sleep(boost::posix_time::milliseconds(1000));
}
return 0;
}
subscriber.cpp:
#include <zmq.hpp>
#include <string>
#include <iostream>
int main()
{
zmq::context_t context(1);
zmq::socket_t subscriber(context, ZMQ_SUB);
subscriber.connect("tcp://localhost:5555");
subscriber.setsockopt(ZMQ_SUBSCRIBE, "B", 1);
while(true)
{
zmq::message_t env;
subscriber.recv(&env);
std::string env_str = std::string(static_cast<char*>(env.data()), env.size());
std::cout << "Received envelope '" << env_str << "'" << std::endl;
zmq::message_t msg;
subscriber.recv(&msg);
std::string msg_str = std::string(static_cast<char*>(msg.data()), msg.size());
std::cout << "Received '" << msg_str << "'" << std::endl;
}
return 0;
}
Program output:
$ ./publisher
Sending 'Hello-1' on topic A
Sending 'World-1' on topic B
Sending 'Hello-2' on topic A
Sending 'World-2' on topic B
Sending 'Hello-3' on topic A
Sending 'World-3' on topic B
$ ./subscriber
Received envelope 'B'
Received 'World-2'
Received envelope 'B'
Received 'World-3'
(note: subscriber is executed before executing publisher)
Bonus question: By the way, is it my impression or this C++ wrapper it is quite low level? I see no direct support for std::string and the code to transmit a simple string looks quite verbose.
Found the answer in the ZeroMQ Guide:
There is one more important thing to know about PUB-SUB sockets: you
do not know precisely when a subscriber starts to get messages. Even
if you start a subscriber, wait a while, and then start the publisher,
the subscriber will always miss the first messages that the publisher
sends. This is because as the subscriber connects to the publisher
(something that takes a small but non-zero time), the publisher may
already be sending messages out.
This "slow joiner" symptom hits enough people often enough that we're
going to explain it in detail. Remember that ZeroMQ does asynchronous
I/O, i.e., in the background. Say you have two nodes doing this, in
this order:
Subscriber connects to an endpoint and receives and counts messages.
Publisher binds to an endpoint and immediately sends 1,000 messages.
Then the subscriber will most likely not receive anything. You'll
blink, check that you set a correct filter and try again, and the
subscriber will still not receive anything.
Making a TCP connection involves to and from handshaking that takes
several milliseconds depending on your network and the number of hops
between peers. In that time, ZeroMQ can send many messages. For sake
of argument assume it takes 5 msecs to establish a connection, and
that same link can handle 1M messages per second. During the 5 msecs
that the subscriber is connecting to the publisher, it takes the
publisher only 1 msec to send out those 1K messages.
In Chapter 2 - Sockets and Patterns we'll explain how to synchronize a
publisher and subscribers so that you don't start to publish data
until the subscribers really are connected and ready. There is a
simple and stupid way to delay the publisher, which is to sleep. Don't
do this in a real application, though, because it is extremely fragile
as well as inelegant and slow. Use sleeps to prove to yourself what's
happening, and then wait for Chapter 2 - Sockets and Patterns to see
how to do this right.
The alternative to synchronization is to simply assume that the
published data stream is infinite and has no start and no end. One
also assumes that the subscriber doesn't care what transpired before
it started up. This is how we built our weather client example.
So the client subscribes to its chosen zip code and collects 100
updates for that zip code. That means about ten million updates from
the server, if zip codes are randomly distributed. You can start the
client, and then the server, and the client will keep working. You can
stop and restart the server as often as you like, and the client will
keep working. When the client has collected its hundred updates, it
calculates the average, prints it, and exits.
Bonus answer:
ZeroMQ has been designed for high-performance messaging / signalling and as such has some design-maxims, around which the core-parts have been developed.
Zero-Copy and Zero-Sharing are those more well-known, Zero-(almost)-Latency might be ( a bit ) provocative one, and a Zero-Warranty is perhaps a one, you would like least to hear about.
Yes, ZeroMQ does not strive to provide any explicit warranty to be assumed ( naturally, due to many reasons common in worlds of distributed-systems ), but yet it gives you one warranty of this kind -- any message is either delivered atomically ( i.e. complete, error-free ) -- or not at all ( so one will indeed never have to pay any extra costs, associated with detecting and discarding any runts and/or broken message-payloads ).
So may rather forget to worry about any packets undelivered, and what if these were delivered etc etc. You simply get as much as possible, and the rest is not under your influence ( "Late-joiner" cases could be considered as a boundary, where ( if ) one were in such a position to be able to enforce more time for "slow-joiner"(s), then none such observable difference would change the code-design, so rather try to design distributed-systems to be robust against ( principally ) possible undelivered signals / messages ).
API? Wrapper...
If interested in this level-of-detail, would recommend to read API, since some v2.x, so that one may better realise all the thoughts, that were put behind the strive for maximum performance ( Zero-Copy motivated set of message-preparation steps, advanced API-calls for messages, that would get re-sent, memory-leaks prevention, advanced IO-thread-Pool maps for increased IO-throughput / reduced latency / relative-prioritisations et al ).
After this, one may review how well ( or how poor ) any respective non-native language-binding ( wrapper ) did reflect these initial design-efforts into cross-ported programming environment.
Most of such efforts have got into troubles right with finding a reasonable balance between a user-programming comfort, the target programming-environment expressivity constraints and minimising sins of leaking memory or compromised quality of API-binding/wrapper.
It is fair to note, that designing a non-native language binding is one of a few most challenging tasks. Thus one ought bear with such brave teams who decided to step into this territory ( and sometimes failed to mirror all the native-API strengths without degraded performance and/or clarity of original intents -- needless to add, that many native-API features might even get excluded from becoming accessible from environments, that cannot provide seamless integration within the scope of such non-native language expressivity, so care is to be taken once evaluating an API-binding/wrapper ( and original native-API will always help to get to the roots of ZeroMQ original powers ) - anyway - in most corner cases, one may try to inline in critical sections ).

How to limit an Akka Stream to execute and send down one message only once per second?

I have an Akka Stream and I want the stream to send messages down stream approximately every second.
I tried two ways to solve this problem, the first way was to make the producer at the start of the stream only send messages once every second when a Continue messages comes into this actor.
// When receive a Continue message in a ActorPublisher
// do work then...
if (totalDemand > 0) {
import scala.concurrent.duration._
context.system.scheduler.scheduleOnce(1 second, self, Continue)
}
This works for a short while then a flood of Continue messages appear in the ActorPublisher actor, I assume (guess but not sure) from downstream via back-pressure requesting messages as the downstream can consume fast but the upstream is not producing at a fast rate. So this method failed.
The other way I tried was via backpressure control, I used a MaxInFlightRequestStrategy on the ActorSubscriber at the end of the stream to limit the number of messages to 1 per second. This works but messages coming in come in at approximately three or so at a time, not just one at a time. It seems the backpressure control doesn't immediately change the rate of messages coming in OR messages were already queued in the stream and waiting to be processed.
So the problem is, how can I have an Akka Stream which can process one message only per second?
I discovered that MaxInFlightRequestStrategy is a valid way to do it but I should set the batch size to 1, its batch size is default 5, which was causing the problem I found. Also its an over-complicated way to solve the problem now that I am looking at the submitted answer here.
You can either put your elements through the throttling flow, which will back pressure a fast source, or you can use combination of tick and zip.
The first solution would be like this:
val veryFastSource =
Source.fromIterator(() => Iterator.continually(Random.nextLong() % 10000))
val throttlingFlow = Flow[Long].throttle(
// how many elements do you allow
elements = 1,
// in what unit of time
per = 1.second,
maximumBurst = 0,
// you can also set this to Enforcing, but then your
// stream will collapse if exceeding the number of elements / s
mode = ThrottleMode.Shaping
)
veryFastSource.via(throttlingFlow).runWith(Sink.foreach(println))
The second solution would be like this:
val veryFastSource =
Source.fromIterator(() => Iterator.continually(Random.nextLong() % 10000))
val tickingSource = Source.tick(1.second, 1.second, 0)
veryFastSource.zip(tickingSource).map(_._1).runWith(Sink.foreach(println))

ZeroMQ - pub / sub latency

I'm looking into ZeroMQ to see if it's a fit for a soft-realtime application. I was very pleased to see that the latency for small payloads were in the range of 30 micro-seconds or so. However in my simple tests, I'm getting about 300 micro-seconds.
I have a simple publisher and subscriber, basically copied from examples off the web and I'm sending one byte through localhost.
I've played around for about two days w/ different sockopts and I'm striking out.
Any help would be appreciated!
publisher:
#include <iostream>
#include <zmq.hpp>
#include <unistd.h>
#include <sys/time.h>
int main()
{
zmq::context_t context (1);
zmq::socket_t publisher (context, ZMQ_PUB);
publisher.bind("tcp://*:5556");
struct timeval timeofday;
zmq::message_t msg(1);
while(true)
{
gettimeofday(&timeofday,NULL);
publisher.send(msg);
std::cout << timeofday.tv_sec << ", " << timeofday.tv_usec << std::endl;
usleep(1000000);
}
}
subscriber:
#include <iostream>
#include <zmq.hpp>
#include <sys/time.h>
int main()
{
zmq::context_t context (1);
zmq::socket_t subscriber (context, ZMQ_SUB);
subscriber.connect("tcp://localhost:5556");
subscriber.setsockopt(ZMQ_SUBSCRIBE, "", 0);
struct timeval timeofday;
zmq::message_t update;
while(true)
{
subscriber.recv(&update);
gettimeofday(&timeofday,NULL);
std::cout << timeofday.tv_sec << ", " << timeofday.tv_usec << std::endl;
}
}
Is the Task Definition real?
Once speaking about *-real-time design, the architecture-capability validation is more important, than the following implementation itself.
If taking your source code as-is, your readings ( which would be ideally posted together with your code snippets for a cross-validation of the replicated MCVE-retest ) will not serve much, as the numbers do not distinguish what portions ( what amounts of time ) were spent on sending-side loop-er, on sending side zmq-data-acquisition/copy/scheduling/wire-level formatting/datagram-dispatch and on receiving side unloading from media/copy/decode/pattern-match/propagate to receiver buffer(s)
If interested in ZeroMQ internals, there are good performance-related application notes available.
If striving for a minimum-latency design do:
remove all overheads
replace all tcp-header processing from the proposed PUB/SUB channel
avoid all non-cardinal logic overheads from processing ( no sense to spend time on subscribe-side ( sure, newer versions of ZMQ have moved into publisher-side filtering, but the idea is clear ) with pattern-matching encoded in the selected archetype processing ( using ZMQ_PAIR avoids any such, independently from the transport class ) - if it is intended to block something, then rather change the signalling socket layout accordingly, so as to principally avoid blocking ( this ought to be a real-time system, as you have said above)
apply a "latency-masking" where possible in the target multi-core / many-core hardware architectures so as to squeeze the last drops of spare-time from your hardware / tools capabilities ... benchmark with experiments setups with more I/O-threads' help zmq::context_t context( N );, where N > 1
Missing target:
As Alice in the Wonderlands stated more than a century ago, whenever there was no goal defined, any road leads to the target.
Having a soft-real time ambition, there shan´t be an issue to state a maximum allowed end-to-end latency and from that derive a constraint for transport-layer latency.
Having not done so, 30 us, 300 us or even 3 ms have no meaning per se, so no-one can decide, whether these figures are "enough" for some subsystem or not.
A reasonable next step:
define real-time stability horizon(s) ... if using for a real-time control
define real-time design constraints ... for signal / data acquisition(s), for processing task(s), for self-diagnostic & control services
avoid any blocking, design-wise & validate / prove no blocking will ever appear under all possible real-world operations circumstances [formal proof methods are ready for such task] ( no one would like to see an AlertPanel [ Waiting for data] during your next jet landing or have the last thing to see, before an autonomous car crashes right into the wall, a lovely looking [hour-glass] animated-icon as it moves the sand while the control system got busy, whatever a reason for that was behind it, in a devastatingly blocking manner.
Quantified targets make sense for testing.
If a given threshold permits to have 500 ms stability horizon (which may be a safe value for a slo-mo hydraulic-actuator/control-loop, but may fail to work for a guided missile control system, the less for any [mass&momentum-of-inertia]-less system (alike DSP family of RT-control-systems)), you can test end-to-end if your processing fits in between.
If you know, your incoming data-stream brings about 10 kB each 500 us, you can test your design if it can keep the pace with the burst traffic or not.
If you test, your mock-up design does miss the target (not meeting the performance / time-constrained figures) you know pretty well, where the design or where the architecture needs to get improved.
First make sure you run producer and consumer on different physical cores (not HT).
Second, it depends A LOT on the hardware and OS. Last time I measured kernel IO (4-5 years ago) the results were indeed 10 to 20us around send/recv system calls.
You have to optimize your kernel settings to low latency and set TCP_NODELAY.

C++ : Avoid lot of boolean variable for multiple verification conditions in trading app

i am a junior dev in trading app... we have a order refresh verification unit. It has to verify order confirmation from exchange. We send a bunch of different request in bulk ( NEW, MODIFY, CANCEL ) to exchange... Verification has to happen for max N times with each T intervals for all orders. if verification successful for all the order before N retry then fine.. otherwise we need to indicate as verification unsuccessfull. i hv done a basic coding done in very urgent like below
for( N times )
{
for_each ( sent_request_order ) // SENT
{
1) get all the refreshed order from DB or shared mem i.e REFRESHED
2) find current sent order in REFRESHED
if( not_found )
not refreshed from exchange, continue to next order
if( found )
case NEW : //check for new status, mark verification done
case MODIFY : //check for modified status..
//if not mark pending, go to next order,
//revisit the same after T time
case CANCEL : //check for cancelled status..
//if not mark pending, go to next order,
//revisit the same after T time
}
if( all_verified )
exit from verification.
wait ( T sec )
}
order_verification_pending, order_verification_done, order_visited, order_not_visited, all_verified, all_not_verified ... lot of boolean flags used for indication..
is there any better approach for doing this.... splitting responsibilities across the classes......????
i know this is not a general question.... but still flags are making me tidious to handle...
Your algorithm looks workable. Implement it.
Do not try to optimize your code before you got it working. Once you have a working version running, nevermind how ugly, then you look at ways & means to optimize. Chances are good that you will then find a way to handle the flags that gives you so much trouble.
You talk about "order_verification_pending, order_verification_done, order_visited, order_not_visited, all_verified, all_not_verified"... but that seems to double the number of booleans, for example: if you have order_visited then you don't need order_not_visited... it is just "!order_visited". When there are more than two states involved, use an enum instead of a lot of complicated, overlapping booleans. For example, if verification might be pending, done, failed etc. but these are all mutually exclusive, then store the single current state in that enum.
It's more elegant to have a set of pending operations, and remove elements from that set until the set is empty or the whole verification times out. That way, you're not checking operations you already found succeeded.