plc_modbus read and write taking too much time - c++

I'm using modbus_t to establish communication between PC and Siemens PLC (I'm not so familiar with modbus.h library). To read and write, I'm using
modbus_write_registers()
modbus_read_registers()
to realise vehicle control. However these two functions takes around 200ms, respectively, which is too long and fails to reach the minimum control frequency.
So, I want to ask if this is the common situation for these two functions, or caused by hardware?

Related

Create huge text file - multi-threading a good idea?

A need to create huge (>10 Gb) text files where every line is a very long number, basically a string as even types like unsigned long long won't be enough. So i will be using random generator and first though was that probably it's a good idea create several threads. From what I see, every thread will be writing one line at a time, which is considered a thread safe operation in C++.
Is it a good idea or am I missing something and it's better just to write line by line from one thread?
A correct answer here will depend fairly heavily on the type of drive to which you're writing the file.
If it's an actual hard drive, then a single thread will probably be able to generate your random numbers and write the data to the disk as fast as the disk can accept it. With reasonable code on a modern CPU, the core running that code will probably spend more time idle than it does doing actual work.
If it's a SATA hard SSD, effectively the same thing will be true, but (especially if you're using an older CPU) the core running the code will probably spend a lot less time idle. A single thread will probably still be able to generate the data and write it to the drive as fast as the drive can accept it.
Then we get to things like NVMe and Optane drives. Here you honestly might stand a decent chance of improving performance by writing from multiple threads. But (at least in my experience) to do that, you just about have to skip past using iostreams, and instead talk directly to the OS. Under Windows, that would mean opening the file with CreateFile (specifying FILE_FLAG_OVERLAPPED when you do). Windows also has built in support for I/O completion ports (which are really sort of thread pools) to minimize overhead and improve performance--but using it is somewhat nontrivial.
On Linux, asynchronous I/O is a bit more of a sore point. There's an official AIO interface for doing asynchronous I/O, and Linux has had an implementation of that for a long time, but it's never really worked very well. More recently, something called io_uring was added to the Linux kernel. I haven't used it a lot, but it looks like it's probably a better design--but it doesn't (as far as I know) support the standard AIO interface, so you pretty much have to use via its own liburing instead. Rather that Windows I/O completion ports, this works well, but using it is somewhat non-trivial.
If there is no explicit synchronization between the threads, you need to make sure that the library functions you use are thread-safe. For example, the C++ random number generators in <random> are not, so it would be best to have one RNG per thread. Additionally, you need to look at bottlenecks. Conversion of a number to text is one bottleneck, and multithreading would help with that. Output is another, and multithreading would not. Profiling would help resolve this.
Ostreams are not thread-safe, so you'll have to use synchronization to protect each thread's access.

Using MPI on a single Raspberry-Pi

I am working on an application (in C++) which involves several independent operations (FFTW + signal processing) on data arrays.
Array sizes can be either 512 or 1024 (yet to decide), and the data type is double.
I am hoping to make those independent operations parallalized to get the best out of the Pi.
Obvious thing I would have done in the past is using pthreads.
However, (unfortunately :) ) I learned about MPI recently and I wonder whether I should use it here instead of good old threads.
Obviously MPI would be the way to go if I have a device cluster (that's what I get when I search the internet).
But is MPI still a good choice in my situation, where there is just one device? (and specially when that device is a Raspberry-Pi).
(If the answer to above is "no", does that mean MPI is a bad choice in general when there is only one computer?)
MPI can be an awesome choice depending on how much work can be done per unit communication. That's how I would consider MPI or not.
I am coauthor of an MRI simulation framework. There we deal with individual "macro" spins, which can be dealt with generally as spatially non-interacting. This allows one to do poor man's parallelisation on every single spin and the local bloch equations. So a lot of physics for very little communication. Even on a single device it can perform as well as pthreads.
However on the other side of the spectrum I's see massive parallel matrix inversion as done with SCALAPACK. There you'll find a lot of communication per unit calculation. That's where there is not a chance in the world you could compete with pthreads.
Even if you were going to use a pi cluster, you'd use both MPI and pthreads in such cases and might not be able to break even, as the 100Mbit network has significant latency issues.
There are single board machines with 1Gb/s network and stronger fp performance as raspberry pi, where the cost of communication might be worth it.
tldr: For MPI to make sense one wants computation/communication >> 1.

how to sync the clock of two computer with each other using c++

I have two computer say A and B. I need to make sure that they are sync to each other vey accurately (in the range of ms). One computer is windows base and the other is Linux. They are connected to each other by Ethernet directly (a cable from one computer is connected to other). I can write C/C++ code for each of them.
How can I make them synch to each other y noting that neither windows nor Linux are real time system and hence you don't know how long would it takes that a packet that you sent over Ethernet is received by other side so you can not compensate for it. Since you need accuracy of ms, this delay is important.
Is there any algorithm that can do this?
is there any function in windows/Linux that can be send to make sure that when you send data via Ethernet, it is passed to other side instantly?
Syncrhonizing a clock between two machines is not an easy and trivial task.
One known way to do it with a descent accuracy is Marzullo's Algorithm
There are 3 basic ways to do this some more complicated than others but the all use a basic rule which is based on local clock offset timing. these 3 are RBS, TPSN, and FTSP the common rule looks after any latency across each layer the synchronization schemes have four basic packet delay components: send time, access time, propagation time, and receive time see http://www.cs.wustl.edu/~jain/cse574-06/ftp/time_sync/index.html

What is the defacto standard for sharing variables between programs in different languages?

I've never had formal training in this area so I'm wondering what do they teach in school (if they do).
Say you have two programs in written in two different languages: C++ and Python or some other combination and you want to share a constantly updated variable on the same machine, what would you use and why? The information need not be secured but must be isochronous should be reliable.
Eg. Program A will get a value from a hardware device and update variable X every 0.1ms, I'd like to be able to access this X from Program B as often as possible and obtain the latest values. Program A and B are written and compiled in two different (robust) languages. How do I access X from program B? Assume I have the source code from A and B and I do not want to completely rewrite or port either of them.
The method's I've seen used thus far include:
File Buffer - Read and write to a
single file (eg C:\temp.txt).
Create a wrapper - From A to B or B
to A.
Memory Buffer - Designate a specific
memory address (mutex?).
UDP packets via sockets - Haven't
tried it yet but looks good.
Firewall?
Sorry for just throwing this out there, I don't know what the name of this technique is so I have trouble searching.
Well you can write XML and use some basic message queuing (like rabbitMQ) to pass messages around
Don't know if this will be helpful, but I'm also a student, and this is what I think you mean.
I've used marshalling to get a java class and import it into a C# program.
With marshalling you use xml to transfer code in a way so that it can be read by other coding environments.
When asking particular questions, you should aim at providing as much information as possible. You have added a use case, but the use case is incomplete.
Your particular use case seems like a very small amount of data that has to be available at a high frequency 10kHz. I would first try to determine whether I can actually make both pieces of code part of a single process, rather than two different processes. Depending on the languages (missing from the question) it might even be simple, or turn the impossible into possible --depending on the OS (missing from the question), the scheduler might not be fast enough switching from one process to another, and it might impact the availability of the latest read. Switching between threads is usually much faster.
If you cannot turn them into a single process, then you will have to use some short of IPC (Inter Process Communication). Due to the frequency I would rule out most heavy weight protocols (avoid XML, CORBA) as the overhead will probably be too high. If the receiving end needs only access to the latest value, and that access may be less frequent than 0.1 ms, then you don't want to use any protocol that includes queueing as you do not want to read the next element in the queue, you only care about the last, if you did not read the element when it was good, avoid the cost of processing it when it is already stale --i.e. it does not make sense to loop extracting from the queue and discarding.
I would be inclined to use shared memory, or a memory mapped shared file (they are probably quite similar, depends on the platform missing from the question). Depending on the size of the element and the exact hardware architecture (missing from the question) you may be able to avoid locking with a mutex. As an example in current intel processors, read/write access to 32 bit integers from memory is guaranteed to be atomic if the variable is correctly aligned, so in that case you would not be locking.
At my school they teach CORBA. They shouldn't, it's an ancient hideous language from the eon of mainframes, it's a classic case of design-by-committee, every feature possible that you don't want is included, and some that you probably do (asynchronous calls?) aren't. If you think the c++ specification is big, think again.
Don't use it.
That said though, it does have a nice, easy-to-use interface for doing simple things.
But don't use it.
It almost always pass through C binding.

MPI or Sockets?

I'm working on a loosely coupled cluster for some data processing. The network code and processing code is in place, but we are evaluating different methodologies in our approach. Right now, as we should be, we are I/O bound on performance issues, and we're trying to decrease that bottleneck. Obviously, faster switches like Infiniband would be awesome, but we can't afford the luxury of just throwing out what we have and getting new equipment.
My question posed is this. All traditional and serious HPC applications done on clusters is typically implemented with message passing versus sending over sockets directly. What are the performance benefits to this? Should we see a speedup if we switched from sockets?
MPI MIGHT use sockets. But there are also MPI implementation to be used with SAN (System area network) that use direct distributed shared memory. That of course if you have the hardware for that. So MPI allows you to use such resources in the future. On that case you can gain massive performance improvements (on my experience with clusters back at university time, you can reach gains of a few orders of magnitude). So if you are writting code that can be ported to higher end clusters, using MPI is a very good idea.
Even discarding performance issues, using MPI can save you a lot of time, that you can use to improve performance of other parts of your system or simply save your sanity.
I would recommend using MPI instead of rolling your own, unless you are very good at that sort of thing. Having wrote some distributed computing-esque applications using my own protocols, I always find myself reproducing (and poorly reproducing) features found within MPI.
Performance wise I would not expect MPI to give you any tangible network speedups - it uses sockets just like you. MPI will however provide you with much the functionality you would need for managing many nodes, i.e. synchronisation between nodes.
Performance is not the only consideration in this case, even on high performance clusters. MPI offers a standard API, and is "portable." It is relatively trivial to switch an application between the different versions of MPI.
Most MPI implementations use sockets for TCP based communication. Odds are good that any given MPI implementation will be better optimized and provide faster message passing, than a home grown application using sockets directly.
In addition, should you ever get a chance to run your code on a cluster that has InfiniBand, the MPI layer will abstract any of those code changes. This is not a trivial advantage - coding an application to directly use OFED (or another IB Verbs) implementation is very difficult.
Most MPI applications include small test apps that can be used to verify the correctness of the networking setup independently of your application. This is a major advantage when it comes time to debug your application. The MPI standard includes the "pMPI" interfaces, for profiling MPI calls. This interface also allows you to easily add checksums, or other data verification to all the message passing routines.
Message Passing is a paradigm not a technology. In the most general installation, MPI will use sockets to communicate. You could see a speed up by switching to MPI, but only in so far as you haven't optimized your socket communication.
How is your application I/O bound? Is it bound on transferring the data blocks to the work nodes, or is it bound because of communication during computation?
If the answer is "because of communication" then the problem is you are writing a tightly-coupled application and trying to run it on a cluster designed for loosely coupled tasks. The only way to gain performance will be to get better hardware (faster switches, infiniband, etc. )... maybe you could borrow time on someone else's HPC?
If the answer is "data block" transfers then consider assigning workers multiple data blocks (so they stay busy longer) & compress the data blocks before transfer. This is a strategy that can help in a loosely coupled application.
MPI has the benefit that you can do collective communications. Doing broadcasts/reductions in O(log p) /* p is your number of processors*/ instead of O(p) is a big advantage.
I'll have to agree with OldMan and freespace. Unless you know of a specific and improvement to some useful metric (performance, maintainability, etc.) over MPI, why reinvent the wheel. MPI represents a large amount of shared knowledge regarding the problem you are trying to solve.
There are a huge number of issues you need to address which is beyond just sending data. Connection setup and maintenance will all become your responsibility. If MPI is the exact abstraction (it sounds like it is) you need, use it.
At the very least, using MPI and later refactoring it out with your own system is a good approach costing the installation and dependency of MPI.
I especially like OldMan's point that MPI gives you much more beyond simple socket communication. You get a slew of parallel and distributed computing implementation with a transparent abstraction.
I have not used MPI, but I have used sockets quite a bit. There are a few things to consider on high performance sockets. Are you doing many small packets, or large packets? If you are doing many small packets consider turning off the Nagle algorithm for faster response:
setsockopt(m_socket, IPPROTO_TCP, TCP_NODELAY, ...);
Also, using signals can actually be much slower when trying to get a high volume of data through. Long ago I made a test program where the reader would wait for a signal, and read a packet - it would get a bout 100 packets/sec. Then I just did blocking reads, and got 10000 reads/sec.
The point is look at all these options, and actually test them out. Different conditions will make different techniques faster/slower. It's important to not just get opinions, but to put them to the test. Steve Maguire talks about this in "Writing Solid Code". He uses many examples that are counter-intuitive, and tests them to find out what makes better/faster code.
MPI uses sockets underneath, so really the only difference should be the API that your code interfaces with. You could fine tune the protocol if you are using sockets directly, but thats about it. What exactly are you doing with the data?
MPI Uses sockets, and if you know what you are doing you can probably get more bandwidth out of sockets because you need not send as much meta data.
But you have to know what you are doing and it's likely to be more error prone. essentially you'd be replacing mpi with your own messaging protocol.
For high volume, low overhead business messaging you might want to check out
OAMQ with several products. The open source variant OpenAMQ supposedly runs the trading at JP Morgan, so it should be reliable, shouldn't it?