Schedule task on precise periods in Linux or Windows

Schedule task on precise periods in Linux or Windows - c++

I have this weird question.
I would like to know if it is possible to make a program in C/C++ that will run on Linux or Windows and will hook interrupt handler on a system timer set to specific period (2000 times per second, for example) and I want this interrupt to be with highest priority, meaning that it has to be executed every half millisecond and while executing it must not be interrupted.
This we have done with MS-DOS with Borland Turbo C 3.1. We have an interface card (our own) that runs on ISA slot. Every half millisecond, our program reads the state of electronics that is controlling an industrial process thru the interface. This has worked for us in the past 15 years, but we are running out of motherboards that have ISA slot, so we are looking for new solutions.
We also have solution based on PIC microcontrollers, but our horizons will be widened with general purpose processor.
My guess is that there are some customized Linux kernels for embedded applications, so I am looking for some sources with which we can start experimenting.

Yes, you can do that in MS-DOS because it is not a multi-user or multi-tasking operating system. However, the same thing will not work in Windows because it is a mult-user and multi-tasking operating system. It's also not real-time, which means there's no guarantee that your task will be executed exactly when you ask for it to be executed. Everything is pre-emptively scheduled, meaning that any number of other processes and tasks (either user-mode or system-level) could effectively "bump" your process down the priority list and force it to wait to be executed until those other tasks completed or were themselves interrupted to give your process a chance to run for a while.
I don't know about Linux, but I imagine most of the major distributions are written similarly to Windows.
You will need to find a real-time, single-user operating system to do this. A Unix-derivative is probably the best place to start looking, but I won't be the person able to suggest one.
Alternatively, you could continue using MS-DOS (or alternatives such as FreeDOS), but switch to a different interface technology that is available on newer boards. There's no reason to update something that works for you, especially if the updates are counter-productive to your goal.

A typical OS such as a standard Linux or Windows is not designed to, and will not be able to perform to that degree of real-time accuracy and availability.
It sounds to me like you need to be investigating Real-Time Linux, or similar.
RTLinux is a modified version of the Linux Kernel which is designed to perform in real-time, precicely for applications such as this.
Hope that helps.

Personal and affordable computing has increased in performance incredibly over the years, except in one area, low latency. Latency has actually increased in many use cases when you compare a 486 and a modern desktop CPU.
That said, have a look at this paper, where the authors come to the conclusion that sub-millisecond scheduling is possible in Linux on commodity hardware.

Related

Make a program run slowly

Is there any way to run a C++ program slower by changing any OS parameters in Linux? In this way I would like to simulate what will happen if that particular program happens to run on a real slower machine.
In other words, a faster machine should behave as a slower machine to that particular program.

Lower the priority using nice (and/or renice). You can also do it programmatically using nice() system call. This will not slow down the execution speed per se, but will make Linux scheduler allocate less (and possibly shorter) execution time frames, preempt more often, etc. See Process Scheduling (Chapter 10) of Understanding the Linux Kernel for more details on scheduling.
You may want to increase the timer interrupt frequency to put more load on the kernel, which will in turn slow everything down. This requires a kernel rebuild.
You can use CPU Frequency Scaling mechanism (requires kernel module) and control (slow down, speed up) the CPU using the cpufreq-set command.
Another possibility is to call sched_yield(), which will yield quantum to other processes, in performance critical parts of your program (requires code change).
You can hook common functions like malloc(), free(), clock_gettime() etc. using LD_PRELOAD, and do some silly stuff like burn a few million CPU cycles with rep; hop;, insert memory barriers etc. This will slow down the program for sure. (See this answer for an example of how to do some of this stuff).
As #Bill mentioned, you can always run Linux in a virtualization software which allows you to limit the amount of allocated CPU resources, memory, etc.
If you really want your program to be slow, run it under Valgrind (may also help you find some problems in your application like memory leaks, bad memory references, etc).
Some slowness can be achieved by recompiling your binary with disabled optimizations (i.e. -O0 and enable assertions (i.e. -DDEBUG).
You can always buy an old PC or a cheap netbook (like One Laptop Per Child, and don't forget to donate it to a child once you are done testing) with a slow CPU and run your program.
Hope it helps.

QEMU is a CPU emulator for Linux. Debian has packages for it (I imagine most distros will). You can run a program in an emulator and most of them should support slowing things down. For instance, Miroslav Novak has patches to slow down QEMU.
Alternatively, you could cross compile to another CPU-linux (arm-none-gnueabi-linux, etc) and then have QEMU translate that code to run.
The nice suggestion is simple and may work if you combine it with another process which will consume cpu.
nice -19 test &
while [ 1 ] ; do sha1sum /boot/vmlinuz*; done;
You did not say if you need graphics, file and/or network I/O? Do you know something about the class of error you are looking for? Is it a race condition, or does the code just perform poorly at a customer site?
Edit: You can also use signals like STOP and CONT to start and stop your program. A debugger can also do this. The issue is that the code runs a full speed and then gets stopped. Most solutions with the Linux scheduler will have this issue. There was some sort of thread analyzer from Intel afair. I see Vtune Release Notes. This is Vtune, but I was pretty sure there is another tool to analyze thread races. See: Intel Thread Checker, which can check for some thread race conditions. But we don't know if the app is multi-threaded?

Use cpulimit:
Cpulimit is a tool which limits the CPU usage of a process (expressed in percentage, not in CPU time). It is useful to control batch jobs, when you don't want them to eat too many CPU cycles. The goal is prevent a process from running for more than a specified time ratio. It does not change the nice value or other scheduling priority settings, but the real CPU usage. Also, it is able to adapt itself to the overall system load, dynamically and quickly.
The control of the used cpu amount is done sending SIGSTOP and SIGCONT POSIX signals to processes.
All the children processes and threads of the specified process will share the same percent of CPU.
It's in the Ubuntu repos. Just
apt-get install cpulimit
Here're some examples on how to use it on an already-running program:
Limit the process 'bigloop' by executable name to 40% CPU:
cpulimit --exe bigloop --limit 40
cpulimit --exe /usr/local/bin/bigloop --limit 40
Limit a process by PID to 55% CPU:
cpulimit --pid 2960 --limit 55

Get an old computer
VPS hosting packages tend to run slowly, have lots of interruptions, and wildly varying latencies. The cheaper you go the worse the hardware will be. Unlike truly old hardware, there is a good chance they will contain instruction sets (SSE4) that are not usually found on old hardware. Neverthless, if you want a system that walks slowly and shutters often, a cheap VPS host will be the quickest start.

If you just want to simulate your program to analyze its behavior on really slow machine, you can try making your whole program to run as a thread of some other main program.
In this manner you can prioritize same code with different priorities in few threads at once and collect data of your analysis. I have used this in game development for frame-processing analysis.

Use sleep or wait inside of your code. Its not the brightest way to do but acceptable in all kind of computer with different speeds.

The simplest possible way to do it would be to wrap your main runable code in a while loop with a sleep at the end of it.
For example:
void main()
{
while 1
{
// Logic
// ...
usleep(microseconds_to_sleep)
}
}
As people will mention, this isn't the most accurate way, since your logic code will still run at normal speed but with delays between runs. Also, it assumes that your logic code is something that runs in a loop.
But it is both simple and configurable.

You can increase load on CPUs via launching tools like stress or stress-ng

c++ Distributed computing of an executable program

I was wondering if it is possible to run an executable program without adding to its source code, like running any game across several computers. When i was programming in c# i noticed a process method, which lets you summon or close any application or process, i was wondering if there was something similar with c++ which would let me transfer the processes of any executable file or game to other computers or servers minimizing my computer's processor consumption.
thanks.

Everything is possible, but this would require a huge amount of work and would almost for sure make your program painfully slower (I'm talking about a factor of millions or billions here). Essentially you would need to make sure every layer that is used in the program allows this. So you'd have to rewrite the OS to be able to do this, but also quite a few of the libraries it uses.
Why? Let's assume you want to distribute actual threads over different machines. It would be slightly more easy if it were actual processes, but I'd be surprised many applications work like this.
To begin with, you need to synchronize the memory, more specifically all non-thread-local storage, which often means 'all memory' because not all language have a thread-aware memory model. Of course, this can be optimized, for example buffer everything until you encounter an 'atomic' read or write, if of course your system has such a concept. Now can you imagine every thread blocking for synchronization a few seconds whenever a thread has to be locked/unlocked or an atomic variable has to be read/written?
Next to that there are the issues related to managing devices. Assume you need a network connection: which device will start this, how will the ip be chosen, ...? To seamlessly solve this you probably need a virtual device shared amongst all platforms. This has to happen for network devices, filesystems, printers, monitors, ... . And as you kindly mention games: this should happen for a GPU as well, just imagine how this would impact performance in only sending data from/to the GPU (hint: even 16xpci-e is often already a bottleneck).
In conclusion: this is not feasible, if you want a clustered application, you have to build it into the application from scratch.

I believe the closest thing you can do is MapReduce: it's a paradigm which hopefully will be a part of the official boost library soon. However, I don't think that you would want to apply it to a real-time application like a game.
A related question may provide more answers: https://stackoverflow.com/questions/2168558/is-there-anything-like-hadoop-in-c
But as KillianDS pointed out, there is no automagical way to do this, nor does it seem like is there a feasible way to do it. So what is the exact problem that you're trying to solve?

The current state of research is into practical means to distribute the work of a process across multiple CPU cores on a single computer. In that case, these processors still share RAM. This is essential: RAM latencies are measured in nanoseconds.
In distributed computing, remote memory access can take tens if not hundreds of microseconds. Distributed algorithms explicitly take this into account. No amount of magic can make this disappear: light itself is slow.

The Plan 9 OS from AT&T Bell Labs supports distributed computing in the most seamless and transparent manner. Plan 9 was designed to take the Unix ideas of breaking jobs into interoperating small tasks, performed by highly specialised utilities, and "everything is a file", as well as the client/server model, to a whole new level. It has the idea of a CPU server which performs computations for less powerful networked clients. Unfortunately the idea was too ambitious and way beyond its time and Plan 9 remained largerly a research project. It is still being developed as open source software though.
MOSIX is another distributed OS project that provides a single process space over multiple machines and supports transparent process migration. It allows processes to become migratable without any changes to their source code as all context saving and restoration are done by the OS kernel. There are several implementations of the MOSIX model - MOSIX2, openMosix (discontinued since 2008) and LinuxPMI (continuation of the openMosix project).
ScaleMP is yet another commercial Single System Image (SSI) implementation, mainly targeted towards data processing and Hight Performance Computing. It not only provides transparent migration between the nodes of a cluster but also provides emulated shared memory (known as Distributed Shared Memory). Basically it transforms a bunch of computers, connected via very fast network, into a single big NUMA machine with many CPUs and huge amount of memory.
None of these would allow you to launch a game on your PC and have it transparently migrated and executed somewhere on the network. Besides most games are GPU intensive and not so much CPU intensive - most games are still not even utilising the full computing power of multicore CPUs. We have a ScaleMP cluster here and it doesn't run Quake very well...

Need help to choose real-time OS and Hardware

I heard and read that Windows/Linux OS machines are not real-time.
I have read this article. It listed WindowsCE is one of RTOS. That's kind of confusing to me since I always thought WindowsCE is for a mobile or embeded device.
I need a real-time application running 24/7 and processing signals various sensors from each quick moving object to db and monitor by running several machine learning algorithms.
What would be proper real-time hardware and OS for this kind of applications? Development environment would be MFC or Qt C++. I really need opinions from experienced developers. Thanks

QNX has served me well in the past. I should warn you that it was only for training purposes (real-time industrial process control), and that I have implemented real time control programs with this OS by I've never really deployed one.
The first rule with real-time systems is to specify your real-time constraints, such as:
the system must be able to process up to 600 signals per minute; or
the system must spend no more than 1/10 second per signal.
The difference is subtle, but these are different constraints.
Just keep in mind that there is absolutely no way to decide if any hardward/OS/library combination is good enough for you unless you specify these constraints
For that, you think QNX might be proper? What would be its advantages over Windows/Linux systems with high priority setting?
If you look at the QNX documentation for many POSIX systems calls, you will notice they specify extra constraints on performance, which are possibly required to guarantee your real-time constraints. The OS is specifically designed to match these constraints. You won't get this on a system that is not officially an RTOS. If you are going to write real-time software, I recommend that you buy a good book on the subject. There is considerable literature given that the subject is very sensitive.
Get yourself a good book on real-time system design to get a feel of what questions to ask, and then read the technical documentation of each product you will use to see if it can match your constraints. Example of things to look in software libraries like Qt is when they allocate memory. If this is not documented in each class interface, there is no way to guarantee meeting your constraints since there is hidden algorithmic complexity.
Development environment would be MFC or Qt C++.
I would think that Qt compiles on QNX, but I'm not sure if Qt provides the guarantees required to match your real-time constraints. Libraries that abstract away too much stuff are risk since it's difficult to determine if they satisfy your requirements. Hidden memory management is often problematic, but there are other questions you should ask about too.
It seems to me that people say Real-time systems == embedded systems. Am I wrong?
Real-time system definitely does not equal "embedded system", though many embedded systems have real-time constraints.

How real time do you need?
Remember real time is about responsiveness, not speed. In fact most RTOS will be slower on average than general OS.
Do you need to guarrantee a certain average number of transactions/second or do you need to always respond within n seconds of an event?
Do you have custom hardware or are you relying on inputs over ethernet, USB, etc?
Are drivers for the hardware available on the RTOS or will you have to write them yourself ?
Windows and linux are possible RTOS. Windows embedded allows you to turn off services to give much more reliable response rate and there are both realtime kernels and realtime add-ons to Linux which give pretty much the same real time performance as something like VxWorks.
It also depends on how many tasks you need to handle. A lot of the complexity of true RTOS (like VxWorks) is that they can control many tasks at the same time while allowing each a guaranteed latency and CPU share - important for a Mars rover but not for a single data collection PC

How to program in Windows 7.0 to make it more deterministic?

My understanding is that Windows is non-deterministic and can be trouble when using it for data acquisition. Using a 32bit bus, and dual core, is it possible to use inline asm to work with interrupts in Visual Studio 2005 or at least set some kind of flags to be consistent in time with little jitter?
Going the direction of an RTOS(real time operating system): Windows CE with programming in kernel mode may get too expensive for us.

Real time solutions for Windows such as LabVIEW Real-time or RTX are expensive; a stand-alone RTOS would often be less expensive (or even free), but if you need Windows functionality as well, you are perhaps no further forward.
If cost is critical, you might run a free or low-cost RTOS in a virtual machine. This can work, though there is no cooperation over hardware access between the RTOS and Windows, and no direct communication mechanism (you could use TCP/IP over a virtual (or real) network I suppose.
Another alternative is to perform the real-time data acquisition on stand-alone hardware (a microcontroller development board or SBC for example) and communicate with Windows via USB or TCP/IP for example. It is possible that way to get timing jitter down to the microsecond level or better.

There are third-party realtime extensions to Windows. See, e. g. http://msdn.microsoft.com/en-us/library/ms838340(v=winembedded.5).aspx

Windows is not an RTOS, so there is no magic answer. However, there are some things you can do to make the system more "real time friendly".
Disable background processes that can steal system resources from you.
Use a multi-core processor to reduce the impact of context switching
If your program does any disk I/O, move that to its own spindle.
Look into process priority. Make sure your process is running as High or Realtime.
Pay attention to how your program manages memory. Avoid doing thigs that will lead to excessive disk paging.
Consider a real-time extension to Windows (already mentioned).
Consider moving to a real RTOS.
Consider dividing your system into two pieces: (1) real time component running on a microcontroller/DSP/FPGA, and (2) The user interface portion that runs on the Windows PC.

Low latency trading systems using C++ in Windows?

It seems that all the major investment banks use C++ in Unix (Linux, Solaris) for their low latency/high frequency server applications. Why is Windows generally not used as a platform for this? Are there technical reasons why Windows can't compete?

The performance requirements on the extremely low-latency systems used for algorithmic trading are extreme. In this environment, microseconds count.
I'm not sure about Solaris, but the case of Linux, these guys are writing and using low-latency patches and customisations for the whole kernel, from the network card drivers on up. It's not that there's a technical reason why that couldn't be done on Windows, but there is a practical/legal one - access to the source code, and the ability to recompile it with changes.

Technically, no. However, there is a very simple business reason: the rest of the financial world runs on Unix. Banks run on AIX, the stock market itself runs on Unix, and therefore, it is simply easier to find programmers in the financial world that are used to a Unix environment, rather than a Windows one.

(I've worked in investment banking for 8 years)
In fact, quite a lot of what banks call low latency is done in Java. And not even Real Time Java - just normal Java with the GC turned off. The main trick here is to make sure you've exercised all of your code enough for the jit to have run before you switch a particular VM into prod ( so you have some startup looping that runs for a couple of minutes - and hot failover).
The reasons for using Linux are:
Familiarity
Remote administration is still better, and also low impact - it will have a minimal effect on the other processes on the machine. Remember, these systems are often co-located at the exchange, so the links to the machines (from you/your support team) will probably be worse than those to your normal datacentres.
Tunability - the ability to set swappiness to 0, get the JVM to preallocate large pages, and other low level tricks is quite useful.
I'm sure you could get Windows to work acceptably, but there is no huge advantage to doing so - as others have said, any employees you poached would have to rediscover all their latency busting tricks rather than just run down a checklist.

Linux/UNIX are much more usable for concurrent remote users, making it easier to script around the systems, use standard tools like grep/sed/awk/perl/ruby/less on logs... ssh/scp... all that stuff's just there.
There are also technical issues, for example: to measure elapsed time on Windows you can choose between a set of functions based on the Windows clock tick, and the hardware-based QueryPerformanceCounter(). The former is increments each 10 to 16 milliseconds (note: some documentation implies more precision - e.g. the values from GetSystemTimeAsFileTime() measure to 100ns, but they report the same 100ns edge of the clock tick until it ticks again). The latter - QueryPerformanceCounter() - has show-stopping issues where different cores/cpus can report clocks-since-startup that differ by several seconds due to being warmed up at different times during system boot. MSDN documents this as a possible BIOS bug, but it's common. So, who wants to develop low-latency trading systems on a platform that can't be instrumented properly? (There are solutions, but you won't find any software ones sitting conveniently in boost or ACE).
Many Linux/UNIX variants have lots of easily tweakable parameters to trade off latency for a single event against average latency under load, time slice sizes, scheduling policies etc.. On open source Operating Systems, there's also the assurance that comes with being able to refer to the code when you think something should be faster than it is, and the knowledge that a (potentially huge) community of people have been and are doing so critically - with Windows it's obviously mainly going to be the people who're assigned to look at it.
On the FUD/reputation side - somewhat intangible but an important part of the reasons for OS selection - I think most programmers in the industry would just trust Linux/UNIX more to provide reliable scheduling and behaviour. Further, Linux/UNIX has a reputation for crashing less, though Windows is pretty reliable these days, and Linux has a much more volatile code base than Solaris or FreeBSD.

Reason is simple, 10-20 years ago when such systems emerged, "hardcore" multi-CPU servers were ONLY on some sort of UNIX. Windows NT was in kinder-garden these days. So the reason is "historical".
Modern systems might be developed on Windows, it's just a matter of taste these days.
PS: I am currencly working on one of such systems :-)

I partially agree with most of the answers above. Though what I have realized is the biggest reason to use C++ is becuase it is relatively faster with a very vast STL library.
Apart from it, linux/unix system is also used to boost performance. I know many low latency team which go to a extent of tweaking the linux kernel. Obviously this level of freedom is not provided by windows.
Other reasons like legacy systems, license cost, resources count as well but are lesser driving factors. As "rjw" mentioned, I have seen teams use Java as well with a modified JVM.

There are a variety of reasons, but the reason is not only historical. In fact, it seems as if more and more server-side financial applications run on *nix these days than ever before (including big names like the London Stock Exchange, who switched from a .NET platform). For client-side or desktop apps, it would be silly to target anything other than Windows, as that is the established platform. However, for server-side apps, most places that I have worked at deploy to *nix.

I second the opinions of historical and access to kernel manipulation.
Apart from those reasons I also believe that just like how they turn off garbage collection of .NET and the similar mechanism in Java when using these technologies in some low latency. They might avoid Windows because of the API's at high level which interact with low level os and then the kernel.
So the core is of course the kernel which can be interacted with using the low level os. The high level APIs are provided just to make the common users life easier. But in case of Low latency this turns out to be a fatty layer and fraction seconds loss around each operation. So a lucrative option for gaining few fraction seconds per call.
Apart from this another thing to consider is integration. Most of the servers, data centers, exchanges use UNIX not windows so using the clients of same family makes the integration and communication easier.
Then you have security issues (many people out there might not agree with this point though) hacking UNIX is not easy compared to hacking WINDOWS. I don't agree licensing must be the issue for banks because they shower money on every single piece of hardware and software and the people who customize them, so buying licenses will not be as bigger the issue when considered what they gain by purchasing.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js