kernel vs user-space audio device driver on macOS

kernel vs user-space audio device driver on macOS - c++

I'm in a need to develop an audio device driver for System Audio Capture(based on Soundflower). But soon a problem appeared that it seems IOAudioFamily stack is being deprecated in OSX 10.10 and later. Looking through the IOAudioDevice and IOAudioEngine header files it seems that apple recommends now using the <CoreAudio/AudioServerPlugIn.h> API which runs in user-space. But I can't find lots of information on this user-space device drivers topic. It seems that the only resource is the Apple provided sample devices from https://developer.apple.com/library/prerelease/content/samplecode/AudioDriverExamples/Introduction/Intro.html
Looking through the examples I find that its a lot harder and more work to develop a user-space driver instead of I/O Kit kernel based.
So the question arises what should motivate to develop a device driver in user-space instead of kernel space?

The "SimpleAudioDriver" example is somewhat misnamed. It demonstrates pretty much every feature of the API. This is handy as a reference if you actually need to use those features. It's also structured in a way that's maybe a little more complicated than necessary.
For a virtual device, the NullAudioDriver is probably a much better base, and much, much easier to understand (single source file, if I remember correctly). SimpleAudioDriver is more useful for dealing with issues such as hotplugging, multiple instances of identical devices, etc.
IOAudioEngine is deprecated as you say, and has been since OS X 10.10. Expect it to go away eventually, so if you build your driver with it, you'll probably need to rewrite it sooner than if you create a Core Audio Server Plugin based one.
Testing and debugging audio drivers is awkward either way (due to being so time sensitive), but I'd say userspace ones are slightly less frustrating to deal with. You'll still want to test on a different machine than your development Mac, because if coreaudiod crashes or hangs, apps usually start locking up too, so being able to just ssh in, delete your plugin and kill coreaudiod is handy. Certainly quicker turnaround than having to reboot.
(FWIW, I've shipped both kernel and userspace OS X audio drivers, and I spend a lot of time working on kexts.)

There is a great book on this subject, available free online here:
http://free-electrons.com/doc/books/ldd3.pdf
See page 37 for a summary of why you might want a user-space driver, copied here for convenience:
The advantages of user-space drivers are:
The full C library can be linked in. The driver can perform many exotic tasks without resorting to external programs (the utility
programs implementing usage policies that are usually distributed
along with the driver itself).
The programmer can run a conventional debugger on the driver code without having to go through contortions to debug a running kernel.
If a user-space driver hangs, you can simply kill it. Problems with the driver are unlikely to hang the entire system, unless the hardware
being controlled is really misbehaving.
User memory is swappable, unlike kernel memory. An infrequently used device with a huge driver won’t occupy RAM that other programs could
be using, except when it is actually in use.
A well-designed driver program can still, like kernel-space drivers, allow concurrent access to a device.
If you must write a closed-source driver, the user-space option makes it easier for you to avoid ambiguous licensing situations and
problems with changing kernel interfaces.

Related

How to access the hardware in QNX?

I installed QNX on the machine. The question is, the embedded system must also have access to the hardware, port management, and so on. How is this implemented in QNX? In what direction to study? So far I've found this the organization of files, directories, users, groups, etc. Or I do not understand the operating principle of the system

NOTE: I PUT A LINK ON CODE SAMPLES at THE BOTTOM.
Ill try to explain it in terms of difference between Linux and QNX.
QNX is a RTOS and its kernel can be referred as Neutrino Kernel. Kernel is just a bare bones which interacts with H/W and it is the core of any operating system, But OS consists of application software and Kernel which works in unison to achieve the purpose of a computer system.
Linux on its own is just a Kernel, the GNU/Linux is a complete OS.
Linux is based on monolithic architecture whereas QNX is Micro Kernel.
Monolithic kernel: all the OS service run along with the kernel main thread thus residing in the same memory. Monolithic kernels are easier to implement but a bug in some part like the driver can bring down the total system.
MORE RANT:
QNX is a complete microkernel based on realtime OS, vs Linux which is a monolithic kernel. QNX can run on many Embedded platforms, such as on mini computers in cars which have satnav or music controls.(Jeep Cherokee), SCADA systems. The application building framework is much different than X11, or Wayland you get on Linux. As shown in QNX GUI it is much closer to to the bone and metal.
Example: In Linux if you want to draw a circle on the screen, this will go through many layers of abstraction like the X11, in QNX things take a more direct route which makes it faster on a small chip, this results in loosing most of the networky stuff which X11 makes possible on to Linux.
QNX is somewhat out of the box, supported framework for making embedded systems, vs GNU/LINUX is a bit more opposite of this.
Real Time side of things is about both timely responses and accuracy of the response.
Look here to understand QNX and different parts that you need for coding.
QNX Sample Code can be found here.

The documentation for QNX SDP 7 is at http://www.qnx.com/download/group.html?programid=29184 - you'll need to log in to access it (create an account if you don't have one already).
The QNX Neutrino System Architecture guide is a must read.
By and large, hardware access will be needed for system startup (see Building Embedded Systems) and processes providing system services (Writing a Resource Manager). Primarily you'll be looking at having sufficient privilege to access ports, attach interrupt handlers, and map hardware resources into the address space of your program, then creating initialization routines, interrupt handlers (QNX Neutrino RTOS Programmers Guide), and various forms of event responders that operate in threads within resource manager processes when unblocked by interrupt handlers. The QNX Neutrino Cookbook gives some examples. Look for functions like mmap* in* out* shm* in the library reference and when searching for examples.
But, study and really learn the System Architecture first, it will be hard to find your way around the rest of the documentation and make sense of it without understanding the architecture and the associated terminology.
Have fun!

Fast cross-platform cpu profiling of C++ functions

I have written a software in Qt for a company, which works with a hardware device and shows realtime plots. I have problems with its speed and I need to know which parts are cpu intensive, but there are lots of threads and events, and when I use valgrind the application gets so slow that serial handlers won't work as expected and timeouts happen and therefore I can't find what's going on. The code base is huge and simplification is almost impossible, because things depend on each other. I'm developing on MacOSX but the application runs on Linux, I wanted to know if there is a faster profiler than valgrind out there. Preferably one that works on MacOSX, as valgrind doesn't work on MacOSX (It says it does, but there are so many problems that makes it just not practical). Thanks in advance.
P.S. If you need any more info, please comment instead of down voting, I can't offer the code as it is proprietary, but I can say it is well written, and I'm a fairly experienced C++ programmer.

How to program in Windows 7.0 to make it more deterministic?

My understanding is that Windows is non-deterministic and can be trouble when using it for data acquisition. Using a 32bit bus, and dual core, is it possible to use inline asm to work with interrupts in Visual Studio 2005 or at least set some kind of flags to be consistent in time with little jitter?
Going the direction of an RTOS(real time operating system): Windows CE with programming in kernel mode may get too expensive for us.

Real time solutions for Windows such as LabVIEW Real-time or RTX are expensive; a stand-alone RTOS would often be less expensive (or even free), but if you need Windows functionality as well, you are perhaps no further forward.
If cost is critical, you might run a free or low-cost RTOS in a virtual machine. This can work, though there is no cooperation over hardware access between the RTOS and Windows, and no direct communication mechanism (you could use TCP/IP over a virtual (or real) network I suppose.
Another alternative is to perform the real-time data acquisition on stand-alone hardware (a microcontroller development board or SBC for example) and communicate with Windows via USB or TCP/IP for example. It is possible that way to get timing jitter down to the microsecond level or better.

There are third-party realtime extensions to Windows. See, e. g. http://msdn.microsoft.com/en-us/library/ms838340(v=winembedded.5).aspx

Windows is not an RTOS, so there is no magic answer. However, there are some things you can do to make the system more "real time friendly".
Disable background processes that can steal system resources from you.
Use a multi-core processor to reduce the impact of context switching
If your program does any disk I/O, move that to its own spindle.
Look into process priority. Make sure your process is running as High or Realtime.
Pay attention to how your program manages memory. Avoid doing thigs that will lead to excessive disk paging.
Consider a real-time extension to Windows (already mentioned).
Consider moving to a real RTOS.
Consider dividing your system into two pieces: (1) real time component running on a microcontroller/DSP/FPGA, and (2) The user interface portion that runs on the Windows PC.

How can I create an executable to run on a certain processor architecture (instead of certain OS)?

So I take my C++ program in Visual studio, compile, and it'll spit out a nice little EXE file. But EXEs will only run on windows, and I hear a lot about how C/C++ compiles into assembly language, which is runs directly on a processor. The EXE runs with the help of windows, or I could have a program that makes an executable that runs on a mac. But aren't I compiling C++ code into assembly language, which is processor specific?
My Insights:
I'm guessing I'm probably not. I know there's an Intel C++ compiler, so would it make processor-specific assembly code? EXEs run on windows, so they advantage of tons of things already set up, from graphics packages to the massive .NET framework. A processor-specific executable would be literally starting from scratch, with just the instruction set of the processor.
Would this executable be a file-type? We could be running windows and open it, but then would control switch to processor only? I assume this executable would be something like an operating system, in that it would have to be run before anything else was booted up, and have only the processor instruction set to "use".

Let's think about what "run" means...
Something has to load the binary codes into memory. That's an OS feature. The .EXE or binary executable file or bundle or whatever, is formatted in a very OS-specific way so that the OS can load it into memory.
Something has to turn control over to those binary codes. There's the OS, again.
The I/O routines (in C++, but this is true in most places) are just a library that encapsulate OS API's. Drat that OS, it's everywhere.
Reminiscing.
In the olden days (yes, I'm this old) I worked on machines that didn't have OS's. We also didn't have C.
We wrote machine codes using tools like "assemblers" and "linkers" to create big binary images that we could load into the machine. We had to load these binary images through a painful bootstrap process.
We'd use front panel keys to load enough code into memory to read a handy device like a punched paper-tape reader. This would load a small piece of fairly standard boot linking loader software. (We used mylar tape so it wouldn't wear out.)
Then, when we had this linking loader in memory, we could feed the tape we'd prepared earlier with the assembler.
We wrote our own device drivers. Or we used library routines that were in source form, punched on paper tapes.
A "patch" was actually patched pieces of paper tape. Plus, since there were also little bugs, we'd have to adjust the memory image based on hand-written instructions -- patches that hadn't been put into the tape.
Later, we had simple OS's that had simple API's, simple device drivers, and a few utilities like a "file system", an "editor" and a "compiler". It was for a language called Jovial, but we also used Fortran sometimes.
We had to solder serial interface boards so we could plug in a device. We had to write device drivers.
Bottom Line.
You can easily write C++ programs that don't require an OS.
Learn about the hardware BIOS (or BIOS-like) facilities that are part of your processor's chipset. Most modern hardware has a simple OS wired into ROM that does power-on self-test (POST), loads a few simple drivers, and locates boot blocks.
Learn how to write your own boot block. That is the first proper "software" thing that's loaded after POST. This isn't all that hard. You can use various partitioning tools to force your boot block program onto a disk and you'll have complete control over the hardware. No OS.
Learn how GRUB, LILO or BootCamp launch an OS. It's not complicated. Once they're booted, they can load your program and you're off and running. This is slightly simpler because you create the kind of partition that a boot loader wants to load. Base yours on the Linux kernel and you'll be happier. Don't try to figure out how Windows boots -- it's too complicated.
Read up on ELF. http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
Learn how device drivers are written. If you don't use an OS, you'll need to write device drivers.

The problem is that the OS really does a lot to start your programs. The EXE file itself has header information on it that Windows recognizes, identifying itself as an EXE file. Your app does everything, from filesystem access to memory allocations, through the OS.
But yes, you CAN run apps compiled for Windows/intel on other platforms without emulation. If you want to run your EXE on a Mac or UNIX, you will need to install a bit more software to do the work that Windows would do to run your program -- take a look at the "Wine" project.

What you're talking about is what's known in the embedded world as a "bare-metal" application. They're very common for things like a ARM Cortex-M3 that goes in (say) a debit-card validator box or an interactive toy, and doesn't have enough memory or capability to run a full operating system. So, instead of getting an "ARM/Linux" compiler that would compile an application to run on Linux on an ARM processor, you get an "ARM bare-metal" compiler that compiles things to run on an ARM processor without an operating system. (I'm using ARM rather than x86 as an example, because x86 bare-metal applications are really quite rare these days.)
As stated in your question and the other answers, your application will need to do some things that would otherwise be taken care of by the operating system.
First, it needs to initialize the memory system, the interrupt vectors, and various other bits of board goo. Typically this is something that a bare-metal compiler will do for you, though if you have a weird board, you may need to tell it how to do that. This gets things from the point where the board turns on to the point where your main() function starts.
Then, you need to interact with things outside the CPU and RAM. An operating system includes all sorts of functions for doing this -- disk I/O, screen output, keyboard and mouse input, networking, etc., so forth, and so on. Without an operating system, you have to get that from somewhere else. You may get some of that from libraries from your hardware manufacturer; for instance, a board I was recently playing with has a 40x200-pixel LED screen, and it came with a library with the code to turn that on and set individual pixel values on it. And there are several companies selling libraries to implement a TCP/IP stack and things like that, for doing networking or whatnot.
Consider, for example, that this makes it difficult to do even a basic printf. When you have an operating system, printf just sends a message to the operating system that says "put this string on the console", and the operating system finds the current cursor position on the console, and does all the stuff to figure out what pixels to change on the screen, and what CPU instructions to use to change those pixels, in order to do that.
Oh, and did we mention that you first have to figure out how to get the program into the CPU? A typical computer has a bit of programmable ROM that it will load instructions from when it starts up. On an x86, this is the BIOS, and it usually already contains a handy program that gets the CPU started, sets up the display, looks for disks, and loads a program off the disk that it finds. On an embedded system, that's typically where your program goes -- which means you need some way to put your program there. Often, that means you have a device called a "debugger" that's physically attached to your embedded board that loads the program -- and can also do things that allow you to pause the processor and determine what its state is, so that you can step through your program just as if you were running it in a software debugger on your computer. But I digress.
Anyway, to answer your second question, this executable that you'd create is something that gets stored in that ROM on your embedded board -- or perhaps you'd just store a bit of it in ROM (which is, after all, pretty small) and store the rest on a flash drive, and the bit in ROM would include the instructions to get the rest of it off the flash drive. It would probably be stored as a file on your main computer (that is, the Linux or Windows computer where you're creating it), but that's just for storage, it wouldn't run there.
You'll notice that when you've got a lot of these libraries together, they're doing a fair bit of what an operating system does, and there's sort of this space between the pile of libraries and a real operating system. In that space goes what's called an RTOS -- "real-time operating system". The smaller ones of these are really just collections of libraries that work together to do all the operating-systemy things, and sometimes also include stuff so you can run multiple threads at once (and then you can have different threads act like different programs) -- though all of this is all compiled into the same compiled "program", and the RTOS is really nothing more than a library you've included. Larger ones start storing parts of the code in separate places, and I think some of them can even load pieces of code off of disks -- just like Windows and Linux do when running a program. It's sort of a continuum, rather than an either/or.
The FreeRTOS system is an open-source RTOS that's towards the smaller end of the RTOS space; they might be a good place to look at some of this if you're more interested. They do have some examples of x86 applications, which would give you an idea of what sort of x86 systems would run a bare-metal or RTOS-based program and how you'd compile something to run on one; link here: http://www.freertos.org/a00090.html#186.

The computer is not the CPU. To do anything useful, the CPU has to be connected to memory and IO controllers and other devices. An OS takes care of abstracting all of that from running programs. So, if you want to write a program that runs without an OS, your program will have to replicate at least some features of an OS: Taking over from the BIOS during the boot process, initializing devices, communicating with the disk controller to load code and data, communicating with the display controller to show information to the user, communicating with the keyboard controller and the mouse controller to read user input etc etc etc.
Unless you are building an embedded system with specialized hardware, there is no point in doing this. Besides, running your program would mean the user would have to give up running other programs. While this may be acceptable for an ATM today or WordStar in 1984, these days people frown on not being able to check email while listening to music.

Sure, they exist. They are called cross compilers. For example, that's how I can program for the iPhone platform using Xcode.
A related type of compiler is one that compiles for a virtual platform. That's how Java works.

Any given compiler/toolset produces code for a particular processor/OS combination. So your Visual Studio compile example produces code for x86/Windows. That .EXE will only run on x86/Windows and not on (for example) ARM/Windows (as used by some cellphones).
To produce code for a processor/OS combination other than what you're running the compiler on requires what is generally referred to as a cross-compiler. If you have a full professional Visual Studio subscription, you can get the ARM cross compiler, which will allow you to produce ARM/Windows .EXE files which won't run on your desktop machine, but WILL run on an ARM/Windows based cellphone or palmtop.

Yes, you can make an executable that runs on the 'bare metal' of a processor. Obviously that's how operating system kernels work. The main thing you need to do is create an executable that uses no libraries whatsoever. However, the "no libraries" restriction includes the C standard library! So that means no malloc, no printf, etc. You have to basically be your own OS and manage memory and I/O yourself. This will inevitably require a fair bit of work directly in assembly at some stage.
You also lose several other luxuries, such as main(), which can't be the starting point of your program since main() is something that is invoked by the OS and the C runtime environment.

Absolutely! That is what embedded programming is. As many have probably said already the operating system does quite a bit for you. And even in the embedded world without an operating system a number of the development tools will provide the startup code to get the processor running enough to jump to your program. Some/many provide full blow C/C++ libraries so that you can call functions like memcpy() and sometimes even malloc() and printf().
You are welcome to provide every line of code and every instruction and not use a development tool package but still use a compiler like gcc for example. Some of the binary formats are common to those run on operating systems like elf for example. You can execute elf files on Linux but also have your embedded program result in an elf binary. The processor cannot execute elf in that format but whatever programs the boot prom or ram in some cases will extract the binary program from the elf file, not unlike an operating system extracting the program to run from an elf file. EXE is not one of those file formats. Your favorite windows application compiler is probably not an embedded compiler either although you can sometimes use one to do the high level language stuff and then use an alternative assembler and linker. More work than it is worth usually. For example you write a function in C (that does NOT make any library or system calls), compile that to an object. Write your own or find a utility to extract the compiled binary from that object, convert it to another object format or to assembler (disassemble). Add your startup code and other assembly to it. Assemble and link everything together as an embedded program. I did it once with Microsofts embedded visual C just to see how it measured up to other compilers, it wasnt horrible but certainly was not worth the effort of hacking to get at the output.
Every processor from the one in your computer to the one in your cell phone or microwave has too have some boot up code. That code is not running on an operating system. That code uses the same or similar compilers than operating system applications use. For some devices that code puts the processor and memory and on and off chip peripherals in a state where the operating system can be started. From there the operating system takes over. On your computer this would be the BIOS followed by the bootloader, then eventually the operating system, dos, windows, linux, etc.

The main problem is the file format. PE is very different to ELF(Used in unix-like systems). A valid PE program cannot be a valid ELF. So, you either load the binary dynamically with different starters or you have to give up.
Other than that, with knowledge of OS services, the value of registers at startup, etc. your code can probably detect easily and reliably which OS you are running under and act accordingly(Some malware does just that). Another challenge is then reusing code instead of having two or more different programs in the same binary. Basically you would have to write an emulator, at least for the services that you need.

Don't also forget about the Windows libraries. Look into QT and GTK+

Interprocess communication between 32- and 64-bit apps on Windows x64

We'd like to support some hardware that has recently been discontinued. The driver for the hardware is a plain 32-bit C DLL. We don't have the source code, and (for legal reasons) are not interested in decompiling or reverse engineering the driver.
The hardware sends tons of data quickly, so the communication protocol needs to be pretty efficient.
Our software is a native 64-bit C++ app, but we'd like to access the hardware via a 32-bit process. What is an efficient, elegant way for 32-bit and 64-bit applications to communicate with each other (that, ideally, doesn't involve inventing a new protocol)?
The solution should be in C/C++.
Update: several respondents asked for clarification whether this was a user-mode or kernel-mode driver. Fortunately, it's a user-mode driver.

If this is a real driver (kernel mode), you're SOL. Vista x64 doesn't allow installing unsigned drivers. It this is just a user-mode DLL, you can get a fix by using any of the standard IPC mechanisms. Pipes, sockets, out-of-proc COM, roughly in that order. It all operates on bus speeds so as long as you can buffer enough data, the context switch overhead shouldn't hurt too much.

I would just use sockets. It would allow you to use it over IP if you need it in the future, and you won't be tied down to one messaging API. If in the future you wish to implement this on another OS or language, you can.

This article might be of interest. It discusses the problem and then suggests using COM as a solution. I'm not a big fan of COM but given its ubiquity in the Windows universe, it's possible that it might be efficient enough. You will probably want to architect your solution so that you can batch data (you don't want to do one COM call for each item of data).

Elegant? C++? DCOM/RPC calls to yourself might work, or you could create a named pipe and use that to talk between the two processes (maybe create a "CMessage class" or something), though watch out for different structure alignment between x86 and x64.

If the driver does turn out to be a real driver, nobugz is almost right -- you're going to have to work a lot harder, you're not completely SOL. One solution is to install Win32 on some other machine (or virtual machine) and then use some form of RPC, such as sockets (as suggested by Pyrolistical) or UDP or MQ or even Tibco Rendezvous (which claims to support very high throughput in order to handle the volumes of data generated by the financial markets -- at least that's what I remember from back in the old days).

A memory-mapped file, shared by both sides would have the same contents. The OS will have to do some interesting pointer stuff to make it happen, but quite likely will be able to setup the 2 views in such a way that you're not physically copying memory around. Zero copies is about as good as it gets

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js