C++ let a program work just at one pc

C++ let a program work just at one pc - c++

I want to make a program for some people, so i make the program again for each person, and if someone gives the program to other guy, he can't use it.
How can I do that?
Without any internet connection.
Does any PC has some unique id or something like that, that i can make him a program to get it, so he will send me, and in my program i'll check if is the same, if not the program will stop.
Something like hwid will work?
Is the hardware id unique and cannot be changed?
If so, how can I get it? I found a lot of questions, but without any good answers..

Take a look at these:
Uniquely identify PC based on software/hardware
C++ API : license management to protect a software
Generating a Hardware-ID on Windows
Restrict functionality to a certain computer
How to get unique hardware/software signature from a windows pc in c/c++
If you want something a bit harder to spoof than whatever the machine itself can tell you, you'll probably need a USB dongle dedicated for this purpose.

There are several ways to identify a computer from where a program runs:
WMI - Windows provides a set of classes that can be used for most hardware enumeration and identification tasks, which is named WMI or Windows Management Instrumentation. These are extensions to the Windows Driver Model (WDM).
CPU ID - The solution that seems to be the best choice is to sample the CPU unique identification number (or CPU ID). However, there are several problems that makes it impossible to rely on reading the CPU ID.
To begin with, most CPUs with the exception of the old Pentium III, don't have a unique CPU Serial Number. Intel has removed this feature for privacy reasons.
It is still possible to generate a unique ID from the motherboard as a whole. That certainly works but the huge number of different types of motherboards and manufacturers makes it next to impossible to generate a unique ID that will cover all of them.
MAC address based hardware ID
The next choice for obtaining such a unique ID would be sampling the MAC address. To begin with, what is the "MAC address"? It stands for Media Access Control. The MAC address is 48 bits long (6 bytes). The GetMACAddress code sample explains how to obtain the MAC address.
However, there is one problem with this approach: the MAC address can be easily changed into a new one...
Hard Drive serial number
It seems that the only reliable solution for obtaining a machine ID would be using the serial number of the main Hard Drive. The second example, GetHDSerialNumber, shows how to obtain this ID. From my experience, this approach is the best one and the most reliable for generating a unique machine based hardware ID.
See also this article.

Like others have said, it is really hard to do this reliably.
You CAN use things like hardware dongles or licensing software to try to restrict use. For anyone sufficiently motivated this is a speed bump, not much more.
Another aspect of this is that the more secure you try to make it, the higher the risk that it'll be too restrictive. That is, it might end up accidentally blocking legitimate use, which is a really bad thing to do if you want to keep users happy.

This was tried many, many times when the PCs became popular. Each time a dismal failure. It even interferes with rights the law grants the user (keep backup copies). It also turned out that the hassle for the user was enough for many of them to just don't use the "copy protected" programs.
Today this is done successfully by the various gaming consoles, but there the provider of the console has a very tight control over the machine and the software. By force, those can't be used as regular computing platforms by the user, they are single-purpose. No wide range of software available.
The only ones to pull of this feat on regular machines have been expensive programs like Mathlab or Autocad, mostly through some sort of "license server" under tight control of the network administrator, tied to the specific server on which it runs by some long-winded procedure. And even so, it isn't too hard to get pirated ("unlocked") copies.

Related

What will be the exact code to get count of last level cache misses on Intel Kaby Lake architecture

I read an interesting paper, entitled "A High-Resolution Side-Channel Attack on Last-Level Cache", and wanted to find out the index hash function for my own machine—i.e., Intel Core i7-7500U (Kaby Lake architecture)—following the leads from this work.
To reverse-engineer the hash function, the paper mentions the first step as:
for (n=16; ; n++)
{
// ignore any miss on first run
for (fill=0; !fill; fill++)
{
// set pmc to count LLC miss
reset_pmc();
for (a=0; a<n; a++)
// set_count*line_size=2^19
load(a*2^19);
}
// get the LLC miss count
if (read_pmc()>0)
{
min = n;
break;
}
}
How can I code the reset_pmc() and read_pmc() in C++? From all that I read online so far, I think it requires inline assembly code, but I have no clue what instructions to use to get the LLC miss count. I would be obliged if someone can specify the code for these two steps.
I am running Ubuntu 16.04.1 (64-bit) on VMware workstation.
P.S.: I found mention of these LONGEST_LAT_CACHE.REFERENCES and LONGEST_LAT_CACHE.MISSES in Chapter-18 Volume 3B of the Intel Architectures Software Developer's Manual, but I do not know how to use them.

You can use perf as Cody suggested to measure the events from outside the code, but I suspect from your code sample that you need fine-grained, programmatic access to the performance counters.
To do that, you need to enable user-mode reading of the counters, and also have a way to program them. Since those are restricted operations, you need at least some help from the OS kernel to do that. Rolling your own solution is going to be pretty difficult, but luckily there are several existing solutions for Ubunty 16.04:
Andi Kleen's jevents library, which among other things lets you read PMU events from user space. I haven't personally used this part of pmu-tools, but the stuff I have used has been high quality. It seems to use the existing perf_events syscalls for counter programming so and doesn't need a kernel model.
The libpfc library is a from-scratch implementation of a kernel module and userland code that allows userland reading of the performance counters. I've used this and it works well. You install the kernel module which allows you to program the PMU, and then use the API exposed by libpfc to read the counters from userspace (the calls boil down to rdpmc instructions). It is the most accurate and precise way to read the counters, and it includes "overhead subtraction" functionality which can give you the true PMU counts for the measured region by subtracting out the events caused by the PMU read code itself. You need to pin to a single core for the counts to make sense, and you will get bogus results if your process is interrupted.
Intel's open-sourced Processor Counter Monitor library. I haven't tried this on Linux, but I used its predecessor library, the very similarly named1 Performance Counter Monitor on Windows, and it worked. On Windows it needs a kernel driver, but on Linux it seems you can either use a drive or have it go through perf_events.
Use the likwid library's Marker API functionality. Likwid has been around for a while and seems well supported. I have used likwid in the past, but only to measure whole processes in a matter similar to perf stat and not with the marker API. To use the marker API you still need to run your process as a child of the likwid measurement process, but you can read programmatically the counter values within your process, which is what you need (as I understand it). I'm not sure how likwid is setting up and reading the counters when the marker API is used.
So you've got a lot of options! I think all of them could work, but I can personally vouch for libpfc since I've used it myself for the same purpose on Ubuntu 16.04. The project is actively developed and probably the most accurate (least overhead) of the above. So I'd probably start with that one.
All of the solutions above should be able to work for Kaby Lake, since the functionality of each successive "Performance Monitoring Architecture" seems to generally be a superset of the prior one, and the API is generally preserved. In the case of libpfc, however, the author has restricted it to only support Haswell's architecture (PMA v3), but you just need to change one line of code locally to fix that.
1 Indeed, they are both commonly called by their acronym, PCM, and I suspect that the new project is simply the officially open sourced continuation of the old PCM project (which was also available in source form, but without a mechanism for community contribution).

I would use PAPI, see http://icl.cs.utk.edu/PAPI/
This is a cross platform solution that has a lot of support, especially from the hpc community.

Kernel module or user space application

I have a dilemma. I do not know what is the best approach to the following scenario and then if it makes sense to invest time on developing a kernel module.
I have hardware (FPGA) that is exposed like many modules (around 30). Each module can be defined like:
Base address of the module;
Fields' offset (from base address);
The maximum number of fields per modules is around 10;
Each field has its own type like uint32_t, float32_t, uint32_t[] etc;
Some fields are read/write only and other read only;
Usually a module is ready as is. I mean that it is not necessary to implement any logic to check if it is possible to write to a field (except in few cases).
On the target device there is a custom Linux distribution (built from Yocto).
What do you think is better?
Application in user space that uses mmap (/dev/mem to map all
modules) and then reads/writes directly from/to memory. I have a C++
implementation and it is working but maybe it is not the best
solution... I need to set manually all offsets, using many
reinterpret_cast<> to read data properly and if something it is
wrong the application crashes;
Implement a character device
driver to expose each module like /dev/module1, /dev/module2 etc?
and use in user space open/write/read/release/ioctl. I have just
started to read a huge manual about Linux kernel development and I
am not so sure if a character device is a good idea here, especially
how to expose so many modules with so many fields to user space;
Other.
Thank you a lot for any ideas.

Using /dev/mem is quite straightforward, however it also causes some serious security issues. You either have to run your application as root or make the /dev/mem file accessible for other users, which are both unwelcome in designs that at some point will become products. If a malicious process can access the /dev/mem file it can possibly access any secret stored in RAM or corrupt any application - including the kernel itself. Even if your application is the only one able to access this file, any security concern of your code becomes the security concern of the whole system.
Preparing the driver is obviously not an easy task, but allows you to separate the (usually simple) privileged code from the applications in user space. In a simplest case you only have to provide some register read and write methods (through ioctl). These should check if the address is well aligned and constrained to the device address space. Additionally, the driver usually performs any additional address translation - so the client application does not need to know under which physical address was your device mapped (which is the case e.g. with PCI Express).
I would not recommend writing the driver from scratch, but to repurpose some existent code. In the mentioned case of PCI Express I have used two sources of inspiration - the Xilinx driver described here: https://www.xilinx.com/support/answers/65444.html (sources included) and more complicated 'pcieuni' and 'gpcieuni' from ChimeraTk project (https://github.com/ChimeraTK).

c++ Distributed computing of an executable program

I was wondering if it is possible to run an executable program without adding to its source code, like running any game across several computers. When i was programming in c# i noticed a process method, which lets you summon or close any application or process, i was wondering if there was something similar with c++ which would let me transfer the processes of any executable file or game to other computers or servers minimizing my computer's processor consumption.
thanks.

Everything is possible, but this would require a huge amount of work and would almost for sure make your program painfully slower (I'm talking about a factor of millions or billions here). Essentially you would need to make sure every layer that is used in the program allows this. So you'd have to rewrite the OS to be able to do this, but also quite a few of the libraries it uses.
Why? Let's assume you want to distribute actual threads over different machines. It would be slightly more easy if it were actual processes, but I'd be surprised many applications work like this.
To begin with, you need to synchronize the memory, more specifically all non-thread-local storage, which often means 'all memory' because not all language have a thread-aware memory model. Of course, this can be optimized, for example buffer everything until you encounter an 'atomic' read or write, if of course your system has such a concept. Now can you imagine every thread blocking for synchronization a few seconds whenever a thread has to be locked/unlocked or an atomic variable has to be read/written?
Next to that there are the issues related to managing devices. Assume you need a network connection: which device will start this, how will the ip be chosen, ...? To seamlessly solve this you probably need a virtual device shared amongst all platforms. This has to happen for network devices, filesystems, printers, monitors, ... . And as you kindly mention games: this should happen for a GPU as well, just imagine how this would impact performance in only sending data from/to the GPU (hint: even 16xpci-e is often already a bottleneck).
In conclusion: this is not feasible, if you want a clustered application, you have to build it into the application from scratch.

I believe the closest thing you can do is MapReduce: it's a paradigm which hopefully will be a part of the official boost library soon. However, I don't think that you would want to apply it to a real-time application like a game.
A related question may provide more answers: https://stackoverflow.com/questions/2168558/is-there-anything-like-hadoop-in-c
But as KillianDS pointed out, there is no automagical way to do this, nor does it seem like is there a feasible way to do it. So what is the exact problem that you're trying to solve?

The current state of research is into practical means to distribute the work of a process across multiple CPU cores on a single computer. In that case, these processors still share RAM. This is essential: RAM latencies are measured in nanoseconds.
In distributed computing, remote memory access can take tens if not hundreds of microseconds. Distributed algorithms explicitly take this into account. No amount of magic can make this disappear: light itself is slow.

The Plan 9 OS from AT&T Bell Labs supports distributed computing in the most seamless and transparent manner. Plan 9 was designed to take the Unix ideas of breaking jobs into interoperating small tasks, performed by highly specialised utilities, and "everything is a file", as well as the client/server model, to a whole new level. It has the idea of a CPU server which performs computations for less powerful networked clients. Unfortunately the idea was too ambitious and way beyond its time and Plan 9 remained largerly a research project. It is still being developed as open source software though.
MOSIX is another distributed OS project that provides a single process space over multiple machines and supports transparent process migration. It allows processes to become migratable without any changes to their source code as all context saving and restoration are done by the OS kernel. There are several implementations of the MOSIX model - MOSIX2, openMosix (discontinued since 2008) and LinuxPMI (continuation of the openMosix project).
ScaleMP is yet another commercial Single System Image (SSI) implementation, mainly targeted towards data processing and Hight Performance Computing. It not only provides transparent migration between the nodes of a cluster but also provides emulated shared memory (known as Distributed Shared Memory). Basically it transforms a bunch of computers, connected via very fast network, into a single big NUMA machine with many CPUs and huge amount of memory.
None of these would allow you to launch a game on your PC and have it transparently migrated and executed somewhere on the network. Besides most games are GPU intensive and not so much CPU intensive - most games are still not even utilising the full computing power of multicore CPUs. We have a ScaleMP cluster here and it doesn't run Quake very well...

Low latency trading systems using C++ in Windows?

It seems that all the major investment banks use C++ in Unix (Linux, Solaris) for their low latency/high frequency server applications. Why is Windows generally not used as a platform for this? Are there technical reasons why Windows can't compete?

The performance requirements on the extremely low-latency systems used for algorithmic trading are extreme. In this environment, microseconds count.
I'm not sure about Solaris, but the case of Linux, these guys are writing and using low-latency patches and customisations for the whole kernel, from the network card drivers on up. It's not that there's a technical reason why that couldn't be done on Windows, but there is a practical/legal one - access to the source code, and the ability to recompile it with changes.

Technically, no. However, there is a very simple business reason: the rest of the financial world runs on Unix. Banks run on AIX, the stock market itself runs on Unix, and therefore, it is simply easier to find programmers in the financial world that are used to a Unix environment, rather than a Windows one.

(I've worked in investment banking for 8 years)
In fact, quite a lot of what banks call low latency is done in Java. And not even Real Time Java - just normal Java with the GC turned off. The main trick here is to make sure you've exercised all of your code enough for the jit to have run before you switch a particular VM into prod ( so you have some startup looping that runs for a couple of minutes - and hot failover).
The reasons for using Linux are:
Familiarity
Remote administration is still better, and also low impact - it will have a minimal effect on the other processes on the machine. Remember, these systems are often co-located at the exchange, so the links to the machines (from you/your support team) will probably be worse than those to your normal datacentres.
Tunability - the ability to set swappiness to 0, get the JVM to preallocate large pages, and other low level tricks is quite useful.
I'm sure you could get Windows to work acceptably, but there is no huge advantage to doing so - as others have said, any employees you poached would have to rediscover all their latency busting tricks rather than just run down a checklist.

Linux/UNIX are much more usable for concurrent remote users, making it easier to script around the systems, use standard tools like grep/sed/awk/perl/ruby/less on logs... ssh/scp... all that stuff's just there.
There are also technical issues, for example: to measure elapsed time on Windows you can choose between a set of functions based on the Windows clock tick, and the hardware-based QueryPerformanceCounter(). The former is increments each 10 to 16 milliseconds (note: some documentation implies more precision - e.g. the values from GetSystemTimeAsFileTime() measure to 100ns, but they report the same 100ns edge of the clock tick until it ticks again). The latter - QueryPerformanceCounter() - has show-stopping issues where different cores/cpus can report clocks-since-startup that differ by several seconds due to being warmed up at different times during system boot. MSDN documents this as a possible BIOS bug, but it's common. So, who wants to develop low-latency trading systems on a platform that can't be instrumented properly? (There are solutions, but you won't find any software ones sitting conveniently in boost or ACE).
Many Linux/UNIX variants have lots of easily tweakable parameters to trade off latency for a single event against average latency under load, time slice sizes, scheduling policies etc.. On open source Operating Systems, there's also the assurance that comes with being able to refer to the code when you think something should be faster than it is, and the knowledge that a (potentially huge) community of people have been and are doing so critically - with Windows it's obviously mainly going to be the people who're assigned to look at it.
On the FUD/reputation side - somewhat intangible but an important part of the reasons for OS selection - I think most programmers in the industry would just trust Linux/UNIX more to provide reliable scheduling and behaviour. Further, Linux/UNIX has a reputation for crashing less, though Windows is pretty reliable these days, and Linux has a much more volatile code base than Solaris or FreeBSD.

Reason is simple, 10-20 years ago when such systems emerged, "hardcore" multi-CPU servers were ONLY on some sort of UNIX. Windows NT was in kinder-garden these days. So the reason is "historical".
Modern systems might be developed on Windows, it's just a matter of taste these days.
PS: I am currencly working on one of such systems :-)

I partially agree with most of the answers above. Though what I have realized is the biggest reason to use C++ is becuase it is relatively faster with a very vast STL library.
Apart from it, linux/unix system is also used to boost performance. I know many low latency team which go to a extent of tweaking the linux kernel. Obviously this level of freedom is not provided by windows.
Other reasons like legacy systems, license cost, resources count as well but are lesser driving factors. As "rjw" mentioned, I have seen teams use Java as well with a modified JVM.

There are a variety of reasons, but the reason is not only historical. In fact, it seems as if more and more server-side financial applications run on *nix these days than ever before (including big names like the London Stock Exchange, who switched from a .NET platform). For client-side or desktop apps, it would be silly to target anything other than Windows, as that is the established platform. However, for server-side apps, most places that I have worked at deploy to *nix.

I second the opinions of historical and access to kernel manipulation.
Apart from those reasons I also believe that just like how they turn off garbage collection of .NET and the similar mechanism in Java when using these technologies in some low latency. They might avoid Windows because of the API's at high level which interact with low level os and then the kernel.
So the core is of course the kernel which can be interacted with using the low level os. The high level APIs are provided just to make the common users life easier. But in case of Low latency this turns out to be a fatty layer and fraction seconds loss around each operation. So a lucrative option for gaining few fraction seconds per call.
Apart from this another thing to consider is integration. Most of the servers, data centers, exchanges use UNIX not windows so using the clients of same family makes the integration and communication easier.
Then you have security issues (many people out there might not agree with this point though) hacking UNIX is not easy compared to hacking WINDOWS. I don't agree licensing must be the issue for banks because they shower money on every single piece of hardware and software and the people who customize them, so buying licenses will not be as bigger the issue when considered what they gain by purchasing.

Generating a Hardware-ID on Windows

What is the best way to generate a unique hardware ID on Microsoft Windows with C++ that is not easily spoofable (with for example changing the MAC Address)?

Windows stores a unique Guid per machine in the registry at:
HKEY_LOCAL_MACHINE\Software\Microsoft\Cryptography\MachineGuid

This used to be the CPU serial number but today there are many types of motherboards and this factor is not accurate. MAC address can be easily forged. That leaves us with the internal hard drive serial number. See also: http://www.codeproject.com/Articles/319181/Haephrati-Searching-for-a-reliable-Hardware-ID

There are a variety of "tricks", but the only real "physical answer" is "no, there is no solution".
A "machine" is nothing more than a passive bus with some hardware around.
Although each piece of iron can provide a somehow usable identifier, every piece of iron can be replaced by a user for whatever bad or good reason you can never be fully aware of (so if you base your functionality on this, you create problems to your user, and hence -as a consequence- to yourself every time an hardware have to be replaced / reinitialized / reconfigured etc. etc.).
Now, if your problem is identify a machine in a context where many machines have to inter-operate together, this is a role well played by MAC or IP addresses or Hostnames. But be prepared to the idea that they are not necessarily constant on long time-period (so avoid to hard-code them - instead "discover then" upon any of your start-up)
If your problem is -instead- identify a software instance or a licence, you have probably better to concentrate on another kind of solution: you sell licences to "users" (it is the user that has the money, not his computer!), not to their "machines" (that users must be free to change whenever they need/like without your permission, since you din't licence the hardware or the OS...), hence your problem is not to identify a machine, but a USER (consider that a same machine can be a host for many user and that a same user can work on a variety of machines ..., you cannot assume/impose a 1:1 relation, without running into some kind of problems sooner or later, when this idiom ifs found to no more fit).
The idea should be to register the users in a somewhat reachable site, give them keys you generate, and check that a same user/key pair is not con-temporarily used more than an agreed number of times under a given time period. When violations exceed, or keys becomes old, just block and wait for the user to renew.
As you can see, the answer mostly depends on the reason behind your question, more than from the question itself.

There are various IDs assigned to hardware that can be read and combined to form a machine key. For example, you could get the ID of the hard drive where the software is stored, the proc ID, etc. Some of these can be set more easily than others, but part of the strength is in combining multiple pieces together that are not necessarily strong enough by themselves.

Here is a program (also available as DLL) that can read and show your computer/hardware ID: http://www.soft.tahionic.com/download-hdd_id/index.html

Use Win32 System HDS APIs.
Don't read the registry, it has no sense at all.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js