Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
This is a bad title, but hopefully my description is clearer. I am managing a modeling and simulation application that is decades old. For the longest time we have been interested in writing some of the code to run on GPUs because we believe it will speed up the simulations (yes, we are very behind in the times). We finally have the opportunity to do this (i.e. money), and so now we want to make sure we understand the consequences of doing this, specifically to sustaining the code. The problem is that since many of our users do not have high end GPUs (at the moment), we would still need our code to support normal processing and GPU processing (i.e. I believe we will now have two sets of code performing very similar operations). Has anyone had to go through this and have any lesson learned and/or advice that they would like to share? If it helps, our current application is developed with C++ and we are looking at going with NVIDIA and writing in Cuda for the GPU.
This is similar to writing hand-crafted assembly version with vectorization or other assembly instructions, while maintaining a C/C++ version as well. There is a lot of experience with doing this in the long-term out there, and this advice is based on that. (My experience with doing this with GPU cases is both shorter term (a few years) and smaller (a few cases)).
You will want to write unit tests.
The unit tests use the CPU implementations (because I have yet to find a situation where they are not simpler) to test the GPU implementations.
The test runs a few simulations/models, and asserts that the results are identical if possible. These run nightly, and/or with every change to the code base as part of the acceptance suite.
This ensures that both code bases do not go "stale" as they are constantly exercised, and the two indepdendent implementations actually help with maintenance on the other.
Another approach is to run blended solutions. Sometimes running a mix of CPU and GPU is faster than one or the other, even if they are both solving the same problem.
When you have to switch technology (say, to a new GPU language, or to a distributed network of devices, or whatever new whiz-bang that shows up in the next 20 years), the "simpler" CPU implementation will be a life saver.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
My background is like this: embedded/C, then C++, then higher level OO languages (Java, Scala, Ruby, Groovy, etc.), and now I am doing a small project involving MSP430 microcontroller. Meanwhile, inspired by that, I am contemplating a number of potential pet embedded system projects (meshes and/or RTLS look appealing). So my question is focused primarily on MSP430 for now, though, as an aside, I'd love to have a broader picture, too, involving other microcontrollers.
I was a bit surprised finding out that, after so many years, I might need to go back to C, with its macros, naming conventions, and all. My brain used to be wired for C, but that was many, many years ago.
So what alternatives are available?
C++ feels much more agreeable to me, and, fortunately, seems doable: http://stonepile.fi/object-oriented-approach-to-embedded-programming-with-c/
So if I am to program C++, I just need to inline a lot, avoid virtual functions when possible, and I should be good, right? (at least, memory-wise; they did not benchmark for performance at the above link).
However, if it's so easy, why do people program C? I must be missing something.
The above link also seems to provide a wrapper library for pico]OS. Has anyone used picoOS on MSP430, how reliable is it, and how much resources does it take?
What are the pros and cons of Energia for a simple MS430 project? I tried it, it seems very intuitive and self-documenting, but does it result in as neat a code under the hood? For instance, does Energia initialize unused GPIO to the off state to save energy? Does it initialize unused interrupts? What is the overhead in terms of memory and speed? Etc.
Edit: As a long-time Eclipse person, I'd love to use CCS. I saw that Energia sketches can be imported to CCS. Does it mean that CCS have full support for Energia and can be used as an Energia IDE?
Has anyone used Java Grinder http://hackaday.com/2014/02/10/java-grinder-spits-out-dspic-and-msp430-assembly-code/ ? It seems appealing, but because it spits out an Assembly and not C/C++ code, it's a bit scary to commit to it: what if I am locked into it and it's not ready for the prime time? If it generated C code, I could have easily dropped it if it did not work.
I mentioned Java and my question was deleted, as it's self-evident that other than grinder-like syntactic sugar (not that I mind syntactic sugar!), Java can't run on MSP430. I guess I'll ask another question re WHERE Java can run. This has already grown too long.
What other languages/environments are out there, that fill the niche between low- and hig-level languages?
you seem to have several questions here so I shall go through in the order you numbered them.
Most micros will indeed run C++ (assuming the manufacturer or an open source project provides a compiler back-end), however you have to be wary of a number of drawbacks. C++ Is less deterministic, as in, it provides a significantly higher level of abstraction, which one likely does not want an a resource constrained embedded system, and by and large it is not needed either as embedded systems are rarely powerful enough to usefully run the enormously complex algorithms that warrant a high level language like C++. It is also likely to cause a wide range of hard to track bugs, given the difficulty of debugging code from an embedded system having bugs which are simple and easy to trace are very much nicer. However very importantly, the C++ standard libraries are enormous, they will use excessive ram and very likely waste much of your limited memory space. Thus, even if you do use C++, you wont be able to use any of the techniques that make it powerful.
Simply, I have not used it, however like any RTOS, it is useful if you want a slightly higher level interface, however for a micro the tiny size of the MSP430 it seems overkill, I cannot imagine you doing anything on there that warrants an ROTS, if you need multitasking it would be better to provide simple cooperative tasking yourself.
Unfortunately I have not used that platform either, however given it is based on wiring, my guess it that it does not provide high levels of hardware specific optimization, if you want that I recommend using it for the bulk of your code but make calls into lower level libraries when needed. Beyond that however, it does provide a lovely, self documenting interface, I strongly encourage you to try it. It will also make your code many times easier port if you switch to another micro later (many systems from many companies provide wiring bindings).
You really answer this yourself here, it could be very powerful but is still very immature, I would avoid it purely because of this lock-in until it becomes more mature, then it is worth re-assessing.
Java works nicely on more powerful ARM chips, that is the only place I have seen it in wide use, and implemented fairly efficiently in a micro (ARM provides hardware assistance specifically for Java). Other than this Java is a poor fit for the micro world, at one point it appeared it might go somewhere but this was largely unrealized, for now C likes are the way to go for smaller micros.
Unfortunately there is not really a huge amount of choice other than C. My best recommendation is using higher level libs like wiring. That gives you slightly nicer interfaces without killing efficiency, otherwise there is little point in using a tiny micro if you need high levels of abstraction.
In summary, C does a fairly good job here, I do not think there has been any motivation or effort to make a good replacement. And frankly I largely feel this way too, C never became a bad language, it is still well suited for small systems for the same reasons as it was before. It provides power, efficiency and predictability.
I hope this helps somewhat, if you have any queries please comment me and I will see what I can do to help.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Hi i'm wanting to program microcontrollers but i don't really know how to do it and what i need to do it. I have no idea where to look in the slightest for the information i need. I've been coding with python for around 11 months now and i know how to use the language well. I've used c++ in the past and i know quite a large amount of that language too.
When programming microcontrollers can the microcontroller be programmed with any coding language or do microcontrollers only allow certain languages to be used with it?
I have endless amount of questions but i'm not going to ask them all, if someone could please point me in the right direction i would be very grateful. Thanks.
The problem is with the word "programmed". Microprocessors (you mean CPUs, right?) typically execute machine codes which are specific to their hardware platform. Machine codes are just bytes read from the memory and interpreted in a special way. This is the lowest possible level at which processors may be programmed (and some day the were programmed that way).
Now since programming processors this way is very inconvenient, the so-called "assembly languages" have been invented. Basically, they just define symbolic representations for machine codes and sets of rules of their interpretation. Then a special program, called translator, takes a set of text files containing the definition of a program written in an assembly language and produces something which contains machine codes and might be executed by the target processor. (The definition of this "something" is hard, and let's not digress.)
Now there's another level higher up — languages like C (and, to a lesser extent, C++) which try to abstract away the details of a particular hardware platform and allow to concentrate on algorithms and data formats rather than dealing with a particular processor. Obviously, this moves the knowledge of a particular H/W platform to the compiler — a program which takes the text of your program written in a high-level language and produces something runnable by a target processor.
Now there's another level higher up which includes languages which almost completely abstract you away from any particularities of a H/W platform. JavaScript which runs in your browser when you're reading Stack Overflow is a good example — the programs in it are still executed by the processor of the device running your browser but there are many complicated layers of code between those JS scripts and the processor.
By now you should see that there's no definitive answer to your question. If you would like to dabble with low-level code for the CPU on your bedroom PC then google for "x86 assembler", "intel assembler" etc. This is a good start. If you want to program some other processor, the search query to use would be similar. If, instead, you want to program some specialized processor like AVR then start with that product manuals as they usually come with specialized tools.
if you are interested in getting handy with basic practical like you said "like to do something basic to start like making an LED flash etc."
choose basic micro controller, say from 8051 family we will take 89c51(NXP/Atmel depend on availability). Go through the user manual first, it will give you brief idea(overall architecture) about it.
regarding programming you will find basic code for LED flash in manual only(likely).
if you are using NXP micro-controller then Flashmagic software is freely available on internet you can download it.
In you IDE(like keil) do not forget to create ".hex " file after you are done with your coding.
now open Flashmagic and load your .hex file in it and burn your micro controller for particular code you wrote.
Good Luck!!
Closed. This question is off-topic. It is not currently accepting answers.
Closed 9 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
How do emulators work? When I see NES/SNES or C64 emulators, it astounds me.
Do you have to emulate the processor of those machines by interpreting its particular assembly instructions? What else goes into it? How are they typically designed?
Can you give any advice for someone interested in writing an emulator (particularly a game system)?
Emulation is a multi-faceted area. Here are the basic ideas and functional components. I'm going to break it into pieces and then fill in the details via edits. Many of the things I'm going to describe will require knowledge of the inner workings of processors -- assembly knowledge is necessary. If I'm a bit too vague on certain things, please ask questions so I can continue to improve this answer.
Basic idea:
Emulation works by handling the behavior of the processor and the individual components. You build each individual piece of the system and then connect the pieces much like wires do in hardware.
Processor emulation:
There are three ways of handling processor emulation:
Interpretation
Dynamic recompilation
Static recompilation
With all of these paths, you have the same overall goal: execute a piece of code to modify processor state and interact with 'hardware'. Processor state is a conglomeration of the processor registers, interrupt handlers, etc for a given processor target. For the 6502, you'd have a number of 8-bit integers representing registers: A, X, Y, P, and S; you'd also have a 16-bit PC register.
With interpretation, you start at the IP (instruction pointer -- also called PC, program counter) and read the instruction from memory. Your code parses this instruction and uses this information to alter processor state as specified by your processor. The core problem with interpretation is that it's very slow; each time you handle a given instruction, you have to decode it and perform the requisite operation.
With dynamic recompilation, you iterate over the code much like interpretation, but instead of just executing opcodes, you build up a list of operations. Once you reach a branch instruction, you compile this list of operations to machine code for your host platform, then you cache this compiled code and execute it. Then when you hit a given instruction group again, you only have to execute the code from the cache. (BTW, most people don't actually make a list of instructions but compile them to machine code on the fly -- this makes it more difficult to optimize, but that's out of the scope of this answer, unless enough people are interested)
With static recompilation, you do the same as in dynamic recompilation, but you follow branches. You end up building a chunk of code that represents all of the code in the program, which can then be executed with no further interference. This would be a great mechanism if it weren't for the following problems:
Code that isn't in the program to begin with (e.g. compressed, encrypted, generated/modified at runtime, etc) won't be recompiled, so it won't run
It's been proven that finding all the code in a given binary is equivalent to the Halting problem
These combine to make static recompilation completely infeasible in 99% of cases. For more information, Michael Steil has done some great research into static recompilation -- the best I've seen.
The other side to processor emulation is the way in which you interact with hardware. This really has two sides:
Processor timing
Interrupt handling
Processor timing:
Certain platforms -- especially older consoles like the NES, SNES, etc -- require your emulator to have strict timing to be completely compatible. With the NES, you have the PPU (pixel processing unit) which requires that the CPU put pixels into its memory at precise moments. If you use interpretation, you can easily count cycles and emulate proper timing; with dynamic/static recompilation, things are a /lot/ more complex.
Interrupt handling:
Interrupts are the primary mechanism that the CPU communicates with hardware. Generally, your hardware components will tell the CPU what interrupts it cares about. This is pretty straightforward -- when your code throws a given interrupt, you look at the interrupt handler table and call the proper callback.
Hardware emulation:
There are two sides to emulating a given hardware device:
Emulating the functionality of the device
Emulating the actual device interfaces
Take the case of a hard-drive. The functionality is emulated by creating the backing storage, read/write/format routines, etc. This part is generally very straightforward.
The actual interface of the device is a bit more complex. This is generally some combination of memory mapped registers (e.g. parts of memory that the device watches for changes to do signaling) and interrupts. For a hard-drive, you may have a memory mapped area where you place read commands, writes, etc, then read this data back.
I'd go into more detail, but there are a million ways you can go with it. If you have any specific questions here, feel free to ask and I'll add the info.
Resources:
I think I've given a pretty good intro here, but there are a ton of additional areas. I'm more than happy to help with any questions; I've been very vague in most of this simply due to the immense complexity.
Obligatory Wikipedia links:
Emulator
Dynamic recompilation
General emulation resources:
Zophar -- This is where I got my start with emulation, first downloading emulators and eventually plundering their immense archives of documentation. This is the absolute best resource you can possibly have.
NGEmu -- Not many direct resources, but their forums are unbeatable.
RomHacking.net -- The documents section contains resources regarding machine architecture for popular consoles
Emulator projects to reference:
IronBabel -- This is an emulation platform for .NET, written in Nemerle and recompiles code to C# on the fly. Disclaimer: This is my project, so pardon the shameless plug.
BSnes -- An awesome SNES emulator with the goal of cycle-perfect accuracy.
MAME -- The arcade emulator. Great reference.
6502asm.com -- This is a JavaScript 6502 emulator with a cool little forum.
dynarec'd 6502asm -- This is a little hack I did over a day or two. I took the existing emulator from 6502asm.com and changed it to dynamically recompile the code to JavaScript for massive speed increases.
Processor recompilation references:
The research into static recompilation done by Michael Steil (referenced above) culminated in this paper and you can find source and such here.
Addendum:
It's been well over a year since this answer was submitted and with all the attention it's been getting, I figured it's time to update some things.
Perhaps the most exciting thing in emulation right now is libcpu, started by the aforementioned Michael Steil. It's a library intended to support a large number of CPU cores, which use LLVM for recompilation (static and dynamic!). It's got huge potential, and I think it'll do great things for emulation.
emu-docs has also been brought to my attention, which houses a great repository of system documentation, which is very useful for emulation purposes. I haven't spent much time there, but it looks like they have a lot of great resources.
I'm glad this post has been helpful, and I'm hoping I can get off my arse and finish up my book on the subject by the end of the year/early next year.
A guy named Victor Moya del Barrio wrote his thesis on this topic. A lot of good information on 152 pages. You can download the PDF here.
If you don't want to register with scribd, you can google for the PDF title, "Study of the techniques for emulation programming". There are a couple of different sources for the PDF.
Emulation may seem daunting but is actually quite easier than simulating.
Any processor typically has a well-written specification that describes states, interactions, etc.
If you did not care about performance at all, then you could easily emulate most older processors using very elegant object oriented programs. For example, an X86 processor would need something to maintain the state of registers (easy), something to maintain the state of memory (easy), and something that would take each incoming command and apply it to the current state of the machine. If you really wanted accuracy, you would also emulate memory translations, caching, etc., but that is doable.
In fact, many microchip and CPU manufacturers test programs against an emulator of the chip and then against the chip itself, which helps them find out if there are issues in the specifications of the chip, or in the actual implementation of the chip in hardware. For example, it is possible to write a chip specification that would result in deadlocks, and when a deadline occurs in the hardware it's important to see if it could be reproduced in the specification since that indicates a greater problem than something in the chip implementation.
Of course, emulators for video games usually care about performance so they don't use naive implementations, and they also include code that interfaces with the host system's OS, for example to use drawing and sound.
Considering the very slow performance of old video games (NES/SNES, etc.), emulation is quite easy on modern systems. In fact, it's even more amazing that you could just download a set of every SNES game ever or any Atari 2600 game ever, considering that when these systems were popular having free access to every cartridge would have been a dream come true.
I know that this question is a bit old, but I would like to add something to the discussion. Most of the answers here center around emulators interpreting the machine instructions of the systems they emulate.
However, there is a very well-known exception to this called "UltraHLE" (WIKIpedia article). UltraHLE, one of the most famous emulators ever created, emulated commercial Nintendo 64 games (with decent performance on home computers) at a time when it was widely considered impossible to do so. As a matter of fact, Nintendo was still producing new titles for the Nintendo 64 when UltraHLE was created!
For the first time, I saw articles about emulators in print magazines where before, I had only seen them discussed on the web.
The concept of UltraHLE was to make possible the impossible by emulating C library calls instead of machine level calls.
Something worth taking a look at is Imran Nazar's attempt at writing a Gameboy emulator in JavaScript.
Having created my own emulator of the BBC Microcomputer of the 80s (type VBeeb into Google), there are a number of things to know.
You're not emulating the real thing as such, that would be a replica. Instead, you're emulating State. A good example is a calculator, the real thing has buttons, screen, case etc. But to emulate a calculator you only need to emulate whether buttons are up or down, which segments of LCD are on, etc. Basically, a set of numbers representing all the possible combinations of things that can change in a calculator.
You only need the interface of the emulator to appear and behave like the real thing. The more convincing this is the closer the emulation is. What goes on behind the scenes can be anything you like. But, for ease of writing an emulator, there is a mental mapping that happens between the real system, i.e. chips, displays, keyboards, circuit boards, and the abstract computer code.
To emulate a computer system, it's easiest to break it up into smaller chunks and emulate those chunks individually. Then string the whole lot together for the finished product. Much like a set of black boxes with inputs and outputs, which lends itself beautifully to object oriented programming. You can further subdivide these chunks to make life easier.
Practically speaking, you're generally looking to write for speed and fidelity of emulation. This is because software on the target system will (may) run more slowly than the original hardware on the source system. That may constrain the choice of programming language, compilers, target system etc.
Further to that you have to circumscribe what you're prepared to emulate, for example its not necessary to emulate the voltage state of transistors in a microprocessor, but its probably necessary to emulate the state of the register set of the microprocessor.
Generally speaking the smaller the level of detail of emulation, the more fidelity you'll get to the original system.
Finally, information for older systems may be incomplete or non-existent. So getting hold of original equipment is essential, or at least prising apart another good emulator that someone else has written!
Yes, you have to interpret the whole binary machine code mess "by hand". Not only that, most of the time you also have to simulate some exotic hardware that doesn't have an equivalent on the target machine.
The simple approach is to interpret the instructions one-by-one. That works well, but it's slow. A faster approach is recompilation - translating the source machine code to target machine code. This is more complicated, as most instructions will not map one-to-one. Instead you will have to make elaborate work-arounds that involve additional code. But in the end it's much faster. Most modern emulators do this.
When you develop an emulator you are interpreting the processor assembly that the system is working on (Z80, 8080, PS CPU, etc.).
You also need to emulate all peripherals that the system has (video output, controller).
You should start writing emulators for the simpe systems like the good old Game Boy (that use a Z80 processor, am I not not mistaking) OR for C64.
Emulator are very hard to create since there are many hacks (as in unusual
effects), timing issues, etc that you need to simulate.
For an example of this, see http://queue.acm.org/detail.cfm?id=1755886.
That will also show you why you ‘need’ a multi-GHz CPU for emulating a 1MHz one.
Also check out Darek Mihocka's Emulators.com for great advice on instruction-level optimization for JITs, and many other goodies on building efficient emulators.
I've never done anything so fancy as to emulate a game console but I did take a course once where the assignment was to write an emulator for the machine described in Andrew Tanenbaums Structured Computer Organization. That was fun an gave me a lot of aha moments. You might want to pick that book up before diving in to writing a real emulator.
Advice on emulating a real system or your own thing?
I can say that emulators work by emulating the ENTIRE hardware. Maybe not down to the circuit (as moving bits around like the HW would do. Moving the byte is the end result so copying the byte is fine). Emulator are very hard to create since there are many hacks (as in unusual effects), timing issues, etc that you need to simulate. If one (input) piece is wrong the entire system can do down or at best have a bug/glitch.
The Shared Source Device Emulator contains buildable source code to a PocketPC/Smartphone emulator (Requires Visual Studio, runs on Windows). I worked on V1 and V2 of the binary release.
It tackles many emulation issues:
- efficient address translation from guest virtual to guest physical to host virtual
- JIT compilation of guest code
- simulation of peripheral devices such as network adapters, touchscreen and audio
- UI integration, for host keyboard and mouse
- save/restore of state, for simulation of resume from low-power mode
To add the answer provided by #Cody Brocious
In the context of virtualization where you are emulating a new system(CPU , I/O etc ) to a virtual machine we can see the following categories of emulators.
Interpretation: bochs is an example of interpreter , it is a x86 PC emulator,it takes each instruction from guest system translates it in another set of instruction( of the host ISA) to produce the intended effect.Yes it is very slow , it doesn't cache anything so every instruction goes through the same cycle.
Dynamic emalator: Qemu is a dynamic emulator. It does on the fly translation of guest instruction also caches results.The best part is that executes as many instructions as possible directly on the host system so that emulation is faster. Also as mentioned by Cody, it divides the code into blocks ( 1 single flow of execution).
Static emulator: As far I know there are no static emulator that can be helpful in virtualization.
How I would start emulation.
1.Get books based around low level programming, you'll need it for the "pretend" operating system of the Nintendo...game boy...
2.Get books on emulation specifically, and maybe os development. (you won't be making an os, but the closest to it.
3.look at some open source emulators, especially ones of the system you want to make an emulator for.
4.copy snippets of the more complex code into your IDE/compliler. This will save you writing out long code. This is what I do for os development, use a district of linux
I wrote an article about emulating the Chip-8 system in JavaScript.
It's a great place to start as the system isn't very complicated, but you still learn how opcodes, the stack, registers, etc work.
I will be writing a longer guide soon for the NES.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I'm trying to pick a perforamnce analyzer to use. I'm a beginner developer and not sure what to look for in a performance analyzer. What are the most important features?
If you use valgrind, I can highly recommend KCacheGrind to visualize performance bottlenecks.
I would like to have following features/output information shown in a profiler.
1.) Should be able to show Total Clock cycles consumed and also for each function.
2.) If not one, should tell the total time consumed and time spent in each function.
3.) All it should be able to tell how many times a function is called.
4.) It would be nice to know memory reads, memory writes, cache misses, cache hits.
5.) Code memory for each function
6.) Data memory used: Global constants, Stack, Heap usage.
=AD
The two classical answers (assuming you are in *nix world) are valgrind and gprof. You want something that will let you (at least) check how much time you are spending inside each procedure or function.
Stability - be able to profile your process for long durations without crashing or running out of memory. its surprising how many commercial profilers fail that.
goldenmean has it right, I would add that line execution counts are sometimes handy as well.
My preference is for sampling profilers rather than instrumented profilers. The profiler should be able to map sample data back to the source code, ideally in a GUI. The two best examples of this that I am aware of are:
Mac OS X: Shark developer.apple.com
Linux: Zoom www.rotateright.com
All you need is a debugger or IDE that has a "pause" button. It is not only the simplest and cheapest tool, but in my experience, the best. This is a complete explanation why. Note the 2nd-to-last comment.
EDIT because I thought of a better answer:
As an aside, I studied A.I. in the 70s, and an idea very much in the air was automatic programming, and a number of people tried to accomplish it.
(I took my crack at it.)
The idea is to try to automate the process of having a knowledge structure of a domain, plus desired functional requirements, to generate (and debug) a program that would accomplish those requirements.
It would be a tour-de-force in automated reasoning about the domain of programming.
There were some tantalizing demonstrations, but in a practical sense the field didn't go very far.
Nevertheless, it did contribute a lot of ideas to programming languages, like contracts and logical verification techniques.
To build an ideal profiler, for the purpose of optimizing programs, it would get a sample of the program's state every nanosecond.
Either on-the-fly or later (ideal, remember?) it would carefully examine each sample, to see if, knowing the reasons for which the program is executing, that particular nanosecond of work was actually necessary or could be somehow eliminated.
That would be billions of samples and a lot of reasoning, but course there would be tremendous duplication, because any wastage costing, say, 10% of time, would be evident on 10% of samples.
That wastage could be recognized on a lot fewer than a billion samples.
If fact, 100 samples or even less could spot it, provided they were randomly chosen in time, or at least in the time interval the user cares about.
This is assuming the purpose is to find the wastage so we can get rid of it, as opposed to measuring it with much precision.
Why would it be helpful to apply all that reasoning power to each sample?
Well, if the programs were little, and it were only looking for things like O(n^2) code, it shouldn't be too hard.
But suppose the state of the program consisted of a procedure stack 20-30 levels deep, possibly with some recursive function calls appearing more than once, possibly with some of the functions being calls to external processors to do IO, possibly with the program's action being driven by some data in a table.
Then, to decide if the particular sample is wasteful requires potentially examining all or at least some of that state information, and using reasoning power to see if it is truly necessary in accomplishing the functional requirements.
What the profiler is looking for is nanoseconds being spent for dubious reasons.
To see the reason it is being spent requires examining every function call site on the stack, and the code surrounding it, or at least some of those sites.
The necessity of the nanosecond being spent requires the logical AND of the necessity of every statement being executed on the stack.
It only takes one such function call site to have a dubious justification for the entire sample to have a dubious justification.
So, if the entire purpose is to find nanoseconds being spent for dubious reasons, the more complicated the samples are, the better,
and the more reasoning power brought to bear on each sample, the better.
(That's why bigger programs have more room for speedup - they have deeper stacks, hence more calls, hence more likelihood of poorly justified calls.)
OK, that's in the future.
However, since we don't need a huge number of samples (10 or 20 is very useful), and since we already have highly intelligent automatic programmers (powered by pizza and soda),
we can do this now.
Compare that to the tools we call profilers today.
The very best of them take stack samples, but what's their output?
Measurements. "Hot paths". Rat's nest graphs. Eye-candy.
From those, even an artificially intelligent programmer would easily miss large inefficiencies, except for the ones that are exposed by those outputs.
After you fix the ones you do find, the ones you don't find are the ones that make all the difference.
One of the things one learns studying A.I. is, don't expect to be able to program a computer to do something if a human, in principle, can't also do it.