I have some code written in c++ that simulates a prefetcher for a CPU. In the code I have some definitions that look like this
#define x 5
...
for(int i = 0; i < x; i++)
...
At the end of the simulation the simulator outputs the average access time which is a measure of how good the prefetcher did. The performance of the prefetcher depends on x and some other similar definitions.
I would like to have a program that changes x, recompiles the new code, runs it, looks at the value, and based on the change in simulated access time repeats the process.
Does anyone know of an easy way to do this that isn't manually changing values?
EDIT: I think I need to clarify that I do not want to have to program a learning algorithm since I have never done it and probably couldn't do it nearly as well as others.
I guess your current program looks something like this
int main() {
#define x 5
<do the simulation>
cout << "x=" << x << " time=" << aat << endl;
Instead you might create a simulate function that takes x as an explicit parameter and returns the average access time ...
double simulate( int x ) {
<do simulation>
}
And call it from main
int main() {
x= initial x value
While ( necessary ) {
Double aat = simulate(x)
Cout << "x=" << x << " time=" << aat << endl;
x = <updated x according to some strategy>
This way your machine learning to learn x happens in main.
But ... If you're writing a program to simulate CPU prefetching I can't help thinking that you know all this perfectly well already. I don't really understand why you were using the compiler to change a simulation parameter in the first place.
Related
I want to use this question to improve a bit in my general understanding of how computer works, since I'll probably never have the chance to study in a profound and deep manner. Sorry in advance if the question is silly and not useful in general, but I prefer to learn in this way.
I am learning c++, I found online a code that implements the Newton-Raphson method for finding the root of a function. The code is pretty simple, as you can see from it, at the beginning it asks for the tolerance required, and if I give a "decent" number it works fine. If instead, when it asks for the tolerance I write something like 1e-600, the program break down immediately and the output is Enter starting value x: Failed to converge after 100 iterations .
The output of failed convergence should be a consequence of running the loop for more than 100 iterations, but this isn't the case since the loop doesn't even start. It looks like the program knows already it won't reach that level of tolerance.
Why does this happen? How can the program write that output even if it didn't try the loop for 100 times?
Edit: It seems that everything meaningless (too small numbers, words) I write when it asks for tolerance produces a pnew=0.25 and then the code runs 100 times and fails.
The code is the following:
#include <iostream>
#include <cmath>
using namespace std;
#define N 100 // Maximum number of iterations
int main() {
double p, pnew;
double f, dfdx;
double tol;
int i;
cout << "Enter tolerance: ";
cin >> tol;
cout << "Enter starting value x: ";
cin >> pnew;
// Main Loop
for(i=0; i < N; i++){
p = pnew;
//Evaluate the function and its derivative
f = 4*p - cos(p);
dfdx= 4 + sin(p);
// The Newton-Raphson step
pnew = p - f/dfdx;
// Check for convergence and quit if done
if(abs(p-pnew) < tol){
cout << "Root is " << pnew << " to within " << tol << "\n";
return 0;
}
}
// We reach this point only if the iteration failed to converge
cerr << "Failed to converge after " << N << " iterations.\n";
return 1;
}
1e-600 is not representable by most implementations of double. std::cin will fail to convert your input to double and fall into a failed state. This means that, unless you clear the error state, any future std::cin also automatically fails without waiting for user input.
From cppreference (since c++17) :
If extraction fails, zero is written to value and failbit is set. If extraction results in the value too large or too small to fit in value, std::numeric_limits<T>::max() or std::numeric_limits<T>::min() is written and failbit flag is set.
As mentioned, 1e-600 is not a valid double value. However, there's more to it than being outside of the range. What's likely happening is that 1 is scanned into tol, and then some portion of e-600 is being scanned into pnew, and that's why it ends immediately, instead of asking for input for pnew.
Like François said, you cannot exeed 2^64 when you work on an 64bit machine (with corresponding OS) and 2^32 on a 32bit machine, you can use SSE which are 4 32 bytes data used for floating point representation. In your program the function fails at every iteration and skips your test with "if" and so never returns before ending the loop.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
P.S. I'm fairly new to programming.
Having used Javascript, I've been wanting to learn C++ since both of their syntax looks similar, and I'm quite intrigued about what code actually does in the hardware.
However, we can't actually look at hardware activity easily right?
I can only take a compiler's word for it that my array has been properly allocated only 5 indexes, but I can't see that visually in my ram or something else easily right?
How do I verify stuff like this at least a little better?
With JavaScript, it didn't bother me really, because I was mainly just writing much more non-countable things (or at least more abstract) of what I want to happen, so how can I feel more confident about C++ claiming that it actually gives me control over these tiny things?
I think what you're looking for is a debugger. Many IDEs provide one, and the one that comes with Visual Studios allows you to view memory, registers and CPU activity. If you want to do it manually, you can always rely on inline assembly, or by comparing the addresses of objects allocated on the stack or heap.
However, we can't actually look at hardware activity easily right?
The debugger is worth learning, and simple programs are easy to debug. Remember that on most desktops, your debug code is running in virtual memory ... meaning that the addresses are probably not hardware addresses, they are virtual, and might be mapped to any physical ram.
I can only take a compiler's word for it that my array has been properly
allocated only 5 indexes, but I can't see that visually in my ram or
something else easily right?
I'm not sure what you think 'properly' means. How could it be improper as long as it behaves as you told it? I mean no offense, but how would you or I recognize if something is improper? alignment? padding? In my experience, the padding 'inserted' by the compiler (in any struct), is visible in the debugger displays.
Any memory you 'inspect' with the debugger is from virtual memory. The addresses shown are the virtual ram addresses which your code is using.
With JavaScript, it didn't bother me really, because
I was mainly just writing much more non-countable things (or at least
more abstract) of what I want to happen, so how can I feel more
confident about C++ claiming that it actually gives me control over
these tiny things?
With practice.
I have many years experience with C++ on embedded systems, mostly vxWorks. Embedded systems often do have virtual memory. Memory mapped i/o are typically accessed through special hardware, the OS is 'informed' of their special nature, and the hardware usually has different timing than 'regular' virtual ram.
How do I verify stuff like this at least a little better?
Stuff like what? I am not sure what you think you could see in Java.
What 'stuff' do you think you will not be able to see in C++?
The debugger is worth learning, and simple programs are easy to debug.
AND you are always allowed to 'throw in' some diagnostic cout's. In this example I have implemented overloaded show functions. (for development or diagnostics only ... probably you do not want to ship your code with show()'s enabled)
Example:
DTB::SOps_t sops; // string operations
// digiComma() inserts comma's for readability
void show(uint64_t ui64)
{
cout << "\n " << setw(24) << sops.digiComma(ui64) << flush;
}
void show (uint64_t sum1, uint64_t sum2, uint64_t digit, uint64_t digit2)
{
cout << " "
<< setw(8) << sops.digiComma(sum1)
<< setw(8) << sops.digiComma(sum2)
<< setw(8) << sops.digiComma(digit)
<< setw(8) << sops.digiComma(digit2)
<< flush;
}
show() Usage examples on a sample:
int exec()
{
cout << "\n Note that llui and uint64_t "
<< "\n are the same size (on my Linux/g++ system) - 8 bytes"
<< "\n sizeof( llui ): " << sizeof(llui)
<< "\n sizeof(uin64_t): " << sizeof(uint64_t) << endl;
uint64_t checksum = 4024007185756128;
show(checksum);
uint64_t sum1 = 0;
uint64_t digit1 = checksum % 10ULL;
uint64_t sum2 = 0;
while (checksum > 0)
{
sum2 = sum2 + digit1;
uint64_t digit2 = ((checksum - digit1) / 10ULL) % 10ULL;
show(sum1, sum2, digit1, digit2);
checksum = (checksum - digit1 - (digit2 * 10ULL)) / 100ULL;
digit1 = checksum % 10ULL;
digit2 = digit2 * 2ULL;
sum1 = sum1 + digit2;
show(checksum);
if (checksum < 10) { checksum = 0; }
}
if ((sum1 + sum2) % 10ULL == 0ULL) cout << "\n INVALID 1";
else cout << "\n INVALID 2";
return 0;
}
This output might be called 'trace' results of the executing code, about 15 lines.
When the above helps a little, you might next try running the debugger on this code. Just add a break point at the show routines. Then run, and inspect the results, continue to next breakpoint, inspect the results, repeat.
I've written a simple code to practice some stuff, but it's not running as it's supposed to.
I'm programming on my iPhone just for fun and l'm using an app called c/c++ offline compiler which seems to work really well.
Anyway, I wrote a program to display numbers, and if the number is less than 5 digits, in each empty space display a star. Then next to that, display the memory address.
There are two questions I have. First, why are the stars not displaying when I run this. Second, each time run it, the memory addresses are different. Is this because of the compiler, or because of how the iPhone's memory works?
Source Code:
//c plus plus program 1
#include <iostream>
using namespace std;
int main (void){
for(int i=0; i<150;i+=30){
cout.width(5);
cout.fill('*');
cout<<i<< "="<<&i <<endl;
}
return 0;
}
It is normal that address of i is changing each time you run your program. As I know it is because of how system works with memory. Why did you think it will place your program in the same part every time? :-)
I tried you code in http://cpp.sh only and stars were shown:
****0=0xffcde9dc
***30=0xffcde9dc
***60=0xffcde9dc
***90=0xffcde9dc
**120=0xffcde9dc
Note, that all after second << was not took into consideration when output width was determined. So first of all to investigate the problem I would try something like
int main (void) {
cout.width(5);
cout.fill('*');
cout << 1 << endl;
cout << 2 << endl;
}
to understand if it compiler problem or not.
So this is really a mystery for me. I am Measuring time of my own sine function and comparing it to the standard sin(). There is a strange behavior though. When I use the functions just standalone like:
sin(something);
I get an average time like (measuring 1000000 calls in 10 rounds) 3.1276 ms for the standard sine function and 51.5589 ms for my implementation.
But when I use something like this:
float result = sin(something);
I get suddenly 76.5621 ms for standard sin() and 49.3675 ms for my one. I understand that it takes some time to assign the value to a variable but why doesn't it add time to my sine too? It's more or less the same while the standard one increases rapidly.
EDIT:
My code for measuring:
ofstream file("result.txt",ios::trunc);
file << "Measured " << repeat << " rounds with " << callNum << " calls in each \n";
for (int i=0;i<repeat;i++)
{
auto start = chrono::steady_clock::now();
//call the function here dattebayo!
for (int o=0; o<callNum;o++)
{
double g = sin((double)o);
}
auto end = chrono::steady_clock::now();
auto difTime = end-start;
double timeD = chrono::duration <double,milli> (difTime).count();
file << i << ": " << timeD << " ms\n";
sum += timeD;
}
In any modern compiler, the compiler will know functions such as sin, cos, printf("%s\n", str) and many more, and either translate to simpler form [constant if the value is constant, printf("%s\n", str); becomes puts(str);] or completely remove [if known that the function itself does not have "side-effects", in other words, it JUST calculates the returned value, and has no effect on the system in other ways].
This often happens even for standard function even when the compiler is in low or even no optimisation modes.
You need to make sure that the result of your function is REALLY used for it to be called in optimised mode. Add the returned values together in the loop...
I am working on some grid generation code, during which I really want to see where I am, so I download a piece of progress bar code from internet and then inserted it into my code, something like:
std::string bar;
for(int i = 0; i < 50; i++)
{
if( i < (percent/2))
{
bar.replace(i,1,"=");
}
else if( i == (percent/2))
{
bar.replace(i,1,">");
}
else
{
bar.replace(i,1," ");
}
}
std::cout<< "\r" "[" << bar << "] ";
std::cout.width( 3 );
std::cout<< percent << "% "
<< " ieration: " << iterationCycle << std::flush;
This is very straightforward. However, it GREATLY slows down the whole process, note percent=iterI/nIter.
I am really get annoyed with this, I am wondering if there is any smarter and more efficient way to print a progress bar to the screen.
Thanks a million.
Firstly you could consider only updating it on every 100 or 1000 iterations. Secondly, I don't think the division is the bottleneck, but much rather the string operations and the outputting itself.
I guess the only significant improvement would be to just output less often.
Oh and just for good measure - an efficient way to only execute the code every, say, 1024 iterations, would be not to see if 1024 is a divisor using the modulo operations, but rather using bitwise calls. Something along the lines of
if (iterationCycle & 1024) {
would work. You'd be computing the bitwise AND of iterationCycle and 1024, only returning positive for every time the bit on the 10th position would be a 1. These kind of operations are done extremely fast, as your CPU has specific hardware for them.
You might be overthinking this. I would just output a single character every however-many cycles of your main application code. Run some tests to see how many (hundreds? millions?), but you shouldn't print more than say once a second. Then just do:
std::fputc('*', stdout);
std::fflush(stdout);
You should really check "efficiency", but what would work almost the same ist boost.progress:
#include <boost/progress.hpp>
...
boost::progress_display pd(50);
for (int i=0; i<=60; i++) {
++pd;
}
and as Joost already answered, output less often