I ran this code for a preliminary benchmark, which compares the time taken to generate a certain number of random states using the scale a double random value and using a Bernoulli distribution. The code is below:
int main()
{
std::random_device s;
std::mt19937 engine(s());
std::bernoulli_distribution bernp50(0.5000000000000000);
std::uniform_real_distribution<double> d;
long int limit = 10000000000; //10^10
int counter[2] = {0};
{
Timer bernstate("Bern Two States");
for(int i = limit; i>0; i--)
{
int tmp = bernp50(engine);
//Implicit bool to int conversion
counter[tmp]++;
}
}
cout << " Bern Two States - 0,1 \n\nCounter:\n" << "0: " <<
counter[0] <<"\n1: " << counter[1]<<"\n"
<< "Counter additions: " << counter[0] + counter[1] << "\n\n"
<< "\n0: " << (double)((double)counter[0]*100/(double)limit) << "%"
<< "\n1: " << (double)((double)counter[1]*100/(double)limit) << "%"
<< "\n\n" << endl;
counter[0]=0;
counter[1]=0;
{
Timer double_comp("Two State - Double");
for(int i = limit; i>0; i--)
{
double temp = d(engine)*2;
if(temp < 1)
{
counter[0]++;
}
else
{
counter[1]++;
}
}
}
cout << " Double Two States - 0,1 \n\nCounter:\n" << "0: " <<
counter[0] <<"\n1: " << counter[1]<<"\n"
<< "Counter additions: " << counter[0] + counter[1] << "\n\n"
<< "\n0: " << (double)((double)counter[0]*100/(double)limit) << "%"
<< "\n1: " << (double)((double)counter[1]*100/(double)limit) << "%"
<< "\n\n" << endl;
} //End of Main()
For limit = 10^10 I get the result, where the counter additions is greater than the limit variable. Same for 10^11:
Timer Object: Bern Two States Timer Object Destroyed: Bern Two States Duration Elapsed: 85.9409 s
Bern Two States - 0,1
Counter: 0: 705044031 1: 705021377 Counter additions: 1410065408
0: 7.05044% 1: 7.05021%
Timer Object: Two State - Double Timer Object Destroyed: Two State - Double Duration Elapsed: 87.6082 s
Double Two States - 0,1
Counter: 0: 705029886 1: 705035522 Counter additions: 1410065408
0: 7.0503% 1: 7.05036%
However, for limit = 10^9, the results are fine:
Timer Object: Bern Two States
Timer Object Destroyed: Bern Two States
Duration Elapsed: 62.5088 s
Bern Two States - 0,1
Counter:
0: 500005067
1: 499994933
Counter additions: 1000000000
0: 50.0005%
1: 49.9995%
Timer Object: Two State - Double
Timer Object Destroyed: Two State - Double
Duration Elapsed: 62.6709 s
Double Two States - 0,1
Counter:
0: 500015398
1: 499984602
Counter additions: 1000000000
0: 50.0015%
1: 49.9985%
Resolved: I actually used long int for the counters as well, but the problem was with the loop range iterator which was a 4-byte integer. The loop was actually messing up.
Related
I have implemented a c++ method that calculates the maximum ulp error between an approximation and a reference function on a given interval. The approximation as well as the reference are calculated as single-precision floating point values. The method starts with the low bound of the interval and iterates over each existing single-precision value within the range.
Since there are a lot of existing values depending on the range that is chosen, I would like to estimate the total runtime of this method, and print it to the user.
I tried to execute the comparison several times to calculate the runtime of one iteration. My approach was to multiply the duration of one iteration with the total number of floats existing in the range. But obviously the execution time for one iteration is not constant but depends on the number of iterations, therefore my estimated duration is not accurate at all... Maybe one could adapt the total runtime calculation in the main loop?
My question is: Is there any other way to estimate the total runtime for this particular case?
Here is my code:
void FloatEvaluateMaxUlp(float(*testFunction)(float), float(*referenceFunction)(float), float lowBound, float highBound)
{
/*initialization*/
float x = lowBound, output, output_ref;
int ulp = 0;
long long duration = 0, numberOfFloats=0;
/*calculate number of floats between lowBound and highBound*/
numberOfFloats = *(int*)&highBound - *(int*)&lowBound;
/*measure execution time of 10 iterations*/
int iterationsToEstimateTime = 1000;
auto t1 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < iterationsToEstimateTime; i++)
{
printProgressInteger(i+1, iterationsToEstimateTime);
output = testFunction(x);
output_ref = referenceFunction(x);
int ulp_local = FloatCompareULP(output, output_ref);
if (abs(ulp_local) > abs(ulp))
ulp = ulp_local;
x= std::nextafter(x, highBound + 0.001f);
}
auto t2 = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
duration /= iterationsToEstimateTime;
x = lowBound;
/*output of estimated time*/
std::cout <<std::endl<<std::endl<< " Number of floats: " << numberOfFloats << " Time per iteration: " << duration << " Estimated total time: " << numberOfFloats * duration << std::endl;
std::cout << " Starting test in range [" << lowBound << "," << highBound << "]." << std::endl;
long long count = 0;
/*record start time*/
t1 = std::chrono::high_resolution_clock::now();
for (count; x < highBound; count++)
{
printProgressInteger(count, numberOfFloats);
output = testFunction(x);
output_ref = referenceFunction(x);
int ulp_local = FloatCompareULP(output, output_ref);
if (abs(ulp_local) > abs(ulp))
ulp = ulp_local;
x = std::nextafter(x, highBound + 0.001f);
}
/*record stop time and compute duration*/
t2 = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
/*result output*/
std::cout <<std::endl<< std::endl << std::endl << std::endl << "*********************************************************" << std::endl;
std::cout << " RESULT " << std::endl;
std::cout << "*********************************************************" << std::endl;
std::cout << " Iterations: " << count << " Total execution time: " << duration << std::endl;
std::cout << " Max ulp: " << ulp <<std::endl;
std::cout << "*********************************************************" << std::endl;
}
I wrote a queue program based on a checkout simulation taking data from a text file (sim.txt). I got everything to work in my code except for one thing. I want to organize each sim result by numbers.
sim.txt
2000
0.10
5
15
5
20
10000
0.05
5
15
5
18
2000
0.20
5
20
10
10
2000
0.20
1
5
10
25
20000
0.50
1
2
50
10
Here is an example output
Simulation Results #1
// result #1 output is here...
Simulation Results #2
// result #2 output is here...
Simulation Results #3
// result #3 output is here...
Simulation Results #4
// result #4 output is here...
Simulation Results #5
// result #5 output is here...
I tried implementing a for loop to achieve this, but it didn't work as expected. This problem occurs for each simulation result.
Simulation Results #1
Simulation Results #2
Simulation Results #3
Simulation Results #4
Simulation Results #5
// result #1 outputs here...
Here is my code to get a better look what I'm doing
#include <iostream>
#include <cstdlib>
#include <queue>
#include <fstream>
#include <ctime>
using namespace std;
// Input parameters
#define SIMULATION_TIME 0
#define ARRIVAL_RATE 1
#define MIN_SERVICE_TIME 2
#define MAX_SERVICE_TIME 3
#define MAX_LINE_SIZE 4
#define ANGRY_THRESHOLD 5
#define PARAMS_MAX 6
#define FILENAME "sim.txt"
// Or like this:
// const int SIMULATION_TIME = 0;
// Or make 6 variables:
// double SIMULATION_TIME = 0;
// Counters -- indexes into variable 'counters' below
#define CUSTOMERS_SERVICED 0
#define CUSTOMERS_LEAVING 1
#define AVERAGE_WAIT 2
#define AVERAGE_LINE 3
#define ANGRY_CUSTOMERS 4
#define COUNTERS_MAX 5
// Holds the current simulation parameters
double parameters[PARAMS_MAX];
double counters[COUNTERS_MAX];
// This is an example debug macro you can use to print
// things each time through the loop
#define DEBUGGING_ENABLED 1
#ifdef DEBUGGING_ENABLED
#define DEBUG(x) do { \
cerr << __FILE__ << ": " << __LINE__ << ": " << x << endl; \
} while (0)
#else
#define DEBUG(x)
#endif // DEBUGGING_ENABLED
// Return the service interval, between 'min'
// and 'max'.
int randomInt(int min, int max)
{
return (rand() % (max - min) + min);
}
// Returns TRUE if a customer arrived during this minute
bool randomChance(double prob)
{
double rv = rand() / (double(RAND_MAX) + 1);
return (rv < prob);
}
// Read the next simulation from the file. Return
// TRUE if one could be read, FALSE otherwise (eof).
bool readNextSim(fstream &f, double parameters[])
{
for (int i = 0; i < PARAMS_MAX; i++)
{
string tmp;
getline(f, tmp);
if (f.eof())
return false;
// Read in the next parameter
parameters[i] = atof(tmp.c_str());
}
for (int i = 0; i < 5; i++)
cout <<"Simulation Results #" << i + 1 << endl;
cout << "---------------------" << endl;
cout << "\t Overall simulation time: " << "\t" << parameters[SIMULATION_TIME] << endl;
cout << "\t Arrival rate: " << "\t\t\t" << parameters[ARRIVAL_RATE] << endl;
cout << "\t Minimum service time: " << "\t\t" << parameters[MIN_SERVICE_TIME] << endl;
cout << "\t Maximum service time: " << "\t\t" << parameters[MAX_SERVICE_TIME] << endl;
cout << "\t Maximum line size: " << "\t\t" << parameters[MAX_LINE_SIZE] << endl;
cout << "\t Angry threshold: " << "\t\t" << parameters[ANGRY_THRESHOLD] << endl;
return true;
}
int main()
{
fstream f(FILENAME);
// Seed the random number generator here
srand(time(0));
if (!f.good())
{
cout << "Invalid file." << endl;
return -1;
}
while (readNextSim(f, parameters))
{
// Run the next simulation
queue<int> line;
for (int i = 0; i < COUNTERS_MAX; i++)
counters[i] = 0;
// or:
// memset(counters, 0, COUNTERS_MAX * sizeof(double));
//int customersLeavingLineFull = 0;
int simTime = 0;
int currentCustomer = -1;
// Each time through this loop represents 1 minute passing.
// There needs to be code to handle everything that can happen
// in 1 minute:
// - Customer arriving (yes/no?)
// - Is the current customer finished
// - Possibly process the next person in line
// - Calculate simulation statistics along the way
while (simTime++ < parameters[SIMULATION_TIME])
{
// One iteration of the loop represents one
// minute of simulation time.
// Check to see if a customer arrived
// (if so, process, also see if the line is full)
bool arrived = randomChance(parameters[ARRIVAL_RATE]);
if (arrived)
{
// A customer arrived in this minute
if (currentCustomer == -1)
{
// No customer is currently at the cashier
int serviceTime = randomInt(parameters[MIN_SERVICE_TIME],
parameters[MAX_SERVICE_TIME]);
currentCustomer = simTime + serviceTime;
}
else
{
if (line.size() == parameters[MAX_LINE_SIZE])
{
// Count this customer as leaving because the line is too
// full
counters[CUSTOMERS_LEAVING]++;
}
else
{
line.push(simTime);
}
}
}
counters[AVERAGE_LINE] += line.size();
// Check to see if the current customer is done
// at the cashier. Also check if there is no customer
// at the cashier (in which the next customer goes to the
// cashier).
if (simTime == currentCustomer)
{
if (!line.empty())
{
int nextCustomerTimestamp = line.front();
int waitingTime = simTime - nextCustomerTimestamp;
// We need to include this in the average waiting times
if (waitingTime >= parameters[ANGRY_THRESHOLD])
counters[ANGRY_CUSTOMERS]++;
counters[AVERAGE_WAIT] += waitingTime;
// Set currentCustomer to the time when that customer
// will be done. Need to call randomInt().
int serviceTime = randomInt(parameters[MIN_SERVICE_TIME],
parameters[MAX_SERVICE_TIME]);
// This will give us a timestamp of when the current customer will
// be done.
currentCustomer = simTime + serviceTime;
line.pop();
counters[CUSTOMERS_SERVICED]++;
}
else
{
// The line is empty
counters[CUSTOMERS_SERVICED]++;
currentCustomer = -1;
}
}
}
// Print a summary of the simulation:
// counters
counters[AVERAGE_WAIT] /= counters[CUSTOMERS_SERVICED];
counters[AVERAGE_LINE] /= parameters[SIMULATION_TIME];
cout << endl;
cout << "\t Customers serviced: " << "\t\t" << counters[CUSTOMERS_SERVICED] << endl;
cout << "\t Customers leaving: " << "\t\t" << counters[CUSTOMERS_LEAVING] << endl;
cout << "\t Average time spent in line: " << "\t" << counters[AVERAGE_WAIT] << endl;
cout << "\t Average line length: " << "\t\t" << counters[AVERAGE_LINE] << endl;
cout << "\t Angry customers: " << "\t\t" << counters[ANGRY_CUSTOMERS] << endl;
cout << endl;
}
return 0;
}
Any advice on how to achieve the example output above would be appreciated!
I was learning to use pthread with hopes it will help some of the slowest pieces of my code
go a bit faster. I tried to (as a warm-up example) to write a Montecarlo integrator using
threads. I wrote a code that compares three approaches:
Single thread pthread evaluation of the integral with NEVALS integrand evaluations.
Multiple thread evaluation of the integral NTHREADS times each with NEVALS
integrand evaluations.
Multiple threads commited to different cores in my CPU, again totalling NEVALS*NTHREADS
integrand evaluations.
Upon running the fastest per integrand evaluations is the single core, between 2 and 3 times faster than the others. The other two seem to be somewhat equivalent except for the fact that
the CPU usage is very different, the second one spreads the threads across all the (8) cores
in my CPU, while the third (unsurprisingly) concentrates the job in NTHREADS and leaves the rest
unoccupied.
Here is the source:
#include <iostream>
#define __USE_GNU
#include <sched.h>
#include <pthread.h>
#include <thread>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <unistd.h>
using namespace std;
double aleatorio(double a, double b){
double r = double(rand())/RAND_MAX;
return a + r * (b - a);
}
double funct(double* a){
return pow(a[0],6);
}
void EstimateBounds(int ndim, double (*f)(double*), double* bounds){
double x[ndim];
for(int i=1;i<=1000;i++){
for(int j=0;j<ndim;j++) x[j] = aleatorio(0,1);
if ( f(x) > bounds[1]) bounds[1] = f(x);
if ( f(x) < bounds[0]) bounds[0] = f(x);
}
}
void Integrate(double (*f)(double*), int ndim, double* integral, int verbose, int seed){
int nbatch = 5000000;
const int maxeval = 25*nbatch;
double x[ndim];
srand(seed);
/// Algorithm to estimate the maxima and minima ///
for(int j=0;j<ndim;j++) x[j] = 0.5;
double bounds[2] = {f(x),f(x)};
EstimateBounds(ndim,f,bounds);
/// Integral initialization ///
int niter = int(maxeval/nbatch);
for(int k=1;k<=niter;k++)
{
double loc_min = bounds[0];
double loc_max = bounds[1];
int count = 0;
for (int i=1; i<=nbatch; i++)
{
for(int j=0;j<ndim;j++) x[j] = aleatorio(0,1);
double y = aleatorio(bounds[0],bounds[1]);
if ( f(x) > loc_max ) loc_max = f(x);
if ( f(x) < loc_min ) loc_min = f(x);
if ( f(x) > y && y > 0 ) count++;
if ( f(x) < y && y < 0 ) count--;
}
double delta = (bounds[1]-bounds[0])*double(count)/nbatch;
integral[0] += delta;
integral[1] += pow(delta,2);
bounds[0] = loc_min;
bounds[1] = loc_max;
if(verbose>0){
cout << "Iteration["<<k<<"]: " << k*nbatch;
cout << " integrand evaluations so far" <<endl;
if(verbose>1){
cout << "The bounds for this iteration were = ["<<bounds[0]<<","<<bounds[1]<<"]"<<endl;}
cout << "Integral = ";
cout << integral[0]/k << " +- ";
cout << sqrt((integral[1]/k - pow(integral[0]/k,2)))/(k) << endl;
cout << endl;
}
}
integral[0] /= niter;
integral[1] = sqrt((integral[1]/niter - pow(integral[0],2)))/niter;
}
struct IntegratorArguments{
double (*Integrand)(double*);
int NumberOfVariables;
double* Integral;
int VerboseLevel;
int Seed;
};
void LayeredIntegrate(IntegratorArguments IA){
Integrate(IA.Integrand,IA.NumberOfVariables,IA.Integral,IA.VerboseLevel,IA.Seed);
}
void ThreadIntegrate(void * IntArgs){
IntegratorArguments *IA = (IntegratorArguments*)IntArgs;
LayeredIntegrate(*IA);
pthread_exit(NULL);
}
#define NTHREADS 5
int main(void)
{
cout.precision(16);
bool execute_single_core = true;
bool execute_multi_core = true;
bool execute_multi_core_2 = true;
///////////////////////////////////////////////////////////////////////////
///
/// Single Thread Execution
///
///////////////////////////////////////////////////////////////////////////
if(execute_single_core){
pthread_t thr0;
double integral_value0[2] = {0,0};
IntegratorArguments IntArg0;
IntArg0.Integrand = funct;
IntArg0.NumberOfVariables = 2;
IntArg0.VerboseLevel = 0;
IntArg0.Seed = 1;
IntArg0.Integral = integral_value0;
int t = time(NULL);
cout << "Now Attempting to create thread "<<0<<endl;
int rc0 = 0;
rc0 = pthread_create(&thr0, NULL, ThreadIntegrate,&IntArg0);
if (rc0) {
cout << "Error:unable to create thread," << rc0 << endl;
exit(-1);
}
else cout << "Thread "<<0<<" has been succesfuly created" << endl;
pthread_join(thr0,NULL);
cout << "Thread 0 has finished, it took " << time(NULL)-t <<" secs to finish" << endl;
cout << "Integral Value = "<< integral_value0[0] << "+/-" << integral_value0[1] <<endl;
}
////////////////////////////////////////////////////////////////////////////////
///
/// Multiple Threads Creation
///
///////////////////////////////////////////////////////////////////////////////
if(execute_multi_core){
pthread_t threads[NTHREADS];
double integral_value[NTHREADS][2];
IntegratorArguments IntArgs[NTHREADS];
int rc[NTHREADS];
for(int i=0;i<NTHREADS;i++){
integral_value[i][0]=0;
integral_value[i][1]=0;
IntArgs[i].Integrand = funct;
IntArgs[i].NumberOfVariables = 2;
IntArgs[i].VerboseLevel = 0;
IntArgs[i].Seed = i;
IntArgs[i].Integral = integral_value[i];
}
int t = time(NULL);
for(int i=0;i<NTHREADS;i++){
cout << "Now Attempting to create thread "<<i<<endl;
rc[i] = pthread_create(&threads[i], NULL, ThreadIntegrate,&IntArgs[i]);
if (rc[i]) {
cout << "Error:unable to create thread," << rc[i] << endl;
exit(-1);
}
else cout << "Thread "<<i<<" has been succesfuly created" << endl;
}
/// Thread Waiting Phase ///
for(int i=0;i<NTHREADS;i++) pthread_join(threads[i],NULL);
cout << "All threads have now finished" <<endl;
cout << "This took " << time(NULL)-t << " secs to finish" <<endl;
cout << "Or " << (time(NULL)-t)/NTHREADS << " secs per core" <<endl;
for(int i = 0; i < NTHREADS; i++ ) {
cout << "Thread " << i << " has as the value for the integral" << endl;
cout << "Integral = ";
cout << integral_value[i][0] << " +- ";
cout << integral_value[i][1] << endl;
}
}
////////////////////////////////////////////////////////////////////////
///
/// Multiple Cores Execution
///
///////////////////////////////////////////////////////////////////////
if(execute_multi_core_2){
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
pthread_t threads[NTHREADS];
double integral_value[NTHREADS][2];
IntegratorArguments IntArgs[NTHREADS];
int rc[NTHREADS];
for(int i=0;i<NTHREADS;i++){
integral_value[i][0]=0;
integral_value[i][1]=0;
IntArgs[i].Integrand = funct;
IntArgs[i].NumberOfVariables = 2;
IntArgs[i].VerboseLevel = 0;
IntArgs[i].Seed = i;
IntArgs[i].Integral = integral_value[i];
}
int t = time(NULL);
for(int i=0;i<NTHREADS;i++){
cout << "Now Attempting to create thread "<<i<<endl;
rc[i] = pthread_create(&threads[i], NULL, ThreadIntegrate,&IntArgs[i]);
if (rc[i]) {
cout << "Error:unable to create thread," << rc[i] << endl;
exit(-1);
}
else cout << "Thread "<<i<<" has been succesfuly created" << endl;
CPU_SET(i, &cpuset);
}
cout << "Now attempting to commit different threads to different cores" << endl;
for(int i=0;i<NTHREADS;i++){
const int set_result = pthread_setaffinity_np(threads[i], sizeof(cpu_set_t), &cpuset);
if(set_result) cout << "Error: Thread "<<i<<" could not be commited to a new core"<<endl;
else cout << "Thread reassignment succesful" << endl;
}
/// Thread Waiting Phase ///
for(int i=0;i<NTHREADS;i++) pthread_join(threads[i],NULL);
cout << "All threads have now finished" <<endl;
cout << "This took " << time(NULL)-t << " secs to finish" <<endl;
cout << "Or " << (time(NULL)-t)/NTHREADS << " secs per core" <<endl;
for(int i = 0; i < NTHREADS; i++ ) {
cout << "Thread " << i << " has as the value for the integral" << endl;
cout << "Integral = ";
cout << integral_value[i][0] << " +- ";
cout << integral_value[i][1] << endl;
}
}
pthread_exit(NULL);
}
I compile with
g++ -std=c++11 -w -fpermissive -O3 SOURCE.cpp -lpthread
It seems to me that my threads are actually being excecuted sequentially, because
the time seems to grow with NTHREADS, and it actully takes roughly NTHREADS times longer
than a single thread.
Does anyone have an idea of where the bottleneck is?
You are using rand(), which is a global random number generator. First of all it is not thread-safe, so using it in multiple threads, potentially in parallel, causes undefined behavior.
Even if we set that aside, rand() is using one global instance, shared by all threads. If one thread wants to call it, the processor core needs to check whether the other cores modified its state and needs to refetch that state from the main memory or other caches each time it is used. This is why you observe the drop in performance.
Use the <random> facilities for pseudo-random number generators instead. They offer much better quality random number generators, random number distributions, and the ability to create multiple independent random number generator instances. Make these thread_local, so the threads do not interfere with one another:
double aleatorio(double a, double b){
thread_local std::mt19937 rng{/*seed*/};
return std::uniform_real_distribution<double>{a, b}(rng);
}
Please note though that this is not using proper seeding for std::mt19937, see this question for details and that uniform_real_distribution<double>{a, b} will return a uniformly distributed number between a inclusive and b exclusive. Your original code gave a number between a and b inclusive (potential rounding errors aside). I assume that neither is particularly relevant to you.
Also note my unrelated comments under your question for other things you should improve.
I'm trying to simulate in SystemC a module with a CABA (Cycle Accurate / Bit Accurate) model which adds two numbers. It has the following signals:
Module addition_CABA
a: Input number for the addition.
b: Input number for the addition.
clk: Clock input.
valid: Input signal which changes to 1 when the inputs a and b are available.
result: Output signal containing the result of a + b.
ready: Output signal which changes to 1 when result is ready.
In order to test if the result from this module is right I have created a testbench module which has the following signals:
Module testbench
result_tb: Input signal which receives the result signal from addition_CABA module.
ready_tb: Input signal which receives the ready signal from addition_CABA module.
clk_tb: Clock input signal. It's the same for both modules.
rst_tb: Reset input signal. It's the same for both modules.
a_tb: Output signal which sends the number a to addition_CABA module.
b_tb: Output signal which sends the number b to addition_CABA module.
valid_tb: Output signal which sends the valid signal to addition_CABA module.
The test I'm doing is as follows:
Within the module testbench a pair of random numbers is generated to give values to a and b.
The module calcules the result after some clock cycles.
The testbench directly does the addition operation and compares it with the one resulting from the module.
These actions are within a loop which repeats five times.
The problem I'm having is that when I run the simulation testbench gives the right result, and addition_CABA shows the result but some clock cycles later, so the comparison is between two different numbers. In addition, when the program ends I have a Segmentation fault (core dumped) message. What I'm trying to figure it out is how to indicate testbench to wait until the result is ready (with signal ready_tb) so it can do the comparison with the right number. I have tried with a while(!ready_tb.read()) wait(); condition just before starting the test, but when doing this the program ends and the simulation never starts.
In the main.cpp file I'm just doing the connections between modules, generating the clock and setting the rst signal to 0. Below is my code:
addition_CABA.h
#include <systemc.h>
//Module which adds two numbers
SC_MODULE(addition_CABA){
sc_in< sc_uint<8> > a;
sc_in< sc_uint<8> > b;
sc_in<bool> clk;
sc_in<bool> valid;
sc_in<bool> rst;
sc_out<bool> ready;
sc_out< sc_uint<8> > result;
int addcaba(int a, int b){
int c;
c = a+b;
wait(3);
return c;
}
void loop();
SC_CTOR(addition_CABA){
SC_CTHREAD(loop, clk.pos());
async_reset_signal_is(rst, true);
}
};
addition_CABA.cpp
void addition_CABA::loop(){
ready.write(0);
result.write(0);
if(rst){
ready.write(0);
result.write(0);
}
else{
while(1){
while (!valid.read()) wait();
result.write(addcaba(a.read(),b.read()));
ready.write(1);
wait();
ready.write(0);
}
}
}
testbench.h
#include <systemc.h>
SC_MODULE(testbench){
sc_in< sc_uint<8> > result_tb;
sc_in<bool> ready_tb;
sc_in<bool> clk_tb;
sc_in<bool> rst_tb;
sc_out< sc_uint<8> > a_tb;
sc_out< sc_uint<8> > b_tb;
sc_out<bool> valid_tb;
void test();
SC_CTOR(testbench){
SC_CTHREAD(test, clk_tb.pos());
async_reset_signal_is(rst_tb, true);
}
};
testbench.cpp
void testbench::test(){
uint8_t c = 0;
int k = 0;
if (rst_tb){
c = 0;
k = 0;
cout << "\nReset on!\n" << endl;
}
else{
//while(!ready_tb.read()) wait(); //when using this condition the simulation never starts
while(k < 5){
a_tb.write( (1 + rand() % (128-1)) );
b_tb.write( (1 + rand() % (128-1)) );
valid_tb.write(1);
sc_start(10, SC_NS);
valid_tb.write(0);
sc_start(10, SC_NS);
cout << "\nTest number " << k+1 << endl;
cout << "\ta = " << a_tb.read() << " and b = " << b_tb.read() << endl;
cout << "\tAddition of " << a_tb.read() << " and " << b_tb.read();
cout << " = " << result_tb.read() << endl;
c = a_tb.read() + b_tb.read();
if ( result_tb.read() != c ){
cout << "Real result = " << a_tb.read() + b_tb.read();
cout << " and result with module = " << result_tb.read() << endl;
cout << "Wrong result\n" << endl;
// exit(1);
}
else cout << "Result OK\n" << endl;
k++;
}
}
}
The root causes of the problem were the following ones:
The simulation time in main.cpp function was too short (10 ns, I modified to 500 ns).
As the test function in the testbench module is an SC_CTHREAD, it must be within an infinite loop. The previous implementation was completely wrong and I think this was also the root cause of the Segmentation fault (core dumped) message.
In addition, the loop which repeats the test five times is not necessary as the number of iterations is related to the simulation time (set in the main.cpp function). Below is the code for testbench.cpp corrected:
void testbench::test(){
uint8_t c = 0;
int k = 0;
if (rst_tb){
c = 0;
k = 0;
cout << "\nReset on!\n" << endl;
}
else{
while(1){
a_tb.write( (1 + rand() % (128-1)) );
b_tb.write( (1 + rand() % (128-1)) );
valid_tb.write(1);
wait();
valid_tb.write(0);
wait();
while(!ready_tb.read()) wait();//This condition waits until ready_tb = true to continue the simulation
cout << "\nTest number " << k+1 << endl;
cout << "\ta = " << a_tb.read() << " and b = " << b_tb.read() << endl;
cout << "\tAddition of " << a_tb.read() << " and " << b_tb.read();
cout << " = " << result_tb.read() << endl;
c = a_tb.read() + b_tb.read();
if ( result_tb.read() != c ){
cout << "Real result = " << a_tb.read() + b_tb.read();
cout << " and result with module = " << result_tb.read() << endl;
cout << "Wrong result\n" << endl;
exit(1);
}
else cout << "Result OK\n" << endl;
k++;
}
}
}
Hey im trying to count how long the function takes to execute
I am doing it like this:
Timer.cpp
long long int Timer :: clock1()
{
QueryPerformanceCounter((LARGE_INTEGER*)&time1);
return time1;
}
long long int Timer :: clock2()
{
QueryPerformanceCounter((LARGE_INTEGER*)&time2);
return time2;
}
main.cpp
#include "Timer.h" //To allow the use of the timer class.
Timer query;
void print()
{
query.clock1();
//Loop through the elements in the array.
for(int index = 0; index < num_elements; index++)
{
//Print out the array index and the arrays elements.
cout <<"Index: " << index << "\tElement: " << m_array[index]<<endl;
}
//Prints out the number of elements and the size of the array.
cout<< "\nNumber of elements: " << num_elements;
cout<< "\nSize of the array: " << size << "\n";
query.clock2();
cout << "\nTime Taken : " << query.time1 - query.time2;
}
Can anyone tell me if i am doing this correctly?
You are substracting ending time from starting time.
cout << "\nTime Taken : " << query.time1 - query.time2;
should be
cout << "\nTime Taken : " << query.time2 - query.time1
Let's say I start something at 10 seconds and it finishes at 30 seconds. How long did it take? 20 seconds. To get that, we would do 30 - 10; that is, the second time subtract the first time.
So perhaps you want:
cout << "\nTime Taken : " << (query.time2 - query.time1);