SystemC Transfer Level Model Sockets: Two Way Communication - c++

In Doulos's SystemC Transfer Level Model documentation, it is written
The top-level module of the hierarchy instantiates one initiator and
one memory, and binds the initiator socket on the initiator to the
target socket on the target memory. The sockets encapsulate everything
you need for two-way communication between modules, including ports
and exports for both directions of communication. One initiator socket
is always bound to one target socket.
My understanding of this was that when you create an initiator and a target the initiator starts the communication by calling b_transport, thereby triggering the target, which can reply back. However, I have been writing some code and this does not seem to be the case. Let us look at an example.
I have a very basic implementation of an adder that can be talked to using transfer level modeling. This modules servers as the target.
adder.cc
#define SC_INCLUDE_DYNAMIC_PROCESS
#include "tlm_utils/simple_initiator_socket.h"
#include "tlm_utils/simple_target_socket.h"
#include <iostream>
using namespace sc_core;
using namespace std;
#include "adder.h"
adder::adder(sc_module_name name)
: sc_module(name), socket("socket2")
{
socket.register_b_transport(this, &adder::b_transport);
socket.register_transport_dbg(this, &adder::transport_dbg);
}
void adder::b_transport(tlm::tlm_generic_payload& trans, sc_time& delay)
{
tlm::tlm_command cmd = trans.get_command();
sc_dt::uint64 addr = trans.get_address();
uint32_t *ptr = (uint32_t*)trans.get_data_ptr();
unsigned int len = trans.get_data_length();
unsigned char *byt = trans.get_byte_enable_ptr();
unsigned int wid = trans.get_streaming_width();
addend1 = *ptr;
addend2 = *(++ptr);
add();
cout << "addend1: " << addend1 << endl;
cout << "addend2: " << addend2 << endl;
cout << "sum: " << sum << endl;
uint32_t *return_sum_loc = ptr;
for(int i = 0; i< 2; i++) {
return_sum_loc++;
}
memcpy(return_sum_loc, (char*) &sum, sizeof(uint32_t));
cout << "New sum for return: " << *(return_sum_loc) << endl;
}
unsigned int adder::transport_dbg(tlm::tlm_generic_payload& trans)
{
return 0;
}
void adder::add()
{
sum = addend1 + addend2;
}
Then I have a test_bench module that is going to serve as the initiator
test_bench.cc
#define SC_INCLUDE_DYNAMIC_PROCESS
#include "tlm_utils/simple_initiator_socket.h"
#include "tlm_utils/simple_target_socket.h"
using namespace sc_core;
using namespace std;
#include "test_bench.h"
#include <fstream>
#include <iostream>
test_bench::test_bench(sc_module_name name):
sc_module(name), socket("socket")
{
SC_THREAD(run_tests);
}
void test_bench::run_tests()
{
ifstream infile("./adder.golden.dat");
ofstream ofs;
ofs.open("./adder.dat");
uint32_t theoretical_sum = 0;
while(infile >> data[0] >> data[1] >> theoretical_sum)
{
tlm::tlm_generic_payload *trans = new tlm::tlm_generic_payload;
sc_time delay = sc_time(10, SC_NS);
cout << "Sending" << endl;
cout << "Data[0]: " << data[0] << endl;
cout << "Data[1]: " << data[1] << endl;
trans->set_data_ptr((unsigned char*)data);
socket->b_transport(*trans, delay);
cout << "data[2]" << data[2] << endl;
ofs << data[0] << "\t" << data[1] << "\t" << data[2] << "\n";
delete trans;
}
infile.close();
ofs.close();
printf ("Comparing against output data \n");
if (system("diff -w adder.dat adder.golden.dat"))
{
cout << "*******************************************" << endl;
cout << "FAIL: Output DOES NOT match the golden output" << endl;
cout << "*******************************************" << endl;
}
else
{
cout << "*******************************************" << endl;
cout << "PASS: The output matches the golden output!" << endl;
cout << "*******************************************" << endl;
}
}
Here is the parent module that instantiates and connects them.
main.cc
#include "systemc.h"
#include "tlm_utils/simple_initiator_socket.h"
#include "tlm_utils/simple_target_socket.h"
#include "tlm_utils/tlm_quantumkeeper.h"
using namespace sc_core;
using namespace sc_dt;
using namespace std;
#include "test_bench.h"
#include "adder.h"
SC_MODULE(Top)
{
test_bench *tb;
adder *ad;
sc_signal<bool> rst;
sc_signal<bool> tb_irq;
sc_signal<bool> ad_irq;
Top(sc_module_name name) :
rst("rst")
{
tb = new test_bench("test_bench");
ad = new adder("adder");
tb->socket.bind(ad->socket);
tb->irq(tb_irq);
ad->irq(ad_irq);
}
};
int sc_main(int argc, char *argv[])
{
Top *top = new Top("Top");
sc_start();
}
When I run the executable this is the output I get.
< 1 0 0
< 1 1 0
< 2 1 0
< 2 2 0
< 2 3 0
< 3 3 0
< 4 3 0
< 4 4 0
< 5 4 0
< 5 5 0
1 0 1
1 1 2
2 1 3
2 2 4
2 3 5
3 3 6
4 3 7
4 4 8
5 4 9
5 5 10
FAIL: Output DOES NOT match the golden output
So my original thought was that you were passing by value this payload into the b_transport function of an initiator that is bound to a target. The target will receive and decode this payload. This part is happening. I am able to parse the uint32_t s passed in by value to the data[]. What I eventually realized based on my 0 return values, that were written into the memory that was passed, is that this is not actually passed by value. For some reason it is created as a pointer type, then it is dereferenced when passed. This in essence destroys that target's ability to manipulate the memory that was passed to hand back a response to the initiator.
So this whole two-way communication thing Aynsley mentioned has me a little confused. By two-way does, he mean both modules need target and initiator sockets to enable two-way communication?

This is the signature of b_transport call:
void b_transport( tlm::tlm_generic_payload& trans, sc_time& delay )
Payload is passed by reference, so target can modify it. Initiator can read returned value from the same payload object.
So this whole two-way communication thing Aynsley mentioned has me a
little confused. By two-way does, he mean both modules need target and
initiator sockets to enable two-way communication?
Blocking transport protocol implemented by b_transport call is unidirectional. Initiator module is active, target module is passive. Transaction finishes in a single call. Target is allowed to call wait() inside b_transport implementation.
But TLM2.0 also supports non-blocking protocol that consists of two calls:
nb_transport_fw from initiator to target
nb_transport_bw from target to initiator
This bidirectional protocol allows more fine-grained modeling of bus timing. For example you can model of out-of-order transaction processing in AMBA AXI bus.
In practice however almost everyone uses b_transport. Most models I've seen don't even support non-blocking interface.

Related

When multithreading in C++, it randomly repeats the input line stored in my sctruct

I'm writing a program, that reads a file, and for each line, it creates a thread. In log, I can see that before calling the pthread_create command, it is taking the lines correctly (one at a time). The problem is that when creating the thread, it is randomly repeating the lines. I originally thought that a usleep(250) could solve it (in fact it mitigates the problem a bit), but it still keeps repeating some lines and ignoring others.
Please excuse the terrible mix of C with C++! The code:
#include <iostream>
#include <cstdlib>
#include <pthread.h>
#include <fstream>
#include <stdlib.h>
using namespace std;
#define NUM_THREADS 201
struct thread_data{
int thread_id;
char *serial;
char *location;
};
void *RunCommand(void *threadarg)
{
struct thread_data *my_data;
my_data = (struct thread_data *) threadarg;
char *writecmd1="nohup date --date='10 hour' '+%Y-%m-%d_%H:%M:%S' >> /results_path/";
char *writecmda="cd /work_place/bin; ./process1 -d";
char *serial_num=my_data->serial;
char *d_location=my_data->location;
char *writecmd2=" >> /results_path/";
char *writecmd3=".out ";
char *writecmd3a="; ";
char *writecmd4="& ";
char * final_cmd = (char *) malloc(1 + strlen(writecmd1) + strlen(serial_num) + strlen(writecmd3) + strlen(writecmd3a) + strlen(writecmda) + strlen(d_location) + strlen(writecmd2) + strlen(serial_num) + strlen(writecmd3) + strlen(writecmd4));
strcpy(final_cmd,writecmd1);
strcat(final_cmd,serial_num);
strcat(final_cmd,writecmd3);
strcat(final_cmd,writecmd3a);
strcat(final_cmd,writecmda);
strcat(final_cmd,d_location);
strcat(final_cmd,writecmd2);
strcat(final_cmd,serial_num);
strcat(final_cmd,writecmd3);
strcat(final_cmd,writecmd4);
cout << final_cmd << endl;
cout << my_data->serial << "-" << my_data->location << endl;
system(final_cmd);
free(final_cmd);
pthread_exit(NULL);
}
int main ()
{
pthread_t threads[NUM_THREADS];
struct thread_data td[NUM_THREADS];
int rc;
int i = 0;
const char s[2] = ",";
string input_line;
std::ifstream infile("elements_list.in");
while (infile >> input_line)
{
char *main_line = &input_line[0u];
char *token;
// Get the first token. In the file it is the serial number
token = strtok(main_line, s);
td[i].serial = token;
// Get the second token. In the file it is the location
token = strtok(NULL, s);
td[i].location = token;
td[i].thread_id = i;
cout << td[i].serial << "#" << td[i].location << endl;
usleep(250); //Sleep for miliseconds
rc = pthread_create(&threads[i], NULL, RunCommand, (void *)&td[i]);
if (rc)
{
cout << "Error:unable to create thread," << rc << endl;
exit(-1);
}
i++;
}
pthread_exit(NULL);
}
An example of my output (the line with # is before creating the thread, the line with hyphen is when creating the thread):
10000230#location1
10000294#location2
10000294-location2
10000294-location2
10000301#location3
10000301-location3
10000257#location4
10000257-location4
10000344#location5
10000071#location6
10000354#location7
10000354-location7
10000041#location8
10000041-location8
10000058#location9
10000058-location9
1000036310000363-location10
10000363#location10
-location10
10000363-location10
10000201#location11
10000201-location11
10000095#location12
20000037-location13
20000037#location13
On top of that in some cases my structure is apparently loosing shape (see the line for "location10"). Location 1, 5 and 6 are ignored, Location 2 repeated, etc.
The only post I found close to my issue is Multithreading in C but honestly I'm not entirely clear about the solution.
Hope you can help. Thanks in advance
First of all: Your program does not compile using g++. Your code example misses
#include <string.h> // or rather <cstring>
#include <unistd.h>
I have to admit that it was kind of painful to read your code. I did not find the exact reason why lines are repeated but your malformed output is due to a race condition.
std::cout::operator<< is thread safe. 2 concurrent calls are never interleaved in your output. So
Thread 1:
std::cout << "foo\n"
Thread 2:
std::cout << "bar\n"
Always yields either foo\nbar\n or bar\nfoo\n and never fboo\nar\n or something similar. However you are using multiple calls to std::cout::operator<<
Thread 1:
std::cout << "f" << "o" << "o" << std::endl;
Thread 2:
std::cout << "b" << "a" << "r" << std::endl;
Which is the same as
Thread 1:
std::cout << "f";
std::cout << "o";
/* ... */
may yield fboaor\n\n.
When you are multithreading you have to create your full output lines before feeding them to std::cout:
std::stringstream ss;
ss << "My" << "safe" << "output" << "scheme" << std::endl;
std::cout << ss.str();
I also strongly urge you to read a decent book on C++. Your program has a lot of other problems.

Distributing array components over processors using Open MPI with C++

I am trying to write a simple C++ program that takes an array, sends equal portions of it to different processors, and those processors do computations on the components and then send the portions of the array back to the master processor to be combined in the final array.
I have started with a simple case where I have an array of size 2, and the first component gets added by 1 by process 1. The second component gets added by 2 by process 2.
Here is what I have:
# include <cstdlib>
# include <iostream>
# include <iomanip>
# include <ctime>
#include <fstream>
# include "mpi.h"
using namespace std;
ofstream debug("DEBUG");
ofstream debug1("DEBUG1");
ofstream debug2("DEBUG2");
// Declare the array
double arr[2];
int main(int argc, char *argv[])
{
MPI::Init(argc, argv);
// Make the array
arr[0] = 1;
arr[1] = 2;
int rank = MPI::COMM_WORLD.Get_rank();
int npes = MPI::COMM_WORLD.Get_size();
if ( rank == 0 ) {
cout << "Running on "<< npes << " Processes "<< endl;
double arr1;
double arr2;
MPI::COMM_WORLD.Recv(&arr1, 1, MPI::DOUBLE, 0, 0);
debug << "arr1: " << arr1 << endl;
/*... Program freezes here. I'd like to combine arr1 and arr2 into
arr*/
}
if ( rank == 1){
debug1 << "This is process " << rank << endl;
double arr1 = arr[0];
debug1 << "arr1: " << arr1 << endl;
arr1 = arr1 + 1;
debug1 << "arr1+1: " << arr1 << endl;
MPI::COMM_WORLD.Send(&arr1, 1, MPI::DOUBLE, 0, 0);
}
if ( rank == 2){
debug2 << "This is process " << rank << endl;
double arr2 = arr[1];
debug2 << "arr2: " << arr2 << endl;
arr2 = arr2 + 2;
debug2 << "arr2+2: " << arr2 << endl;
}
cout << "Greetings from process " << rank << endl;
MPI::Finalize();
}
I am compiling with
mpiCC test.cpp -o test
and running with
mpirun -np 3 test
since I wish to use 2 processors to operate on arr and 1 processor (process 0) to gather the components.
My issue is that the program freezes when using
MPI::COMM_WORLD.Recv(&arr1, 1, MPI::DOUBLE, 0, 0);
on process 0.
Does anyone know why this would happen? I'd simply like to distribute computations on an array over processors and thought this would be a good example to start with.
When you use MPI, there are functions designed for that kind of task. MPI_Scatter and MPI_Reduce. It allows you to divide your array to n children do your computations and get the result back to the coordinator.

C++ multithreading interruptions

As you can see in the main function I've created a group of threads that execute the exact same function yet with different parameters. The function simply prints out vector's values. Now the problem is that these threads interfere with one another. What I mean is that one thread does not finish printing (cout) before another starts, and it goes like sdkljasjdkljsad. I want some sort of chaotic order, such as, for example:
Thread 1 Vector[0]
Thread 2 Vector[0]
Thread 1 Vector[1]
Thread 3 Vector[0]
Thread 4 Vector[0]
Thread 2 Vector[1]
Rather than:
Thread 1 Thread 2 Vector[0] Vector[0]
Thread 2 Vector[1]
Thread 1 Thread 4 Vector[1] Thread 3 Vector[0] Vector[1]
How can I solve this problem? P.S. Data file is simply a list of player names, weight and bench-press per line. Transforming these to strings and placing in a vector (yeah, sounds dumb, but I'm just fulfilling a task).
#include "stdafx.h"
#include <iostream>
#include <string>
#include <fstream>
#include <vector>
#include <string>
#include <thread>
#include <sstream>
#include <iomanip>
#include <boost/thread.hpp>
#include <boost/bind.hpp>
using namespace std;
vector<string> Kategorijos;
vector< vector<string> > Zaidejai;
ifstream duom("duom.txt");
string precision(double a) {
ostringstream out;
out << setprecision(6) << a;
return out.str();
}
void read() {
string tempKat;
int tempZaidSk;
vector<string> tempZaid;
string vardas;
int svoris;
double pakeltasSvoris;
while (duom >> tempKat >> tempZaidSk) {
Kategorijos.push_back(tempKat);
for (int i = 0; i < tempZaidSk; i++) {
duom >> vardas >> svoris >> pakeltasSvoris;
tempZaid.push_back(vardas + " " + to_string(svoris) + " " + precision(pakeltasSvoris));
}
Zaidejai.push_back(tempZaid);
tempZaid.clear();
}
duom.close();
}
void writethreads(int a) {
int pNr = a+1;
for (int i = 0; i < (int)Zaidejai[a].size(); i++) {
cout << endl << "Proceso nr: " << pNr << " " << i << ": " << Zaidejai[a][i] ;
}
}
void print() {
for (int i = 0; i < (int)Kategorijos.size(); i++) {
cout << "*** " << Kategorijos[i] << " ***" << endl;
for (int j = 0; j < (int)Zaidejai[i].size(); j++) {
cout << j+1<<") "<< Zaidejai[i][j] << endl;
}
cout << endl;
}
cout << "-------------------------------------------------------------------" << endl;
}
int main()
{
read();
print();
boost::thread_group threads
;
for (int i = 0; i < (int)Kategorijos.size(); i++) {
threads.create_thread(boost::bind(writethreads, i));
}
threads.join_all();
system("pause");
return 0;
}
Welcome to the problem of thread synchronization! When only one thread can use a resource at a time, the lock you use to control that resource is a mutex. You can also store the data for one thread to output at the end, or you can have the threads synch up at a barrier.
You can synchronise them, the console writes, with an appropriate mutex. But in this case, with the console output, maybe don't use threads at all. Else send the printing to a dedicated thread that deals with it.
The alternative to using the usual cout overloaded operator << is to write the content to a local buffer or stringsteam (including the new line) and then, with a single function call, write that to the console. The single function call will assist in the console writer only writing one buffer's contents at a time.

Job scheduling from input file

I'm working on a simulated process scheduler for my operating systems class and am having trouble figuring out the best way to extract the data from the file to process. The input file will look like this:
5
1 3 10
2 4 15
3 6 8
4 7 3
5 9 12
Where the first number is the number of processes and each line contains: (1) The job number, (2)The Arrival Time, and (3)The CPU Cycle time.
I have to process the jobs using FCFS, SJF, SRT, and Round Robin.
This is part of what i have so far, but I'm having trouble figuring out how to get the data from the file in a way that I could process it more easily (or at all, I'm a little stuck).
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
struct job
{
int id, arrTime, cpuTime;
};
int main()
{
fstream input;
job job1, job2, job3, job4, job5, job6, job7,job8, job9, job10;
char n, *id, *at, *cpu;
int proc;
input.open("input.txt");
input.get(n);
proc = n;
return 0;
}
I was considering taking each bit of information from the 10 jobs given and putting it into the 10 job class objects. Also, is there a good way to implement the code in a way that will allow for any number of jobs?
Use a container like std::vector to store your jobs. Good I/O is hard and tedious to do:
For example:
// container to hold jobs
// remember to #include <vector>
std::vector<job> myJobs;
And then read them in:
// read in number of jobs:
// going to need #include <fstream> and <sstream>
ifstream input("input.txt");
if (!input)
{
std::cerr << "Failed to open input file! Cannot continue\n";
exit(1);
}
int numJobs = 0;
if (input >> numJobs)
{
myJobs.reserve(numJobs);
}
else
{
std::cerr << "Failed to read number of jobs. Cannot continue\n";
exit(1);
}
std::string nextline;
for (int i=0;i<numJobs && std::getline(input >> std::ws, nextLine); ++i)
{
std::stringstream inStream(nextLine);
job nextJob;
nextLine >> nextJob.id >> nextJob.arrTime >> nextJob.cpuTime;
if (!inStream)
{
std::cerr << "Error reading next job\n";
break;
}
myJobs.push_back(nextJob);
}
input.close();
std::cout << "Was supposed to read in " << numJobs << " jobs. Successfully read in " << myJobs.size() << " jobs.";
And then do what you will with the container:
// Now you can do what you want with them
// Let's sort them from smallest cpuTime to largest!
// remember to #include<algorithm>
std::sort(std::begin(myJobs), std::end(myJobs), [](const job& lhs, const job& rhs){return lhs.cputime < rhs.cputime;});
cout << "Jobs sorted by cputime: \n";
for (auto&& j : myJobs)
{
cout << j.id << " " << j.arrTime << " " << j.cpuTime << std::endl;
}
Edit I used a bit of C++11 in this post, so it will not compile if you're only using C++98/03

How to make thread safe Log class that supports `<<` operations?

So I have such log class:
#include <iostream>
#include <sstream>
#include <boost/circular_buffer.hpp>
#include <boost/foreach.hpp>
class FlushInternal;
class Log
{
public:
static FlushInternal* endl;
Log(int log_length)
{
i = 0;
messages_buffer = new boost::circular_buffer<std::string>(log_length);
}
template <class T>
Log &operator<<(const T &v)
{
current_message << v;
return *this;
}
Log &operator<<(std::ostream&(*f)(std::ostream&))
{
current_message << *f;
return *this;
}
Log &operator<<(FlushInternal*)
{
++i;
messages_buffer->push_back(current_message.str());
clean_stringstream(current_message);
is_filled();
return *this;
}
boost::circular_buffer<std::string> *messages_buffer;
private:
int i;
std::stringstream current_message;
void is_filled()
{
if (i >= messages_buffer->capacity())
{
i = 0;
BOOST_FOREACH(std::string s, *messages_buffer)
{
std::cout << ++i << ": " << s << " ;" << std::endl;
}
i = 0;
}
}
void clean_stringstream(std::stringstream &message)
{
message.flush();
message.clear();
message.seekp(0);
message.str("");
}
};
FlushInternal* Log::endl = 0;
And I can Use it like this:
#include <log.h>
int main()
{
Log l(2);
l << "message one: " << 1 << Log::endl;
l << "message two:" << " " << 2 << Log::endl;
l << "message " << "three: " << 3 << Log::endl;
l << "message" << " " << "four: " << 4 << Log::endl;
std::cin.get();
}
This would output:
1: message one: 1 ;
2: message two: 2 ;
1: message three: 3 ;
2: message four: 4 ;
As you can see I can have as many << as I want inside each log message. I want to be capable to use one instance of Log class from many threads at the same time. So I would have something like (pseudocode that compiles, runs but traces nothing.):
#include <boost/thread.hpp>
#include <log.h>
Log *l;
void fun_one()
{
*l << "message one: " << 1 << Log::endl;
*l << "message two:" << " " << 2 << Log::endl;
}
void fun_two()
{
*l << "message " << "three: " << 3 << Log::endl;
*l << "message" << " " << "four: " << 4 << Log::endl;
}
int main()
{
l = new Log(2);
boost::thread(fun_one);
boost::thread(fun_two);
std::cin.get();
}
So as you can see I want messages to be inserted into log in multythreaded function. Lo I wonder - how to make my log cclass support this?
The approach linked by trojanfoe is pretty much the canonical one. Basically create some temporary thing for the leftmost << operator, accumulate everything, and output the message in the destructor for the temporary thing.
The only question is the exact mechanics of this accumulator. The example used ostringstream, but I've seen the ofstream for the log file used directly as well (requires locking to ensure the output ends up on one line).
Creating ostringstreams is relatively expensive on some platforms, because they may need to lock and copy some internal locale related things. You could re-implement also the << operator for interesting types, but I'd test the ostringstream approach first.
A useful optimization is determine at the point of the construction of the temporary whether the trace will be emitted (e.g., whether tracing is enabled at that particular level), and not create the guts of the temporary at all in that case - all the insertion operations will be no-ops.
Here's one approach:
http://drdobbs.com/cpp/201804215
It basically creates a new ostringstream object each time you perform logging, which makes it thread safe. I can't say I'm that keen on that, as it seems a little clumsy to me.
You might have a look at the Qt logging classes as they support the << operator, however I'm not sure about thread safety.