Brain Computer Interface P300 Machine Learning - c++

I am currently working a P300 (basically there is detectable increase in a brain wave when a user sees something they are interested) detection system in C++ using the Emotiv EPOC. The system works but to improve accuracy I'm attempting to use Wekinator for machine learning, using an support vector machine (SVM).
So for my P300 system I have three stimuli (left, right and forward arrows). My program keeps track of the stimulus index and performs some filtering on the incoming "brain wave" and then calculates which index has the highest average area under the curve to determine which stimuli the user is looking at.
For my integration with Wekinator: I have setup Wekinator to receive a custom OSC message with 64 features (the length of the brain wave related to the P300) and set up three parameters with discrete values of 1 or 0. For training I have I have been sending the "brain wave" for each stimulus index in a trial and setting the relevant parameters to 0 or 1, then training it and running it. The issue is that when the OSC message is received by the the program from Wekinator it is returning 4 messages, rather than just the one most likely.
Here is the code for the training (and input to Wekinator during run time):
for(int s=0; s < stimCount; s++){
for(int i=0; i < stimIndexes[s].size(); i++) {
int eegIdx = stimIndexes[s][i];
ofxOscMessage wek;
wek.setAddress("/oscCustomFeatures");
if (eegIdx + winStart + winLen < sig.size()) {
int winIdx = 0;
for(int e=eegIdx + winStart; e < eegIdx + winStart + winLen; e++) {
wek.addFloatArg(sig[e]);
//stimAvgWins[s][winIdx++] += sig[e];
}
validWindowCount[s]++;
}
std::cout << "Num args: " << wek.getNumArgs() << std::endl;
wekinator.sendMessage(wek);
}
}
Here is the receipt of messages from Wekinator:
if(receiver.hasWaitingMessages()){
ofxOscMessage msg;
while(receiver.getNextMessage(&msg)) {
std::cout << "Wek Args: " << msg.getNumArgs() << std::endl;
if (msg.getAddress() == "/OSCSynth/params"){
resultReceived = true;
if(msg.getArgAsFloat(0) == 1){
result = 0;
} else if(msg.getArgAsFloat(1) == 1){
result = 1;
} else if(msg.getArgAsFloat(2) == 1){
result = 2;
}
std::cout << "Wek Result: " << result << std::endl;
}
}
}
Full code for both is at the following Gist:
https://gist.github.com/cilliand/f716c92933a28b0bcfa4
Main query is basically whether something is wrong with the code: Should I send the full "brain wave" for a trial to Wekinator? Or should I train Wekinator on different features? Does the code look right or should it be amended? Is there a way to only receive one OSC message back from Wekinator based on smaller feature sizes i.e. 64 rather than 4 x 64 per stimulus or 9 x 64 per stimulus index.

Related

How to change parameters in ChampSim simulator?

I am new to c++ programming and computer architecture.
I am trying to learn branch prediction using ChampSim simulator.(https://github.com/ChampSim/ChampSim)
However, I have no idea how to change the parameters in the program to do some simple simulations.
For example, for the bimodal predictor in ChampSim, how can I change the size of prediction tables?How can I change the branch history to 1-bit and do the simulation?
I also don't know how to change parameters like fetch-width(fetches how many instructions per cycle), decode-width, execute-width, commit-width, and ROB size.
If there is anybody who is familiar with ChamnpSim simulator and C++, please help me.
For example, it is the code for bimodal predictor:
#include "ooo_cpu.h"
#define BIMODAL_TABLE_SIZE 16384
#define BIMODAL_PRIME 16381
#define MAX_COUNTER 3
int bimodal_table[NUM_CPUS][BIMODAL_TABLE_SIZE];
void O3_CPU::initialize_branch_predictor()
{
cout << "CPU " << cpu << " Bimodal branch predictor" << endl;
for(int i = 0; i < BIMODAL_TABLE_SIZE; i++)
bimodal_table[cpu][i] = 0;
}
uint8_t O3_CPU::predict_branch(uint64_t ip)
{
uint32_t hash = ip % BIMODAL_PRIME;
uint8_t prediction = (bimodal_table[cpu][hash] >= ((MAX_COUNTER + 1)/2)) ? 1 : 0;
return prediction;
}
void O3_CPU::last_branch_result(uint64_t ip, uint8_t taken)
{
uint32_t hash = ip % BIMODAL_PRIME;
if (taken && (bimodal_table[cpu][hash] < MAX_COUNTER))
bimodal_table[cpu][hash]++;
else if ((taken == 0) && (bimodal_table[cpu][hash] > 0))
bimodal_table[cpu][hash]--;
}
What should I do if I want to change the branch history to 1 bit and let the bimodal predictor to study prediction tables of 128 entries?

How process knows when to stop listening for receive?

Consider a mesh whose bins are decomposed among processes. The numbers in the image are the ranks of processes.
At each time step, some of the points displace so that it is needed to send them to new destinations. This point-sending is done by all processes having displaced points. In the image only the points of lower-left corner bin are shown as an example.
I don't know how long should a process keep listening for receive messages? The problem is that a receiver does not even know whether a message would arrive or not because no point might pass to its region.
Also note that, the source and destination of a point might be the same as for blue point.
Edit: Below is an incomplete code to express the problem.
void transfer_points()
{
world.isend(dest, ...);
while (true)
{
mpi::status msg = world.iprobe(any_source, any_tag);
if (msg.count() != 0)
{
world.irecv(any_source, ...);
}
// but how long keep probing?
if (???) {break;}
}
}
Are you familiar with one-sided MPI or RMA (Remote Memory Access) via MPI_Win_* operations? The way I understand your problem, it should be solvable neatly with it:
Ranks that send some points just put it into the other rank's memory (window).
Barrier
Receivers have directly went to the barrier, and are now in possession of the data
Here is an example of a ring send with RMA (In c++ syntax!). In your situation it should only need some minor modification, i. e. only call MPI_Put if necessary and some math about the offsets to write into the buffer.
#include <iostream>
#include "mpi.h"
int main(int argc, char* argv[]) {
MPI::Init(argc,argv);
int rank = MPI::COMM_WORLD.Get_rank();
int comm_size = MPI::COMM_WORLD.Get_size();
int neighbor_left = rank - 1;
int neighbor_right = rank + 1;
//Left and right most are neighbors.
if(neighbor_right >= comm_size) { neighbor_right = 0;}
if(neighbor_left < 0) {neighbor_left = comm_size - 1;}
int postbox[2];
MPI::Win window = MPI::Win::Create(postbox,2,sizeof(int),MPI_INFO_NULL,MPI::COMM_WORLD);
window.Fence(0);
// Put my rank in the second entry of my left neighbor (I'm his right neighbor)
window.Put(&rank,1,MPI_INT,neighbor_left,1,1,MPI_INT);
window.Fence(0);
// Put my rank in the first entry of my right neighbor (I'm his left neighbor)
window.Put(&rank,1,MPI_INT,neighbor_right,0,1,MPI_INT);
window.Fence(0);
std::cout << "I'm rank = " << rank << " my Neighbors (l-r) are " << postbox[0] << " " << postbox[1] << std::endl;
MPI::Finalize();
return 0;
}

libwebsocket: unable to write frame bigger than 7160 bytes

I'm addressing an issue with WebSocket that I'm not able to understand.
Please, use the code below as reference:
int write_buffer_size = 8000 +
LWS_SEND_BUFFER_PRE_PADDING +
LWS_SEND_BUFFER_POST_PADDING;
char *write_buffer = new unsigned char[write_buffer_size];
/* ... other code
write_buffer is filled in some way that is not important for the question
*/
n = libwebsocket_write(wsi, &write_buffer[LWS_SEND_BUFFER_PRE_PADDING], write_len,
(libwebsocket_write_protocol)write_mode);
if (n < 0) {
cerr << "ERROR " << n << " writing to socket, hanging up" << endl;
if (utils) {
log = "wsmanager::error: hanging up writing to websocket";
utils->writeLog(log);
}
return -1;
}
if (n < write_len) {
cerr << "Partial write: " << n << " < " << write_len << endl;
if (utils) {
log = "wsmanager-error: websocket partial write";
utils->writeLog(log);
}
return -1;
}
When I try to send data bigger than 7160 bytes I receive always the same error, e.g. Partial write: 7160 < 8000.
Do you have any kind of explanation for that behavior?
I have allocated a buffer with 8000 bytes reserved for the payload so I was expecting to be able to send a maximum amount of data of 8K, but 7160 (bytes) seems to be the maximum amount of data I can send.
Any help is appreciated, thanks!
I have encountered similar problem with an older version of libwebsockets. Although I didn't monitor the limit, it was pretty much the same thing: n < write_len. I think my limit was way lower, below 2048B, and I knew that the same code worked fine with newer version of libwebsockets (on different machine).
Since Debian Jessie doesn't have lws v1.6 in repositories, I've built it from github sources. Consider upgrading, it may help solve your problem. Beware, they have changed api. It was mostly renaming of methods' names from libwebsocket_* to lws_*, but also some arguments changed. Check this pull request which migrates boilerplate libwebsockets server to version 1.6. Most of these changes will affect your code.
We solved the issue updating libwebsockets to 1.7.3 version.
We also optimized the code using a custom callback called when the channel is writable
void
WSManager::onWritable() {
int ret, n;
struct fragment *frg;
pthread_mutex_lock(&send_queue_mutex);
if (!send_queue.empty() && !lws_partial_buffered(wsi)) {
frg = send_queue.front();
n = lws_write(wsi, frg->content + LWS_PRE, frg->len, (lws_write_protocol)frg->mode);
ret = checkWsWrite(n, frg->len);
if (ret >= 0 && !lws_partial_buffered(wsi)) {
if (frg->mode == WS_SINGLE_FRAGMENT || frg->mode == WS_LAST_FRAGMENT)
signalResponseSent();
// pop fragment and free memory only if lws_write was successful
send_queue.pop();
delete(frg);
}
}
pthread_mutex_unlock(&send_queue_mutex);
}

How to limit the number of threads which perform an action in C++ AMP

I am performing a series of calculations on a large number of threads using C++ AMP. The last step of the calculation though is to prune the result but only for a limited number of threads. For example, if the result of the calculation is below a threshold, then set the result to 0 BUT only do this for a maximum of X threads. Essentially this is a shared counter but also a shared conditional check.
Any help is appreciated!
My understanding of your question is the following pseudo-code performed by each thread:
auto result = ...
if(result < global_threshold) // if the result of the calculation is below a threshold
if(global_counter++ < global_max) // for a maximum of X threads
result = 0; // then set the result to 0
store(result);
I then further assume that both global_threshold and global_max does not change during the computation (i.e. between parallel_for_each start and finish) - so the most elegant way to pass them is through lambda capture.
On the other hand, global_counter clearly changes value, so it must be located in modifiable memory shared across all threads, effectively being array<T,N> or array_view<T,N>. Since the threads incrementing this object are not synchronized, the operation would need to be performed using atomic operation.
The above translates to the following C++ AMP code (I'm using Visual Studio 2013 syntax, but it is easily back-portable to Visual Studio 2012):
std::vector<int> result_storage(1024);
array_view<int> av_result{ result_storage };
int global_counter_storage[1] = { 0 };
array_view<int> global_counter{ global_counter_storage };
int global_threshold = 42;
int global_max = 3;
parallel_for_each(av_result.extent, [=](index<1> idx) restrict(amp)
{
int result = (idx[0] % 50) + 1; // 1 .. 50
if(result < global_threshold)
{
// assuming less than INT_MAX threads will enter here
if(atomic_fetch_inc(&global_counter[0]) < global_max)
{
result = 0;
}
}
av_result[idx] = result;
});
av_result.synchronize();
auto zeros = count(begin(result_storage), end(result_storage), 0);
std::cout << "Total number of zeros in results: " << zeros << std::endl
<< "Total number of threads lower than threshold: " << global_counter[0]
<< std::endl;

UDP datagram counting

Hi I have a server communicating with clients over UDP. Basically clients streams to the server UDP packets. Each packet consist from header and payload. in header, there is only one short int - i call it seqnum, running from 0 to SHORT_MAX. When client reaches SHORT_MAX on sending, it starts again from 0.
On the server I need to reconstruct stream in this way:
a) If packet arrives with expected seqnum append to the stream.
b) If packet arrives with lower than expected seqnum - drop it - it is packet which arrived to late
c) If packet arrives with higher seqnum than expected - consider packets between expected and actual seqnum as lost and try to reconstruct them and after that append actual packet
I am dealing now with two problems connected to overflow of counter:
1) how to detect situation c) on SHORT_MAX boundary (e.g. expected is SHORT_MAX-2, actual seqnum in packet is 2) - in my scenario it would be wrongly detected as situation b)
2) same problem with situation b) wrongly detected as c)
Thx a lot
Assuming that SHORT_MAX actually means SHRT_MAX, then you have some 30000 or so packets that are missing if you find scenario 2. Which probably means you can't reconstruct anyway, and the link has been going wrong for quite some time. You may solve that by having a "timeout" (e.g. if no correct packet has been received in X seconds, give up and start over at some suitable point - or whatever you can do if LOTS of packets have gone missing - you can of course test this by unplugging the cable or something like that).
And you can detect "wraparound" by doing some modulo math.
#include <iostream>
#include <algorithm>
using namespace std;
#define MAX_SEQ_NO 16
#define THRESHOLD 6 // Max number of "missing packets" that is acceptable
void check_seq_no(int seq_no)
{
static int expected = 0;
cout << "Got seq_no=" << seq_no << " expected=" << expected << endl;
if (seq_no == expected)
{
expected = (expected + 1) % MAX_SEQ_NO;
}
else
{
if ((seq_no + THRESHOLD) % MAX_SEQ_NO > (expected + THRESHOLD) % MAX_SEQ_NO)
{
int missing;
if (seq_no > expected)
{
missing = seq_no - expected;
}
else
{
missing = MAX_SEQ_NO + seq_no - expected;
}
cout << "Packets missing ..." << missing << endl;
expected = (seq_no+1) % MAX_SEQ_NO;
}
else
{
cout << "Old packet received ... " << endl;
}
}
}
int main()
{
int seq_no = 0;
bool in_sim = false;
int old_seq_no = 0;
for(;;)
{
int r = rand() % 50;
if (!in_sim)
{
old_seq_no = seq_no;
switch(r)
{
// Low number: Resend an older packet
case 4:
seq_no --;
case 3:
seq_no --;
case 2:
seq_no --;
case 1:
seq_no --;
in_sim = true;
break;
// High number: "lose" a packet or four.
case 46:
seq_no++;
case 47:
seq_no++;
case 48:
seq_no++;
case 49:
seq_no++;
in_sim = true;
break;
default:
break;
}
if (old_seq_no > seq_no)
{
cout << "Simulating resend of old packets: " << old_seq_no - seq_no << endl;
}
else if (old_seq_no < seq_no)
{
cout << "Simulating missing packets: " << seq_no - old_seq_no << endl;
old_seq_no = seq_no;
}
}
if (old_seq_no == seq_no)
{
in_sim = false;
}
check_seq_no(seq_no % MAX_SEQ_NO);
seq_no++;
}
}
I would suggest that anytime you get a packet within some near threshold of SHORT_MAX, and it is within a similar threshold of 0, you consider it as a candidate for reconstruction. You can also compensate somewhat for the loss of packets by building a sane acknowledgment system between the client and server. Think of this less as a "reconstruction" problem as it is a prioritization problem wherein you must discard "old" or possibly re-transmitted data.
Another strategy I've used in the past is to define channels with (potentially) different configurations in terms of thresholds, ACKing, and overall reliability. In the ideal world you'll have packets that are 100% reliable (TCP-style, guaranteed in-order delivery) and packets that are 100% unreliable, and then possibly streams that are somewhere in between -- a good UDP-based protocol will support these. This gives your application code more direct control over the protocol's algorithms, which is the real point of UDP and where it shines for applications like games and video.
You will likely find that often you are re-implementing bits of TCP, and you may even consider using TCP for your 100% reliable-channel -- it's worth noting that TCP is often given preference by backbones, because they know they will eventually have to re-transmit those packets, if they don't make it through on this trip.
Why don't use an unsigned short, you get twice as much. For those situations that you comment, you need to specify a threshold. It's a dilemma you need to face, I mean, if you're expecting packet 30 and receive packet 60, it's a correct packet or an old lost one. That's why you need to put a threshold
For example;
If (NumberReceived < NumerberExpected)
{
threshold = (USHRT_MAX - NumerberExpected) + NumberReceived ;
// in here you have to decided how many is the threshold 10, 20, 50 …
if (threshold < 10) It is a correct packet and has started over
else It is a lost packet
}
else
If (NumberReceived > NumerberExpected)
{
threshold = NumerberReceived - NumberExpected ;
// in here you have to decided how many is the threshold 10, 20, 50 …
if (threshold < 10) It is a correct packet and I've lost some packets
else It is a lost packet;
}
else It is a correct packet