MIDI Note Counting Program producing incorrect results - c++

I wrote a program in c++ that counts the number of notes in a .mid file, outputs the number of each note (A to G#), then outputs that to a file. It doesn't find enough notes, but I can't figure out why. I've built it based on the MIDI file documentation from midi.org.
While reading the file, all it does is look for the status byte for note on, 1001nnnn, and then reads the next byte as a note. I used Anvil Studio to make a MIDI file with only 1 note, and used the program to analyze it and it found that it only had 1 note which is correct, however when I use it on much larger files (2000+ notes), it won't find nearly all of them, and sometimes it will find that 90%+ of the notes are one or two pitches.
This is the segment of the program that searches for notes. The file is open in byte mode with ios::binary
//Loop through every byte of the file
for (int i = 0; i <= size; i++) {
//Read next byte of file to tempblock
myfile.read(tempblock, 1);
//Put int value of byte in b
int b = tempblock[0];
//x = first 4 binary digits of b, appended with 0000
unsigned int x = b & 0xF0;
//If note is next, set the next empty space of notearray to the notes value, and then set notenext to 0
if (notenext) {
myfile.read(tempblock, 1);
int c = tempblock[0];
i++;
//Add the note to notearray if the velocity data byte is not set to 0
if (c != 0) {
notearray[notecount] = b;
notenext = 0;
notecount++;
}
}
//If note is not next, and x is 144 (int of 10010000, status byte for MIDI "Note on" message), set note next to true
else if (x == 144) {
notenext = 1;
}
}
Does anyone know whats going on? Am I just missing a component of the file type, or could it be a problem with the files I'm using? I am primarily looking at classical piano pieces, downloaded from midi repositories

Channel message status bytes can be omitted when they are identical with the last one; this is called running status.
Furthermore, 1001nnnn bytes can occur inside delta time values.
You have to correctly parse all messages to be able to detect notes.

The problem is very likely to be how your MIDI editor is creating the files. Many MIDI editors do not actually switch notes off - they just set their velocities to 0. This can make them a royal pain to parse.
Have a look at the raw MIDI messages contained in the file and you should see a lot of velocity messages.

Related

Index large txt file

I have a large file (500 million records).
The file is two columns(tab delimited) as follows:
1 4590
3 1390
4 4590
5 4285
7 8902
8 9000
...
All values in first column are ordered numerically (but with gaps e.g: 1 and then 3 and than 4...).
I would like to index that file to be able to access the value on column2 based on value from column 1 (that i will call key)
For example if i submit 8 it should return 9000.
I have started by creating an index as follows:
// Record each entry into a structure
struct Record{
int gi; //first column
int taxa; //second column
};
Record buffer;
ofstream BinaryFile("large_file_indexed.bin", ios::binary);
ifstream inputFile("infile.dat");
//Write to binary file
while( inputFile.good() ){
inputFile >> buffer.gi >> buffer.taxa;
BinaryFile.write( (char *) &buffer, sizeof(Record) );
}
BinaryFile.close();
Ok, what i´m doing above is just creating an binary index file for entries and save it to a binary file. This is working as expected.
The problem comes now, and since i´m not an expert i would appreciate your advice.
The idea is to read the binary file and get a specific record
//Read binary file
ifstream ReadBinary("large_file_indexed.bin, ios::binary );
int idx = 8 ; // Which key do we search for?
while(!ReadBinary.eof())
{
ReadBinary.read( (char *) &buffer, sizeof(Record));
if(idx == buffer.gi) // If we find key return corresponding value
{
cout << "Found key " << buffer.gi << " Taxa:" << buffer.taxa << endl;
break;
}
}
This returns the expected value. Since we are asking for value corresponding to key 8 it returns 9000.
The thing is that it still too long to get the value and i was wondering how can i be faster. If i use seekg and can get a specific index but i don´t know which index (position) corresponds to the key we want. So in other words can i directly jump to the position where the key is and get the corrsponding value. I´m confused on how to get the position for a particular key and jump to the corresponding position in the binary file. Maybe i should index my input file differently or i´m missing something ?
Thanks for your comments.
If you can't use a database or a b-tree library, and don't want to invest in developing yet another b-tree library, you could consider one of the two following approaches.
Both assume that the binary index file is sorted, and take advantage of the fixed size record.
1.Simple heuristic approach
If there would be no gap, to find the n-th record (numbering starting at one) you would do:
if (ReadBinary.seekg(sizeof(Record)*(n-1))
&& ReadBinary.read( (char*)&buffer, sizeof(Record))) {
// process record
}
else {
// record not found (certainly beyond eof)
}
But you can have gaps. This means, if there's no duplicate, the element n would be at this position or before. So just read and rewind as long as necessary:
if (! ReadBinary.seekg(sizeof(Record)*(n-1))) { // try to position
ReadBinary.clear(); // if couldn't position
ReadBinary.seekg(-sizeof(Record), ios_base::end); // go to last record
}
while (ReadBinary.read( (char*)&buffer, sizeof(Record)) && buffer.gi>n ) {
ReadBinary.seekg (-2*sizeof(Record), ios_base::cur);
}
if (ReadBinary && buffer.gi==n) {
// record found
}
else {
// record not found
}
2.Dichotomic approach
Of course, if you have many gaps this heuristic approach will quickly become too slow, as the number searched for increase.
You could therefore opt for a dichotomic search (aka binary search): with seekg() go to the end of the file and use tellg() to know the size of the file, that you could translate into number of records.
Cut the number into two, position on the record in the middle, read it, look if the searched number would be smaller or bigger than the number read, and restart with the new bounds of the search until you find the right position. The same principle you would use to search in an array.
This is very efficient, as you need only at most log(n)/log(2) reads to find any number. So for any of the 500 000 000 numbers, you'd need at most 29 reads !
3.Conclusions
Of course there are other feasible approaches as well. But in the end, this is already pretty good even if it would be outperformed by any database or a good crafted b-tree library, because b-trees reduce disk head movement by an astute regrouping of nodes into blocks that are optimized to be read at once with a minimal disk overhead. This reduces the number of disk access to log(n)/log(b) where b is the number of nodes in a block. For example if b=10, searching the 500 000 000 elements would require at most 9 reads from disk.

how to parse stream data(string) to different data files

#everyone, I have some problem in reading data form IMU recently.
Below is the data I got from My device, it is ASCII, all are chars,and my data size is [122], which is really big, I need convert them to short, and then float, but I dont know why and how.....
unsigned char data[33];
short x,y,z;
float x_fl,y_fl,z_fl,t_fl;
float bias[3]={0,0,0};//array initialization
unsigned char sum_data=0;
int batch=0;
if ( !PurgeComm(file,PURGE_RXCLEAR ))
cout << "Clearing RX Buffer Error" << endl;//this if two sentence aim to clean the buffer
//---------------- read data from IMU ----------------------
do { ReadFile(file,&data_check,1,&read,NULL);
//if ((data_check==0x026))
{ ReadFile(file,&data,33,&read,NULL); }
/// Wx Values
{
x=(data[8]<<8)+data[9];
x_fl=(float)6.8664e-3*x;
bias[0]+=(float)x_fl;
}
/// Wy Values
{
y=(data[10]<<8)+data[11];
y_fl=(float)6.8664e-3*y;
bias[1]+=(float)y_fl;
}
/// Wz Values
{
z=(data[12]<<8)+data[13];
z_fl=(float)6.8664e-3*z;
bias[2]+=(float)z_fl;
}
batch++;
}while(batch<NUM_BATCH_BIAS);
$VNYMR,+049.320,-017.922,-024.946,+00.2829,-00.2734,+00.2735,-02.961,+03.858,-08.325,-00.001267,+00.000213,-00.001214*64
$VNYMR,+049.322,-017.922,-024.948,+00.2829,-00.2714,+00.2735,-02.958,+03.870,-08.323,+00.004923,-00.000783,+00.000290*65
$VNYMR,+049.321,-017.922,-024.949,+00.2821,-00.2655,+00.2724,-02.984,+03.883,-08.321,+00.000648,-00.000391,-00.000485*61
$VNYMR,+049.320,-017.922,-024.947,+00.2830,-00.2665,+00.2756,-02.983,+03.874,-08.347,-00.003416,+00.000437,+00.000252*6C
$VNYMR,+049.323,-017.921,-024.947,+00.2837,-00.2773,+00.2714,-02.955,+03.880,-08.326,+00.002570,-00.001066,+00.000690*67
$VNYMR,+049.325,-017.922,-024.948,+00.2847,-00.2715,+00.2692,-02.944,+03.875,-08.344,-00.002550,+00.000638,+00.000022*6A
$VNYMR,+049.326,-017.921,-024.945,+00.2848,-00.2666,+00.2713,-02.959,+03.876,-08.309,+00.002084,+00.000449,+00.000667*6A
all I want to do is:
extract last 6 numbers separated by commas, btw, I don't need the last 3 chars(like *66).
Save the extracted data to 6 .dat files.
What is the best way to do this?
Since I got this raw data from IMU, and I need the last 6 data, which are accelerations(x,y,z) and gyros(x,y,z).
If someone could tell me how to set a counter to the end of each data stream, that will be perfect, because I need the time stamp of IMU also.
Last word is I am doing data acquisition under windows, c++.
Hope someone could help me, I am freaking out because of so much things to do and that's really annoying!!
There's a whole family of scanf functions (fscanf, sscanf and some "secure" ones).
Assuming you have read a line into a string:-
sscanf( s, "VNYMR,%*f,%*f,%*f,%*f,%*f,%*f,%f,%f,%f,%f,%f,%f", &accX, &accY, &accZ, &gyroX, &gyroY, &gyroZ )
And assuming I have counted correctly! This will verify that the literal $VNYMR is there, followed by about five floats that you don't assign and finally the six that you care about. &accaX, etc are the addresses of your floats. Test the result - the number of assignments made..

String parsing to extract int in C++ for Arduino

I'm trying to write a sketch that allows a user to access data in EEPROM using the serial monitor. In the serial monitor the user should be able to type one of two commands: “read” and “write. "Read" should take one argument, an EEPROM address. "Write" should take two arguments, an EEPROM address and a value. For example, if the user types “read 7” then the contents of EEPROM address 7 should be printed to the serial monitor. If the user types “write 7 12” then the value 12 should be written into address 7 of the EEPROM. Any help is much appreciated. I'm not an expert in Arudino, still learning ;). In the code below I defined inByte to be the serail.read(). Now how do I extract numbers from the string "inByte" to assign to "val" and "addr"
void loop() {
String inByte;
if (Serial.available() > 0) {
// get incoming byte:
inByte = Serial.read();
}
if (inByte.startsWith("Write")) {
EEPROM.write(addr, val);
}
if (inByte.startsWith("Read")) {
val= EEPROM.read(addr);
}
delay(500);
}
Serial.read() only reads a single character. You should loop until no more input while filling your buffer or use a blocking function like Serial.readStringUntil() or Serial.readBytes() to fill a buffer for you.
https://www.arduino.cc/en/Serial/ReadStringUntil
https://www.arduino.cc/en/Serial/ReadBytes
Or you can use Serial.parseInt() twice to grab the two values directly into a pair of integers. This function will skip the non numerical text and grab the values. This method is also blocking.
https://www.arduino.cc/en/Reference/StreamParseInt
A patch I wrote to improve this function is available in the latest hourly build, but the old versions still work fine for simple numbers with the previous IDE's
The blocking methods can be tweaked using Serial.setTimeout() to change how long they wait for input (1000ms default)
https://www.arduino.cc/en/Serial/SetTimeout
[missed the other answer, there's half my answer gone]
I was going to say use Serial.readStringUntil('\n') in order to read a line at a time.
To address the part:
how do I extract numbers from the string "inByte" to assign to "val" and "addr"
This is less trivial than it might seem and a lot of things can go wrong. For simplicity, let's assume the input string is always in the format /^(Read|Write) (\d+)( \d+)?$/.
A simple way to parse it would be to find the spaces, isolate the number strings and call .toInt().
...
int val, addr;
int addrStart = 0;
while(inByte[addrStart] != ' ' && addrStart < inByte.length())
addrStart++;
addrStart++; //skip the space
int addrEnd = addrStart + 1;
while(inByte[addrEnd] != ' ' && addrEnd < inByte.length())
addrEnd++;
String addrStr = inByte.substring(addrStart, addrEnd); //excludes addrEnd
addr = addrStr.toInt();
if (inByte.startsWith("Write")) {
int valEnd = addrEnd+1;
while(inByte[varEnd] != ' ' && varEnd < inByte.length())
valEnd++;
String valStr = inByte.substring(addrEnd+1, valEnd);
val = valStr.toInt();
EEPROM.write(addr, val);
}
else if (inByte.startsWith("Read")) {
val = EEPROM.read(addr);
}
This can fail in all sorts of horrible ways if the input string has a double space or the numbers are malformed, or has any other subtle error.
If you're concerned with correctness, I suggest you look into a regex library, or even an standard format such as JSON - see ArduinoJson.

Reading using fstream

I am using fstream to read a binary file, but strangely I get different values for the same input file each time I execute the code.
if(fs->is_open())
{
while (!fs->eof())
{
fs->seekg( pos );
fs->read( (char *)&mdfHeader, sizeof(mdfHeader_t) );
pos += mdfHeader.length;
fs->read( (char *)&eventHeader, sizeof(eventHeader_t) );
fs->read( (char *)&rawHeader, sizeof(rawHeader_t) );
fs->read( (char *)&ingressHeader, sizeof(ingressHeader_t) );
fs->read( (char *)&l1Header_xc0, sizeof(l1Header_xc0_t) );
fs->read(data, dataLength);
printf("Data=%#x\n",data);
std::cout << "counter: " << c << "\n";
c++;
}
fs->close();
}
As you can see, I print out data, which should be the same each time, but yields a different value. mdfHeader.length is the length of one block of data.
The first things to change are:
The condition eof() is only really useful to determine why reading data failed but it isn't a useful condition for a loop.
You need to check after reading that you successfully read the data you are interested in.
That, the loop would look something like this:
while (*fs) {
// read data from fs
if (*fs) {
// do something with the data
}
else if (!fs->eof()) {
std::cout << "ERROR: failed to read record\n";
}
}
I'd also guess that you don't need the seeks and it is a good idea to get rid of them: seeking is relatively expensive because it looses any buffer. You didn't show the entire code but the initial value of pos has a fair chance to provide some level of randomness. Also, you assume that the sequence of bytes you are reading matches how the data is laid out in your computer. Typically, that isn't the case and you generally need to adjust the binary format, e.g., to accommodate different sizes of words, different endianess, padding, etc.
Computer is like mathematics, every thing is certain(even for functions like rand if input be the same, the output is also same as before) So if you run a code a hundred time with same input and state you will certainly get same output, unless input or running state changed.
You say that input is same each time you execute the code, so only thing that is changed is running state( for example malloc may return 2 different value each time that you run the program, because it may work in different state, because its state will be indicated by the OS ).
In your code you use printf("Data=%#x\n",data); to output your data, but it actually just print address of data as HEX value, so it is very natural that in multiple runs of the program this address may changed because OS map your executive to different positions or anything else. You should output content of the data and you will see that it will be same as previous run

Binary file only overwrites first line C++

So I have a binary file that I create and initialize. If I set my pointer to seekg = 0 or seekp = 0, then I can overwrite the line of text fine. However if I jump ahead 26 bytes (the size of one line of my file and something I have certainly confirmed), it refuses to overwrite. Instead it just adds it before the binary data and pushes the old data further onto the line. I want the data completely overwritten.
char space1[2] = { ',' , ' '};
int main()
{
CarHashFile lead;
lead.createFile(8, cout);
fstream in;
char* tempS;
tempS = new char[25];
in.open("CarHash.dat", ios::binary | ios::in | ios::out);
int x = 2000;
for(int i = 0; i < 6; i++)
tempS[i] = 'a';
int T = 30;
in.seekp(26); //Start of second line
in.write(tempS, 6); //Will not delete anything, will push
in.write(space1, sizeof(space1)); //contents back
in.write((char *)(&T), sizeof(T));
in.write(space1, sizeof(space1));
in.write(tempS,6);
in.write(space1, sizeof(space1));
in.write((char *)&x, sizeof(x));
//Now we will use seekp(0) and write to the first line
//it WILL overwrite the first line perfectly fine
in.seekp(0);
in.write(tempS, 6);
in.write((char*) &x, sizeof(x));
in.write(tempS, 6);
in.write((char *) &T, sizeof(T));
return 0;
}
The CarHashFile is an outside class that creates a binary file full of the following contents when create file is invoked: "Free, " 1900 ", Black, $" 0.00f.
Everything enclosed in quotes was added as a string, 1900 as an int, and 0.00f as a float obviously. I added all of these through write, so I'm pretty sure it's an actual binary file, I just don't know why it only chooses to write over the first line. I know the file size is correct because if I set seekp = 26 it will print at the beginning of the second line and push it down. space was created to easily add the ", " combo to the file, there is also a char dol[1] = '$' array for simplicity and a char nl[1] = '\n' that lets me add a new line to the binary file (just tried removing that binary add and it forced everything onto one row, so afaik, its needed).
EDIT: Ok so, it was erasing the line all along, it just wasn't putting in a new line (kind of embarrassing). But now I can't figure out how to insert a newline into the file. I tried writing it the way I originally did with char nl[1] = { '\n' }. That worked when I first created the file, but won't afterwards. Are there any other ways to add lines? I also tried in << endl and got nothing.
I suggest taking this one step at a time. the code looks OK to me, but lack of error checking will mean any behavior could be happening.
Add error checks and reporting to all operations on in.
If that shows no issues, do a simple seek then write
result = in.pseek(26);
//print result
result = in.write("Hello World",10);
// print result
in.close();
lets know what happens
The end problem wasn't my understand of file streams. It was my lack of understanding of binary files. The newline screwed everything up royally, and while it could be added fine at one point in time, dealing with it later was a huge hassle. Once I removed that, everything else fell into place just fine. And the reason a lot of error checking or lack of closing files is there is because its just driver code. Its as bare bones as possible, I really didn't care what happened to the file at that point in time and I knew it was being opened. Why waste my time? The final version has error checks, when the main program was rewritten. And like I said, what I didn't get was binary files, not file streams. So AJ's response wasn't very useful, at all. And I had to have 25 characters as part of the assignment, no name is 25 characters long, so it gets filled up with junk. Its a byproduct of the project, nothing I can do about it, other than try and fill it with spaces, which just takes more time than skipping ahead and writing from there. So I chose to write what would probably be the average name (8 chars) and then just jump ahead 25 afterwards. The only real solution I could say that was given here was from Emile, who told me to get a Hex Editor. THAT really helped. Thanks for your time.