Why Windows C++ muti-threading IOPS is much faster than IOMeter? - c++

I have a SSD and I am trying to use it to simulate my program I/O performance, however, IOPS calculated from my program is much much faster than IOMeter.
My SSD is PLEXTOR PX-128M3S, by IOMeter, its max 512B random read IOPS is around 94k (queue depth is 32).
However my program (32 windows threads) can reach around 500k 512B IOPS, around 5 times of IOMeter! I did data validation but didn't find any error in data fetching. It's because my data fetching in order?
I paste my code belwo (it mainly fetch 512B from file and release it; I did use 4bytes (an int) to validate program logic and didn't find problem), can anybody help me figure out where I am wrong?
Thanks so much in advance!!
#include <stdio.h>
#include <Windows.h>
//Global variables
long completeIOs = 0;
long completeBytes = 0;
int threadCount = 32;
unsigned long long length = 1073741824; //4G test file
int interval = 1024;
int resultArrayLen = 320000;
int *result = new int[resultArrayLen];
//Method declarison
double GetSecs(void); //Calculate out duration
int InitPool(long long,char*,int); //Initialize test data for testing, if successful, return 1; otherwise, return a non 1 value.
int * FileRead(char * path);
unsigned int DataVerification(int*, int sampleItem); //Verify data fetched from pool
int main()
{
int sampleItem = 0x1;
char * fPath = "G:\\workspace\\4G.bin";
unsigned int invalidIO = 0;
if (InitPool(length,fPath,sampleItem)!= 1)
printf("File write err... \n");
//start do random I/Os from initialized file
double start = GetSecs();
int * fetchResult = FileRead(fPath);
double end = GetSecs();
printf("File read IOPS is %.4f per second.. \n",completeIOs/(end - start));
//start data validation, for 4 bytes fetch only
// invalidIO = DataVerification(fetchResult,sampleItem);
// if (invalidIO !=0)
// {
// printf("Total invalid data fetch IOs are %d", invalidIO);
// }
return 0;
}
int InitPool(long long length, char* path, int sample)
{
printf("Start initializing test data ... \n");
FILE * fp = fopen(path,"wb");
if (fp == NULL)
{
printf("file open err... \n");
exit (-1);
}
else //initialize file for testing
{
fseek(fp,0L,SEEK_SET);
for (int i=0; i<length; i++)
{
fwrite(&sample,sizeof(int),1,fp);
}
fclose(fp);
fp = NULL;
printf("Data initialization is complete...\n");
return 1;
}
}
double GetSecs(void)
{
LARGE_INTEGER frequency;
LARGE_INTEGER start;
if(! QueryPerformanceFrequency(&frequency))
printf("QueryPerformanceFrequency Failed\n");
if(! QueryPerformanceCounter(&start))
printf("QueryPerformanceCounter Failed\n");
return ((double)start.QuadPart/(double)frequency.QuadPart);
}
class input
{
public:
char *path;
int starting;
input (int st, char * filePath):starting(st),path(filePath){}
};
//Workers
DWORD WINAPI FileReadThreadEntry(LPVOID lpThreadParameter)
{
input * in = (input*) lpThreadParameter;
char* path = in->path;
FILE * fp = fopen(path,"rb");
int sPos = in->starting;
// int * result = in->r;
if(fp != NULL)
{
fpos_t pos;
for (int i=0; i<resultArrayLen/threadCount;i++)
{
pos = i * interval;
fsetpos(fp,&pos);
//For 512 bytes fetch each time
unsigned char *c =new unsigned char [512];
if (fread(c,512,1,fp) ==1)
{
InterlockedIncrement(&completeIOs);
delete c;
}
//For 4 bytes fetch each time
/*if (fread(&result[sPos + i],sizeof(int),1,fp) ==1)
{
InterlockedIncrement(&completeIOs);
}*/
else
{
printf("file read err...\n");
exit(-1);
}
}
fclose(fp);
fp = NULL;
}
else
{
printf("File open err... \n");
exit(-1);
}
}
int * FileRead(char * p)
{
printf("Starting reading file ... \n");
HANDLE mWorkThread[256]; //max 256 threads
completeIOs = 0;
int slice = int (resultArrayLen/threadCount);
for(int i = 0; i < threadCount; i++)
{
mWorkThread[i] = CreateThread(
NULL,
0,
FileReadThreadEntry,
(LPVOID)(new input(i*slice,p)),
0,
NULL);
}
WaitForMultipleObjects(threadCount, mWorkThread, TRUE, INFINITE);
printf("File read complete... \n");
return result;
}
unsigned int DataVerification(int* result, int sampleItem)
{
unsigned int invalid = 0;
for (int i=0; i< resultArrayLen/interval;i++)
{
if (result[i]!=sampleItem)
{
invalid ++;
continue;
}
}
return invalid;
}

I didn't look in enough detail to be certain, but I didn't see any code there to flush the data to the disk and/or ensure your reads actually came from the disk. That being the case, it appears that what you're measuring is primarily the performance of the operating system's disk caching. While the disk might contribute a little to the performance you're measuring, it's probably only a small contributor, with other factors dominating.
Since the code is apparently written for Windows, you might consider (for one example) opening the file with CreateFile, and passing the FILE_FLAG_NO_BUFFERING flag when you do so. This will (at least mostly) remove the operating system cache from the equation, and force each read or write to deal directly with the disk itself.

Related

Why is msgrcv() feeding garbage characters into the buffer?

right now, I am currently trying to output the contents of buf.mtext so I can make sure take the correct input before moving on with my program. Everything seems to work fine, except one thing; msgrcv() puts garbage characters into the buffer, and the reciever process outputs garbage characters.
Here is my sender process:
int main (void)
{
int i; // loop counter
int status_01; // result status
int msqid_01; // message queue ID (#1)
key_t msgkey_01; // message-queue key (#1)
unsigned int rand_num;
float temp_rand;
unsigned char eight_bit_num;
unsigned char counter = 0;
unsigned char even_counter = 0;
unsigned char odd_counter = 0;
srand(time(0));
struct message {
long mtype;
char mtext[BUFFER_SIZE];
} buf_01;
msgkey_01 = MSG_key_01; // defined at top of file
msqid_01 = msgget(msgkey_01, 0666 | IPC_CREAT)
if ((msqid_01 <= -1) { exit(1); }
/* wait for a key stroke at the keyboard ---- */
eight_bit_num = getchar();
buf_01.mtype = 1;
/* send one eight-bit number, one at a time ------------ */
for (i = 0; i < NUM_REPEATS; i++)
{
temp_rand = ((float)rand()/(float)RAND_MAX)*255.0;
rand_num = (int)temp_rand;
eight_bit_num = (unsigned char)rand_num;
if ((eight_bit_num % 2) == 0)
{
printf("Even number: %d\n", eight_bit_num);
even_counter = even_counter + eight_bit_num;
}
else
{
printf("Odd number: %d\n", eight_bit_num);
odd_counter = odd_counter + eight_bit_num;
}
/* update the counters ------------------------------ */
counter = counter + eight_bit_num;
if((eight_bit_num % 2) == 0) { even_counter = even_counter + eight_bit_num; }
else { odd_counter = odd_counter + eight_bit_num; }
buf_01.mtext[0] = eight_bit_num; // copy the 8-bit number
buf_01.mtext[1] = '\0'; // null-terminate it
status_01 = msgsnd(msqid_01, (struct msgbuf *)&buf_01, sizeof(buf_01.mtext), 0);
status_01 = msgctl(msqid_01, IPC_RMID, NULL);
}
Here is my receiver process:
int main() {
struct message {
long mtype;
char mtext[BUFFER_SIZE];
} buf;
int msqid;
key_t msgkey;
msgkey = MSG_key_01;
msqid = msgget(msgkey, 0666); // connect to message queue
if (msqid < 0) {
printf("Failed\n");
exit(1);
}
else {
printf("Connected\n");
}
if (msgrcv(msqid, &buf, BUFFER_SIZE, 0, 0) < 0) { // read message into buf
perror("msgrcv");
exit(1);
}
printf("Data received is: %s \n", buf.mtext);
printf("Done receiving messages.\n");
return 0;
}
The output is usually something like as follows:
Data received is: ▒
Done receiving messages.
I have made sure to clear my message queues each time after running the sender and receiver processes, as well, since I have come to find out this can cause issues. Thanks in advance for your help.
Turns out neither of the suggested solutions were the issue, as I suspected; the sender process actually works just fine. The problem was that I was trying to print buf.mtext instead of buf.mtext[0] which isn't an actual integer value. I fixed the issue by just doing this:
int temp_num = buf.mtext[0];
printf("Data recieved is %d \n", temp_num);

Write/Read a stream of data (double) using named pipes in C++

I am trying to develop a little application in C++, within a Linux environment, which does the following:
1) gets a data stream (a series of arrays of doubles) from the output of a 'black-box' and writes it to a pipe. The black-box can be thought as an ADC;
2) reads the data stream from the pipe and feeds it to another application which requires these data as stdin;
Unfortunately, I was not able to find tutorials or examples. The best way I found to realize this is summarized in the following test-bench example:
#include <iostream>
#include <fcntl.h>
#include <sys/stat.h>
#include <stdio.h>
#define FIFO "/tmp/data"
using namespace std;
int main() {
int fd;
int res = mkfifo(FIFO,0777);
float *writer = new float[10];
float *buffer = new float[10];
if( res == 0 ) {
cout<<"FIFO created"<<endl;
int fres = fork();
if( fres == -1 ) {
// throw an error
}
if( fres == 0 )
{
fd = open(FIFO, O_WRONLY);
int idx = 1;
while( idx <= 10) {
for(int i=0; i<10; i++) writer[i]=1*idx;
write(fd, writer, sizeof(writer)*10);
}
close(fd);
}
else
{
fd = open(FIFO, O_RDONLY);
while(1) {
read(fd, buffer, sizeof(buffer)*10);
for(int i=0; i<10; i++) printf("buf: %f",buffer[i]);
cout<<"\n"<<endl;
}
close(fd);
}
}
delete[] writer;
delete[] buffer;
}
The problem is that, by running this example, I do not get a printout of all the 10 arrays I am feeding to the pipe, whereas I keep getting always the first array (filled by 1).
Any suggestion/correction/reference is very welcome to make it work and learn more about the behavior of pipes.
EDIT:
Sorry guys! I found a very trivial error in my code: in the while loop within the writer part, I am not incrementing the index idx......once I correct it, I get the printout of all the arrays.
But now I am facing another problem: when using a lot of large arrays, these are randomly printed out (the whole sequence is not printed); as if the reader part is not able to cope with the speed of the writer. Here is the new sample code:
#include <iostream>
#include <fcntl.h>
#include <sys/stat.h>
#include <stdio.h>
#define FIFO "/tmp/data"
using namespace std;
int main(int argc, char** argv) {
int fd;
int res = mkfifo(FIFO,0777);
int N(1000);
float writer[N];
float buffer[N];
if( res == 0 ) {
cout<<"FIFO created"<<endl;
int fres = fork();
if( fres == -1 ) {
// throw an error
}
if( fres == 0 )
{
fd = open(FIFO, O_WRONLY | O_NONBLOCK);
int idx = 1;
while( idx <= 1000 ) {
for(int i=0; i<N; i++) writer[i]=1*idx;
write(fd, &writer, sizeof(float)*N);
idx++;
}
close(fd);
unlink(FIFO);
}
else
{
fd = open(FIFO, O_RDONLY);
while(1) {
int res = read(fd, &buffer, sizeof(float)*N);
if( res == 0 ) break;
for(int i=0; i<N; i++) printf(" buf: %f",buffer[i]);
cout<<"\n"<<endl;
}
close(fd);
}
}
}
Is there some mechanism to implement in order to make the write() wait until read() is still reading data from the fifo, or am I missing something trivial also in this case?
Thank you for those who have already given answers to the previous version of my question, I have implemented the suggestions.
The arguments to read and write are incorrect. Correct ones:
write(fd, writer, 10 * sizeof *writer);
read(fd, buffer, 10 * sizeof *buffer);
Also, these functions may do partial reads/writes, so that the code needs to check the return values to determine whether the operation must be continued.
Not sure why while( idx <= 10) loop in the writer, this loop never ends. Even on a 5GHz CPU. Same comment for the reader.

Sub-functions to send and receive string over socket

I assume that for messages that are of only 1 byte (a char), I will use read() and write() directly.
For those messages having size > 1 bytes, I use two subfunctions to read and write them over sockets.
For example, I have the server construct a string called strcities (list of city) and print it out --> nothing strange. Then send the number of bytes of this string to the client, and then the actual string.
The client will first read the number of bytes, then the actual city list.
For some reason my code sometimes work and sometimes doesn't. If it works, it also prints out some extra characters that I have no idea where they come from. If it doesn't, it hangs and forever waits in the client, while the server goes back to the top of the loop and wait for next command from the client. Could you please take a look at my codes below and let me know where I did wrong?
Attempt_read
string attempt_read(int rbytes) { // rbytes = number of bytes of message to be read
int count1, bytes_read;
char buffer[rbytes+1];
bool notdone = true;
count1 = read(sd, buffer, rbytes);
while (notdone) {
if (count1 == -1){
perror("Error on write call");
exit(1);
}
else if (count1 < rbytes) {
rbytes = rbytes - count1; // update remaining bytes to be read
count1 = read(sd, buffer, rbytes);
}
else {notdone = false;}
} // end while
string returnme;
returnme = string(buffer);
return returnme;
}
Attempt_write
void attempt_write(string input1, int wbytes) { // wbytes = number of bytes of message
int count1;
bool notdone = true;
count1 = write(sd, input1.c_str(), wbytes);
while (notdone) {
if (count1 == -1){
perror("Error on write call");
exit(1);
}
else if (count1 < wbytes) {
wbytes = wbytes - count1;
count1 = write(sd, input1.c_str(), wbytes);
}
else {notdone = false;}
} // end while
return;
}
1) string class has a method size() that will return the length of the string, so you do not actually need a second attempt_write parameter.
2) You can transfer length of message before message or you can transfer a terminating 0 after, if you only will sent an ASCII strings. Because your connection could terminate at any time, it is better to send exact length before sending the string, so your client could know, what to expect.
3) What compilator do you use, that would allow char buffer[rbytes+1]; ? A standard c++ would require char buffer = new char[rbytes+1]; and corresponding delete to avoid a memory leaks.
4) In your code, the second read function call use same buffer with no adjustment to length, so you, practically, overwrite the already received data and the function will only work, if all data will be received in first function call. Same goes for write function
I would suggest something like this:
void data_read(unsigned char * buffer, int size) {
int readed, total = 0;
do {
readed = read(sd, buffer + total, size - total);
if (-1 == writted) {
perror("Error on read call");
exit(1);
}
total += readed;
} while (total < size);
}
string attempt_read() {
int size = 0;
data_read((unsigned char *) &size, sizeof(int));
string output(size, (char) 0x0);
data_read((unsigned char *) output.c_str(), size);
return output;
}
void data_write(unsigned char * buffer, int size) {
int writted, total = 0;
do {
writted = write(sd, buffer + total, size - total);
if (-1 == writted) {
perror("Error on write call");
exit(1);
}
total += writted;
} while (total < size);
}
void attempt_write(string input) {
int size = input.size();
data_write((unsigned char *) &size, sizeof(int));
data_write((unsigned char *) input.c_str(), size);
}

How do I divide binary data into frames in C++?

I need to read a binary file containing several bytes and divide the contents into frames, each consisting of 535 bytes each. The number of frames present in the file is not known at runtime and thus I need to dynamically allocate memory for them. The code below is a snippet and as you can see, I'm trying to create a pointer to an array of bytes (uint8_t) and then increment into the next frame and so on, in the loop that reads the buffered data into the frames. How do I allocate memory at runtime and is this the best way to do the task? Please let me know if there is a more elegant solution. Also, how I manage the memory?
#include <cstdio>
using namespace std;
long getFileSize(FILE *file)
{
long currentPosition, endPosition;
currentPosition = ftell(file);
fseek(file, 0, 2);
endPosition = ftell(file);
fseek(file, currentPosition, 0);
return endPosition;
}
int main()
{
const char *filePath = "C:\Payload\Untitled.bin";
uint8_t *fileBuffer;
FILE *file = NULL;
if((file = fopen(filePath, "rb")) == NULL)
cout << "Failure. Either the file does not exist or this application lacks sufficient permissions to access it." << endl;
else
cout << "Success. File has been loaded." << endl;
long fileSize = getFileSize(file);
fileBuffer = new uint8_t[fileSize];
fread(fileBuffer, fileSize, 1, file);
uint8_t (*frameBuffer)[535];
for(int i = 0, j = 0; i < fileSize; i++)
{
frameBuffer[j][i] = fileBuffer[i];
if((i % 534) == 0)
{
j++;
}
}
struct frame {
unsigned char bytes[535];
};
std::vector<frame> frames;
Now your loop can simply read a frame and push it into frames. No explicit memory management needed: std::vector does that for you.

Detect disc removal on fwrite in C

I am writing an application to continuously write and read files to a drive (whether it's hard drive or SD card or whatever). I'm writing a certain pattern and then reading it back as verification. I want to immediately output some kind of blaring error as soon as the app fails. Basically we're hitting the hardware with radiation and need to detect when it fails. I have the app reading & writing the files just fine so far, but can yank the SD card mid execution and it keeps on running as if it's still there. I really need to detect the moment that SD card is removed. I've seen some suggesting using libudev. I cannot use that as this is on an embedded linux system which doesn't have it. Here's the code I have so far:
#include <stdio.h>
#include <time.h>
const unsigned long long size = 16ULL*1024;
#define NANOS 1000000000LL
#define KB 1024
long long CreateFile(char* filename)
{
struct timespec time_start;
struct timespec time_stop;
long long start, elapsed, microseconds;
int timefail = 0;
size_t stat;
if(clock_gettime(CLOCK_REALTIME, &time_start) < 0)
timefail = 1;
start = time_start.tv_sec*NANOS + time_start.tv_nsec;
int a[size];
int i, j;
for(i=0;i<size;i++)
a[i] = i;
FILE* pFile;
pFile = fopen(filename, "wb");
if(pFile < 0)
{
perror("fopen");
return -1;
}
for(j=0; j < KB; j++)
{
stat = fwrite(a, sizeof(int), size, pFile);
if(stat < 0)
perror("fwrite");
stat = fsync(pFile);
//if(stat)
// perror("fysnc");
}
fclose(pFile);
if(clock_gettime(CLOCK_REALTIME, &time_stop) < 0)
timefail = 1;
elapsed = time_stop.tv_sec*NANOS + time_stop.tv_nsec - start;
microseconds = elapsed / 1000 + (elapsed % 1000 >= 500);
if(timefail)
return -1;
return microseconds / 1000;
}
long long ReadFile(char* filename)
{
struct timespec time_start;
struct timespec time_stop;
long long start, elapsed, microseconds;
int timefail = 0;
if(clock_gettime(CLOCK_REALTIME, &time_start) < 0)
timefail = 1;
start = time_start.tv_sec*NANOS + time_start.tv_nsec;
FILE* pFile;
pFile = fopen(filename, "rb");
int a[KB];
int i=0, j=0;
for(i=0; i<size; i++)
{
if(ferror(pFile) != 0)
{
fprintf(stderr, "**********************************************");
fprintf(stderr, "READ FAILURE\n");
fclose(pFile);
return -1;
}
fread(a, sizeof(a), 1, pFile);
for(j=0; j<KB;j++)
{
if(a[0] != a[1]-1)
{
fprintf(stderr, "**********************************************");
fprintf(stderr, "DATA FAILURE, %d != %d\n", a[j], a[j+1]-1);
fclose(pFile);
return -1;
}
}
}
fclose(pFile);
if(clock_gettime(CLOCK_REALTIME, &time_stop) < 0)
timefail = 1;
if(timefail)
return -1;
elapsed = time_stop.tv_sec*NANOS + time_stop.tv_nsec - start;
microseconds = elapsed / 1000 + (elapsed % 1000 >= 500);
return microseconds/1000;
}
int main(int argc, char* argv[])
{
char* filenamebase = "/tmp/media/mmcblk0p1/test.file";
char filename[100] = "";
int i=0;
long long tmpsec = 0;
long long totalwritetime = 0;
int totalreadtime = 0;
int numfiles = 10;
int totalwritten = 0;
int totalread = 0;
for(i=0;i<numfiles;i++)
{
sprintf(filename, "%s%d", filenamebase, i);
fprintf(stderr, "Writing File: %s ...", filename);
tmpsec = CreateFile(filename);
if(tmpsec < 0)
return 0;
totalwritetime += tmpsec;
totalwritten++;
fprintf(stderr, "completed in %lld seconds\n", tmpsec);
fprintf(stderr, "Reading File: %s ...", filename);
tmpsec = ReadFile(filename);
if(tmpsec < 0)
return 0;
totalreadtime += tmpsec;
totalread++;
fprintf(stderr, "completed in %lld seconds\n", tmpsec);
}
fprintf(stderr, "Test Complete\nTotal Files: %d written, %d read\n", totalwritten, totalread);
fprintf(stderr, "File Size: %lld KB\n", size);
fprintf(stderr, "Total KBytes Written: %lld\n", size*totalwritten);
fprintf(stderr, "Average Write Speed: %0.2f KBps\n", (double)size*totalwritten/(totalwritetime/1000));
fprintf(stderr, "Total KBytes Read: %lld\n", size*totalread);
fprintf(stderr, "Average Read Speed: %0.2f KBps\n", (double)size*totalread/(totalreadtime/1000));
return 0;
}
You'll need to change your approach.
If you yank out media that has been mounted, you're likely to panic your kernel (as it keeps complex data structures that represent the mounted filesystem in memory), and break the media itself.
I've destroyed quite a few USB memory sticks that way -- their internal small logic that handle allocation and wear leveling do not like to lose power mid-run, and the cheapest ones do not seem to have capacitors capable of providing enough power to keep them running long enough to ensure a consistent state -- but SD cards and more expensive USB sticks might survive better.
Depending on the drivers used, the kernel may allow you to read and write to the media, but simply keep the changes in page cache. (Furthermore, your stdio.h I/O is likely to only reach into the page cache, and not the actual media, depending on the mount options (whether mounted direct/sync or not). Your approach simply does not provide the behaviour you assume it does.)
Instead, you should use low-level I/O (unistd.h, see man 2 open and related calls, none of stdio.h), using O_RDWR|O_DIRECT|O_SYNC flags to make sure your reads and writes hit the hardware, and access the raw media directly via the block device node, instead of mounting it at all. You can also read/write to random locations on the device, in the hopes that wear leveling does not affect your radiation resistance checks too much.
(Edited to add: If you write in blocks exactly the size of the native allocation block for the tested media device, you'll avoid the slow read-modify-write cycles on the device. The device will still do wear leveling, but that just means that the block you wrote is in a random physical location(s) in the flash chip. The native block size depends on the media device. It is possible to measure the native block size by observing how long it takes to read and write a block of different size, but I think for damage testing, a large enough power of two should work best -- say 256k or 262144 bytes. It's probably best to let the user set it for each device separately, and use either manufacturer-provided information, or a separate test program to find out the proper value.)
You do not want to use mmap() for this, as the SIGBUS signal caused by media errors and media becoming unavailable, is very tricky to handle correctly. Low-level unistd.h I/O is best for this, in my opinion.
I believe, but have not verified, that yanking out the media in mid-read/write to the unmounted low-level device, should simply yield a read/write error. (I don't have any media I'm willing to risk right now to check it, though :)
Answer from my comment:
In your write function you should have:
for(j=0; j < KB; j++)
{
uint32_t bytes_written = fwrite(a, sizeof(int), size, pFile);
if(bytes_written < size)
{
perror("fwrite");
break;
}
stat = fsync(pFile);
if(stat < 0)
{
perror("fysnc");
break;
}
}
and in your read function:
uint32_t read_bytes_count = fread(a, sizeof(a), 1, pFile);
if(read_bytes_count < sizeof(a))
break;
Also if you have a C99 compiler please use the fixed size types available in stdint.h, ex: uint32_t, ...