How to implement a stream that can be splitted by newline

How to implement a stream that can be splitted by newline - c++

The following code works, but is about twice as inefficient compared to when I use a (linux) pipe that gives unzipped data to the (modified) program. I need a steady stream within the program which I can keep splitting by \n. Is there a way to do this using a (string?) stream or any other trick?
int main(int argc, char *argv[]) {
static const int unzipBufferSize = 8192;
long long int counter = 0;
int i = 0, p = 0, n = 0;
int offset = 0;
char *end = NULL;
char *begin = NULL;
unsigned char unzipBuffer[unzipBufferSize];
unsigned int unzippedBytes;
char * inFileName = argv[1];
char buffer[200];
buffer[0] = '\0';
bool breaker = false;
char pch[4][200];
Read *aRead = new Read;
gzFile inFileZ;
inFileZ = gzopen(inFileName, "rb");
while (true) {
unzippedBytes = gzread(inFileZ, unzipBuffer, unzipBufferSize);
if (unzippedBytes > 0) {
unzipBuffer[unzippedBytes] = '\0'; //put a 0-char after the total buffer
begin = (char*) &unzipBuffer[0]; // point to the address of the first char
do {
end = strchr(begin,(int)'\n'); //find the end of line
if (end != NULL) *(end) = '\0'; // put 0-char to use it as a c-string
pch[p][0] = '\0'; \\ put a 0-char to be able to strcat
if (strlen(buffer) > 0) { // if buffer from previous iteration contains something
strcat(pch[p], buffer); // cat it to the p-th pch
buffer[0] = '\0'; \\ set buffer to null-string or ""
}
strcat(pch[p], begin); // put begin (or rest of line in case there was a buffer into p-th pch
if (end != NULL) { // see if it already points to something
begin = end+1; // if so, advance begin to old end+1
p++;
}
if(p>3) { // a 'read' contains 4 lines, so if p>3
strcat(aRead->bases,pch[1]); // we use line 2 and 4 as
strcat(aRead->scores,pch[3]); // bases and scores
//do things with the reads
aRead->bases[0] = '\0'; //put them back to 0-char
aRead->scores[0] = '\0';
p = 0; // start counting next 4 lines
}
}
while (end != NULL );
strcat(buffer,pch[p]); //move the left-over of unzipBuffer to buffer
}
else {
break; // when no unzippedBytes, exit the loop
}
}

Your main problem is probably the standard C string library.
With using strxxx() funcions, you are iterating through the complete buffer multiple times each call, first for strchr(), then for strlen(), then for each of the strcat() calls.
Using the standard library is a nice thing, but here, it's just plain inefficient.
Try if you could come up with something simpler that touches each character only once like (code just to show the principle, do not expect it working):
do
{
do
{
*tp++ = *sp++;
} while (sp < buffer_end && *sp != '\n');
/* new line, do whatever it requires */
...
/* reset tp to beginning of buffer */
} while (sp < buffer_end);

I am trying to get this to work, but all it does is giving a Segmentation Fault at runtime:
do {
unzippedBytes = gzread(inFileZ, unzipBuffer, unzipBufferSize);
if (unzippedBytes > 0) {
while (*unzipBuffer < unzippedBytes) {
*pch = *unzipBuffer++;
cout << pch;
i++;
}
i=0;
}
else break;
} while (true);
What am I doing wrong here?

Related

How gets size of char[] when i read from a text file

I'm reading a .txt file. I need to read at most 254 characters into my char* buffer, so I did this:
char *buffer = new char[255];
***Some Code***
if (!feof(fichero))
{
if (fgets(buffer, 254, fichero) != NULL)
{
How do get the size of the buffer read? Right now I'm using a manual method to find '\n' and use its position as the size. But is there a better way?
My code for now:
char *buffer = new char[255];
int tamanio;
***more code***
if (!feof(fichero))
{
if (fgets(buffer, 254, fichero) != NULL)
{
//printarCadena(buffer);
tamanio = limpiarBuffer(buffer);
printf("Tamanio buffer: %i \n", tamanio);
int Gestor::limpiarBuffer(char* buffer)
{
int i;
for(i = 0; i < int(strlen(buffer));i++)
{
if(buffer[i] == '\n')
return i;
}
return int(strlen(buffer) - 1);
}
Edit: I`ve to use Char*, University vibes
Edit2: I've to read line by line, if line ve more than 255 characteres, i read as 2 or more lines

int getSize(char* buffer)
{
int i;
for(i = 0; i < int(strlen(buffer));i++)
{
if(buffer[i] == '\n')
return i;
}
return strlen(buffer);
}
By now no other way to do with char*

Socket Communication Data Corruption on Write/Read

I've got a C++ server that communicates with multiple clients. It uses a vector to store the handles to the sockets for those clients (playerSockets in the code below). At the end of the "game" I want the server to loop through that vector and write the same string to each client. However, sometimes the data that the client reads (and then displays) is "corrupted" as you can see in the screenshot, but this doesn't happen for the first client, only the second. I can't figure out why this is happening! I use this same technique (looping and writing) earlier in the program and it always works fine in that instance.
Here is what it is supposed to be shown:
Here and here's what I get:
Here is the server code that writes:
std::string announcement = "";
if (playerWon) {
...
}
} else {
announcement = "?No one won the game!\nGAME BOARD: " + cn.getGameBoard();
for (int player : gameData->playerSockets) {
write(player, announcement.c_str(), announcement.size() + 1);
}
}
And here's the client code that reads. Keep in mind that more than one client is running and connected to the server, and this issue only happens with a client OTHER THAN the first client in the server's loop:
static bool readMyTurn(int clientSd) {
...
char buf[BUFSIZE];
read(clientSd, buf, BUFSIZE);
string myTurn(buf);
cout << "MYMYMYMY: " << myTurn << endl;
myTurn.erase(0, 1);
cout << myTurn << endl;
...
}
UPDATE
Here is my current code to read until encountering the null-terminator character.
string readOneStringFromServer(int clientSd, string &leftovers) {
ssize_t nullTerminatorPosition = 0;
std::string stringToReturn = "";
do {
char buf[BUFSIZE];
ssize_t bytesRead = read(clientSd, buf, BUFSIZE);
nullTerminatorPosition = findPositionOfNullTerminator(buf, bytesRead);
// found a null terminator
if (nullTerminatorPosition != -1) {
// create a buffer to hold all of the chars from buf1 up to and including the null terminator
char upToNullTerminator[nullTerminatorPosition + 1];
// get those chars from buf1 and put them into buf2 (including the null terminator)
for (int i = 0; i < nullTerminatorPosition + 1; ++i) {
upToNullTerminator[i] = buf[i];
}
// use buf2 to create a string
stringToReturn += upToNullTerminator;
// check if there are leftover bytes after the null terminator
int leftoverBytes = bytesRead - nullTerminatorPosition - 1;
if (leftoverBytes != 0) {
// if there are, create a char array of that size
char leftoverChars[leftoverBytes];
// loop through buf1 and add the leftover chars to buf3
for (int i = nullTerminatorPosition + 1; i < bytesRead; ++i) {
leftoverChars[i - (nullTerminatorPosition + 1)] = buf[i];
}
// make a string out of those leftover chars
leftovers = leftoverChars;
} else {
// if there are no leftover bytes, then we want to "erase" what is currently held in leftovers so that
// it doesn't get passed to the next function call
leftovers = "";
}
// didn't find one
} else {
stringToReturn += buf;
}
} while (nullTerminatorPosition == -1);
return stringToReturn;
}

How does one locate a pointer error?

I am attempting to create a program to create a Markov chain but I am having pointer problems. When I run the Program I get a segmentation fault.
#include <stdio.h>
#include <cstring>
#include <cstdlib>
struct word;
struct nextword
{
word* sourceword;
word* next = 0;
};
int wordcount;
struct word
{
char* wordstr;
struct word* next = 0;
nextword* followingword = 0;
int nextwordcount = 0;
};
int main()
{
word* firstword = 0;
char * buffer = 0;
long length;
FILE * f = fopen ("alice.txt", "rb");
if (f)
{
fseek (f, 0, SEEK_END);
length = ftell (f);
fseek (f, 0, SEEK_SET);
buffer = (char *)malloc (length);
if (buffer)
{
fread (buffer, 1, length, f);
}
fclose (f);
}
if (buffer)
{
char wordbuffer[500];
int fileindex = 0;
while(fileindex < length-1)
{
int wordindex = 0;
while(buffer[fileindex] != ' ')
{
wordbuffer[wordindex] = buffer[fileindex];
wordindex++;
fileindex++;
}
if(wordindex != 0)
{
wordbuffer[wordindex] = '\0';
word* newword = (word*)malloc(sizeof(word));
char* newwordstr = (char*)malloc((strlen(wordbuffer)+1)*sizeof(char));
strcpy(newword->wordstr, newwordstr);
if(!firstword)
{
firstword = newword;
}
else
{
word* testword = firstword;
while(!testword->next)
{
testword = (testword->next);
}
testword->next = newword;
printf(newword->wordstr);
}
}
return 0;
}
}
else
{
return 1;
}
}
I attempted to remove the file reading part and replace it with a hard coded string, but the problem remained.

You might want to read about STL and use a list. Or use a C list, see a couple of examples,
Adding node in front of linklist
How to pop element from tail in linked list?
Trying to make linkedlist in C
Several problems. Fixed some. compiles.
I have annotated the code with places where you need to fix bounds checking, and the big problem was likely the strcpy to the struct word->wordstr uninitialized char*,
#include <stdio.h>
#include <cstring>
#include <cstdlib>
struct word;
struct nextword
{
word* sourceword;
word* next = 0;
};
int wordcount;
struct word
{
char* wordstr; //what do you think this pointer points to?
struct word* next = 0;
nextword* followingword = 0;
int nextwordcount = 0;
};
int main()
{
FILE* fh = NULL;
word* firstword = 0;
char* buffer = 0;
char* fname = "alice.txt";
long length = 0; //you did not initialize length
if ( (fh = fopen ("alice.txt", "rb")) )
{
//why not use fstat to get file size?
//why not use mmap to read file?
fseek (fh, 0, SEEK_END);
length = ftell (fh); //ok, length set here
fseek (fh, 0, SEEK_SET);
if( (buffer = (char *)malloc (length)) )
{
fread (buffer, 1, length, fh);
}
fclose (fh);
}
else
{
printf("error: cannot open %s",fname);
exit(1);
}
printf("read %s, %ld\n",fname,length);
if (!buffer)
{
printf("error: cannot open %s",fname);
exit(1);
//use exit, to return from main() //return 1;
}
//already checked buffer
{
int fileindex = 0;
//put wordbuffer after fileindex, avoids stackoverflow overwrite
char wordbuffer[500]; //500 bytes on stack, initialize?
memset(wordbuffer,0,sizeof(wordbuffer));
while(fileindex < length-1)
{
int wordindex = 0;
//several errors in this line, check for null terminator,
//check for newline, tab, basically any whitespace
//while(buffer[fileindex] != ' ')
while( buffer[fileindex] && buffer[fileindex] != ' ' )
{
wordbuffer[wordindex] = buffer[fileindex];
wordindex++;
fileindex++;
//here is another error, do not overflow your stack based buffer
if( wordindex>sizeof(buffer)-1 ) break; //do not overflow buffer
}
wordbuffer[wordindex] = '\0'; //terminate wordbuffer
//since you chose wordindex signed, you want it > 0
if(wordindex > 0)
{
//use a constructor
word* newword = (word*)malloc(sizeof(word));
//use a constructor
//or just use strdup, since it is just a cstring
char* newwordstr = strdup(wordbuffer);
//no, just set pointer to the above allocated string
//strcpy(newword->wordstr, newwordstr);
newword->wordstr = newwordstr;
if(!firstword)
{
firstword = newword;
}
else
{
word* testword = firstword;
while(!testword->next)
{
testword = (testword->next);
}
testword->next = newword;
printf(newword->wordstr);
}
}
return 0;
}
}
exit(0); //done
}
This compiles and runs without error, you need to look up linked list handling. You should implement a linked list, and then add word elements to list.

Is there a better way to search a file for a string?

I need to search a (non-text) file for the byte sequence "9µ}Æ" (or "\x39\xb5\x7d\xc6").
After 5 hours of searching online this is the best I could do. It works but I wanted to know if there is a better way:
char buffer;
int pos=in.tellg();
// search file for string
while(!in.eof()){
in.read(&buffer, 1);
pos=in.tellg();
if(buffer=='9'){
in.read(&buffer, 1);
pos=in.tellg();
if(buffer=='µ'){
in.read(&buffer, 1);
pos=in.tellg();
if(buffer=='}'){
in.read(&buffer, 1);
pos=in.tellg();
if(buffer=='Æ'){
cout << "found";
}
}
}
}
in.seekg((streampos) pos);
Note:
I can't use getline(). It's not a text file so there are probably not many line breaks.
Before I tried using a multi-character buffer and then copying the buffer to a C++ string, and then using string::find(). This didn't work because there are many '\0' characters throughout the file, so the sequence in the buffer would be cut very short when it was copied to the string.

Similar to what bames53 posted; I used a vector as a buffer:
std::ifstream ifs("file.bin");
ifs.seekg(0, std::ios::end);
std::streamsize f_size = ifs.tellg();
ifs.seekg(0, std::ios::beg);
std::vector<unsigned char> buffer(f_size);
ifs.read(buffer.data(), f_size);
std::vector<unsigned char> seq = {0x39, 0xb5, 0x7d, 0xc6};
bool found = std::search(buffer.begin(), buffer.end(), seq.begin(), seq.end()) != buffer.end();

If you don't mind loading the entire file into an in-memory array (or using mmap() to make it look like the file is in memory), you could then search for your character sequence in-memory, which is a bit easier to do:
// Works much like strstr(), except it looks for a binary sub-sequence rather than a string sub-sequence
const char * MemMem(const char * lookIn, int numLookInBytes, const char * lookFor, int numLookForBytes)
{
if (numLookForBytes == 0) return lookIn; // hmm, existential questions here
else if (numLookForBytes == numLookInBytes) return (memcmp(lookIn, lookFor, numLookInBytes) == 0) ? lookIn : NULL;
else if (numLookForBytes < numLookInBytes)
{
const char * startedAt = lookIn;
int matchCount = 0;
for (int i=0; i<numLookInBytes; i++)
{
if (lookIn[i] == lookFor[matchCount])
{
if (matchCount == 0) startedAt = &lookIn[i];
if (++matchCount == numLookForBytes) return startedAt;
}
else matchCount = 0;
}
}
return NULL;
}
.... then you can just call the above function on the in-memory data array:
char * ret = MemMem(theInMemoryArrayContainingFilesBytes, numBytesInFile, myShortSequence, 4);
if (ret != NULL) printf("Found it at offset %i\n", ret-theInMemoryArrayContainingFilesBytes);
else printf("It's not there.\n");

This program loads the entire file into memory and then uses std::search on it.
int main() {
std::string filedata;
{
std::ifstream fin("file.dat");
std::stringstream ss;
ss << fin.rdbuf();
filedata = ss.str();
}
std::string key = "\x39\xb5\x7d\xc6";
auto result = std::search(std::begin(filedata), std::end(filedata),
std::begin(key), std::end(key));
if (std::end(filedata) != result) {
std::cout << "found\n";
// result is an iterator pointing at '\x39'
}
}

const char delims[] = { 0x39, 0xb5, 0x7d, 0xc6 };
char buffer[4];
const size_t delim_size = 4;
const size_t last_index = delim_size - 1;
for ( size_t i = 0; i < last_index; ++i )
{
if ( ! ( is.get( buffer[i] ) ) )
return false; // stream to short
}
while ( is.get(buffer[last_index]) )
{
if ( memcmp( buffer, delims, delim_size ) == 0 )
break; // you are arrived
memmove( buffer, buffer + 1, last_index );
}
You are looking for 4 bytes:
unsigned int delim = 0xc67db539;
unsigned int uibuffer;
char * buffer = reinterpret_cast<char *>(&uibuffer);
for ( size_t i = 0; i < 3; ++i )
{
if ( ! ( is.get( buffer[i] ) ) )
return false; // stream to short
}
while ( is.get(buffer[3]) )
{
if ( uibuffer == delim )
break; // you are arrived
uibuffer >>= 8;
}

Because you said you cannot search the entire file because of null terminator characters in the string, here's an alternative for you, which reads the entire file in and uses recursion to find the first occurrence of a string inside of the whole file.
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
string readFile (char *fileName) {
ifstream fi (fileName);
if (!fi)
cerr << "ERROR: Cannot open file" << endl;
else {
string str ((istreambuf_iterator<char>(fi)), istreambuf_iterator<char>());
return str;
}
return NULL;
}
bool findFirstOccurrenceOf_r (string haystack, char *needle, int haystack_pos, int needle_pos, int needle_len) {
if (needle_pos == needle_len)
return true;
if (haystack[haystack_pos] == needle[needle_pos])
return findFirstOccurrenceOf_r (haystack, needle, haystack_pos+1, needle_pos+1, needle_len);
return false;
}
int findFirstOccurrenceOf (string haystack, char *needle, int length) {
int pos = -1;
for (int i = 0; i < haystack.length() - length; i++) {
if (findFirstOccurrenceOf_r (haystack, needle, i, 0, length))
return i;
}
return pos;
}
int main () {
char str_to_find[4] = {0x39, 0xB5, 0x7D, 0xC6};
string contents = readFile ("input");
int pos = findFirstOccurrenceOf (contents, str_to_find, 4);
cout << pos << endl;
}
If the file is not too large, your best solution would be to load the whole file into memory, so you don't need to keep reading from the drive. If the file is too large to load in at once, you would want to load in chunks of the file at a time. But if you do load in chucks, make sure you check to edges of the chunks. It's possible that your chunk happens to split right in the middle of the string you're searching for.

trouble decoding the serial GPS using readfile function

i m trying to use readfile function to read the serial port in c++. i manage to open and read the serial port in c++. the problem i facing now is the decoding of data after i read from the serial port. The below are my codes. When i run my code, my loop of decoding could not detect ((*&szChar == '$')), and it exit the loop by printing error. May i know how could i decode the gps data that i read from my serial port? thanks
char szChar[100];
int nRet;
DWORD dwBytesRead = 10;
char ReadBuffer[BUFFERSIZE] = {0};
nRet = ReadFile(hCom,&szChar,BUFFERSIZE-1,&dwBytesRead,NULL);
if((*&szChar == '$'))
{
printf("%s\n", &szChar);
}
else
{
printf("error\n");

I have to say, I find your code quite confused and confusing. Just for example, you're creating szChar as an array of 100 char, and ReadBuffer as an array of BUFFERSIZE chars. When you call ReadFile, however, you're passing the base address of szChar with the size given as BUFFERSIZE. Unless, by some coincidence, BUFFERSIZE happens to equal 100, that looks a lot like a potential buffer overrun.
Then we get to *&szChar. This doesn't really make much sense either. From the looks of things, you probably want szChar[0] -- but even that's not really a good idea, because you might not receive the data in exactly line-sized pieces. As such, you probably want to scan through the data to find the '$'.
int Ret;
DWORD BytesRead;
char ReadBuffer[BUFFERSIZE] = {0};
Ret = ReadFile(hCom,ReadBuffer,sizeof(ReadBuffer)-1,&BytesRead,NULL);
ReadBuffer[BytesRead] = '\0';
if (ReadBuffer[0] == '$')
printf(%s\n", ReadBuffer);
else
printf("error\n");

#Jerry: Thanks.. so i edited my code below to decode my data, is it correct way to put my ReadBuffer into another array for checking?
char lastCommaPosition;
char latitudeString[11];
char stringRead[MAXSIZE];
char tempString[MAXSIZE];
char *pChar;
char dummyChar;
float latitude;
int latDegrees;
float latMinutes;
int numLinesRead;
int Ret,i,j,k;
if (ReadBuffer[0] == '$')
{
i = 0;
numLinesRead++;
stringRead[i] = ReadBuffer;
}
stringRead[i+1] = '\0';
j = 0;
pChar = stringRead;
while(*(pChar+j) != ',')
{
tempString[j] = *(pChar+j);
j++;
}
tempString[j] = '\0';
if(tempString[3] == 'G' && tempString[4] == 'G' && tempString[5] == 'A')
{
pChar = stringRead;
j = lastCommaPosition + 1;
k = 0;
while(*(pChar+j) != ',')
{
latitudeString[k] = *(pChar+j);
j++;
k++;
}
lastCommaPosition = j;
latitudeString[k] = '\0';
sscanf(latitudeString, "%f", &latitude);
latDegrees = (int)(latitude/100);
latMinutes = (float)(latitude - latDegrees*100);
printf("\t%02d DEG\t%2.4f MIN", latDegrees, latMinutes);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to implement a stream that can be splitted by newline - c++

Related

How gets size of char[] when i read from a text file

Socket Communication Data Corruption on Write/Read

How does one locate a pointer error?

Is there a better way to search a file for a string?

trouble decoding the serial GPS using readfile function

Categories

Resources