Processing Common Crawl warc files. These are 5gb uncompressed. Inside there is text, xml and warc headers.
This is the code I am particulary having trouble with:
wstring sub = buffer->substr(windowStart, windowSize);
Which give me the error, "expression must have a pointer to class type". I take it that this is because the label is a pointer to heap memory location of that size. therefore, I cannot run any string operations on it. But the -> operator should get the contents that it points to so I can run something like substr?
I am using a simple buffer like this because I understand that mapping the file (MapViewOfFile, etc) to memory is more for random access. it is actually slower if all I need is sequential read?
I would like to read the file sequentially. To improve speed, read the file in chunks to the RAM and then process the ram chunk before getting another chunk from the disk. say 1mb per chunk, etc.
I am not processing all the xml, some will be skipped. grabbing the text and some of the warc headers, skipping the rest.
The idea is to use a sliding window through the file chunk in ram. The window starts where it last left off in the chunk. The window grows in size in a loop. once it gets to a sufficient size, regex is used to check to see if there are any matching tags, headers or text. If so, either skips just that tag, skips ahead so many characters (500 chars in some cases if it comes across a particular type of warc header), writes that tag (if it ones I want to keep), etc.
When the window matches, the windowStart is set to equal the windowEnd and it starts expanding the window again to find the next pattern. Once the buffer ends, it keeps track of any partial tags and refills the buffer from the disk.
The main problem I am running into is how to do the while sliding window. The buffer is a pointer to a location in heap memory. I can't use period or -> operators on it for some reason. So I can't use substr, regex, etc. I could make a copy, but do I really need to do that?
Here's my code so far:
BOOL pageActive = FALSE;
BOOL xml = FALSE;
#define MAXBUFFERSIZE 1024
#define MAXTAGSIZE 64
DWORD windowStart = 0; DWORD windowEnd = 15; DWORD windowSize = 15; // buffer window containing tag candidate
wstring windowCopy;
DWORD bufferSize = MAXBUFFERSIZE;
_int64 fileRemaining;
HANDLE hFile;
DWORD dwBytesRead = 0;
OVERLAPPED ol = { 0 };
LARGE_INTEGER dwPosition;
TCHAR* buffer;
hFile = CreateFile(
inputFilePath, // file to open
GENERIC_READ, // open for reading
FILE_SHARE_READ | FILE_SHARE_WRITE, // share for reading and writing
NULL, // default security
OPEN_EXISTING, // existing file only
FILE_ATTRIBUTE_NORMAL, // normal file | FILE_FLAG_OVERLAPPED
NULL); // no attr. template
if (hFile == INVALID_HANDLE_VALUE)
{
DisplayErrorBox((LPWSTR)L"CreateFile");
return 0;
}
LARGE_INTEGER size;
GetFileSizeEx(hFile, &size);
_int64 fileSize = (__int64)size.QuadPart;
double gigabytes = fileSize * 9.3132e-10;
sendToReportWindow(L"file size: %lld bytes \(%.1f gigabytes\)\n", fileSize, gigabytes);
if(fileSize > MAXBUFFERSIZE)
{
TCHAR* buffer = new TCHAR[MAXBUFFERSIZE]; buffer[0] = 0;
//sendToReportWindow(L"buffer is MAXBUFFERSIZE\n");
}
else
{
TCHAR* buffer = new TCHAR[fileSize]; buffer[0] = 0;
//sendToReportWindow(L"buffer is fileSize + 1\n");
}
fileRemaining = fileSize;
sendToReportWindow(L"file remaining: %lld bytes\n", fileRemaining);
//TCHAR readBuffer[MAXBUFFERSIZE] = { 0 };
while (fileRemaining) // outer loop. while file remaining, read file chunk to buffer
{
if (bufferSize > fileRemaining) // as fileremaining gets smaller as file is processed, it eventually is smaller than the buffer
bufferSize = fileRemaining;
if (FALSE == ReadFile(hFile, buffer, bufferSize -1, &dwBytesRead, NULL))
//if (FALSE == ReadFile(hFile, readBuffer, bufferSize -1, &dwBytesRead, NULL))
{
sendToReportWindow(L"file read failed\n");
CloseHandle(hFile);
return 0;
}
fileRemaining -= bufferSize; //fileRemaining is size of the file left after this buffer is processed
sendToReportWindow(L"outer loop\n");
// declare and clear span char array[maxTagSize] // size of array is maximum tag size (64). This is for unused windows. Raw text is not considered a tag
while (windowEnd < bufferSize) //inner loop. while unused data remains in buffer
{
windowSize = windowEnd - windowStart;
// windowsize += span.size
// The window start position remains fixed as the window size is slowly increased. Once it is large enough, some conditional below begin to look at it.If any triggers, they eat that window. Setting the new start position at the previous end position.
// If the buffer ends mid - tag, the contents of the window are copy to the span array variable
// Page state. Tags in header
// If !pageActive
// if windowSize > 7 (warc / 1.0)
// Convert chunk to string for regex ? (prepend span array from previous loop)
// If Regex chunk WARC - Type : response pageActive = true; wstart = wend, clear span
// Elseif regex chunk other warc - type clear span; skip ahead 550 for start, 565 for end
// Continue
// // page is active
//
// if windowSize > 6
// If regex chunk WARC / \d pageActive = false; xml = false; wstart = wend, clear span; Continue
// If !xml
// If windowSize > 15 (warc date)
// Convert chunk to string for regex ? (prepend span array from previous loop)
// If regex chunk warc date output warc date; wstart = wend, clear span
// elseIf regex chunk warc uri output warc uri; wstart = wend, clear span; skip ahead 300
// ElseIf end of window has \nā < ā Xml = true // any window size where xml is not started
// continue // whatever triggers in this !xml block, always continue
// // page and xml are active
// // only send to output bare text when a [^\n]< or newline is reached
// test where just outputs all the tags or text it finds
// pull out any <.+> sequences or any >.+< sequences
// multibyte conversion, build string of window
//LPCCH readBuffer = { "ab" }; // = buffer[2];
// std::string str2 = str.substr (3,5);
//wstring sub = (wstring)readBuffer.substr(0,5); // substring of buffer
wstring sub = buffer->substr(windowStart, windowSize);
TCHAR converted[64] = { 0 };
MultiByteToWideChar(CP_ACP, MB_COMPOSITE, (LPCCH)&sub, -1, converted, MAXBUFFERSIZE);
//MultiByteToWideChar(CP_ACP, MB_COMPOSITE, (LPCCH)buffer, MAXBUFFERSIZE, converted, 1); // convert between the utf encoding of the file to the utf encoding of windows?
sendToReportWindow(L"windowStart:%d windowEnd:%d char:%s\n", windowStart, windowEnd, converted);
//sendToReportWindow((LPWSTR)buffer[windowStart]);
windowStart = windowEnd;
// //Tags in body. Any chunk size
// Convert chunk to string for regex ? (prepend span array from previous loop)
// if regex chunk tag pattern output pattern, wstart = wend, clear span
// nested tags? no
// windowEnd++; // tests above did not bite. so increment end of window, increasing window size
} // inner loop: while windowEnd <buffersize
// end of buffer: load any unused window into span
//If windowEnd != windowStart // window start did not get set to end by regex above
//Span = buffer(start ā end)
//file progress indicator
//fileSize / fileRemaining x 0.01 // calculate percentage of file remaining with each buffer load
//print progress
//windowStart = 0; windowEnd = 1; windowSize = 1 // look at smaller pieces after first iteration (not in w header)
} // outer loop. while fileRemaining
delete buffer;
Which give me the error, "expression must have a pointer to class
type".
TCHAR has no such method as substr.
modify:
wstring str(buffer);
wstring sub = str.substr(windowStart, windowSize);
Other codes that need to be modified:
MultiByteToWideChar(CP_ACP, MB_COMPOSITE, (LPCCH)&sub, -1, converted, MAXBUFFERSIZE);
sendToReportWindow(L"windowStart:%d windowEnd:%d char:%s\n", windowStart, windowEnd, converted);
=> sendToReportWindow(L"windowStart:%d windowEnd:%d char:%s\n", windowStart, windowEnd, sub.c_str()); //use string::c_str method
buffer = new TCHAR[MAXBUFFERSIZE]; buffer[0] = 0; //remove TCHAR*
buffer = new TCHAR[fileSize]; buffer[0] = 0; //remove TCHAR*
I am not processing all the xml, some will be skipped. grabbing the
text and some of the warc headers, skipping the rest.
You can use string::find to grab the warc header.(Make sure the warc header is unique)
ep: Check if a string contains a string in C++
BTW, whether you use Unicode Character or Multi-Byte Character, you need to maintain a single encoding format.
Related
I wanna read and remove the first line from a txt file (without copying, it's a huge file).
I've read the net but everybody just copies the desired content to a new file. I can't do that.
Below a first attempt. This code will be stucked in a loop as no lines are removed. If the code would remove the first line of file at each opening, the code would reach the end.
#include <iostream>
#include <string>
#include <fstream>
#include <boost/interprocess/sync/file_lock.hpp>
int main() {
std::string line;
std::fstream file;
boost::interprocess::file_lock lock("test.lock");
while (true) {
std::cout << "locking\n";
lock.lock();
file.open("test.txt", std::fstream::in|std::fstream::out);
if (!file.is_open()) {
std::cout << "can't open file\n";
file.close();
lock.unlock();
break;
}
else if (!std::getline(file,line)) {
std::cout << "empty file\n"; //
file.close(); // never
lock.unlock(); // reached
break; //
}
else {
// remove first line
file.close();
lock.unlock();
// do something with line
}
}
}
Here's a solution written in C for Windows.
It will execute and finish on a 700,000 line, 245MB file in no time. (0.14 seconds)
Basically, I memory map the file, so that I can access the contents using the functions used for raw memory access. Once the file has been mapped, I just use the strchr function to find the location of one of the pair of symbols used to denote an EOL in windows (\n and \r) - this tells us how long in bytes the first line is.
From here, I just memcpy from the first byte f the second line back to the start of the memory mapped area (basically, the first byte in the file).
Once this is done, the file is unmapped, the handle to the mem-mapped file is closed and we then use the SetEndOfFile function to reduce the length of the file by the length of the first line. When we close the file, it has shrunk by this length and the first line is gone.
Having the file already in memory since I've just created and written it is obviously altering the execution time somewhat, but the windows caching mechanism is the 'culprit' here - the very same mechanism we're leveraging to make the operation complete very quickly.
The test data is the source of the program duplicated 100,000 times and saved as testInput2.txt (paste it 10 times, select all, copy, paste 10 times - replacing the original 10, for a total of 100 times - repeat until output big enough. I stopped here because more seemed to make Notepad++ a 'bit' unhappy)
Error-checking in this program is virtually non-existent and the input is expected not to be UNICODE, i.e - the input is 1 byte per character.
The EOL sequence is 0x0D, 0x0A (\r, \n)
Code:
#include <stdio.h>
#include <windows.h>
void testFunc(const char inputFilename[] )
{
int lineLength;
HANDLE fileHandle = CreateFile(
inputFilename,
GENERIC_READ | GENERIC_WRITE,
0,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL | FILE_FLAG_WRITE_THROUGH,
NULL
);
if (fileHandle != INVALID_HANDLE_VALUE)
{
printf("File opened okay\n");
DWORD fileSizeHi, fileSizeLo = GetFileSize(fileHandle, &fileSizeHi);
HANDLE memMappedHandle = CreateFileMapping(
fileHandle,
NULL,
PAGE_READWRITE | SEC_COMMIT,
0,
0,
NULL
);
if (memMappedHandle)
{
printf("File mapping success\n");
LPVOID memPtr = MapViewOfFile(
memMappedHandle,
FILE_MAP_ALL_ACCESS,
0,
0,
0
);
if (memPtr != NULL)
{
printf("view of file successfully created");
printf("File size is: 0x%04X%04X\n", fileSizeHi, fileSizeLo);
LPVOID eolPos = strchr((char*)memPtr, '\r'); // windows EOL sequence is \r\n
lineLength = (char*)eolPos-(char*)memPtr;
printf("Length of first line is: %ld\n", lineLength);
memcpy(memPtr, eolPos+2, fileSizeLo-lineLength);
UnmapViewOfFile(memPtr);
}
CloseHandle(memMappedHandle);
}
SetFilePointer(fileHandle, -(lineLength+2), 0, FILE_END);
SetEndOfFile(fileHandle);
CloseHandle(fileHandle);
}
}
int main()
{
const char inputFilename[] = "testInput2.txt";
testFunc(inputFilename);
return 0;
}
What you want to do, indeed, is not easy.
If you open the same file for reading and writing in it without being careful, you will end up reading what you just wrote and the result will not be what you want.
Modifying the file in place is doable: just open it, seek in it, modify and close. However, you want to copy all the content of the file except K bytes at the beginning of the file. It means you will have to iteratively read and write the whole file by chunks of N bytes.
Now once done, K bytes will remain at the end that would need to be removed. I don't think there's a way to do it with streams. You can use ftruncate or truncate functions from unistd.h or use Boost.Interprocess truncate for this.
Here is an example (without any error checking, I let you add it):
#include <iostream>
#include <fstream>
#include <unistd.h>
int main()
{
std::fstream file;
file.open("test.txt", std::fstream::in | std::fstream::out);
// First retrieve size of the file
file.seekg(0, file.end);
std::streampos endPos = file.tellg();
file.seekg(0, file.beg);
// Then retrieve size of the first line (a.k.a bufferSize)
std::string firstLine;
std::getline(file, firstLine);
// We need two streampos: the read one and the write one
std::streampos readPos = firstLine.size() + 1;
std::streampos writePos = 0;
// Read the whole file starting at readPos by chunks of size bufferSize
std::size_t bufferSize = 256;
char buffer[bufferSize];
bool finished = false;
while(!finished)
{
file.seekg(readPos);
if(readPos + static_cast<std::streampos>(bufferSize) >= endPos)
{
bufferSize = endPos - readPos;
finished = true;
}
file.read(buffer, bufferSize);
file.seekg(writePos);
file.write(buffer, bufferSize);
readPos += bufferSize;
writePos += bufferSize;
}
file.close();
// No clean way to truncate streams, use function from unistd.h
truncate("test.txt", writePos);
return 0;
}
I'd really like to be able to provide a cleaner solution for in-place modification of the file, but I'm not sure there's one.
I am trying to implement a simple file transfer. Below here is two methods that i have been testing:
Method one: sending and receiving without splitting the file.
I hard coded the file size for easier testing.
sender:
send(sock,buffer,107,NULL); //sends a file with 107 size
receiver:
char * buffer = new char[107];
recv(sock_CONNECTION,buffer,107,0);
std::ofstream outfile (collector,std::ofstream::binary);
outfile.write (buffer,107);
The output is as expected, the file isn't corrupted because the .txt file that i sent contains the same content as the original.
Method two: sending and receiving by splitting the contents on receiver's side. 5 bytes each loop.
sender:
send(sock,buffer,107,NULL);
Receiver:
char * buffer = new char[107]; //total file buffer
char * ptr = new char[5]; //buffer
int var = 5;
int sizecpy = size; //orig size
while(size > var ){ //collect bytes
recv(sock_CONNECTION,ptr,5,0);
strcat(buffer,ptr); //concatenate
size= size-var; //decrease
std::cout<<"Transferring.."<<std::endl;
}
std::cout<<"did it reach here?"<<std::endl;
char*last = new char[size];
recv(sock_CONNECTION,last,2,0); //last two bytes
strcat(buffer,last);
std::ofstream outfile (collector,std::ofstream::binary);
outfile.write (buffer,107);
Output: The text file contains invalid characters especially at the beginning and the end.
Questions: How can i make method 2 work? The sizes are the same but they yield different results. the similarity of the original file and the new file on method 2 is about 98~99% while it's 100% on method one. What's the best method for transferring files?
What's the best method for transferring files?
Usually I'm not answering questions like What's the best method. But in this case it's obvious:
You sent the file size and a checksum in network byte order, when starting a transfer
Sent more header data (e.g filename) optionally
The client reads the file size and the checksum, and decodes it to host byte order
You sent the file's data in reasonably sized chunks (5 bytes isn't a reasonable size), chunks should match tcp/ip frames maximum available payload size
You receive chunk by chunk at the client side until the previously sent file size is matched
You calculate the checksum for the received data at the client side, and check if it matches the one that was received beforhand
Note: You don't need to combine all chunks in memory at the client side, but just append them to a file at a storage medium. Also the checksum (CRC) usually can be calculated from running through data chunks.
Disagree with Galik. Better not to use strcat, strncat, or anything but the intended output buffer.
TCP is knda fun. You never really know how much data you are going to get, but you will get it or an error.
This will read up to MAX bytes at a time. #define MAX to whatever you want.
std::unique_ptr<char[]> buffer (new char[size]);
int loc = 0; // where in buffer to write the next batch of data
int bytesread; //how much data was read? recv will return -1 on error
while(size > MAX)
{ //collect bytes
bytesread = recv(sock_CONNECTION,&buffer[loc],MAX,0);
if (bytesread < 0)
{
//handle error.
}
loc += bytesread;
size= size-bytesread; //decrease
std::cout<<"Transferring.."<<std::endl;
}
bytesread = recv(sock_CONNECTION,&buffer[loc],size,0);
if (bytesread < 0)
{
//handle error
}
std::ofstream outfile (collector,std::ofstream::binary);
outfile.write (buffer.get(),size);
Even more fun, write into the output buffer so you don't have to store the whole file. In this case MAX should be a bigger number.
std::ofstream outfile (collector,std::ofstream::binary);
char buffer[MAX];
int bytesread; //how much data was read? recv will return -1 on error
while(size)
{ //collect bytes
bytesread = recv(sock_CONNECTION,buffer,MAX>size?size:MAX,0);
// MAX>size?size:MAX is like a compact if-else: if (MAX>size){size}else{MAX}
if (bytesread < 0)
{
//handle error.
}
outfile.write (buffer,bytesread);
size -= bytesread; //decrease
std::cout<<"Transferring.."<<std::endl;
}
The initial problems I see are with std::strcat. You can't use it on an uninitialized buffer. Also you are not copying a null terminated c-string. You are copying a sized buffer. Better to use std::strncat for that:
char * buffer = new char[107]; //total file buffer
char * ptr = new char[5]; //buffer
int var = 5;
int sizecpy = size; //orig size
// initialize buffer
*buffer = '\0'; // add null terminator
while(size > var ){ //collect bytes
recv(sock_CONNECTION,ptr,5,0);
strncat(buffer, ptr, 5); // strncat only 5 chars
size= size-var; //decrease
std::cout<<"Transferring.."<<std::endl;
}
beyond that you should really as error checking so the sockets library can tell you if anything went wrong with the communication.
I have a custom file with mixed data. At the end of the file there's an entire image which I want to retrieve.
The problem is that, when I 'extract' it and paste it into an image file, rdbuf() leaves me some annoying CR LF characters instead of just the LF ones in the original.
I have already opened both streams in binary mode.
using namespace std;
ifstream i(f, ios::in | ios::binary);
bool found = false; // Found image name
string s; // String to read to
string n = ""; // Image name to retrieve
while (!found) {
getline(i, s);
// Check if it's the name line
if (s[0]=='-' && s[1]=='|' && s[2]=='-') {
found = true;
// Loop through name X: -|-XXXX-|-
// 0123456789
// Length: 10 3 6
for (unsigned int j=3; j<s.length()-4; j++)
n = n + s[j];
}
}
ofstream o(n.c_str(), ios::out | ios::binary);
o << i.rdbuf();
I made some research and find out that << operator treats input as text and so adjusts \n to \r\n on Windows.
A way to prevent this using the write method instead of <<.
You can do it like this (replacing your last line of code):
// get pointer to associated buffer object
std::filebuf* pbuf = i.rdbuf();
// next operations will calculate file size
// get current position
const std::size_t current = i.tellg();
// move to the end of file
i.seekg(0, i.end);
// get size of file (current position of the end)
std::size_t size = i.tellg();
// get size of remaining data (removing the current position from the size)
size -= current;
// move back to where we were
i.seekg(current, i.beg);
// allocate memory to contain image data
char* buffer=new char[size];
// get image data
pbuf->sgetn (buffer,size);
// close input stream
i.close();
// write buffer to output
o.write(buffer,size);
// free memory
delete[] buffer;
Solved. The problem were made during the ofstream operation to save the file before opening. Because the file were saved as text (with CR LF), it was opening as text as well.
I am trying to watch a folder changes and notify the added filename so here is my code
bool FileWatcher::NotifyChange()
{
// Read the asynchronous result of the previous call to ReadDirectory
DWORD dwNumberbytes;
GetOverlappedResult(hDir, &overl, &dwNumberbytes, FALSE);
// Browse the list of FILE_NOTIFY_INFORMATION entries
FILE_NOTIFY_INFORMATION *pFileNotify = (FILE_NOTIFY_INFORMATION *)buffer[curBuffer];
// Switch the 2 buffers
curBuffer = (curBuffer + 1) % (sizeof(buffer)/(sizeof(buffer[0])));
SecureZeroMemory(buffer[curBuffer], sizeof(buffer[curBuffer]));
// start a new asynchronous call to ReadDirectory in the alternate buffer
ReadDirectoryChangesW(
hDir, /* handle to directory */
&buffer[curBuffer], /* read results buffer */
sizeof(buffer[curBuffer]), /* length of buffer */
FALSE, /* monitoring option */
FILE_NOTIFY_CHANGE_FILE_NAME ,
//FILE_NOTIFY_CHANGE_LAST_WRITE, /* filter conditions */
NULL, /* bytes returned */
&overl, /* overlapped buffer */
NULL); /* completion routine */
for (;;) {
(pFileNotify->Action == FILE_ACTION_ADDED)
{
qDebug()<<"in NotifyChange if ";
char szAction[42];
char szFilename[MAX_PATH] ;
memset(szFilename,'\0',sizeof( szFilename));
strcpy(szAction,"added");
wcstombs( szFilename, pFileNotify->FileName, MAX_PATH);
qDebug()<<"pFileNotify->FileName : "<<QString::fromWCharArray(pFileNotify->FileName)<<"\nszFilename : "<<QString(szFilename);
}
// step to the next entry if there is one
if (!pFileNotify->NextEntryOffset)
return false;
pFileNotify = (FILE_NOTIFY_INFORMATION *)((PBYTE)pFileNotify + pFileNotify->NextEntryOffset);
}
pFileNotify=NULL;
return true;
}
It works fine unless a file with Arabic name was added so I get
pFileNotify->FileName : "??? ???????.txt"
szFilename : ""
How can I support the UTF-8 code file name ???
any idea please.
Apart from FILE_NOTIFY_INFORMATION::FileName not being null-terminated, there's nothing wrong with it.
FileName:
A variable-length field that contains the file name relative to the directory handle. The file name is in the Unicode character format and is not null-terminated.
If there is both a short and long name for the file, the function will return one of these names, but it is unspecified which one.
FileNameLength: The size of the file name portion of the record, in bytes. Note that this value does not include the terminating null character.
You'll have to use FILE_NOTIFY_INFORMATION::FileNameLength / sizeof(WCHAR) to get the length of the string in wchars pointed to by FileName. So in your case, the proper way would be:
size_t cchFileNameLength = pFileNotify->FileNameLength / sizeof(WCHAR);
QString::fromWCharArray( pFileNotify->FileName, cchFileNameLength );
If you need to use a function that expects the string to be null-terminated (like wcstombs) you'd have to allocate a temporary buffer with the size of FILE_NOTIFY_INFORMATION::FileNameLength + sizeof(WCHAR) and null-terminate it yourself.
As for the empty szFilename and question marks, that's just the result of converting an UTF16 (NTFS) filename that contains unconvertible characters to ANSI. If there's no conversion possible, wcstombs returns an error and QDebug converts any unconvertible character to ?.
If wcstombs encounters a wide character it cannot convert to a multibyte character, it returns ā1 cast to type size_t and sets errno to EILSEQ.
So if you need to support unicode filenames, do not convert them to ANSI and exclusively handle them with functions that support unicode.
I am trying to read a serial response from a hardware device. The string I read is long and I only need a portion of it. To get to portion of the string I want I use std::string.substr(x,y); . The problem I run into however is sometimes I get an exception error because the buffer I am reading from doesn't have y characters. Here is the code I use now to read values:
while(1)
{
char szBuff[50+1] = {0};
char wzBuff[14] = {"AT+CSQ\r"};
DWORD dZBytesRead = 0;
DWORD dwBytesRead = 0;
if(!WriteFile(hSerial, wzBuff, 7, &dZBytesRead, NULL))
std::cout << "Write error";
if(!ReadFile(hSerial, szBuff, 50, &dwBytesRead, NULL))
std::cout << "Read Error";
std:: cout << szBuff;
std::string test = std::string(szBuff).substr(8,10);
std::cout << test;
Sleep(500);
I am issuing the command "AT+CSQ". This returns:
N, N
OK
It returns two integer values seperated by a comma followed by a new line, followed by "OK".
My question is, how can I make sure I read all values from the serial port before grabbing a substring? From what I understand, the last character received should be a new line.
The interface of your ReadFile function seems to provide you with the number of bytes read. If you know the length that is expected, you should loop trying reading from the file (probably port descriptor) until the expected number of bytes is read.
If the length of the response is not known, you might have to read and check in the read buffer whether the separator token has been read or not (in this case your protocol seems to indicate that a new-line can be used to determine EOM --end of message)
If you can use other libraries, I would consider using boost::asio and the read_until functionality (or the equivalent in whatever libraries you are using). While the code to manage this is not rocket science, in most cases there is no point in reinventing the wheel.
As you said yourself in the last line, you know that the terminator for the response is a new line character. You need to read from the serial until you receive a new line somewhere in the input. Everything you received from the previous new line to the current new line is the response, with everything after the current new line is part of the next response. This is achieved by reading in a loop, handling each response as it is discovered:
char* myBigBuff;
int indexToBuff = 0;
int startNewLine = 0;
while (ReadFile(hSerial, myBigBuff + indexToBuff, 100, &dwBytesRead, NULL))
{
if (strchr(myBigBuff, '\n') != NULL)
{
handleResponse(myBigBuff + startNewLine, indexToBuff + dwBytesRead);
startNewLine = indexToBuff + dwBytesRead;
}
// Move forward in the buffer. This should be done cyclically
indexToBuff += dwBytesRead;
}
This is the basic idea. You should handle the left overs characters via any way you choose (cyclic buffer, simple copy to a temp array, etc.)
You should use ReadFile to read a certain amount of bytes per cycle into your buffer. This buffer should be filled until ReadFile reads 0 bytes, you have reached your \n or \r\n characters, or filled your buffer to the max.
Once you have done this, there would be no need to substr your string and you can iterate through your character buffer.
For example,
while (awaitResponse) {
ReadFile(hSerial, szBuff, 50, &dwBytesRead, NULL);
if (dwBytesRead != 0) {
// move memory from szBuff to your class member (e.g. mySerialBuff)
} else {
// nothing to read
if (buffCounter > 0) {
// process buffer
}
else {
// zero out all buffers
}
}
}
Old question, but I modified #Eli Iser code to:
while (ReadFile(hSerial, myBigBuff + indexToBuff, 1, &dwBytesRead, NULL)) {
if (strchr(myBigBuff, '-') != NULL || dwBytesRead < 1)
break;
// Move forward in the buffer. This should be done cyclically
indexToBuff += dwBytesRead;
}
if (indexToBuff != 0) {
//Do whatever with the code, it received successfully.
}