C++ read file until char found - c++

I want to create a function that can read a file char by char continuously until some specific char encountered.
This is my method in a class FileHandler.
char* tysort::FileHandler::readUntilCharFound(size_t start, char seek)
{
char* text = new char;
if(this->inputFileStream != nullptr)
{
bool goOn = true;
size_t seekPos = start;
while (goOn)
{
this->inputFileStream->seekg(seekPos);
char* buffer = new char;
this->inputFileStream->read(buffer, 1);
if(strcmp(buffer, &seek) != 0)
{
strcat(text, buffer); // Execution stops here
seekPos++;
}
else
{
goOn = false;
}
}
}
//printf("%s\n", text);
return text;
}
I test this function and it actually works. This is an example to read a file content until new line character '\n' found.
size_t startPosition = 0;
char* text = this->fileHandler->readUntilCharFound(startPosition, '\n');
However, I am sure that something not right is exists somewhere in the code because if I use those method in a loop block, the app will just hangs. I guess the 'not right' things are about pointer but I don't know exactly where. Could you please point out for me?

C++ provides some easy-to-use solutions. For instance:
istream& getline (istream& is, string& str, char delim);
In your case, the parameter would be the equivalent of your text variable and delim would be the equivalent of your seek parameter. Also, the return value of getline would in some way be the equivalent of your goOn flag (there are good FAQs regarding the right patterns to check for EOF and IO errors using the return value of getline)

The lines
if(strcmp(buffer, &seek) != 0)
and
strcat(text, buffer); // Execution stops here
are causes for undefined behavior. strcmp and strcat expect null terminated strings.
Here's an updated version, with appropriate comments.
char* tysort::FileHandler::readUntilCharFound(size_t start, char seek)
{
// If you want to return a string containing
// one character, you have to allocate at least two characters.
// The first one contains the character you want to return.
// The second one contains the null character - '\0'
char* text = new char[2];
// Make it a null terminated string.
text[1] = '\0';
if(this->inputFileStream != nullptr)
{
bool goOn = true;
size_t seekPos = start;
while (goOn)
{
this->inputFileStream->seekg(seekPos);
// No need to allocate memory form the heap.
char buffer[2];
this->inputFileStream->read(buffer, 1);
if( buffer[0] == seek )
{
buffer[1] = '\0';
strcat(text, buffer);
seekPos++;
}
else
{
goOn = false;
}
}
}
return text;
}
You can further simplify the function to:
char* tysort::FileHandler::readUntilCharFound(size_t start, char seek)
{
// If you want to return a string containing
// one character, you have to allocate at least two characters.
// The first one contains the character you want to return.
// The second one contains the null character - '\0'
char* text = new char[2];
text[1] = '\0';
if(this->inputFileStream != nullptr)
{
this->inputFileStream->seekg(start);
// Keep reading from the stream until we find the character
// we are looking for or EOF is reached.
int c;
while ( (c = this->inputFileStream->get()) != EOF && c != seek )
{
}
if ( c != EOF )
{
text[0] = c;
}
}
return text;
}

this->inputFileStream->read(buffer, 1);
No error checking.
if(strcmp(buffer, &seek) != 0)
The strcmp function is used to compare strings. Here you just want to compare two characters.

Related

How do I tokenize a string only using <iostream>?

I want to learn how to tokenize a string, like the strtok function only using <iostream>.
I made a program that deletes the spaces but I don't thinks its the same as strtok.
#include <iostream>
int main(){
int i = 0;
char s[100]="fix the car";
while(s[i] != '\0'){
if(s[i] == ' ')
s[i] = s[i-1];
else std::cout << s[i];
i++;
}
return 0;
}
prints: fixthecar
I want the whole strtok function, not just deleting delimiters, heard I have to use pointers, but I don't know how to code it.
The internal implementation of strtok has already been discussed here, you should check that before opening new questions.
The key to the operation of strtok() is preserving the location of the last seperator between seccessive calls (that's why strtok() continues to parse the very original string that is passed to it when it is invoked with a null pointer in successive calls)..
Have a look at this strtok() implementation which has a slightly different functionality than the one provided by strtok()
char *zStrtok(char *str, const char *delim) {
static char *static_str=0; /* var to store last address */
int index=0, strlength=0; /* integers for indexes */
int found = 0; /* check if delim is found */
/* delimiter cannot be NULL
* if no more char left, return NULL as well
*/
if (delim==0 || (str == 0 && static_str == 0))
return 0;
if (str == 0)
str = static_str;
/* get length of string */
while(str[strlength])
strlength++;
/* find the first occurance of delim */
for (index=0;index<strlength;index++)
if (str[index]==delim[0]) {
found=1;
break;
}
/* if delim is not contained in str, return str */
if (!found) {
static_str = 0;
return str;
}
/* check for consecutive delimiters
*if first char is delim, return delim
*/
if (str[0]==delim[0]) {
static_str = (str + 1);
return (char *)delim;
}
/* terminate the string
* this assignmetn requires char[], so str has to
* be char[] rather than *char
*/
str[index] = '\0';
/* save the rest of the string */
if ((str + index + 1)!=0)
static_str = (str + index + 1);
else
static_str = 0;
return str;
}

How to declare an empty char* and increase the size dynamically?

Let's say I am trying to do the following (this is a sub problem of what I am trying to achieve):
int compareFirstWord(char* sentence, char* compareWord){
char* temp; int i=-1;
while(*(sentence+(++i))!=' ') { *(temp+i) = *(sentence+i); }
return strcmp(temp, compareWord); }
When I ran compareFirstWord("Hi There", "Hi");, I got error at the copy line. It said I was using temp uninitialized. Then I used char* temp = new char[]; In this case the function returned 1 and not 0. When I debugged, I saw temp starting with some random characters of length 16 and strcmp fails because of this.
Is there a way to declare an empty char* and increase the size dynamically only to length and contents of what I need ? Any way to make the function work ? I don't want to use std::string.
In C, you may do:
int compareFirstWord(const char* sentence, const char* compareWord)
{
while (*compareWord != '\0' && *sentence == *compareWord) {
++sentence;
++compareWord;
}
if (*compareWord == '\0' && (*sentence == '\0' || *sentence == ' ')) {
return 0;
}
return *sentence < *compareWord ? -1 : 1;
}
With std::string, you just have:
int compareFirstWord(const std::string& sentence, const std::string& compareWord)
{
return sentence.compare(0, sentence.find(" "), compareWord);
}
temp is an uninitialized variable.
It looks like you are attempting to extract the first word out of the sentence in your loop.
In order to do it this way, you would first have to initialize temp to be at least as long as your sentence.
Also, your sentence may not have a space in it. (What about period, \t, \r, \n? Do these matter?)
In addition, you must terminate temp with a null character.
You could try:
int len = strlen(sentence);
char* temp = new char[len + 1];
int i = 0;
while(i < len && *(sentence+(i))!=' ') {
*(temp+i) = *(sentence+i);
i++;
}
*(temp+i) = '\0';
int comparable = strcmp(temp, compareWord);
delete temp;
return comparable;
Also consider using isspace(*(sentence+(i))), which will at least catch all whitespace.
In general, however, I'd use a library, or STL... Why reinvent the wheel...

Seeking file using c++

void Test(Packets& packet)
{
char buf[BUFLEN];
char* offset = buf;
unsigned int foo;
for(;!gzeof(file_handle);){
int len = gzread(file_handle, offset, sizeof(buf)-(offset-buf));
char* cur = buf;
char* end = offset+len;
for (char* eol; (cur<end) && (eol = std::find(cur, end, '\n')) < end; cur = eol + 1)
{
string string_array = string(cur, eol);
if(string_array[0] == 'L'){
packet.foo = blah;
}else{
packet.foo = bar;
}
//After readnig a line.. how do I make sure it quits this function (does another job in some other function) and continous from the next line?
}
#ifdef ROTATION
offset = std::rotate(buf, cur, end);
#else
offset = buf + (end-cur);
std::rotate(buf, cur, end);
#endif
}
std::cout<<std::string(buf, offset) ;
}
After readnig a line.. how do I make sure it quits this function (does another job in some other function) and continous from the next line?
I tried to make buf, offset as global variables and read the file only when the buffer is empty.. it still doesn't work.
Why you want to quit the function and continue from the next line?
Change your functions the way that one read a line and the other does whatever with it
somewhere you probably want to do something like or am i wrong?
{
Packets myPacket;
Test(packet);
OtherFunction(packet);
}

C File I/O bug in my code

I attempted writing a thesaurus program which reads a thesaurus file, for example:
drink:beverage
clever:smart,witty
and a .txt document, changing up the words it finds from the thesaurus and creating a new document with the modified text. However there appears to be a bug, I have narrowed it down to the while loop in getReplacement(), by checking a print operation before and after. I would really appreciate someone finding why it won't work.
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <iostream>
char* getReplacement(char* original, FILE* file);
int main(int argc, char* argv[])
{
using namespace std;
FILE* thes = fopen(argv[1], "r");
FILE* text = fopen(argv[2], "r+");
FILE* nText = fopen("temp.txt", "w");
if(thes == NULL || text == NULL || nText == NULL)
return 1;
char word[20] = {};
char c;
int bytesW=0;
while((c = fgetc(text)) != EOF)
{
fputc(c, nText);
bytesW++;
if(isalpha(c))
{
int len = strlen(word);
word[len] = c;
word[len + 1] = '\0';
}
else
{
if(word == "")
continue;
cout << 7<<endl;
char* replacement = getReplacement(word, thes);
if(replacement == NULL)
continue;
fseek(nText,bytesW-1-strlen(word),SEEK_SET);
for(int i=0;i<strlen(replacement);i++)
fputc(replacement[i],nText);
int diff = strlen(word) - strlen(replacement);
while(diff-- >0)
fputc(' ', nText);
bytesW = bytesW-1-strlen(word)+strlen(replacement);
fseek(nText, bytesW, SEEK_SET);
}
}
fclose(thes);
fclose(text);
fclose(nText);
return 0;
}
char* getReplacement(char* const original, FILE* file)
{
using namespace std;
char* line="";
const short len = strlen(original);
int numOfOptions=1;
int toSkip=0; // number of commas to skip over
outer: while(fgets(line,1000,file) != NULL)
{
for(int i=0;i<len;i++)
if(line[i] != original[i])
{
goto outer;
}
if(line[len] != ':')
goto outer;
for(int i=0;i<len;i++)
line++;
for(int i=0;i<strlen(line);i++)
if(line[i] == ',')
numOfOptions++;
toSkip = rand()%numOfOptions;
while(toSkip >0)
{
if(line[0] == ',')
toSkip--;
line++;
}
return line;
}
return NULL;
}
char* line="";
// ... snip ...
outer: while(fgets(line,1000,file) != NULL)
Here's your problem. You are trying to read into a literal string; you instead need to allocate an array, on the stack or via malloc() to read into.
A string that you write in quotes in C is known as a literal. This means that this string gets embedded in the code of your program, and later loaded into memory when your programs is loaded. Usually it gets loaded into memory that's marked read-only, but that's platform dependent. That string that you wrote has room only for the null terminator. But you are trying to read up to 1000 characters into it. This will either lead to a segmentation fault because you were writing to read-only memory, or will lead to you writing all over some other memory, producing who knows what behavior.
What you want to do instead is allocate a buffer that you can read into:
char line[1000];
or, if you have limited stack space:
char *line = malloc(1000 * sizeof(char));
Furthermore, in your main() function, you do:
char c;
while((c = fgetc(text)) != EOF)
fgetc() returns an int, not a char. This way, it can return a value corresponding to a valid character if a value is read, or a value that is outside that range if you hit the end of file.
You can't compare strings in C using ==; what that does is compare whether they are the same pointer, not whether they have the same contents. It doesn't really make sense to recalculate the length of the current word each time; why not just keep track of len yourself, incrementing it every time you add a character, and then when you want to check if the word is empty, check if len == 0? Remember to reset len to 0 after the end of the word so you'll start over on the next word. Also remember to reset if len goes over sizeof(word); you don't want to write more than word can hold, or you will start scribbling all over random stuff on your stack and lots of things will break.

How can I tokenize a c-style string from a file without making a copy?

Let's say I have a constant c-style string say
const char* msg = "fred,jim,345,7665";
I'd like to tokenize this and read out the individual fields but for performance reasons I don't want to make a copy. How can I do this?
Obviously strtok takes a non-constant pointer and boost::tokenizer is an option but I am unsure what is doing behind the scenes.
Inevitably you will require some copy of the string, even if it is a substring being copied.
If you have a strtok_r function, you can use that, but it will still require a mutable string to do its work. Beware, however, as not all systems provide the function (e.g. Windows), which is why I've provided an implementation here. It works by requiring an additional parameter: a pointer to a C string to save the address of the next match. This allows for it to be more reentrant (thread-safe) in theory. However, you'll still be mutating the value. You could modify it to suit your needs if you like, perhaps copying N bytes into a destination buffer and null-terminating that buffer to avoid the need to modify the source string.
/*
Usage:
char *tok;
char *savep;
tok = mystrtok_r (somestr, ",", &savep);
while (NULL != tok)
{
/* Do something with `tok'. */
tok = mystrtok_r (NULL, ",", &savep);
}
*/
char *
mystrtok_r (char *str, const char *delims, char **nextp)
{
if (str == NULL)
str = *nextp;
str += strspn (str, delims);
*nextp = str + strcspn (str, delims);
**nextp = 0;
if (*str == 0)
return NULL;
++*nextp;
return str;
}
It depends on how you're going to use it.
If you want to get the next token, and then the next (like an iteration over the string, then you only really need to copy the current token into memory.
long strtok2( char *strDest, const char *strSrc, const char cTok, long lOffset, long lMax)
{
if(lMax > 0)
{
strSrc += lOffset;
char * start = strDest;
while(--lMax && *strSrc != cTok && (*strDest++ = * strSrc++) );
*strDest = 0; //for when the token was found, not the null.
return strDest - start - 1; //the length of the token
}
return 0;
}
I snagged a simple strcpy from http://vijayinterviewquestions.blogspot.com.au/2007/07/implement-strcpy-function.html
const char* msg = "fred,jim,345,7665";
char * buffer[20];
long offset = 0;
while(length = strtok2(buffer, msg, ',', offset, 20))
{
cout << buffer;
offset += (length+1);
}
Well, without a little more detail it's hard to know exactly what you want. I'll guess you are parsing delimited items where consecutive delimiters should be treated as zero length tokens (which is usually correct for comma separated elements). I'm also assuming a blank line counts as a single zero length token. This is how I'd approach it:
const char *token_begin = msg;
int length;
for(;;)
{
length = 0;
while(!isDelimiter(token_begin[length])) //< must include \0 as delimiter
++length;
//..do something here with token. token is at: token_begin[0..length)
if ( token_begin[length] != 0 )
token_begin = &token_begin[length+1]; //skip beyond non-null delimiter
else
break; //token null terminated. exit
}
If you are going to store the tokens somewhere then a copy will be necessary in any case and strtok does this nicely by using the string a placing null terminating character inside it.
The only other option I see to avoid copying it is a lexer which reads the string and through a state machine produces tokens by scanning the string and storing the partial results in a buffer but every token should in any case be stored at least in a null terminated string to you are not really saving anything.
Here is my proposal, my code is structured and use a global variable pos(I know global variable are a bad practice but is only to give you the idea), you can replace it with a data member if you need OOP.
int position, messageLength;
char token[MAX]; // MAX = Value greater than the maximum length
// of the tokens(e.g. 1,000);
bool hasNext()
{
return position < messageLength;
}
char* next(const char* message)
{
int i = 0;
while (position < messageLength && message[position] != ',') {
token[i++] = message[position];
position++;
}
position++; // ',' found
token[i] = '\0';
return token;
}
int main(int argc, char **argv)
{
const char* msg = "fred,jim,345,7665";
position = 0;
messageLength = strlen(msg);
while (hasNext())
cout << next(msg) << endl;
return EXIT_SUCCESS;
}