C File I/O bug in my code - c++

I attempted writing a thesaurus program which reads a thesaurus file, for example:
drink:beverage
clever:smart,witty
and a .txt document, changing up the words it finds from the thesaurus and creating a new document with the modified text. However there appears to be a bug, I have narrowed it down to the while loop in getReplacement(), by checking a print operation before and after. I would really appreciate someone finding why it won't work.
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <iostream>
char* getReplacement(char* original, FILE* file);
int main(int argc, char* argv[])
{
using namespace std;
FILE* thes = fopen(argv[1], "r");
FILE* text = fopen(argv[2], "r+");
FILE* nText = fopen("temp.txt", "w");
if(thes == NULL || text == NULL || nText == NULL)
return 1;
char word[20] = {};
char c;
int bytesW=0;
while((c = fgetc(text)) != EOF)
{
fputc(c, nText);
bytesW++;
if(isalpha(c))
{
int len = strlen(word);
word[len] = c;
word[len + 1] = '\0';
}
else
{
if(word == "")
continue;
cout << 7<<endl;
char* replacement = getReplacement(word, thes);
if(replacement == NULL)
continue;
fseek(nText,bytesW-1-strlen(word),SEEK_SET);
for(int i=0;i<strlen(replacement);i++)
fputc(replacement[i],nText);
int diff = strlen(word) - strlen(replacement);
while(diff-- >0)
fputc(' ', nText);
bytesW = bytesW-1-strlen(word)+strlen(replacement);
fseek(nText, bytesW, SEEK_SET);
}
}
fclose(thes);
fclose(text);
fclose(nText);
return 0;
}
char* getReplacement(char* const original, FILE* file)
{
using namespace std;
char* line="";
const short len = strlen(original);
int numOfOptions=1;
int toSkip=0; // number of commas to skip over
outer: while(fgets(line,1000,file) != NULL)
{
for(int i=0;i<len;i++)
if(line[i] != original[i])
{
goto outer;
}
if(line[len] != ':')
goto outer;
for(int i=0;i<len;i++)
line++;
for(int i=0;i<strlen(line);i++)
if(line[i] == ',')
numOfOptions++;
toSkip = rand()%numOfOptions;
while(toSkip >0)
{
if(line[0] == ',')
toSkip--;
line++;
}
return line;
}
return NULL;
}

char* line="";
// ... snip ...
outer: while(fgets(line,1000,file) != NULL)
Here's your problem. You are trying to read into a literal string; you instead need to allocate an array, on the stack or via malloc() to read into.
A string that you write in quotes in C is known as a literal. This means that this string gets embedded in the code of your program, and later loaded into memory when your programs is loaded. Usually it gets loaded into memory that's marked read-only, but that's platform dependent. That string that you wrote has room only for the null terminator. But you are trying to read up to 1000 characters into it. This will either lead to a segmentation fault because you were writing to read-only memory, or will lead to you writing all over some other memory, producing who knows what behavior.
What you want to do instead is allocate a buffer that you can read into:
char line[1000];
or, if you have limited stack space:
char *line = malloc(1000 * sizeof(char));
Furthermore, in your main() function, you do:
char c;
while((c = fgetc(text)) != EOF)
fgetc() returns an int, not a char. This way, it can return a value corresponding to a valid character if a value is read, or a value that is outside that range if you hit the end of file.
You can't compare strings in C using ==; what that does is compare whether they are the same pointer, not whether they have the same contents. It doesn't really make sense to recalculate the length of the current word each time; why not just keep track of len yourself, incrementing it every time you add a character, and then when you want to check if the word is empty, check if len == 0? Remember to reset len to 0 after the end of the word so you'll start over on the next word. Also remember to reset if len goes over sizeof(word); you don't want to write more than word can hold, or you will start scribbling all over random stuff on your stack and lots of things will break.

Related

Input string in char array C++

This is my code:
char A[10];
char B[5];
cin >> setw(10) >> A;
cin >> setw(5) >> B;
cout << A;
cout << B;
If the input exceeds the array size (ex: 10 for A variable), then the program does not prompt me to enter the data for the second one. It goes right to the the end and execute the two "cout" lines.
Input: abcabcabcabcabcabc (for A)
Output: abcabcabcabca (13 space for char + 2 '\n')
Output expected:
abcabcabc (for A)
dddd (for B)
I want to enter data for both variables even if I entered too many characters for one of them
In C++ you would do this more like as follows
std::string A,B;
std::getline(std::cin,A);
std::getline(std::cin,B);
This avoids any pitfalls with fixed-size arrays, such as char[10] and reads the full line. Alternatively, you may add a delimiter
const auto delim = '.'; // say
std::getline(std::cin,A,delim);
std::getline(std::cin,B,delim);
I don't think there is a simple way (i.e. not coding it yourself) for allowing multiple delimiters.
If you would like to read C strings with a fixed limit, the best approach is to use fgets, which is part of the standard C++ library.
You can also use iomanip to setw, like this:
char A[10];
char B[15];
cin >> setw(10) >> A;
cin >> setw(15) >> B;
Note that the length of the string that you get back will be less by one than the width that you set, because C strings require null termination.
Demo.
Note: Although this mixture of C and C++ would work, you would be better off using std::string for an approach that is more idiomatic to C++. I recognize that this could be a learning exercise in which you are not allowed to use std::string, though.
As you are using C++, you can use string
string A,B;
cin>>A>>B;
Here you can scan as many characters as you want.
If you want to stick with C functions, you've got a couple of options.
The first option is to leverage the fact that fgets includes the newline in the string it reads, but only if the reason it stopped reading is because it hit the end of a line. You can check whether the last character is a newline, and if not, throw out anything left in the input up to and including the next newline:
int count;
fgets(A, 10, stdin);
count = strlen(A);
if (count == 9 && A[8] != '\n') {
do {} while (getc(stdin) != '\n');
}
fgets(B, 15, stdin);
printf("A: %s; B: %s\n", A, B);
If you don't want the newline in your string, be sure to remove it. And you may want to treat too much input as an error rather than just skipping extra characters.
A slightly simpler option is to use scanf instead, but only if you don't want to allow spaces in each variable's input:
int count;
scanf("%9s%n", A, &count);
if (count == 9) {
do {} while (!isspace(getc(stdin)));
}
scanf("%14s", B);
printf("A: %s; B: %s\n", A, B);
This C function reads a line of any length and returns a pointer to it in a newly allocated memory block (remember to free() it). If keepNL is true and a newline character (i.e. not EOF) stopped the reading, it's included at the end of the string. If len isn't NULL, *len is set to the length of the line, including any newline character. It makes it possible to read lines with '\0' in, which strlen() can't handle.
On failure, NULL is returned and *len is unchanged. If feof() is true, EOF was reached before any characters was read (no more lines in the file). If ferror() is true, an I/O error occured. If neither feof() nor ferror() is true, memory was exhausted.
Note that the memory block may be larger than the length of the string. If you need to conserve memory, realloc() it yourself to *len + 1U.
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#define MIN_LINE_BUF ((size_t) 128U) /* >= 1 */
char *fgetline(size_t *len, FILE *stream, int keepNL) {
char *buf;
int c;
size_t i, size;
if (!(buf = malloc(size = MIN_LINE_BUF))) {
return NULL;
}
i = 0U;
while ((c = getc(stream)) != EOF) {
if (c != '\n' || keepNL) {
buf[i++] = (char) c;
if (i == size) {
char *newPtr;
if (size > (size_t) -1 - size
|| !(newPtr = realloc(buf, size <<= 1))) {
free(buf);
return NULL;
}
buf = newPtr;
}
}
if (c == '\n') {
break;
}
}
if ((c == EOF && i == 0U) || ferror(stream)) {
free(buf);
return NULL;
}
buf[i++] = '\0';
if (len) {
*len = i;
}
return buf;
}

C++ read file until char found

I want to create a function that can read a file char by char continuously until some specific char encountered.
This is my method in a class FileHandler.
char* tysort::FileHandler::readUntilCharFound(size_t start, char seek)
{
char* text = new char;
if(this->inputFileStream != nullptr)
{
bool goOn = true;
size_t seekPos = start;
while (goOn)
{
this->inputFileStream->seekg(seekPos);
char* buffer = new char;
this->inputFileStream->read(buffer, 1);
if(strcmp(buffer, &seek) != 0)
{
strcat(text, buffer); // Execution stops here
seekPos++;
}
else
{
goOn = false;
}
}
}
//printf("%s\n", text);
return text;
}
I test this function and it actually works. This is an example to read a file content until new line character '\n' found.
size_t startPosition = 0;
char* text = this->fileHandler->readUntilCharFound(startPosition, '\n');
However, I am sure that something not right is exists somewhere in the code because if I use those method in a loop block, the app will just hangs. I guess the 'not right' things are about pointer but I don't know exactly where. Could you please point out for me?
C++ provides some easy-to-use solutions. For instance:
istream& getline (istream& is, string& str, char delim);
In your case, the parameter would be the equivalent of your text variable and delim would be the equivalent of your seek parameter. Also, the return value of getline would in some way be the equivalent of your goOn flag (there are good FAQs regarding the right patterns to check for EOF and IO errors using the return value of getline)
The lines
if(strcmp(buffer, &seek) != 0)
and
strcat(text, buffer); // Execution stops here
are causes for undefined behavior. strcmp and strcat expect null terminated strings.
Here's an updated version, with appropriate comments.
char* tysort::FileHandler::readUntilCharFound(size_t start, char seek)
{
// If you want to return a string containing
// one character, you have to allocate at least two characters.
// The first one contains the character you want to return.
// The second one contains the null character - '\0'
char* text = new char[2];
// Make it a null terminated string.
text[1] = '\0';
if(this->inputFileStream != nullptr)
{
bool goOn = true;
size_t seekPos = start;
while (goOn)
{
this->inputFileStream->seekg(seekPos);
// No need to allocate memory form the heap.
char buffer[2];
this->inputFileStream->read(buffer, 1);
if( buffer[0] == seek )
{
buffer[1] = '\0';
strcat(text, buffer);
seekPos++;
}
else
{
goOn = false;
}
}
}
return text;
}
You can further simplify the function to:
char* tysort::FileHandler::readUntilCharFound(size_t start, char seek)
{
// If you want to return a string containing
// one character, you have to allocate at least two characters.
// The first one contains the character you want to return.
// The second one contains the null character - '\0'
char* text = new char[2];
text[1] = '\0';
if(this->inputFileStream != nullptr)
{
this->inputFileStream->seekg(start);
// Keep reading from the stream until we find the character
// we are looking for or EOF is reached.
int c;
while ( (c = this->inputFileStream->get()) != EOF && c != seek )
{
}
if ( c != EOF )
{
text[0] = c;
}
}
return text;
}
this->inputFileStream->read(buffer, 1);
No error checking.
if(strcmp(buffer, &seek) != 0)
The strcmp function is used to compare strings. Here you just want to compare two characters.

C/C++ reading line at a time

I was trying to solve a programming problem of some site and this one had the following statement:
Read a string and parse it as a number, char 'l' can be considered as number 1 and chars 'o' and 'O' can be considered as number 0, commas and spaces will be accepted in the input but ignored, if any other character is found then output error...
So... since there can be spaces in the lines, I used gets (the documentation says it removes the new line and puts a terminator)...
My sequence of IF test if it is a number, then if its an acceptable letter, then checks if it is not a comma or a space... And I found out that it was almost always entering in the last IF even though there wasn't any character that should lead it there so I changed the printf inside it to print the
printf("%d error", st[k]);
And it outputs 13: carriage return... I tried this compiler here
#include <cstdio>
#include <cstring>
#include <cstdlib>
int main()
{
char st[100];
char buf[100];
int k, i;
long long int num;
ops: while(gets(st))
{
for(k = i = 0; st[k]; k++)
if(st[k] >= '0' && st[k] <= '9')
buf[i++] = st[k];
else if(st[k] == 'o' || st[k] == 'O')
buf[i++] = '0';
else if(st[k] == 'l')
buf[i++] = '1';
else if(st[k] != ',' && st[k] != ' ')
{
printf("error\n");
goto ops;
}
// remaining code comes here...
}
The input sample had the following lilnes:
lo6
234,657
hi
,,,,,5,,5, 4
2200000000
00
Should I use other function to read instead?
Any suggestions on how to avoid this damn Carriage Return?
The statemente for the problem can be seen here if you want more detail
Thanks
EDIT:
I'm asking that because there seem to be a difference between the compiler I'm using and the compiler the website was using, once I submitted a code that wasn't generating the correct output on mine but I thought the code was correct... and it passed. Then after it, I tried the code on a linux virtual machine and also correct but my gcc on windows failed... some characters were completely away from where they should be
The thing is:
The preferred method of line input is getline. If you do not know the width of the input beforehand, getline will allocate space for you if you set the buffer pointer to NULL.
Since you indicate you have some experience in C, the following will show you how to step through an input string and parse it in the manner you want. While there are many ways to handle parsing strings, it is hard to beat assigning a pointer to the beginning of the string and then advancing down the string until you reach the null-terminating character (unsigned 0, or char '\0'). If you have any questions after looking it over, just ask:
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char *argv[]) {
char *buffer = NULL;
size_t n = 0;
ssize_t len;
char *ptr = NULL;
char *output = NULL;
int idx = 0;
printf ("\nEnter your string below: ([ctrl + d] on blank line to end)\n");
while ((len = getline (&buffer, &n, stdin)) != -1) {
ptr = buffer;
idx = 0;
output = malloc (sizeof (char) * len);
while (*ptr != 0) {
if (*ptr >= 48 && *ptr <= 57) output [idx++] = *ptr;
if (*ptr == 'I') output [idx++] = '1';
if (*ptr == 'o' || *ptr == 'O') output [idx++] = '0';
ptr++;
}
output [idx] = 0;
printf ("\n valid output: %s\n\n", output);
free (output);
}
return 0;
}
output:
Enter your string below: ([ctrl + d] on blank line to end)
This sting has 12345 aeiou AEIOU,,,,,commas, and 99, to end.
valid output: 123450100990

Trying to create a program to read a users input then break the array into seperate words are my pointers all valid? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 11 years ago.
char **findwords(char *str);
int main()
{
int test;
char words[100]; //an array of chars to hold the string given by the user
char **word; //pointer to a list of words
int index = 0; //index of the current word we are printing
char c;
cout << "die monster !";
//a loop to place the charecters that the user put in into the array
do {
c = getchar();
words[index] = c;
} while (words[index] != '\n');
word = findwords(words);
while (word[index] != 0) //loop through the list of words until the end of the list
{
printf("%s\n", word[index]); // while the words are going through the list print them out
index ++; //move on to the next word
}
//free it from the list since it was dynamically allocated
free(word);
cin >> test;
return 0;
}
char **findwords(char *str)
{
int size = 20; //original size of the list
char *newword; //pointer to the new word from strok
int index = 0; //our current location in words
char **words = (char **)malloc(sizeof(char *) * (size +1)); //this is the actual list of words
/* Get the initial word, and pass in the original string we want strtok() *
* to work on. Here, we are seperating words based on spaces, commas, *
* periods, and dashes. IE, if they are found, a new word is created. */
newword = strtok(str, " ,.-");
while (newword != 0) //create a loop that goes through the string until it gets to the end
{
if (index == size)
{
//if the string is larger than the array increase the maximum size of the array
size += 10;
//resize the array
char **words = (char **)malloc(sizeof(char *) * (size +1));
}
//asign words to its proper value
words[index] = newword;
//get the next word in the string
newword = strtok(0, " ,.-");
//increment the index to get to the next word
++index;
}
words[index] = 0;
return words;
}
break the array into the individual words then print them out th
do {
c = getchar();
words[index] = c;
} while (words[index] != '\n');
you should also add '\0' at the end of your string (after the loop) in "words" array
You are not incrementing index this way you save only the last c
you should do while(word[index] != '\0') not while(word[index] != 0 ('\0' indicates end of line no 0)
while (word[index] != 0) //loop through the list of words until the end of the list
{
printf("%s\n", word[index]); // while the words are going through the list print them out
index ++; //move on to the next word
}
I think there is a bug memory leakage because you first allocate
char **words = (char **)malloc(sizeof(char *) * (size +1)); //when declaring
when declaring the variable, and after that you again allocate the same **words in the loop body:
char **words = (char **)malloc(sizeof(char *) * (size +1)); // in the while loop
The above line in the while loop with which you allocate the space to store the string should be (1)
//in the while loop should be
char *words[index] = (char *)malloc(sizeof(char ) * (size +1));
strcpy (words[index], str);
Or simply (2)
words[index] = str;
Because the str already points to a valid memory location which you assign to the array of pointers.
In the (1) method above you are allocating a block of memory of size+1 of type char and copying the string in str into words[index] with strcpy. For this you require to reserve a memory location into words[index] first and then perform the copy. If this is the case the the memory freeing is not at simple as free (word) instead each of the allocated block will need to be manually removed.
for (index = 0; words[index] != 0; index++)
{
free (words[index];
}
free (words);
In the (2) solution is in my opinion not a good one, because you have passed a pointer to a string and assign that pointer value to store the string. So both the str and the words[index] point to the same location. Now after the function returns if anybody frees str (if it was dynamically allocated) then the words[index] reference will become illegal.
EDIT:
Also you need to use
gets (words); or in using c++ cin >> words; or use getline, or simply increment the index counter in your code, and assign a null at the end to terminate the string.
in main function. You do not increment the index counter so all the characters are assigned in the same location.
I think everybody is trying to do it the hard way.
The std streams already break the input into words using the >> operator. We just need to be more careful on how we define a word. To do this you just need to define an ctype facet that defines space correctly (for the context) and then imbue the stream with it.
#include <locale>
#include <string>
#include <sstream>
#include <iostream>
// This is my facet that will treat the ,.- as space characters and thus ignore them.
class WordSplitterFacet: public std::ctype<char>
{
public:
typedef std::ctype<char> base;
typedef base::char_type char_type;
WordSplitterFacet(std::locale const& l)
: base(table)
{
std::ctype<char> const& defaultCType = std::use_facet<std::ctype<char> >(l);
// Copy the default value from the provided locale
static char data[256];
for(int loop = 0;loop < 256;++loop) { data[loop] = loop;}
defaultCType.is(data, data+256, table);
// Modifications to default to include extra space types.
table[','] |= base::space;
table['.'] |= base::space;
table['-'] |= base::space;
}
private:
base::mask table[256];
};
Now the code looks very simple:
int main()
{
// Create the facet.
std::ctype<char>* wordSplitter(new WordSplitterFacet(std::locale()));
// Here I am using a string stream.
// But any stream can be used. Note you must imbue a stream before it is used.
// Otherwise the imbue() will silently fail.
std::stringstream teststr;
teststr.imbue(std::locale(std::locale(), wordSplitter));
// Now that it is imbued we can use it.
// If this was a file stream then you could open it here.
teststr << "This, stri,plop";
// Now use the stream normally
std::string word;
while(teststr >> word)
{
std::cout << "W(" << word << ")\n";
}
}
Testing:
> ./a.out
W(This)
W(stri)
W(plop)
With a correctly imbues stream we can use the old trick of copying from a stream into a vector:
std::copy(std::istream_iterator<std::string>(teststr),
std::istream_iterator<std::string>(),
std::back_inserter(data)
);
Lots of issues:
In your first loop you are forgetting to increment index after each read character.
Also, if you have more than 100 characters, your program will likely crash.
getchar returns an "int". Not a char. Very important - especially if you input is redirected or piped in.
Try this instead:
int tmp;
tmp = getchar();
while ((index < 99) && (tmp >= 0) && (tmp != '\n'))
{
word[index] = (char)tmp;
tmp = getchar();
index++;
}
word[index] = 0; /* make life easier - null terminate your string */
Your "findwords" function scares the hell out of me. You haven't don't have enough points on S.O. for me to elaborate on the issues here. In any case
I'm tempted to open with some lame crack about the '80s calling and wanting their obsolete "C++ as a better C" code back, but I'll try to restrain myself and just give at least some idea of how you might consider doing something like this:
std::string line;
// read a line of input from the user:
std::getline(line, std::cin);
// break it up into words:
std::istringstream buffer(line);
std::vector<std::string> words((std::istream_iterator<std::string>(buffer)),
std::istream_iterator<std::string>());
// print out the words, one per line:
std::copy(words.begin(), words.end(),
std::ostream_iterator(std::cout, "\n"));

strcmp segmentation fault

Here is a problem from spoj. nothing related to algorithms, but just c
Sample Input
2
a aa bb cc def ghi
a a a a a bb bb bb bb c c
Sample Output
3
5
it counts the longest sequence of same words
http://www.spoj.pl/problems/WORDCNT/
The word is less than 20 characters
But when i run it, it's giving segmentation fault. I debugged it using eclipse. Here's where it crashes
if (strcmp(previous, current) == 0)
currentLength++;
with the following message
No source available for "strcmp() at 0x2d0100"
What's the problem?
#include <iostream>
#include <cstring>
#include <string>
#include <cstdio>
using namespace std;
int main(int argc, const char *argv[])
{
int t;
cin >> t;
while (t--) {
char line[20000], previous[21], current[21], *p;
int currentLength = 1, maxLength = 1;
if (cin.peek() == '\n') cin.get();
cin.getline(line, 20000);
p = strtok(line, " '\t''\r'");
strcpy(previous, p);
while (p != NULL) {
p = strtok(NULL, " '\t''\r'");
strcpy(current, p);
if (strcmp(previous, current) == 0)
currentLength++;
else
currentLength = 1;
if (currentLength > maxLength)
maxLength = currentLength;
}
cout << maxLength << endl;
}
return 0;
}
The problem is probably here:
while (p != NULL) {
p = strtok(NULL, " '\t''\r'");
strcpy(current, p);
While p may not be NULL when the loop is entered.
It may be NULL when strcpy is used on it.
A more correct form of the loop would be:
while ((p != NULL) && ((p = strtok(NULL, " \t\r")) != NULL))
{
strcpy(current, p);
Note. Tokenizing a stream in C++ is a lot easier.
std::string token;
std::cin >> token; // Reads 1 white space seoporated word
If you want to tokenize a line
// Step 1: read a single line in a safe way.
std::string line;
std::getline(std::cin, line);
// Turn that line into a stream.
std::stringstream linestream(line);
// Get 1 word at a time from the stream.
std::string token;
while(linestream >> token)
{
// Do STUFF
}
Forgot to check for NULL on strtok, it will return NULL when done and you cannot use that NULL on strcpy, strcmp, etc.
Note that you do a strcpy right after the strtok, you should check for null before doing that using p as a source.
The strtok man page says:
Each call to strtok() returns a pointer to a null-terminated string containing the next
token. This string does not include the delimiting character. If no more tokens are found,
strtok() returns NULL.
And in your code,
while (p != NULL) {
p = strtok(NULL, " '\t''\r'");
strcpy(current, p);
you are not checking for NULL (for p) once the whole string has been parsed. After that you are trying to copy p (which is NULL now) in current and so getting the crash.
You will find that one of previous or current does not point to a null-terminated string at that point, so strcmp doesn't know when to stop.
Use proper C++ strings and string functions instead, rather than mixing C and C++. The Boost libraries can provide a much safer tokeniser than strtok.
You probably undersized current and previous. You should really use std::string for this kind of thing- that's what it's for.
You are doing nothing to check your string lengths before copying them into buffers of size 21. I bet that you are somehow overwriting the end of the buffer.
If you insist on using C strings, I'd suggest using strncmp instead of strcmp. That way, in case you are ending up with a non-null terminated string (which is what I suspect is the case), you can restrict your compare to the max length of the string (in this case 21).
Try this one...
#include <cstdio>
#include <cstring>
#define T(x) strtok(x, " \n\r\t")
char line[44444];
int main( )
{
int t; scanf("%d\n", &t);
while(t--)
{
fgets(line, 44444, stdin);
int cnt = 1, len, maxcnt = 0, plen = -1;
for(char *p = T(line); p != NULL; p = T(NULL))
{
len = strlen(p);
if(len == plen) ++cnt;
else cnt = 1;
if(cnt > maxcnt)
maxcnt = cnt;
plen = len;
}
printf("%d\n", maxcnt);
}
return 0;
}