Memory leak with char*? - c++

I read a text file with a content like this "sasdfsdf" with the following code:
char* o = new char[size];
c = 0;
for (i = 0; i < n; i++)
{
fseek(pFile, i, SEEK_SET);
b = fgetc(pFile);
if (b == '\r') {
o[c] = b;
c++;
o[c] = '\n';
}
else {
o[c] = b;
}
c++;
}
fclose(pFile);
SetWindowTextA(TextBox1.hWnd, (n > 0) ? o : NULL);
delete[] o;
First I would like to know if this code is clean. I assume it is not because I am new to C/Cpp and have sometimes some problems with understanding the allocating-stuff.
I would like to use the C-style (FILE*, fopen, fseek, fgetc) to get the content of the file. The problem is that the char* o is always added something. I have an example: instead of "sasdfsdf" (text file content) it writes "sasdfsdf¨‰»3" into the edit control. I found out that the "¨‰»3" is added when the for-loop-scope is left. I assume it is something like a memory leak. I have no other idea where this characters should come from.

As the comments indicate, you are writing in C, but have tagged your question C++. Here, C++ provides a much shorter solution that does not require any dynamic allocation on your part (and even your C style code, none is needed, simply using constexpr size_t nelem = size; char o[nelem] = ""; would create a simple array of the desired length).
The part you are missing above is that fgetc() advances the flle-position each time is reads a character from the file stream, so there is no need for fseek(pFile, i, SEEK_SET); at all, it is simply superfluous.
In C++, you are much better off reading from a fstream rather than using C FILE*. You are much better off using std::string than char* as std::string eliminates the possibility of writing beyond he end of your array (which you fail to check for in your loop). After you read a line from your file with getline(), you can simply use std::replace() to replace all '\r' with '\n'. When you are done, all memory is freed automatically.
You would write your routine similar to:
#include <fstream>
#include <string>
#include <algorigthm>
void somefunc (std::fstream& stream)
{
std::string line{}, windowtext{};
while (getline (stream, line)) {
std::replace (line.begin(), line.end(), '\r', '\n');
windowtext += line;
}
SetWindowTextA (TextBox1.hWnd, windowtext.length() > 0 ? windowtext : nullptr);
/* close file back in caller */
}
(note: in C++ the open file stream will be closed when the file stream object goes out of scope, so you won't need to manually close the stream)
Also Note as #RemyLebeau points out in the comments, on windows, getline() will remove both the CR and LF that make up the DOS line endings. If you need the manual '\n' to create a line-break, then you will need windowtext += line + '\n'; instead of std::replace to inject the '\n'.
Look things over and let me know if you have questions.

You must terminate the string by adding terminating null-character '\0'.
char* o = new char[size];
c = 0;
for (i = 0; i < n; i++)
{
fseek(pFile, i, SEEK_SET);
b = fgetc(pFile);
if (b == '\r') {
o[c] = b;
c++;
o[c] = '\n';
}
else {
o[c] = b;
}
c++;
}
o[c] = '\0'; // add this to terminate the string
fclose(pFile);
SetWindowTextA(TextBox1.hWnd, (n > 0) ? o : NULL);
delete[] o;

Related

sprintf buffer issue, wrong assignment to char array

I got an issue with sprintf buffer.
As you can see in the code down below I'm saving with sprintf a char array to the buffer, so pFile can check if there's a file named like that in the folder. If it's found, the buffer value will be assigned to timecycles[numCycles], and numCycles will be increased. Example: timecycles[0] = "timecyc1.dat". It works well, and as you can see in the console output it recognizes that there are only timecyc1.dat and timecyc5.dat in the folder. But as long as I want to read timecycles with a for loop, both indexes have the value "timecyc9.dat", eventhough it should be "timecyc1.dat" for timecycles[0] and "timecyc5.dat" for timecycles1. Second thing is, how can I write the code so readTimecycles() returns char* timecycles, and I could just initialize it in the main function with char* timecycles[9] = readTimecycles() or anything like that?
Console output
#include <iostream>
#include <cstdio>
char* timecycles[9];
void readTimecycles()
{
char buffer[256];
int numCycles = 0;
FILE* pFile = NULL;
for (int i = 1; i < 10; i++)
{
sprintf(buffer, "timecyc%d.dat", i);
pFile = fopen(buffer, "r");
if (pFile != NULL)
{
timecycles[numCycles] = buffer;
numCycles++;
std::cout << buffer << std::endl; //to see if the buffer is correct
}
}
for (int i = 0; i < numCycles; i++)
{
std::cout << timecycles[i] << std::endl; //here's the issue with timecyc9.dat
}
}
int main()
{
readTimecycles();
return 0;
}
With the assignment
timecycles[numCycles] = buffer;
you make all pointers point to the same buffer, since you only have a single buffer.
Since you're programming in C++ you could easily solve your problem by using std::string instead.
If I would remake your code into something a little-more C++-ish and less C-ish, it could look something like
std::array<std::string, 9> readTimeCycles()
{
std::array<std::string, 9> timecycles;
for (size_t i = 0; i < timecycles.size(); ++i)
{
// Format the file-name
std::string filename = "timecyc" + std::to_string(i + 1) + ".dat";
std::ifstream file(filename);
if (file)
{
// File was opened okay
timecycles[i] = filename;
}
}
return timecycles;
}
References:
std::array
std::string
std::to_string
std::ifstream
The fundamental problem is that your notion of a string doesn't match what a 'char array' is in C++. In particular you think that because you assign timecycles[numCycles] = buffer; somehow the chars of the char array are copied. But in C++ all that is being copied is a pointer, so timecycles ends up with multiple pointers to the same buffer. And that's not to mention the problem you will have that when you exit the readTimecycles function. At that point you will have multiple pointers to a buffer which no longer exists as it gets destroyed when you exit the readTimecycles function.
The way to fix this is to use C++ code that does match your expectations. In particular a std::string will copy in the way you expect it to. Here's how you can change your code to use std::string
#include <string>
std::string timecycles[9];
timecycles[numCycles] = buffer; // now this really does copy a string

Input string in char array C++

This is my code:
char A[10];
char B[5];
cin >> setw(10) >> A;
cin >> setw(5) >> B;
cout << A;
cout << B;
If the input exceeds the array size (ex: 10 for A variable), then the program does not prompt me to enter the data for the second one. It goes right to the the end and execute the two "cout" lines.
Input: abcabcabcabcabcabc (for A)
Output: abcabcabcabca (13 space for char + 2 '\n')
Output expected:
abcabcabc (for A)
dddd (for B)
I want to enter data for both variables even if I entered too many characters for one of them
In C++ you would do this more like as follows
std::string A,B;
std::getline(std::cin,A);
std::getline(std::cin,B);
This avoids any pitfalls with fixed-size arrays, such as char[10] and reads the full line. Alternatively, you may add a delimiter
const auto delim = '.'; // say
std::getline(std::cin,A,delim);
std::getline(std::cin,B,delim);
I don't think there is a simple way (i.e. not coding it yourself) for allowing multiple delimiters.
If you would like to read C strings with a fixed limit, the best approach is to use fgets, which is part of the standard C++ library.
You can also use iomanip to setw, like this:
char A[10];
char B[15];
cin >> setw(10) >> A;
cin >> setw(15) >> B;
Note that the length of the string that you get back will be less by one than the width that you set, because C strings require null termination.
Demo.
Note: Although this mixture of C and C++ would work, you would be better off using std::string for an approach that is more idiomatic to C++. I recognize that this could be a learning exercise in which you are not allowed to use std::string, though.
As you are using C++, you can use string
string A,B;
cin>>A>>B;
Here you can scan as many characters as you want.
If you want to stick with C functions, you've got a couple of options.
The first option is to leverage the fact that fgets includes the newline in the string it reads, but only if the reason it stopped reading is because it hit the end of a line. You can check whether the last character is a newline, and if not, throw out anything left in the input up to and including the next newline:
int count;
fgets(A, 10, stdin);
count = strlen(A);
if (count == 9 && A[8] != '\n') {
do {} while (getc(stdin) != '\n');
}
fgets(B, 15, stdin);
printf("A: %s; B: %s\n", A, B);
If you don't want the newline in your string, be sure to remove it. And you may want to treat too much input as an error rather than just skipping extra characters.
A slightly simpler option is to use scanf instead, but only if you don't want to allow spaces in each variable's input:
int count;
scanf("%9s%n", A, &count);
if (count == 9) {
do {} while (!isspace(getc(stdin)));
}
scanf("%14s", B);
printf("A: %s; B: %s\n", A, B);
This C function reads a line of any length and returns a pointer to it in a newly allocated memory block (remember to free() it). If keepNL is true and a newline character (i.e. not EOF) stopped the reading, it's included at the end of the string. If len isn't NULL, *len is set to the length of the line, including any newline character. It makes it possible to read lines with '\0' in, which strlen() can't handle.
On failure, NULL is returned and *len is unchanged. If feof() is true, EOF was reached before any characters was read (no more lines in the file). If ferror() is true, an I/O error occured. If neither feof() nor ferror() is true, memory was exhausted.
Note that the memory block may be larger than the length of the string. If you need to conserve memory, realloc() it yourself to *len + 1U.
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#define MIN_LINE_BUF ((size_t) 128U) /* >= 1 */
char *fgetline(size_t *len, FILE *stream, int keepNL) {
char *buf;
int c;
size_t i, size;
if (!(buf = malloc(size = MIN_LINE_BUF))) {
return NULL;
}
i = 0U;
while ((c = getc(stream)) != EOF) {
if (c != '\n' || keepNL) {
buf[i++] = (char) c;
if (i == size) {
char *newPtr;
if (size > (size_t) -1 - size
|| !(newPtr = realloc(buf, size <<= 1))) {
free(buf);
return NULL;
}
buf = newPtr;
}
}
if (c == '\n') {
break;
}
}
if ((c == EOF && i == 0U) || ferror(stream)) {
free(buf);
return NULL;
}
buf[i++] = '\0';
if (len) {
*len = i;
}
return buf;
}

Traversing a Fatsa file in C/C++

I'm looking to write a program in C/C++ to traverse a Fasta file formatted like:
>ID and header information
SEQUENCE1
>ID and header information
SEQUENCE2
and so on
in order to find all unique sequences (check if subset of any other sequence)
and write unique sequences (and all headers) to an output file.
My approach was:
Copy all sequences to an array/list at the beginning (more efficient way to do this?)
Grab header, append it to output file, compare sequence for that header to everything in the list/array. If unique, write it under the header, if not delete it.
However, I'm a little unsure as to how to approach reading the lines in properly. I need to read the top line for the header, and then "return?" to the next line to read the sequence. Sometimes the sequence spans more then two lines, so would I use > (from the example above) as a delimiter? If I use C++, I imagine I'd use iostreams to accomplish this?
If anybody could give me a nudge in the right direction as to how I would want to read the information I need to manipulate/how to carry out the comparison, it'd be greatly appreciated.
First, rather than write your own FASTA reading routine you probably want to use something that alrady exists, for example, see: http://lh3lh3.users.sourceforge.net/parsefastq.shtml
Internally you'll have the sequences without newlines and that is probably helpful. I think the simplest approach from a high level is
loop over fasta and write out sequences to a file
sort that file
with the sorted file it becomes easier to pick out subsequences so write a program to find the "unique ids"
Using the unique id's go back to the original fasta and get whatever additional information you need.
Your approach is usable. Below is an implementation of it.
However, I'm a little unsure as to how to approach reading the lines
in properly. ... Sometimes the sequence spans more then two lines, so would I use > (from the example above) as a delimiter?
That's right; in addition, there's just the EOF which has to be checked.
I wrote the function getd() for that, which reads a single-line description or concatenated lines of sequence data and returns a pointer to the string it allocated.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
char *getd()
{
char *s = NULL, *eol;
size_t i = 0, size = 1; // 1 for null character
int c;
#define MAXLINE 80 // recommended max. line length; longer lines are okay
do // read single-line description or concatenated lines of sequence data
{
do // read a line (until '\n')
{
s = realloc(s, size += MAXLINE+1); // +1 for newline character
if (!s) puts("out of memory"), exit(1);
if (!fgets(s+i, size, stdin)) break;
eol = strchr(s+i, '\n');
i += MAXLINE+1;
} while (!eol);
if (!i) { free(s); return NULL; } // nothing read
if (*s == '>') return s; // single-line description
i = eol-s;
ungetc(c = getchar(), stdin); // peek at next character
} while (c != '>' && c != EOF);
return s;
}
int main()
{
char *s;
struct seq { char *head, *data; } *seq = NULL;
int n = 0, i, j;
while (s = getd())
if (*s == '>')
{ // new sequence: make room, store header
seq = realloc(seq, ++n * sizeof *seq);
if (!seq) puts("out of memory"), exit(1);
seq[n-1] = (struct seq){ s, "" };
}
else
if (n) // store sequence data if at least one header present
seq[n-1].data = s;
for (i = 0; i < n; ++i)
{
const int max = 70; // reformat output data to that line length max.
printf("%s", seq[i].head);
for (s = seq[i].data, j = 0; j < n; ++j)
if (j != i) // compare sequence to the others, delete if not unique
if (strstr(seq[j].data, s)) { s = seq[i].data = ""; break; }
for (; strlen(s) > max && s[max] != '\n'; s += max)
printf("%.*s\n", max, s);
printf("%s", s);
}
}

Concatenate multiple chars

I m reading a file and I would like to extract all of its contents and store them into a single char in C++. I know it can be done with strings however I cannot use strings and need to resort to char instead. I can I concatenate multiple chars to one char variable?
Here is what I've tried so far:
string str = "";
ifstream file("c:/path.....");
while (file.good())
{
str += file.get();
}
const char* content = str.c_str();
printf("%c", *content);
but this just gave me the first letter of the file and that's it.
If also tried:
ifstream file("c:/path.....");
char c = ' ';
char result[100];
while (file.good())
{
c= file.get();
strcat(result,c);
}
but this gave me runtime errors all the time.
For the second try you gave in your question (which I guess from your other hints, is what you finally want), you can try the following as a quick fix:
ifstream file("c:/path.....");
char c[2] = { 0, 0 };
char result[100] = { 0 };
for (int i = 0; file && (i < 99); ++i)
{
c[0] = file.get();
strcat(result,c);
}
Since using strcat() might not be very efficient for this use case, I think a better implementation would directly write to the result buffer:
ifstream file("c:/path.....");
char result[100] = { 0 };
for (int i = 0; file && (i < 99); ++i)
{
result[i] = file.get();
}
In your first code block:
const char* content = str.c_str();
printf("%c", *content);
prints only the first character of the string because *contents dereferences the
pointer to the (first character of) the string, and %c is the printf-format for
a single character. You should replace that by
printf("%s", content);
to print the entire string. Or just use
std::cout << str;
Maybe something like that :
char result[100];
int i = 0;
while (file.good())
{
c= file.get();
if(i<100)
result[i++]=c;
}
result[i]='\0';
There a lot of things to improve in this solution (what would you do if there more then 99 chars in your file, file.good() not the best option for loop condition, and so on...). Also it is much better to use strings.I don't know exactly why you can't use them, but just in case you change your mind you can read your file like that :
std::string line;
while ( getline(stream, line)) {
process(line);
}
You can create std::strings of your characters and concatenate them:
char c1 = 'a';
char c2 = 'b';
std::string concatted = std::string(1, c1) + std::string(1, c2);
This works because of the fill std::string constructor, see the reference.
Obviously, you are new to the c++ or at least you're using some strange terminology.
First of all, I advice you to read some c++ literature for beginners (you can find list of it on stackoverflow). Then you will understand all the conceptions of strings in c++.
Second, use this code to read file content and store it in a char*.
FILE *f = fopen("path to file", "r");
fseek(f, 0, SEEK_END);
long fsize = ftell(f);
fseek(f, 0, SEEK_SET);
char *content = new char[fsize + 1];
fread(content, fsize, 1, f);
fclose(f);
content[fsize] = 0;
cout << "{" << content <<"}";
delete content;

C File I/O bug in my code

I attempted writing a thesaurus program which reads a thesaurus file, for example:
drink:beverage
clever:smart,witty
and a .txt document, changing up the words it finds from the thesaurus and creating a new document with the modified text. However there appears to be a bug, I have narrowed it down to the while loop in getReplacement(), by checking a print operation before and after. I would really appreciate someone finding why it won't work.
#include <stdio.h>
#include <ctype.h>
#include <string.h>
#include <iostream>
char* getReplacement(char* original, FILE* file);
int main(int argc, char* argv[])
{
using namespace std;
FILE* thes = fopen(argv[1], "r");
FILE* text = fopen(argv[2], "r+");
FILE* nText = fopen("temp.txt", "w");
if(thes == NULL || text == NULL || nText == NULL)
return 1;
char word[20] = {};
char c;
int bytesW=0;
while((c = fgetc(text)) != EOF)
{
fputc(c, nText);
bytesW++;
if(isalpha(c))
{
int len = strlen(word);
word[len] = c;
word[len + 1] = '\0';
}
else
{
if(word == "")
continue;
cout << 7<<endl;
char* replacement = getReplacement(word, thes);
if(replacement == NULL)
continue;
fseek(nText,bytesW-1-strlen(word),SEEK_SET);
for(int i=0;i<strlen(replacement);i++)
fputc(replacement[i],nText);
int diff = strlen(word) - strlen(replacement);
while(diff-- >0)
fputc(' ', nText);
bytesW = bytesW-1-strlen(word)+strlen(replacement);
fseek(nText, bytesW, SEEK_SET);
}
}
fclose(thes);
fclose(text);
fclose(nText);
return 0;
}
char* getReplacement(char* const original, FILE* file)
{
using namespace std;
char* line="";
const short len = strlen(original);
int numOfOptions=1;
int toSkip=0; // number of commas to skip over
outer: while(fgets(line,1000,file) != NULL)
{
for(int i=0;i<len;i++)
if(line[i] != original[i])
{
goto outer;
}
if(line[len] != ':')
goto outer;
for(int i=0;i<len;i++)
line++;
for(int i=0;i<strlen(line);i++)
if(line[i] == ',')
numOfOptions++;
toSkip = rand()%numOfOptions;
while(toSkip >0)
{
if(line[0] == ',')
toSkip--;
line++;
}
return line;
}
return NULL;
}
char* line="";
// ... snip ...
outer: while(fgets(line,1000,file) != NULL)
Here's your problem. You are trying to read into a literal string; you instead need to allocate an array, on the stack or via malloc() to read into.
A string that you write in quotes in C is known as a literal. This means that this string gets embedded in the code of your program, and later loaded into memory when your programs is loaded. Usually it gets loaded into memory that's marked read-only, but that's platform dependent. That string that you wrote has room only for the null terminator. But you are trying to read up to 1000 characters into it. This will either lead to a segmentation fault because you were writing to read-only memory, or will lead to you writing all over some other memory, producing who knows what behavior.
What you want to do instead is allocate a buffer that you can read into:
char line[1000];
or, if you have limited stack space:
char *line = malloc(1000 * sizeof(char));
Furthermore, in your main() function, you do:
char c;
while((c = fgetc(text)) != EOF)
fgetc() returns an int, not a char. This way, it can return a value corresponding to a valid character if a value is read, or a value that is outside that range if you hit the end of file.
You can't compare strings in C using ==; what that does is compare whether they are the same pointer, not whether they have the same contents. It doesn't really make sense to recalculate the length of the current word each time; why not just keep track of len yourself, incrementing it every time you add a character, and then when you want to check if the word is empty, check if len == 0? Remember to reset len to 0 after the end of the word so you'll start over on the next word. Also remember to reset if len goes over sizeof(word); you don't want to write more than word can hold, or you will start scribbling all over random stuff on your stack and lots of things will break.