Cant copy the whole text file to char array - c++

I am trying to copy a whole text file into char array using fstream but even upon increasing the size of the array it reads the text file to same limit .i am bount to save it in a char array and it will be good if it is not a dynamic one ??? any solution please ...
// smallGrams.cpp : Defines the entry point for the console application.
//
//#include "stdafx.h"
#include<iostream>
using namespace std;
#include<string>
#include<fstream>
void readInput(const char* Path);
void removePunctucationMarks();
void removeSpacing();
void insertDots();
char * getText();
void generateUnigrams();
void generateBigrams();
void generateTrigrams();
double validateSentance(string str);
string sentenceCreation(int position);
int main()
{
char *path="alice.txt";
readInput(path);
return 0;
}
void readInput(const char* Path)
{
ifstream infile;
infile.open(Path);
if(!infile.fail())
cout<<"File opened successfully"<<endl;
else
cout<<"File failed to open"<<endl;
int arrSize=100000000;
char *arr=new char[arrSize];
int i=0;
while(!infile.eof()&&i<arrSize)
{
infile.get(arr[i]);
i++;
}
arr[i-1]='\0';
for(short i=0;i<arrSize&&arr[i]!='\0';i++)
{
cout<<arr[i];
}
}

This is a C style solution that works. It checks the file size and then allocate the necessary memory for the array and reads all the content of the file in one call. The fread() call returns the number of bytes you requested or an error has ocurred (check fread() reference)
# include <cstring>
# include <cstdlib>
# include <cstdio>
int main(int argc, char *argv[]) {
char *data;
int data_len;
FILE *fd;
fd = fopen ("file.txt", "r");
if (fd == NULL) {
// error
return -1;
}
fseek (fd , 0 , SEEK_END);
data_len = ftell (fd);
rewind (fd);
data = (char *) malloc ((data_len + 1) * sizeof (char));
memset (data, data_len + 1, NULL);
if (fread (data, sizeof (char), data_len, fd) != data_len) {
// error
return -1;
}
printf ("%s\n", data);
fclose (fd);
free (data);
return 0;
}

Here with a simple doubling method...
#include<iostream>
#include<string>
#include<fstream>
#include <cstdint>
#include <cstring>
using namespace std;
void readInput(const char* Path)
{
ifstream infile;
infile.open(Path);
if(!infile.fail())
cout<<"File opened successfully"<<endl;
else{
cout<<"File failed to open"<<endl;
return;
}
int capacity=1000;
char *arr=new char[capacity];
char *temp;
int i=0;
while(infile >> arr[i])
{
i++;
if ( i >= capacity ) {
temp = new char[capacity*2];
std::memcpy(temp , arr, capacity);
delete [] arr;
arr = temp;
capacity *=2;
}
}
}
int main()
{
char *path="alice.txt";
readInput(path);
return 0;
}

The error could when you read and display the array content using the for loop and not on reading the data from file.
Use int instead of short in for loop, as short can increment upto 32768, only.

Related

Printing array of char pointers

I am trying to read two lines from a file using array of pointers. However, I am not getting anything on screen. I have tried searching online, but could not solve the problem. Here is my code that I have written using Netbeans on mac.
int main(int argc, char** argv) {
FILE *fp;
char *points[50];
char c;
int i=0;
fp=fopen("/Users/shubhamsharma/Desktop/data.txt","r");
if(fp==NULL)
{
printf("Reached here");
fprintf(stderr," Could not open the File!");
exit(1);
}
c=getc(fp);
while(c!=EOF)
{
*points[i]=c;
c=getc(fp);
i++;
}
for(int i=0;*points[i]!='\0';i++)
{
char d=*points[i];
printf("%c",d);
if(*(points[i+1])==',')
{
i=i+1;
}
}
return (EXIT_SUCCESS);
}
char *points[50];
Is not what you want, this is an array of 50 pointers to char.
If you want an array of pointers to char[50] you need:
char (*points)[50];
points = malloc(sizeof(*points) * 2);
Also note that fgets is prefered to get a line from a file
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
FILE *fp;
char (*points)[50];
points = malloc(sizeof(*points) * 2);
if (points == NULL) {
perror("malloc");
exit(EXIT_FAILURE);
}
fp = fopen("/Users/shubhamsharma/Desktop/data.txt", "r");
if (fp == NULL) {
perror("fopen");
exit(EXIT_FAILURE);
}
fgets(points[0], sizeof(*points), fp);
fgets(points[1], sizeof(*points), fp);
fclose(fp);
printf("%s", points[0]);
printf("%s", points[1]);
free(points);
return 0;
}

Process terminates when function is called

I'm trying to write a function that will read and print the contents of a file. I gave the filename as a parameter for my function. I used FILE *testfile to create a file handle and then I use fread to read the file. block_t is a struct and nreserved are the reserved segments of the block. Each block has records. I don't think that it is necessary to tell you how block_t is created.
My problem is that even though the function runs and I can see in the console the results that I want to see the process terminates. This happens even if I comment out the if else parts. I get this message Process terminated with status -1073741510
Here is my code:
#include "dbtproj.h"
#include <cstdlib>
#include <cstring>
#include <cstdio>
#include <iostream>
using namespace std;
void showEntriesOfBlock(char *filename){
FILE *testfile;
block_t block;
int nreserved;
//open file and print contents
testfile = fopen(filename,"r");
if(testfile==NULL)
cout << "Error";
else{
while(!feof(testfile)){
fread(&block, 1, sizeof(block_t), testfile);
nreserved = block.nreserved;
//print block contents
for (int i=0; i<nreserved; ++i) {
printf("this is block id: %d, record id: %d, num: %d, str: %s\n",
block.blockid, block.entries[i].recid, block.entries[i].num,
block.entries[i].str);
}
}
}
fclose(testfile);
};
In my main file I create a file by using outfile = fopen("file.bin", "w"); then I write random data to the file. Then I close the file with fclose(outfile); and in the next line I call my function like this showEntriesOfBlock("file.bin");
Can anybody help? I think that I might have messed up my pointers of did something wrong with the file handlers.
This is how I give data to my blocks and records.
for (int b=0; b<nblocks; ++b) { // for each block
block.blockid = b;
for (int r=0; r<MAX_RECORDS_PER_BLOCK; ++r) { // for each record
// prepare a record
record.recid = recid++;
record.num = rand() % 1000;
strcpy(record.str,"hello"); // put the same string to all records
record.valid = true;
memcpy(&block.entries[r], &record, sizeof(record_t)); // copy record to block
}
block.nreserved = MAX_RECORDS_PER_BLOCK;
block.valid = true;
fwrite(&block, 1, sizeof(block_t), outfile); // write the block to the file
}
fclose(outfile);
And here are the definitions of my structs:
// This is the definition of a record of the input file. Contains three fields, recid, num and str
typedef struct {
unsigned int recid;
unsigned int num;
char str[STR_LENGTH];
bool valid; // if set, then this record is valid
} record_t;
// This is the definition of a block, which contains a number of fixed-sized records
typedef struct {
unsigned int blockid;
unsigned int nreserved; // how many reserved entries
record_t entries[MAX_RECORDS_PER_BLOCK]; // array of records
bool valid; // if set, then this block is valid
unsigned char misc;
unsigned int next_blockid;
unsigned int dummy;
} block_t;
Here's a working version using FILE* (which I wouldn't recommend if you're learning...)
NOTE: open your files in binary mode : fopen(filename, "wb") or fopen(filename, "rb")
#include <cstdlib>
#include <cstring>
#include <cstdio>
#include <iostream>
#include <cassert>
#include <fstream>
const int STR_LENGTH = 10;
const int MAX_RECORDS_PER_BLOCK = 5;
//! For my test I assumed the following definitions.
//! (i.e. that block_t is a POD.)
// This is the definition of a record of the input file. Contains three fields, recid, num and str
typedef struct
{
unsigned int recid;
unsigned int num;
char str[STR_LENGTH];
bool valid; // if set, then this record is valid
} record_t;
// This is the definition of a block, which contains a number of fixed-sized records
typedef struct
{
unsigned int blockid;
unsigned int nreserved; // how many reserved entries
record_t entries[MAX_RECORDS_PER_BLOCK]; // array of records
bool valid; // if set, then this block is valid
unsigned char misc;
unsigned int next_blockid;
unsigned int dummy;
} block_t;
void showEntriesOfBlock(const char *filename)
{
FILE* testfile = fopen(filename, "rb");
assert(testfile);
if (!testfile)
{
perror("Error");
return;
}
block_t block;
while(fread(reinterpret_cast<char*>(&block), sizeof(block_t), 1, testfile))
{
if (ferror(testfile))
{
perror("Error while reading");
return;
}
//print block contents
for (int i = 0; i < block.nreserved; ++i)
{
printf("this is block id: %d, record id: %d, num: %d, str: %s\n",
block.blockid, block.entries[i].recid, block.entries[i].num,
block.entries[i].str);
}
}
fclose(testfile);
};
int main(int argc, const char *argv[])
{
std::string filename = "g:/test.dat";
FILE* outfile;
outfile = fopen(filename.c_str(), "wb");
int nblocks = 10;
int recid = 0;
for (int b = 0; b < nblocks; ++b)
{
block_t block;
block.blockid = b;
for (int r = 0; r < MAX_RECORDS_PER_BLOCK; ++r)
{
// for each record
// prepare a record
record_t record;
record.recid = recid++;
record.num = rand() % 1000;
strcpy(record.str, "hello"); // put the same string to all records
record.valid = true;
memcpy(&block.entries[r], &record, sizeof(record_t)); // copy record to block
}
block.nreserved = MAX_RECORDS_PER_BLOCK;
block.valid = true;
fwrite(&block, sizeof(block_t), 1, outfile); // write the block to the file
}
fclose(outfile);
showEntriesOfBlock(filename.c_str());
return 0;
}
Try this:
#include <cstdlib>
#include <cstring>
#include <cstdio>
#include <iostream>
#include <cassert>
#include <fstream>
#include <type_traits>
void showEntriesOfBlock(char *filename)
{
std::ifstream testfile(filename, std::ios_base::binary);
assert(testfile);
if (!testfile)
{
std::cout << "Error";
return;
}
block_t block;
int nreserved;
while (testfile)
{
//! This assumes block is a POD.
static_assert(std::is_pod<block_t>::value, "block_t is not a POD.");
testfile.read(reinterpret_cast<char*>(&block), sizeof(block_t));
nreserved = block.nreserved;
//print block contents
for (int i = 0; i < nreserved; ++i)
{
printf("this is block id: %d, record id: %d, num: %d, str: %s\n",
block.blockid, block.entries[i].recid, block.entries[i].num,
block.entries[i].str);
}
}
testfile.close();
};

stack check fail in sha-1 c++

I'm having a __stack_chk_fail in the main thread.
I have no idea why is this happening?
I got the codes from this website:
http://www.packetizer.com/security/sha1/
Im trying to add a function to compute the digest of a file using the example.
.h file
#include <stdio.h>
#include <string>
std::string digestFile( char *filename );
.cpp file
std::string SHA1::digestFile( char *filename )
{
Reset();
FILE *fp = NULL;
if (!(fp = fopen(filename, "rb")))
{
printf("sha: unable to open file %s\n", filename);
return NULL;
}
char c = fgetc(fp);
while(!feof(fp))
{
Input(c);
c = fgetc(fp);
}
fclose(fp);
unsigned message_digest[5];
if (!Result(message_digest))
{ printf("sha: could not compute message digest for %s\n", filename); }
std::string hash;
for (int i = 0; i < 5; i++)
{
char buffer[8];
int count = sprintf(buffer, "%08x", message_digest[i]);
if (count != 8)
{ printf("converting unsiged to char ERROR"); }
hash.append(buffer);
}
return hash;
}
__stack_chk_fail occurs when you write to invalid address.
It turns out you do:
char buffer[8];
int count = sprintf(buffer, "%08x", message_digest[i]);
C strings are NUL-terminated. That means that when sprintf writes 8 digits, it adds 9-th char, '\0'. But buffer only has space for 8 chars, so the 9-th goes past the end of the buffer.
You need char buffer[9]. Or do it the C++ way with std::stringstream, which does not involve any fixed sizes and thus no risk of buffer overrun.

Possible memory leak in string memory allocation

This is my code:
#include <string>
#include <iostream>
#include <cstdio>
#include <cstdlib>
std::string & fileread(const char * name)
{
FILE *fp = fopen(name,"rb");
size_t sz;
int i;
char *buff;
fseek(fp, 0, SEEK_END);
sz = ftell(fp);
fseek(fp, 0, SEEK_SET);
buff = (char *)malloc(sizeof(char)*(sz+1));
buff[sz] = '\0';
fread(buff,sz,1,fp);
std::string * rtstr = new std::string(buff);
free(buff);
fclose(fp);
return * rtstr;
}
int main(int argc,char * argv[])
{
std::string file_info(fileread(argv[1]));
std::cout<<file_info << std::endl;
return 0;
}
It is simply read one file, and print its content to screen.
In the function fileread, I use new std::string(buff); to get a std::string *, and return the point back. Will it cause memory leak? And if the answer is 'yes' , how to avoid it?
About use C in C++: fread is much faster than ifstream (tested with 1 billion random number)
My problem is about memory leak.
Return std::string by value. Don't worry, C++ will take care of not copying the object redundantly (unless you have a very old compiler).
Here is the code, fixed:
#include <string>
#include <iostream>
#include <cstdio>
#include <cstdlib>
std::string fileread(const char * name)
{
FILE *fp = fopen(name,"rb");
size_t sz;
int i;
char *buff;
fseek(fp, 0, SEEK_END);
sz = ftell(fp);
fseek(fp, 0, SEEK_SET);
buff = (char *)malloc(sizeof(char)*(sz+1));
buff[sz] = '\0';
fread(buff,sz,1,fp);
std::string rtstr(buff);
free(buff);
fclose(fp);
return * rtstr;
}
int main(int argc,char * argv[])
{
std::string file_info(fileread(argv[1]));
std::cout<<file_info << std::endl;
return 0;
}
I made only the small change necessary and ignored any other problems you might have in your code. Take care.
As Nawaz rightly commented: "DONT code C in C++. Use std::ifstream and std::string (not std::string*)". Here is the code in C++ to avoid all the issues:
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main () {
string line;
ifstream myfile ("myfile.txt");
if (myfile.is_open())
{
while ( getline (myfile,line) )
{
cout << line << '\n';
}
myfile.close();
}
else cout << "Unable to open file";
return 0;
}

C++ Read large data, parse, then write data

I'm trying to read a large dataset, format it the way I need, and then write it to another file. I'm trying to use C++ over SAS or STATA for the speed advantage. The data file are usually around 10gigabytes. And my current code takes over an hour to run (and then I kill it because I'm sure that something is very inefficient with my code.
Is there a more efficient way to do this? Maybe read the file into memory and then analyze it using the switch statements? (I have 32gb ram linux 64bit). Is it possible that reading, and then writing within the loop slows it down since it is constantly reading, then writing? I tried to read it from one drive, and then write to another in an attempt to speed this up.
Are the switch cases slowing it down?
The process I have now reads the data using getline, uses the switch statement to parse it correctly, and then writes it to my outfile. And repeats for 300 million lines. There are about 10 more cases in the switch statement, but I didn't copy for brevity's sake.
The code is probably very ugly all being in the main function, but I wanted to get it working before I worked on attractiveness.
I've tried using read() but without any success. Please let me know if I need to clarify anything.
Thank you for the help!
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <stdio.h>
//#include <cstring>
//#include <boost/algorithm/string.hpp>
#include <vector>
using namespace std;
//using namespace boost;
struct dataline
{
char type[0];
double second;
short mill;
char event[1];
char ticker[6];
char marketCategory[1];
char financialStatus[1];
int roundLotSize;
short roundLotOnly;
char tradingState[1];
char reserved[1];
char reason[4];
char mpid[4];
char primaryMarketMaker[1];
char primaryMarketMode[1];
char marketParticipantState[1];
unsigned long orderNumber;
char buySell[0];
double shares;
float price;
int executedShares;
double matchNumber;
char printable[1];
double executionPrice;
int canceledShares;
double sharesBig;
double crossPrice;
char crossType[0];
double pairedShares;
double imbalanceShares;
char imbalanceDirection[1];
double fairPrice;
double nearPrice;
double currentReferencePrice;
char priceVariationIndicator[1];
};
int main ()
{
string a;
string b;
string c;
string d;
string e;
string f;
string g;
string h;
string k;
string l;
string times;
string smalltimes;
short time; //counter to keep second filled
short smalltime; //counter to keep millisecond filled
double N;
double NN;
double NNN;
int length;
char M;
//vector<> fout;
string line;
ofstream fout ("/media/3tb/test.txt");
ifstream myfile;
myfile.open("S050508-v3.txt");
dataline oneline;
if (myfile.is_open())
{
while ( myfile.good() )
{
getline (myfile,line);
// cout << line<<endl;;
a=line.substr(0,1);
stringstream ss(a);
char type;
ss>>type;
switch (type)
{
case 'T':
{
if (type == 'T')
{
times=line.substr(1,5);
stringstream s(times);
s>>time;
//oneline.second=time;
//oneline.second;
//cout<<time<<endl;
}
else
{
time=time;
}
break;
}
case 'M':
{
if (type == 'M')
{
smalltimes=line.substr(1,3);
stringstream ss(smalltimes);
ss>>smalltime; //oneline.mill;
// cout<<smalltime<<endl; //smalltime=oneline.mill;
}
else
{
smalltime=smalltime;
}
break;
}
case 'R':
{
oneline.second=time;
oneline.mill=smalltime;
a=line.substr(0,1);
stringstream ss(a);
ss>>oneline.type;
b=line.substr(1,6);
stringstream sss(b);
sss>>oneline.ticker;
c=line.substr(7,1);
stringstream ssss(c);
ssss>>oneline.marketCategory;
d=line.substr(8,1);
stringstream sssss(d);
sssss>>oneline.financialStatus;
e=line.substr(9,6);
stringstream ssssss(e);
ssssss>>oneline.roundLotSize;
f=line.substr(15,1);
stringstream sssssss(f);
sssssss>>oneline.roundLotOnly;
*oneline.tradingState=0;
*oneline.reserved=0;
*oneline.reason=0;
*oneline.mpid=0;
*oneline.primaryMarketMaker=0;
*oneline.primaryMarketMode=0;
*oneline.marketParticipantState=0;
oneline.orderNumber=0;
*oneline.buySell=0;
oneline.shares=0;
oneline.price=0;
oneline.executedShares=0;
oneline.matchNumber=0;
*oneline.printable=0;
oneline.executionPrice=0;
oneline.canceledShares=0;
oneline.sharesBig=0;
oneline.crossPrice=0;
*oneline.crossType=0;
oneline.pairedShares=0;
oneline.imbalanceShares=0;
*oneline.imbalanceDirection=0;
oneline.fairPrice=0;
oneline.nearPrice=0;
oneline.currentReferencePrice=0;
*oneline.priceVariationIndicator=0;
break;
}//End Case
}//End Switch
}//end While
myfile.close();
}//End If
else cout << "Unable to open file";
cout<<"Junk"<<endl;
return 0;
}
UPDATE So I've been trying to use memory map, but now I'm getting a segmentation fault.
I've been trying to follow different examples to piece together something that would work for mine. Why would I be getting a segmentation fault? I've taken the first part of my code, which looks like this:
int main (int argc, char** path)
{
long i;
int fd;
char *map;
char *FILEPATH = path;
unsigned long FILESIZE;
FILE* fp = fopen(FILEPATH, "/home/brian/Desktop/S050508-v3.txt");
fseek(fp, 0, SEEK_END);
FILESIZE = ftell(fp);
fseek(fp, 0, SEEK_SET);
fclose(fp);
fd = open(FILEPATH, O_RDONLY);
map = (char *) mmap(0, FILESIZE, PROT_READ, MAP_SHARED, fd, 0);
char z;
stringstream ss;
for (long i = 0; i <= FILESIZE; ++i)
{
z = map[i];
if (z != '\n')
{
ss << z;
}
else
{
// c style tokenizing
ss.str("");
}
}
if (munmap(map, FILESIZE) == -1) perror("Error un-mmapping the file");
close(fd);
The data file are usually around 10gigabytes.
...
Are the switch cases slowing it down?
Almost certainly not, smells like you're I/O bound. But you should consider measuring it. Modern CPUs have performance counters which are pretty easy to leverage with the right tools. But let's start to partition the problems into some major domains: I/O to devices, load/store to memory, CPU. You can place some markers in your code where you read a clock in order to understand how long each of the operations are taking. On linux you can use clock_gettime() or the rdtsc instruction to access a clock with higher precision than the OS tick.
Consider mmap/CreateFileMapping, either of which might provide better efficiency/throughput to the pages you're accessing.
Consider large/huge pages if streaming through large amounts of data which has already been paged in.
From the manual for mmap():
Description
mmap() creates a new mapping in the virtual address space of the
calling process. The starting address for the new mapping is specified
in addr. The length argument specifies the length of the mapping.
Here's an mmap() example:
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#define FILEPATH "/tmp/mmapped.bin"
#define NUMINTS (1000)
#define FILESIZE (NUMINTS * sizeof(int))
int main(int argc, char *argv[])
{
int i;
int fd;
int *map; /* mmapped array of int's */
fd = open(FILEPATH, O_RDONLY);
if (fd == -1) {
perror("Error opening file for reading");
exit(EXIT_FAILURE);
}
map = mmap(0, FILESIZE, PROT_READ, MAP_SHARED, fd, 0);
if (map == MAP_FAILED) {
close(fd);
perror("Error mmapping the file");
exit(EXIT_FAILURE);
}
/* Read the file int-by-int from the mmap
*/
for (i = 1; i <=NUMINTS; ++i) {
printf("%d: %d\n", i, map[i]);
}
if (munmap(map, FILESIZE) == -1) {
perror("Error un-mmapping the file");
}
close(fd);
return 0;
}