C++ Text File Input [closed] - c++

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
This is a relatively simple question but I can't seem to find an answer. I need to read every character from a text file excluding spaces.
I currently have:
fstream inFile(fileName, ios::in);
char ch;
while (!inFile.eof()){
ch = inFile.get();
This is working for all letters and number but not special characters. What's an alternative I can use to read everything but spaces?

Assuming the file is ASCII and contains no NULL characters the following method could be used.
size_t ReadAllChars(char const* fileName, char **ppDestination)
{
//Check inputs
if(!filename || !ppDestination)
{
//Handle errors;
return 0;
}
//open file for reading
FILE *pFile = fopen(fileName, "rb");
//check file successfully opened
if(!pFile)
{
//Handle error
return 0;
}
//Seek to end of file (to get file length)
if(_fseeki64(pFile, 0, SEEK_END))
{
//Handle error
return 0;
}
//Get file length
size_t fileLength = _ftelli64(pFile);
if(fileLength == -1)
{
//Handle error
return 0;
}
//Seek back to start of file
if(_fseeki64(pFile, 0, SEEK_SET))
{
//Handle error
return 0;
}
//Allocate memory to store entire contents of file
char *pRawSource = (char*)malloc(fileLength);
//Check that allocation succeeded
if(!pRawSource)
{
//Handle error
//return 0;
}
//Read entire file
if(fread(pRawSource, 1, fileLength, pFile) != fileLength))
{
//Handle error
fclose(pFile);
free(pRawSource);
return 0;
}
//Close file
fclose(pFile);
//count spaces
size_t spaceCount = 0;
for(size_t i = 0; i < fileLength; i++)
{
if(pRawSource[i] == ' ')
++spaceCount;
}
//allocate space for file contents not including spaces (plus a null terminator)
size_t resultLength = fileLength - spaceCount;
char *pResult = (char*)malloc(resultLength + 1)
//Check allocation succeeded
if(!pResult)
{
//Handle error
free(pRawSource);
return 0;
}
//Null terminate result
pResult[resultLength] = NULL;
//copy all characters except space into pResult
char *pNextTarget = pResult;
for(size_t i = 0; i < fileLength; i++)
{
if(pRawSource[i] != ' ')
{
*pNextTarget = pRawSource[i];
++pNextTarget;
}
}
//Free temporary buffer
free(pRawSource);
*ppDestination = pResult;
return resultLength;
}

You should open the file in binary mode

One of the simpler approaches would just start checking the ASCII of all each character that you are iterating on.
If the ASCII value of the character is "20" (ASCII for SPACE) then skip it with "continue" otherwise just print it.

Assuming you are using the default locale of C++, maybe try to put them into a std::string and let std::ifstream& operator >> (std::ifstream&, std::string&) and std::skipws do the magic (skip all spaces) for you?
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <utility>
int main(int, char* argv[])
{
const char *filename = /* filename */;
std::ifstream in{filename};
if (in.fail()) {
std::cerr << "Fails to open " << filename << std::endl;
return 1;
}
/*
* Actually, you can skip this line, because the default behavior of
* std::fstream and other stream is to skip all the white space before input.
*/
in >> std::skipws;
std::vector<std::string> stringv;
// reserve to speed up, you can replace the new_cap with your guess
stringv.reserve(10);
std::string str;
/*
* while std::skipws tells the stream to skip all the white space before input,
* std::ifstream& operator >> (std::ifstream&, std::string&) will stop when a space is read.
*/
while(in >> str)
stringv.push_back(std::move(str));
}
Edit:
I haven't tested this program yet, so there might be some compilation errors, but I am so sure that this method should works.
Using !in.eof() tests whether the eof is reached, but it doesn't test whether the extraction succeeds or not, which means you can get invalid data. in >> str fixs this because after the extraction the value of !in.fail() indicates whether the extraction from stream succeeds or not.

Related

Best way to read binary file c++ though input redirection

I am trying to read a large binary file thought input redirection (stdin) at runtime, and stdin is mandatory.
./a.out < input.bin
So far I have used fgets. But fgets skips blanks and newline. I want to include both. My currentBuffersize could dynamically vary.
FILE * inputFileStream = stdin;
int currentPos = INIT_BUFFER_SIZE;
int currentBufferSize = 24; // opt
unsigned short int count = 0; // As Max number of packets 30,000/65,536
while (!feof(inputFileStream)) {
char buf[INIT_BUFFER_SIZE]; // size of byte
fgets(buf, sizeof(buf), inputFileStream);
cout<<buf;
cout<<endl;
}
Thanks in advance.
If it were me I would probably do something similar to this:
const std::size_t INIT_BUFFER_SIZE = 1024;
int main()
{
try
{
// on some systems you may need to reopen stdin in binary mode
// this is supposed to be reasonably portable
std::freopen(nullptr, "rb", stdin);
if(std::ferror(stdin))
throw std::runtime_error(std::strerror(errno));
std::size_t len;
std::array<char, INIT_BUFFER_SIZE> buf;
// somewhere to store the data
std::vector<char> input;
// use std::fread and remember to only use as many bytes as are returned
// according to len
while((len = std::fread(buf.data(), sizeof(buf[0]), buf.size(), stdin)) > 0)
{
// whoopsie
if(std::ferror(stdin) && !std::feof(stdin))
throw std::runtime_error(std::strerror(errno));
// use {buf.data(), buf.data() + len} here
input.insert(input.end(), buf.data(), buf.data() + len); // append to vector
}
// use input vector here
}
catch(std::exception const& e)
{
std::cerr << e.what() << '\n';
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
Note you may need to re-open stdin in binary mode not sure how portable that is but various documentation suggests is reasonably well supported across systems.

OpenSSL SHA256 Wrong result

I have following piece of code that is supposed to calculate the SHA256 of a file. I am reading the file chunk by chunk and using EVP_DigestUpdate for the chunk. When I test the code with the file that has content
Test Message
Hello World
in Windows, it gives me SHA256 value of 97b2bc0cd1c3849436c6532d9c8de85456e1ce926d1e872a1e9b76a33183655f but the value is supposed to be 318b20b83a6730b928c46163a2a1cefee4466132731c95c39613acb547ccb715, which can be verified here too.
Here is the code:
#include <openssl\evp.h>
#include <iostream>
#include <string>
#include <fstream>
#include <cstdio>
const int MAX_BUFFER_SIZE = 1024;
std::string FileChecksum(std::string, std::string);
int main()
{
std::string checksum = FileChecksum("C:\\Users\\Dell\\Downloads\\somefile.txt","sha256");
std::cout << checksum << std::endl;
return 0;
}
std::string FileChecksum(std::string file_path, std::string algorithm)
{
EVP_MD_CTX *mdctx;
const EVP_MD *md;
unsigned char md_value[EVP_MAX_MD_SIZE];
int i;
unsigned int md_len;
OpenSSL_add_all_digests();
md = EVP_get_digestbyname(algorithm.c_str());
if(!md) {
printf("Unknown message digest %s\n",algorithm);
exit(1);
}
mdctx = EVP_MD_CTX_create();
std::ifstream readfile(file_path,std::ifstream::in|std::ifstream::binary);
if(!readfile.is_open())
{
std::cout << "COuldnot open file\n";
return 0;
}
readfile.seekg(0, std::ios::end);
long filelen = readfile.tellg();
std::cout << "LEN IS " << filelen << std::endl;
readfile.seekg(0, std::ios::beg);
if(filelen == -1)
{
std::cout << "Return Null \n";
return 0;
}
EVP_DigestInit_ex(mdctx, md, NULL);
long temp_fil = filelen;
while(!readfile.eof() && readfile.is_open() && temp_fil>0)
{
int bufferS = (temp_fil < MAX_BUFFER_SIZE) ? temp_fil : MAX_BUFFER_SIZE;
char *buffer = new char[bufferS+1];
buffer[bufferS] = 0;
readfile.read(buffer, bufferS);
std::cout << strlen(buffer) << std::endl;
EVP_DigestUpdate(mdctx, buffer, strlen(buffer));
temp_fil -= bufferS;
delete[] buffer;
}
EVP_DigestFinal_ex(mdctx, md_value, &md_len);
EVP_MD_CTX_destroy(mdctx);
printf("Digest is: ");
//char *checksum_msg = new char[md_len];
//int cx(0);
for(i = 0; i < md_len; i++)
{
//_snprintf(checksum_msg+cx,md_len-cx,"%02x",md_value[i]);
printf("%02x", md_value[i]);
}
//std::string res(checksum_msg);
//delete[] checksum_msg;
printf("\n");
/* Call this once before exit. */
EVP_cleanup();
return "";
}
I tried to write the hash generated by program as string using _snprintf but it didn't worked. How can I generate the correct hash and return the value as string from FileChecksum Function? Platform is Windows.
EDIT: It seems the problem was because of CRLF issue. As Windows in saving file using \r\n, the Checksum calculated was different. How to handle this?
MS-DOS used the CR-LF convention,So basically while saving the file in windows, \r\n comes in effect for carriage return and newline. And while testing on online (given by you), only \n character comes in effect.
Thus either you have to check the checksum of Test Message\r\nHello World\r\n in string which is equivalent to creating and reading file in windows(as given above), which is the case here.
However, the checksum of files,wherever created, will be same.
Note: your code works fine :)
It seems the problem was associated with the value of length I passed in EVP_DigestUpdate. I had passed value from strlen, but replacing it with bufferS did fixed the issue.
The code was modified as:
while(!readfile.eof() && readfile.is_open() && temp_fil>0)
{
int bufferS = (temp_fil < MAX_BUFFER_SIZE) ? temp_fil : MAX_BUFFER_SIZE;
char *buffer = new char[bufferS+1];
buffer[bufferS] = 0;
readfile.read(buffer, bufferS);
EVP_DigestUpdate(mdctx, buffer, bufferS);
temp_fil -= bufferS;
delete[] buffer;
}
and to send the checksum string, I modified the code as:
EVP_DigestFinal_ex(mdctx, md_value, &md_len);
EVP_MD_CTX_destroy(mdctx);
char str[128] = { 0 };
char *ptr = str;
std::string ret;
for(i = 0; i < md_len; i++)
{
//_snprintf(checksum_msg+cx,md_len-cx,"%02x",md_value[i]);
sprintf(ptr,"%02x", md_value[i]);
ptr += 2;
}
ret = str;
/* Call this once before exit. */
EVP_cleanup();
return ret;
As for the wrong checksum earlier, the problem was associated in how windows keeps the line feed. As suggested by Zangetsu, Windows was making text file as CRLF, but linux and the site I mentioned earlier was using LF. Thus there was difference in the checksum value. For files other than text, eg dll the code now computes correct checksum as string

Read last X bytes of a file [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Could anyone tell me a simple way, how to read the last X bytes of a specific file?
If I'm right, I should use ifstream, but I'm not sure how to use it. Currently I'm learning C++ ( at least I'm trying to learn :) ).
Input file streams have the seekg() method that repositions the current position to an absolute position or a relative position. One overload takes a positon type that represents an absolute value. The other takes an offset type and direction mask that determines the relative position to move to. Negating the offset allows you to move backward. Specifying the end constant moves the indicator relative to the end.
file.seekg(-x, std::ios_base::end);
This is a C solution, but works and handles errors. The trick is to use a negative index in fseek to "seek from EOF" (ie: seek from the "right").
#include <stdio.h>
#define BUF_SIZE (4096)
int main(void) {
int i;
const char* fileName = "test.raw";
char buf[BUF_SIZE] = { 0 };
int bytesRead = 0;
FILE* fp; /* handle for the input file */
size_t fileSize; /* size of the input file */
int lastXBytes = 100; /* number of bytes at the end-of-file to read */
/* open file as a binary file in read-only mode */
if ((fp = fopen("./test.txt", "rb")) == NULL) {
printf("Could not open input file; Aborting\n");
return 1;
}
/* find out the size of the file; reset pointer to beginning of file */
fseek(fp, 0L, SEEK_END);
fileSize = ftell(fp);
fseek(fp, 0L, SEEK_SET);
/* make sure the file is big enough to read lastXBytes of data */
if (fileSize < lastXBytes) {
printf("File too small; Aborting\n");
fclose(fp);
return 1;
} else {
/* read lastXBytes of file */
fseek(fp, -lastXBytes, SEEK_END);
bytesRead = fread(buf, sizeof(char), lastXBytes, fp);
printf("Read %d bytes from %s, expected %d\n", bytesRead, fileName, lastXBytes);
if (bytesRead > 0) {
for (i=0; i<bytesRead; i++) {
printf("%c", buf[i]);
}
}
}
fclose(fp);
return 0;
}
You need to use he seekg function and pass a negative offset from the end of the stream.
std::ifstream is("file.txt");
if (is)
{
is.seekg(-x, is.end); // x is the number of bytes to read before the end
}
#include <iostream>
#include <fstream>
using namespace std;
int main(int argc, char* argv)
{
ifstream ifs("F:\\test.data", ifstream::binary);
if(ifs.fail())
{
cout << "Error:fail to open file" << endl;
return -1;
}
//read the last 10 bits of file
const int X = 10;
char* buf = new char[X];
ifs.seekg(-X, SEEK_END);
ifs.read(buf, X);
ifs.close();
delete buf;
return 0;
}
Use seekg() for relative positionning from end of file, then use read():
ifstream ifs("test.txt");
int x=10;
char buffer[11]={};
ifs.seekg(-x, ios_base::end);
if (!ifs.read(buffer, x))
cerr << "There's a problem !\n";
else cout <<buffer<<endl;
Note that read() just takes the x bytes from the file and puts them in the buffer, without adding a '\0' at the end. So if you expect a C string, you have to make sure that your buffer ends with a 0.

Program crashes when reading a long text file - "*.exe has stopped working"

RRThe title describes it all. I am reading various files in my program, and once it reaches a relatively large file, the program crashes.
I wrote a shortened version of my program that replicates the issue.
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <iostream>
#include <fstream>
char** load_File(char** preComputed, const int lines, const int sLength,
std::string fileName){
//Declarations
FILE *file;
int C = lines+1;
int R = sLength+2;
int i; //Dummy index
int len;
//Create 2-D array on the heap
preComputed = (char**) malloc(C*sizeof(char*));
for(i = 0; i<C; i++) preComputed[i] = (char *) malloc(R*sizeof(char));
//Need to free each element individually later on
//Create temprary char array
char* line = (char *) malloc(R*sizeof(char));
assert(preComputed);
//Open file to read and store values
file = fopen(fileName.c_str(), "r");
if(file == NULL){ perror("\nError opening file"); return NULL;}
else{
i = 0;
while(fgets(line, R, file) != NULL){
//Remove next line
len = R;
if((line[len-1]) == '\n') (line[len-1]) = '\0';
len--; // Decrement length by one because of replacing EOL
// with null terminator
//Copy character set
strcpy(preComputed[i], line);
i++;
}
preComputed[C-1] = NULL; //Append null terminator
free(line);
}
return preComputed;
}
int main(void){
char** preComputed = NULL;
std::string name = "alphaLow3.txt";
system("pause");
preComputed = load_File(preComputed, 17576, 3, name);
if(preComputed == NULL){
std::cout<<"\nAn error has been encountered...";
system("PAUSE");
exit(1);
}
//Free preComputed
for(int y = 0; y < 17576; y++){
free(preComputed[y]);
}
free(preComputed);
}
This program will crash when it is executed. Here are two links to the text files.
alphaLow3.txt
alphaLow2.txt
To run alphaLow2.txt, change the numbers in the load_file call to 676 and 2 respectively.
When this program reads alphaLow2.txt, it executes successfully. However, when it read alphaLow3.txt, it crashes. This file is only 172KB. I have files that are a MB or larger. I thought I allocated enough memory, but I may be missing something.
The program is supposed to be in C, but I've included some C++ functions for ease.
Any constructive input is appreciated.
You must confirm your file length.In the alphaLow3.txt file, a total of 35152 lines.But in your program,you set the line 17576.This is the main reason leading to crash.
In addition,this sentence
if((line[len-1]) == '\n') (line[len-1]) = '\0';
fgets will make the last character NULL.For example the first line should be " 'a''a''a''\n''null' ".So you should do it like this.
if((line[len-2]) == '\n') (line[len-2]) = '\0';

Help Editing Code to Fix "Argument list too long" Error

I am currently doing some testing with a new addition to the ICU dictionary-based break iterator.
I have code that allows me to test the word-breaking on a text document but when the text document is too large it gives the error: bash: ./a.out: Argument list too long
I am not sure how to edit the code to break-up the argument list when it gets too long so that a file of any size can be run through the code. The original code author is quite busy, would someone be willing to help out?
I tried removing the printing of what is being examined to see if that would help, but I still get the error on large files (printing what is being examined isn't necessary - I just need the result).
If the code could be modified to read the source text file line by line and export the results line by line to another text file (ending up with all the lines when it is done), that would be perfect.
The code is as follows:
/*
Written by George Rhoten to test how word segmentation works.
Code inspired by the break ICU sample.
Here is an example to run this code under Cygwin.
PATH=$PATH:icu-test/source/lib ./a.exe "`cat input.txt`" > output.txt
Encode input.txt as UTF-8.
The output text is UTF-8.
*/
#include <stdio.h>
#include <unicode/brkiter.h>
#include <unicode/ucnv.h>
#define ZW_SPACE "\xE2\x80\x8B"
void printUnicodeString(const UnicodeString &s) {
int32_t len = s.length() * U8_MAX_LENGTH + 1;
char *charBuf = new char[len];
len = s.extract(0, s.length(), charBuf, len, NULL);
charBuf[len] = 0;
printf("%s", charBuf);
delete charBuf;
}
/* Creating and using text boundaries */
int main(int argc, char **argv)
{
ucnv_setDefaultName("UTF-8");
UnicodeString stringToExamine("Aaa bbb ccc. Ddd eee fff.");
printf("Examining: ");
if (argc > 1) {
// Override the default charset.
stringToExamine = UnicodeString(argv[1]);
if (stringToExamine.charAt(0) == 0xFEFF) {
// Remove the BOM
stringToExamine = UnicodeString(stringToExamine, 1);
}
}
printUnicodeString(stringToExamine);
puts("");
//print each sentence in forward and reverse order
UErrorCode status = U_ZERO_ERROR;
BreakIterator* boundary = BreakIterator::createWordInstance(NULL, status);
if (U_FAILURE(status)) {
printf("Failed to create sentence break iterator. status = %s",
u_errorName(status));
exit(1);
}
printf("Result: ");
//print each word in order
boundary->setText(stringToExamine);
int32_t start = boundary->first();
int32_t end = boundary->next();
while (end != BreakIterator::DONE) {
if (start != 0) {
printf(ZW_SPACE);
}
printUnicodeString(UnicodeString(stringToExamine, start, end-start));
start = end;
end = boundary->next();
}
delete boundary;
return 0;
}
Thanks so much!
-Nathan
The Argument list too long error message is coming from the bash shell and is happening before your code even gets started executing.
The only code you can fix to eliminate this problem is the bash source code (or maybe it is in the kernel) and then, you're always going to run into a limit. If you increase from 2048 files on command line to 10,000, then some day you'll need to process 10,001 files ;-)
There are numerous solutions to managing 'too big' argument lists.
The standardized solution is the xargs utility.
find / -print | xargs echo
is a un-helpful, but working example.
See How to use "xargs" properly when argument list is too long for more info.
Even xargs has problems, because file names can contain spaces, new-line chars, and other unfriendly stuff.
I hope this helps.
The code below reads the content of a file whos name is given as the first parameter on the command-line and places it in a str::buffer. Then, instead of calling the function UnicodeString with argv[1], use that buffer instead.
#include<iostream>
#include<fstream>
using namespace std;
int main(int argc, char **argv)
{
std::string buffer;
if(argc > 1) {
std::ifstream t;
t.open(argv[1]);
std::string line;
while(t){
std::getline(t, line);
buffer += line + '\n';
}
}
cout << buffer;
return 0;
}
Update:
Input to UnicodeString should be char*. The function GetFileIntoCharPointer does that.
Note that only the most rudimentary error checking is implemented below!
#include<iostream>
#include<fstream>
using namespace std;
char * GetFileIntoCharPointer(char *pFile, long &lRet)
{
FILE * fp = fopen(pFile,"rb");
if (fp == NULL) return 0;
fseek(fp, 0, SEEK_END);
long size = ftell(fp);
fseek(fp, 0, SEEK_SET);
char *pData = new char[size + 1];
lRet = fread(pData, sizeof(char), size, fp);
fclose(fp);
return pData;
}
int main(int argc, char **argv)
{
long Len;
char * Data = GetFileIntoCharPointer(argv[1], Len);
std::cout << Data << std::endl;
if (Data != NULL)
delete [] Data;
return 0;
}