Related
On Python, there is this option errors='ignore' for the open Python function:
open( '/filepath.txt', 'r', encoding='UTF-8', errors='ignore' )
With this, reading a file with invalid UTF8 characters will replace them with nothing, i.e., they are ignored. For example, a file with the characthers Føö»BÃ¥r is going to be read as FøöBår.
If a line as Føö»BÃ¥r is read with getline() from stdio.h, it will be read as Føö�Bår:
FILE* cfilestream = fopen( "/filepath.txt", "r" );
int linebuffersize = 131072;
char* readline = (char*) malloc( linebuffersize );
while( true )
{
if( getline( &readline, &linebuffersize, cfilestream ) != -1 ) {
std::cerr << "readline=" readline << std::endl;
}
else {
break;
}
}
How can I make stdio.h getline() read it as FøöBår instead of Føö�Bår, i..e, ignoring invalid UTF8 characters?
One overwhelming solution I can think of it do iterate throughout all characters on each line read and build a new readline without any of these characters. For example:
FILE* cfilestream = fopen( "/filepath.txt", "r" );
int linebuffersize = 131072;
char* readline = (char*) malloc( linebuffersize );
char* fixedreadline = (char*) malloc( linebuffersize );
int index;
int charsread;
int invalidcharsoffset;
while( true )
{
if( ( charsread = getline( &readline, &linebuffersize, cfilestream ) ) != -1 )
{
invalidcharsoffset = 0;
for( index = 0; index < charsread; ++index )
{
if( readline[index] != '�' ) {
fixedreadline[index-invalidcharsoffset] = readline[index];
}
else {
++invalidcharsoffset;
}
}
std::cerr << "fixedreadline=" << fixedreadline << std::endl;
}
else {
break;
}
}
Related questions:
Fixing invalid UTF8 characters
Replacing non UTF8 characters
python replace unicode characters
Python unicode: how to replace character that cannot be decoded using utf8 with whitespace?
You are confusing what you see with what is really going on. The getline function does not do any replacement of characters. [Note 1]
You are seeing a replacement character (U+FFFD) because your console outputs that character when it is asked to render an invalid UTF-8 code. Most consoles will do that if they are in UTF-8 mode; that is, the current locale is UTF-8.
Also, saying that a file contains the "characters Føö»BÃ¥r" is at best imprecise. A file does not really contain characters. It contains byte sequences which may be interpreted as characters -- for example, by a console or other user presentation software which renders them into glyphs -- according to some encoding. Different encodings produce different results; in this particular case, you have a file which was created by software using the Windows-1252 encoding (or, roughly equivalently, ISO 8859-15), and you are rendering it on a console using UTF-8.
What that means is that the data read by getline contains an invalid UTF-8 sequence, but it (probably) does not contain the replacement character code. Based on the character string you present, it contains the hex character \xbb, which is a guillemot (») in Windows code page 1252.
Finding all the invalid UTF-8 sequences in a string read by getline (or any other C library function which reads files) requires scanning the string, but not for a particular code sequence. Rather, you need to decode UTF-8 sequences one at a time, looking for the ones which are not valid. That's not a simple task, but the mbtowc function can help (if you have enabled a UTF-8 locale). As you'll see in the linked manpage, mbtowc returns the number of bytes contained in a valid "multibyte sequence" (which is UTF-8 in a UTF-8 locale), or -1 to indicate an invalid or incomplete sequence. In the scan, you should pass through the bytes in a valid sequence, or remove/ignore the single byte starting an invalid sequence, and then continue the scan until you reach the end of the string.
Here's some lightly-tested example code (in C):
#include <stdlib.h>
#include <string.h>
/* Removes in place any invalid UTF-8 sequences from at most 'len' characters of the
* string pointed to by 's'. (If a NUL byte is encountered, conversion stops.)
* If the length of the converted string is less than 'len', a NUL byte is
* inserted.
* Returns the length of the possibly modified string (with a maximum of 'len'),
* not including the NUL terminator (if any).
* Requires that a UTF-8 locale be active; since there is no way to test for
* this condition, no attempt is made to do so. If the current locale is not UTF-8,
* behaviour is undefined.
*/
size_t remove_bad_utf8(char* s, size_t len) {
char* in = s;
/* Skip over the initial correct sequence. Avoid relying on mbtowc returning
* zero if n is 0, since Posix is not clear whether mbtowc returns 0 or -1.
*/
int seqlen;
while (len && (seqlen = mbtowc(NULL, in, len)) > 0) { len -= seqlen; in += seqlen; }
char* out = in;
if (len && seqlen < 0) {
++in;
--len;
/* If we find an invalid sequence, we need to start shifting correct sequences. */
for (; len; in += seqlen, len -= seqlen) {
seqlen = mbtowc(NULL, in, len);
if (seqlen > 0) {
/* Shift the valid sequence (if one was found) */
memmove(out, in, seqlen);
out += seqlen;
}
else if (seqlen < 0) seqlen = 1;
else /* (seqlen == 0) */ break;
}
*out++ = 0;
}
return out - s;
}
Notes
Aside from the possible line-end transformation of the underlying I/O library, which will replace CR-LF with a single \n on systems like Windows where the two character CR-LF sequence is used as a line-end indication.
As #rici well explains in his answer, there can be several invalid UTF-8 sequences in a byte sequence.
Possibly iconv(3) could be worth a look, e.g. see https://linux.die.net/man/3/iconv_open.
When the string "//IGNORE" is appended to tocode, characters that cannot be represented in the target character set will be silently discarded.
Example
This byte sequence, if interpreted as UTF-8, contains some invalid UTF-8:
"some invalid\xFE\xFE\xFF\xFF stuff"
If you display this you would see something like
some invalid���� stuff
When this string passes through the remove_invalid_utf8 function in the following C program, the invalid UTF-8 bytes are removed using the iconv function mentioned above.
So the result is then:
some invalid stuff
C Program
#include <stdio.h>
#include <iconv.h>
#include <string.h>
#include <stdlib.h>
#include <stdbool.h>
#include <errno.h>
char *remove_invalid_utf8(char *utf8, size_t len) {
size_t inbytes_len = len;
char *inbuf = utf8;
size_t outbytes_len = len;
char *result = calloc(outbytes_len + 1, sizeof(char));
char *outbuf = result;
iconv_t cd = iconv_open("UTF-8//IGNORE", "UTF-8");
if(cd == (iconv_t)-1) {
perror("iconv_open");
}
if(iconv(cd, &inbuf, &inbytes_len, &outbuf, &outbytes_len)) {
perror("iconv");
}
iconv_close(cd);
return result;
}
int main() {
char *utf8 = "some invalid\xFE\xFE\xFF\xFF stuff";
char *converted = remove_invalid_utf8(utf8, strlen(utf8));
printf("converted: %s to %s\n", utf8, converted);
free(converted);
return 0;
}
I also managed to fix it by trailing/cutting down all Non-ASCII characters.
This one takes about 2.6 seconds to parse 319MB:
#include <stdlib.h>
#include <iostream>
int main(int argc, char const *argv[])
{
FILE* cfilestream = fopen( "./test.txt", "r" );
size_t linebuffersize = 131072;
if( cfilestream == NULL ) {
perror( "fopen cfilestream" );
return -1;
}
char* readline = (char*) malloc( linebuffersize );
char* fixedreadline = (char*) malloc( linebuffersize );
if( readline == NULL ) {
perror( "malloc readline" );
return -1;
}
if( fixedreadline == NULL ) {
perror( "malloc fixedreadline" );
return -1;
}
char* source;
if( ( source = std::setlocale( LC_ALL, "en_US.utf8" ) ) == NULL ) {
perror( "setlocale" );
}
else {
std::cerr << "locale='" << source << "'" << std::endl;
}
int index;
int charsread;
int invalidcharsoffset;
unsigned int fixedchar;
while( true )
{
if( ( charsread = getline( &readline, &linebuffersize, cfilestream ) ) != -1 )
{
invalidcharsoffset = 0;
for( index = 0; index < charsread; ++index )
{
fixedchar = static_cast<unsigned int>( readline[index] );
// std::cerr << "index " << std::setw(3) << index
// << " readline " << std::setw(10) << fixedchar
// << " -> '" << readline[index] << "'" << std::endl;
if( 31 < fixedchar && fixedchar < 128 ) {
fixedreadline[index-invalidcharsoffset] = readline[index];
}
else {
++invalidcharsoffset;
}
}
fixedreadline[index-invalidcharsoffset] = '\0';
// std::cerr << "fixedreadline=" << fixedreadline << std::endl;
}
else {
break;
}
}
std::cerr << "fixedreadline=" << fixedreadline << std::endl;
free( readline );
free( fixedreadline );
fclose( cfilestream );
return 0;
}
Alternative and slower version using memcpy
Using menmove does not improve much speed, so you could either one.
This one takes about 3.1 seconds to parse 319MB:
#include <stdlib.h>
#include <iostream>
#include <cstring>
#include <iomanip>
int main(int argc, char const *argv[])
{
FILE* cfilestream = fopen( "./test.txt", "r" );
size_t linebuffersize = 131072;
if( cfilestream == NULL ) {
perror( "fopen cfilestream" );
return -1;
}
char* readline = (char*) malloc( linebuffersize );
char* fixedreadline = (char*) malloc( linebuffersize );
if( readline == NULL ) {
perror( "malloc readline" );
return -1;
}
if( fixedreadline == NULL ) {
perror( "malloc fixedreadline" );
return -1;
}
char* source;
char* destination;
char* finalresult;
int index;
int lastcopy;
int charsread;
int charstocopy;
int invalidcharsoffset;
bool hasignoredbytes;
unsigned int fixedchar;
if( ( source = std::setlocale( LC_ALL, "en_US.utf8" ) ) == NULL ) {
perror( "setlocale" );
}
else {
std::cerr << "locale='" << source << "'" << std::endl;
}
while( true )
{
if( ( charsread = getline( &readline, &linebuffersize, cfilestream ) ) != -1 )
{
hasignoredbytes = false;
source = readline;
destination = fixedreadline;
lastcopy = 0;
invalidcharsoffset = 0;
for( index = 0; index < charsread; ++index )
{
fixedchar = static_cast<unsigned int>( readline[index] );
// std::cerr << "fixedchar " << std::setw(10)
// << fixedchar << " -> '"
// << readline[index] << "'" << std::endl;
if( 31 < fixedchar && fixedchar < 128 ) {
if( hasignoredbytes ) {
charstocopy = index - lastcopy - invalidcharsoffset;
memcpy( destination, source, charstocopy );
source += index - lastcopy;
lastcopy = index;
destination += charstocopy;
invalidcharsoffset = 0;
hasignoredbytes = false;
}
}
else {
++invalidcharsoffset;
hasignoredbytes = true;
}
}
if( destination != fixedreadline ) {
charstocopy = charsread - static_cast<int>( source - readline )
- invalidcharsoffset;
memcpy( destination, source, charstocopy );
destination += charstocopy - 1;
if( *destination == '\n' ) {
*destination = '\0';
}
else {
*++destination = '\0';
}
finalresult = fixedreadline;
}
else {
finalresult = readline;
}
// std::cerr << "finalresult=" << finalresult << std::endl;
}
else {
break;
}
}
std::cerr << "finalresult=" << finalresult << std::endl;
free( readline );
free( fixedreadline );
fclose( cfilestream );
return 0;
}
Optimized solution using iconv
This takes about 4.6 seconds to parse 319MB of text.
#include <iconv.h>
#include <string.h>
#include <stdlib.h>
#include <iostream>
// Compile it with:
// g++ -o main test.cpp -O3 -liconv
int main(int argc, char const *argv[])
{
FILE* cfilestream = fopen( "./test.txt", "r" );
size_t linebuffersize = 131072;
if( cfilestream == NULL ) {
perror( "fopen cfilestream" );
return -1;
}
char* readline = (char*) malloc( linebuffersize );
char* fixedreadline = (char*) malloc( linebuffersize );
if( readline == NULL ) {
perror( "malloc readline" );
return -1;
}
if( fixedreadline == NULL ) {
perror( "malloc fixedreadline" );
return -1;
}
char* source;
char* destination;
int charsread;
size_t inchars;
size_t outchars;
if( ( source = std::setlocale( LC_ALL, "en_US.utf8" ) ) == NULL ) {
perror( "setlocale" );
}
else {
std::cerr << "locale='" << source << "'" << std::endl;
}
iconv_t conversiondescriptor = iconv_open("UTF-8//IGNORE", "UTF-8");
if( conversiondescriptor == (iconv_t)-1 ) {
perror( "iconv_open conversiondescriptor" );
}
while( true )
{
if( ( charsread = getline( &readline, &linebuffersize, cfilestream ) ) != -1 )
{
source = readline;
inchars = charsread;
destination = fixedreadline;
outchars = charsread;
if( iconv( conversiondescriptor, &source, &inchars, &destination, &outchars ) )
{
perror( "iconv" );
}
// Trim out the new line character
if( *--destination == '\n' ) {
*--destination = '\0';
}
else {
*destination = '\0';
}
// std::cerr << "fixedreadline='" << fixedreadline << "'" << std::endl;
}
else {
break;
}
}
std::cerr << "fixedreadline='" << fixedreadline << "'" << std::endl;
free( readline );
free( fixedreadline );
if( fclose( cfilestream ) ) {
perror( "fclose cfilestream" );
}
if( iconv_close( conversiondescriptor ) ) {
perror( "iconv_close conversiondescriptor" );
}
return 0;
}
Slowest solution ever using mbtowc
This takes about 24.2 seconds to parse 319MB of text.
If you comment out the line fixedchar = mbtowc(NULL, source, charsread); and uncomment the line charsread -= fixedchar; (breaking the invalid characters removal) this will take 1.9 seconds instead of 24.2 seconds (also compiled with -O3 optimization level).
#include <stdlib.h>
#include <string.h>
#include <iostream>
#include <cstring>
#include <iomanip>
int main(int argc, char const *argv[])
{
FILE* cfilestream = fopen( "./test.txt", "r" );
size_t linebuffersize = 131072;
if( cfilestream == NULL ) {
perror( "fopen cfilestream" );
return -1;
}
char* readline = (char*) malloc( linebuffersize );
if( readline == NULL ) {
perror( "malloc readline" );
return -1;
}
char* source;
char* lineend;
char* destination;
int charsread;
int fixedchar;
if( ( source = std::setlocale( LC_ALL, "en_US.utf8" ) ) == NULL ) {
perror( "setlocale" );
}
else {
std::cerr << "locale='" << source << "'" << std::endl;
}
while( true )
{
if( ( charsread = getline( &readline, &linebuffersize, cfilestream ) ) != -1 )
{
lineend = readline + charsread;
destination = readline;
for( source = readline; source != lineend; )
{
// fixedchar = 1;
fixedchar = mbtowc(NULL, source, charsread);
charsread -= fixedchar;
// std::ostringstream contents;
// for( int index = 0; index < fixedchar; ++index )
// contents << source[index];
// std::cerr << "fixedchar=" << std::setw(10)
// << fixedchar << " -> '"
// << contents.str().c_str() << "'" << std::endl;
if( fixedchar > 0 ) {
memmove( destination, source, fixedchar );
source += fixedchar;
destination += fixedchar;
}
else if( fixedchar < 0 ) {
source += 1;
// std::cerr << "errno=" << strerror( errno ) << std::endl;
}
else {
break;
}
}
// Trim out the new line character
if( *--destination == '\n' ) {
*--destination = '\0';
}
else {
*destination = '\0';
}
// std::cerr << "readline='" << readline << "'" << std::endl;
}
else {
break;
}
}
std::cerr << "readline='" << readline << "'" << std::endl;
if( fclose( cfilestream ) ) {
perror( "fclose cfilestream" );
}
free( readline );
return 0;
}
Fastest version from all my others above using memmove
You cannot use memcpy here because the memory regions overlap!
This takes about 2.4 seconds to parse 319MB.
If you comment out the lines *destination = *source and memmove( destination, source, 1 ) (breaking the invalid characters removal) the performance still almost the same as when memmove is being called. Here in, calling memmove( destination, source, 1 ) is a little slower than directly doing *destination = *source;
#include <stdlib.h>
#include <iostream>
#include <cstring>
#include <iomanip>
int main(int argc, char const *argv[])
{
FILE* cfilestream = fopen( "./test.txt", "r" );
size_t linebuffersize = 131072;
if( cfilestream == NULL ) {
perror( "fopen cfilestream" );
return -1;
}
char* readline = (char*) malloc( linebuffersize );
if( readline == NULL ) {
perror( "malloc readline" );
return -1;
}
char* source;
char* lineend;
char* destination;
int charsread;
unsigned int fixedchar;
if( ( source = std::setlocale( LC_ALL, "en_US.utf8" ) ) == NULL ) {
perror( "setlocale" );
}
else {
std::cerr << "locale='" << source << "'" << std::endl;
}
while( true )
{
if( ( charsread = getline( &readline, &linebuffersize, cfilestream ) ) != -1 )
{
lineend = readline + charsread;
destination = readline;
for( source = readline; source != lineend; ++source )
{
fixedchar = static_cast<unsigned int>( *source );
// std::cerr << "fixedchar=" << std::setw(10)
// << fixedchar << " -> '" << *source << "'" << std::endl;
if( 31 < fixedchar && fixedchar < 128 ) {
*destination = *source;
++destination;
}
}
// Trim out the new line character
if( *source == '\n' ) {
*--destination = '\0';
}
else {
*destination = '\0';
}
// std::cerr << "readline='" << readline << "'" << std::endl;
}
else {
break;
}
}
std::cerr << "readline='" << readline << "'" << std::endl;
if( fclose( cfilestream ) ) {
perror( "fclose cfilestream" );
}
free( readline );
return 0;
}
Bonus
You can also use Python C Extensions (API).
It takes about 2.3 seconds to parse 319MB without converting them to cached version UTF-8 char*
And takes about 3.2 seconds to parse 319MB converting them to UTF-8 char*.
And also takes about 3.2 seconds to parse 319MB converting them to cached ASCII char*.
#define PY_SSIZE_T_CLEAN
#include <Python.h>
#include <iostream>
typedef struct
{
PyObject_HEAD
}
PyFastFile;
static PyModuleDef fastfilepackagemodule =
{
// https://docs.python.org/3/c-api/module.html#c.PyModuleDef
PyModuleDef_HEAD_INIT,
"fastfilepackage", /* name of module */
"Example module that wrapped a C++ object", /* module documentation, may be NULL */
-1, /* size of per-interpreter state of the module, or
-1 if the module keeps state in global variables. */
NULL, /* PyMethodDef* m_methods */
NULL, /* inquiry m_reload */
NULL, /* traverseproc m_traverse */
NULL, /* inquiry m_clear */
NULL, /* freefunc m_free */
};
// initialize PyFastFile Object
static int PyFastFile_init(PyFastFile* self, PyObject* args, PyObject* kwargs) {
char* filepath;
if( !PyArg_ParseTuple( args, "s", &filepath ) ) {
return -1;
}
int linecount = 0;
PyObject* iomodule;
PyObject* openfile;
PyObject* fileiterator;
iomodule = PyImport_ImportModule( "builtins" );
if( iomodule == NULL ) {
std::cerr << "ERROR: FastFile failed to import the io module '"
"(and open the file " << filepath << "')!" << std::endl;
PyErr_PrintEx(100);
return -1;
}
PyObject* openfunction = PyObject_GetAttrString( iomodule, "open" );
if( openfunction == NULL ) {
std::cerr << "ERROR: FastFile failed get the io module open "
<< "function (and open the file '" << filepath << "')!" << std::endl;
PyErr_PrintEx(100);
return -1;
}
openfile = PyObject_CallFunction(
openfunction, "ssiss", filepath, "r", -1, "ASCII", "ignore" );
if( openfile == NULL ) {
std::cerr << "ERROR: FastFile failed to open the file'"
<< filepath << "'!" << std::endl;
PyErr_PrintEx(100);
return -1;
}
PyObject* iterfunction = PyObject_GetAttrString( openfile, "__iter__" );
Py_DECREF( openfunction );
if( iterfunction == NULL ) {
std::cerr << "ERROR: FastFile failed get the io module iterator"
<< "function (and open the file '" << filepath << "')!" << std::endl;
PyErr_PrintEx(100);
return -1;
}
PyObject* openiteratorobject = PyObject_CallObject( iterfunction, NULL );
Py_DECREF( iterfunction );
if( openiteratorobject == NULL ) {
std::cerr << "ERROR: FastFile failed get the io module iterator object"
<< " (and open the file '" << filepath << "')!" << std::endl;
PyErr_PrintEx(100);
return -1;
}
fileiterator = PyObject_GetAttrString( openfile, "__next__" );
Py_DECREF( openiteratorobject );
if( fileiterator == NULL ) {
std::cerr << "ERROR: FastFile failed get the io module iterator "
<< "object (and open the file '" << filepath << "')!" << std::endl;
PyErr_PrintEx(100);
return -1;
}
PyObject* readline;
while( ( readline = PyObject_CallObject( fileiterator, NULL ) ) != NULL ) {
linecount += 1;
PyUnicode_AsUTF8( readline );
Py_DECREF( readline );
// std::cerr << "linecount " << linecount << " readline '" << readline
// << "' '" << PyUnicode_AsUTF8( readline ) << "'" << std::endl;
}
std::cerr << "linecount " << linecount << std::endl;
// PyErr_PrintEx(100);
PyErr_Clear();
PyObject* closefunction = PyObject_GetAttrString( openfile, "close" );
if( closefunction == NULL ) {
std::cerr << "ERROR: FastFile failed get the close file function for '"
<< filepath << "')!" << std::endl;
PyErr_PrintEx(100);
return -1;
}
PyObject* closefileresult = PyObject_CallObject( closefunction, NULL );
Py_DECREF( closefunction );
if( closefileresult == NULL ) {
std::cerr << "ERROR: FastFile failed close open file '"
<< filepath << "')!" << std::endl;
PyErr_PrintEx(100);
return -1;
}
Py_DECREF( closefileresult );
Py_XDECREF( iomodule );
Py_XDECREF( openfile );
Py_XDECREF( fileiterator );
return 0;
}
// destruct the object
static void PyFastFile_dealloc(PyFastFile* self) {
Py_TYPE(self)->tp_free( (PyObject*) self );
}
static PyTypeObject PyFastFileType =
{
PyVarObject_HEAD_INIT( NULL, 0 )
"fastfilepackage.FastFile" /* tp_name */
};
// create the module
PyMODINIT_FUNC PyInit_fastfilepackage(void)
{
PyObject* thismodule;
// https://docs.python.org/3/c-api/typeobj.html
PyFastFileType.tp_new = PyType_GenericNew;
PyFastFileType.tp_basicsize = sizeof(PyFastFile);
PyFastFileType.tp_dealloc = (destructor) PyFastFile_dealloc;
PyFastFileType.tp_flags = Py_TPFLAGS_DEFAULT;
PyFastFileType.tp_doc = "FastFile objects";
PyFastFileType.tp_init = (initproc) PyFastFile_init;
if( PyType_Ready( &PyFastFileType) < 0 ) {
return NULL;
}
thismodule = PyModule_Create(&fastfilepackagemodule);
if( thismodule == NULL ) {
return NULL;
}
// Add FastFile class to thismodule allowing the use to create objects
Py_INCREF( &PyFastFileType );
PyModule_AddObject( thismodule, "FastFile", (PyObject*) &PyFastFileType );
return thismodule;
}
To built it, create the file source/fastfilewrappar.cpp with the contents of the above file and the setup.py with the following contents:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
from setuptools import setup, Extension
myextension = Extension(
language = "c++",
extra_link_args = ["-std=c++11"],
extra_compile_args = ["-std=c++11"],
name = 'fastfilepackage',
sources = [
'source/fastfilewrapper.cpp'
],
include_dirs = [ 'source' ],
)
setup(
name = 'fastfilepackage',
ext_modules= [ myextension ],
)
To run example, use following Python script:
import time
import datetime
import fastfilepackage
testfile = './test.txt'
timenow = time.time()
iterable = fastfilepackage.FastFile( testfile )
fastfile_time = time.time() - timenow
timedifference = datetime.timedelta( seconds=fastfile_time )
print( 'FastFile timedifference', timedifference, flush=True )
Example:
user#user-pc$ /usr/bin/pip3.6 install .
Processing /fastfilepackage
Building wheels for collected packages: fastfilepackage
Building wheel for fastfilepackage (setup.py) ... done
Stored in directory: /pip-ephem-wheel-cache-j313cpzc/wheels/e5/5f/bc/52c820
Successfully built fastfilepackage
Installing collected packages: fastfilepackage
Found existing installation: fastfilepackage 0.0.0
Uninstalling fastfilepackage-0.0.0:
Successfully uninstalled fastfilepackage-0.0.0
Successfully installed fastfilepackage-0.0.0
user#user-pc$ /usr/bin/python3.6 fastfileperformance.py
linecount 820800
FastFile timedifference 0:00:03.204614
Using std::getline
This takes about 4.7 seconds to parse 319MB.
If you remove the UTF-8 removal algorithm borrowed from the fastest benchmark using stdlib.h getline(), it takes 1.7 seconds to run.
#include <stdlib.h>
#include <iostream>
#include <locale>
#include <fstream>
#include <iomanip>
int main(int argc, char const *argv[])
{
unsigned int fixedchar;
int linecount = -1;
char* source;
char* lineend;
char* destination;
if( ( source = setlocale( LC_ALL, "en_US.ascii" ) ) == NULL ) {
perror( "setlocale" );
return -1;
}
else {
std::cerr << "locale='" << source << "'" << std::endl;
}
std::ifstream fileifstream{ "./test.txt" };
if( fileifstream.fail() ) {
std::cerr << "ERROR: FastFile failed to open the file!" << std::endl;
return -1;
}
size_t linebuffersize = 131072;
char* readline = (char*) malloc( linebuffersize );
if( readline == NULL ) {
perror( "malloc readline" );
return -1;
}
while( true )
{
if( !fileifstream.eof() )
{
linecount += 1;
fileifstream.getline( readline, linebuffersize );
lineend = readline + fileifstream.gcount();
destination = readline;
for( source = readline; source != lineend; ++source )
{
fixedchar = static_cast<unsigned int>( *source );
// std::cerr << "fixedchar=" << std::setw(10)
// << fixedchar << " -> '" << *source << "'" << std::endl;
if( 31 < fixedchar && fixedchar < 128 ) {
*destination = *source;
++destination;
}
}
// Trim out the new line character
if( *source == '\n' ) {
*--destination = '\0';
}
else {
*destination = '\0';
}
// std::cerr << "readline='" << readline << "'" << std::endl;
}
else {
break;
}
}
std::cerr << "linecount='" << linecount << "'" << std::endl;
if( fileifstream.is_open() ) {
fileifstream.close();
}
free( readline );
return 0;
}
Resume
2.6 seconds trimming UTF-8 using two buffers with indexing
3.1 seconds trimming UTF-8 using two buffers with memcpy
4.6 seconds removing invalid UTF-8 with iconv
24.2 seconds removing invalid UTF-8 with mbtowc
2.4 seconds trimming UTF-8 using one buffer with pointer direct assigning
Bonus
2.3 seconds removing invalid UTF-8 without converting them to a cached UTF-8 char*
3.2 seconds removing invalid UTF-8 converting them to a cached UTF-8 char*
3.2 seconds trimming UTF-8 and caching as ASCII char*
4.7 seconds trimming UTF-8 with std::getline() using one buffer with pointer direct assigning
The used file ./text.txt had 820.800 lines where each line was equal to:
id-é-char&id-é-char&id-é-char&id-é-char&id-é-char&id-é-char&id-é-char&id-é-char&id-é-char&id-é-char&id-é-char&id-é-char&id-é-char&id-é-char&id-é-char&id-é-char&id-é-char&id-é-char&id-é-char&id-é-char\r\n
And all versions where compiled with
g++ (GCC) 7.4.0
iconv (GNU libiconv 1.14)
g++ -o main test.cpp -O3 -liconv && time ./main
I've currently been using custom exceptions to achieve the goal of jumping through deeply nested function calls, to get to a specific function in the call chain. For example, consider the following code:
#include <iostream>
struct label {};
void B();
void C();
void D();
void A() {
return B();
}
void B() { // I want to jump to the level of the B function in the call-chain.
try {
return C();
}
catch(const label& e) {
std::cout << "jumped to b function" << std::endl;
}
}
void C() {
return D();
}
void D() {
throw label();
}
int main() {
A();
return 0;
}
Note however that the above example is extremely contrived, and is simply for illustration purposes. In my actual code, I'm using this technique in a recursive-decent parser to recover from syntactical errors. Also note that I'm not using exceptions to jump around to different functions, like a glorified goto. I'm using the custom exception to always jump to one specific function near the top of the call chain.
The above code does work fine, but reading some of the top posts on the question Are exceptions as control flow considered a serious antipattern? If so, Why? (on the Software Engineering site), suggested that using exceptions in such a manner as the above scenario does, is consider an anti-pattern, and there are usually better was to accomplish one's goal.
Is my usage of a custom exception above appropriate? If not, what is a more reasonable way to accomplish my goal while avoiding using exactions as a form control flow? (Also, although I tagged this question as c++ since that's what I'm writing my parser in, I suppose this is a more language-agnostic question.)
Part 1: - Nested Function Stack Calls With Exceptions.
This may not fit your particular or exact needs, however I'm willing to share this example as I think that it may provide some insight and that it is related to your current situation.
I have a set of classes that are integrated together that handle multiple common tasks. The following set of classes include BlockProcess, BlockThread, FileHandlers, ExceptionHandler, Logger and a Utility class. There are several files here and please keep in mind that this light weight project is targeted towards Windows and that I am using Visual Studio 2017 with pre compiled headers.
I'm sure one can strip out any windows dependent code easily and replace with their equivalent system, architecture & environment includes and functionality.
I am also using a namespace called demo that wraps all the classes & functions in this small project; any user should replace this namespace with their own namespace name.
The main purpose of this is the design process of how I typically handle exceptions when the stack calls are nested quite deep.
These sets of classes not only allow control of Logging information, warnings & errors to the console with different settings for different types of messages, but also gives the ability to log the contents to a file.
This type of construct is very handy and versatile while being in the process of developing 3D Graphics Applications which can become very intense in their code base.
I can not take full credit for this code as a majority of this was inspired and designed by Marek A. Krzeminski, MASc which can be seen here yet I believe that it is the concepts and the use of this code that is important.
Main Entry Point:
main.cpp
#include "stdafx.h"
#include "BlockProcess.h"
#include "Logger.h"
#include "Utility.h"
//struct label {}; // Instead of throwing this struct in D() I'm throwing the ExceptionHandler
void B();
void C();
void D();
void A() {
return B();
}
void B() {
using namespace demo;
try {
return C();
} catch ( ... ) {
std::ostringstream strStream;
strStream << __FUNCTION__ << " failed for some reason.";
Logger::log( strStream, Logger::TYPE_INFO );
Logger::log( strStream, Logger::TYPE_WARNING );
Logger::log( strStream, Logger::TYPE_ERROR );
Logger::log( strStream, Logger::TYPE_CONSOLE );
}
}
void C() {
return D();
}
void D() {
using namespace demo;
std::ostringstream strStream;
strStream << __FUNCTION__ << " failed for some reason.";
throw ExceptionHandler( strStream ); // By Default will log to file; otherwise pass false for second param.
}
int _tmain( int iNumArgs, _TCHAR* pArugmentText[] ) {
using namespace demo;
try {
Logger log( "logger.txt" );
A();
// Prevent Multiple Start Ups Of This Application
BlockProcess processBlock( "ExceptionManager.exe" );
if ( processBlock.isBlocked() ) {
std::ostringstream strStream;
strStream << "ExceptionManager is already running in another window." << std::endl;
throw ExceptionHandler( strStream, false );
}
Utility::pressAnyKeyToQuit();
} catch ( ExceptionHandler& e ) {
std::cout << "Exception Thrown: " << e.getMessage() << std::endl;
Utility::pressAnyKeyToQuit();
return RETURN_ERROR;
} catch ( ... ) {
std::cout << __FUNCTION__ << " Caught Unknown Exception" << std::endl;
Utility::pressAnyKeyToQuit();
return RETURN_ERROR;
}
return RETURN_OK;
}
As you can see from the images above I was able to generate a log file of the info, warnings, errors etc., and if you look at the 2 consoles they were running simultaneously and the 2nd or lower cmd window is throwing the exception since I used the BlockProcess class to manage only a single instance of this running application. This is a very versatile design. The messages or thrown errors are being generated.
Now if you do not want execution to stop because of a specific value of a variable, the return of a function, if statement etc. instead of throwing an ExceptionHandler you can easily just create an ostringstream object, populate it with the needed iformation and you can pass that to Logger with the default option of saving to the Log file turned on or passing false as the last param. You can even set what type of message through the logger's types.
So to answer your question if this is anti-pattern? I honestly do not think it is if you carefully design your project and know where & when to throw messages.
ExceptionHandler:
ExceptionHandler.h
#ifndef EXCEPTION_HANDLER_H
#define EXCEPTION_HANDLER_H
namespace demo {
class ExceptionHandler final {
private:
std::string strMessage_;
public:
explicit ExceptionHandler( const std::string& strMessage, bool bSaveInLog = true );
explicit ExceptionHandler( const std::ostringstream& strStreamMessage, bool bSaveInLog = true );
~ExceptionHandler() = default;
ExceptionHandler( const ExceptionHandler& c ) = default;
const std::string& getMessage() const;
ExceptionHandler& operator=( const ExceptionHandler& c ) = delete;
};
} // namespace demo
#endif // !EXCEPTION_HANDLER_H
ExceptionHandler.cpp
#include "stdafx.h"
#include "ExceptionHandler.h"
#include "Logger.h"
namespace demo {
ExceptionHandler::ExceptionHandler( const std::string& strMessage, bool bSaveInLog ) :
strMessage_( strMessage ) {
if ( bSaveInLog ) {
Logger::log( strMessage_, Logger::TYPE_ERROR );
}
}
ExceptionHandler::ExceptionHandler( const std::ostringstream& strStreamMessage, bool bSaveInLog ) :
strMessage_( strStreamMessage.str() ) {
if ( bSaveInLog ) {
Logger::log( strMessage_, Logger::TYPE_ERROR );
}
}
const std::string& ExceptionHandler::getMessage() const {
return strMessage_;
}
} // namespace demo
Logger:
Logger.h
#ifndef LOGGER_H
#define LOGGER_H
#include "Singleton.h"
namespace demo {
class Logger final : public Singleton {
public:
enum LoggerType {
TYPE_INFO = 0,
TYPE_WARNING,
TYPE_ERROR,
TYPE_CONSOLE,
}; // LoggerType
private:
std::string strLogFilename_;
unsigned uMaxCharacterLength_;
std::array<std::string, 4> aLogTypes_;
const std::string strUnknownLogType_;
HANDLE hConsoleOutput_;
WORD consoleDefaultColor_;
public:
explicit Logger( const std::string& strLogFilename );
virtual ~Logger();
static void log( const std::string& strText, LoggerType eLogType = TYPE_INFO );
static void log( const std::ostringstream& strStreamText, LoggerType eLogType = TYPE_INFO );
static void log( const char* szText, LoggerType eLogType = TYPE_INFO );
Logger( const Logger& c ) = delete;
Logger& operator=( const Logger& c ) = delete;
};
} // namespace demo
#endif // !LOGGER_H
Logger.cpp
#include "stdafx.h"
#include "Logger.h"
#include "BlockThread.h"
#include "TextFileWriter.h"
namespace demo {
static Logger* s_pLogger = nullptr;
static CRITICAL_SECTION s_criticalSection;
static const WORD WHITE_ON_RED = FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE | FOREGROUND_INTENSITY | BACKGROUND_RED; // White Text On Red Background
Logger::Logger( const std::string& strLogFilename ) :
Singleton( TYPE_LOGGER ),
strLogFilename_( strLogFilename ),
uMaxCharacterLength_( 0 ),
strUnknownLogType_( "UNKNOWN" ) {
// Oder must match types defined in Logger::Type enum
aLogTypes_[0] = "Info";
aLogTypes_[1] = "Warning";
aLogTypes_[2] = "Error";
aLogTypes_[3] = ""; // Console
// Find widest log type string
uMaxCharacterLength_ = strUnknownLogType_.size();
for each ( const std::string& strLogType in aLogTypes_ ) {
if ( uMaxCharacterLength_ < strLogType.size() ) {
uMaxCharacterLength_ = strLogType.size();
}
}
InitializeCriticalSection( &s_criticalSection );
BlockThread blockThread( s_criticalSection ); // Enter critical section
// Start log file
TextFileWriter file( strLogFilename_, false, false );
// Prepare console
hConsoleOutput_ = GetStdHandle( STD_OUTPUT_HANDLE );
CONSOLE_SCREEN_BUFFER_INFO consoleInfo;
GetConsoleScreenBufferInfo( hConsoleOutput_, &consoleInfo );
consoleDefaultColor_ = consoleInfo.wAttributes;
s_pLogger = this;
logMemoryAllocation( true );
} // Logger()
Logger::~Logger() {
logMemoryAllocation( false );
s_pLogger = nullptr;
DeleteCriticalSection( &s_criticalSection );
} // ~Logger
void Logger::log( const std::string& strText, LoggerType eLogType ) {
log( strText.c_str(), eLogType );
}
void Logger::log( const std::ostringstream& strStreamText, LoggerType eLogType ) {
log( strStreamText.str().c_str(), eLogType );
}
void Logger::log( const char* szText, LoggerType eLogType ) {
if ( nullptr == s_pLogger ) {
std::cout << "Logger has not been initialized, can not log " << szText << std::endl;
return;
}
BlockThread blockThread( s_criticalSection ); // Enter critical section
std::ostringstream strStream;
// Default White Text On Red Background
WORD textColor = WHITE_ON_RED;
// Choose log type text string, display "UNKNOWN" if eLogType is out of range
strStream << std::setfill( ' ' ) << std::setw( s_pLogger->uMaxCharacterLength_ );
try {
if ( TYPE_CONSOLE != eLogType ) {
strStream << s_pLogger->aLogTypes_.at( eLogType );
}
if ( TYPE_WARNING == eLogType ) {
// Yellow
textColor = FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_INTENSITY | BACKGROUND_RED | BACKGROUND_GREEN;
} else if ( TYPE_INFO == eLogType ) {
// Green
textColor = FOREGROUND_GREEN;
} else if ( TYPE_CONSOLE == eLogType ) {
// Cyan
textColor = FOREGROUND_GREEN | FOREGROUND_BLUE;
}
} catch ( ... ) {
strStream << s_pLogger->strUnknownLogType_;
}
// Date & Time
if ( TYPE_CONSOLE != eLogType ) {
SYSTEMTIME time;
GetLocalTime( &time );
strStream << " [" << time.wYear << "."
<< std::setfill( '0' ) << std::setw( 2 ) << time.wMonth << "."
<< std::setfill( '0' ) << std::setw( 2 ) << time.wDay << " "
<< std::setfill( ' ' ) << std::setw( 2 ) << time.wHour << ":"
<< std::setfill( '0' ) << std::setw( 2 ) << time.wMinute << ":"
<< std::setfill( '0' ) << std::setw( 2 ) << time.wSecond << "."
<< std::setfill( '0' ) << std::setw( 3 ) << time.wMilliseconds << "] ";
}
strStream << szText << std::endl;
// Log message
SetConsoleTextAttribute( s_pLogger->hConsoleOutput_, textColor );
std::cout << strStream.str();
// Save same message to file
try {
TextFileWriter file( s_pLogger->strLogFilename_, true, false );
file.write( strStream.str() );
} catch ( ... ) {
// Ignore, not saved in log file
std::cout << __FUNCTION__ << " failed to write to file: " << strStream.str() << std::endl;
}
// Reset to default color
SetConsoleTextAttribute( s_pLogger->hConsoleOutput_, s_pLogger->consoleDefaultColor_ );
}
} // namespace demo
Singleton.h
#ifndef SINGLETON_H
#define SINGLETON_H
namespace demo {
class Singleton {
public:
// Number of items in enum type must match the number of items and order of items stored in s_aSingletons
enum SingletonType {
TYPE_LOGGER = 0, // MUST BE FIRST!
}; // enum SingleType
private:
SingletonType eType_;
public:
Singleton( const Singleton& c ) = delete;
Singleton& operator=( const Singleton& c ) = delete;
virtual ~Singleton();
protected:
explicit Singleton( SingletonType eType );
void logMemoryAllocation( bool isAllocated ) const;
};
} // namespace demo
#endif // !SINGLETON_H
Singleton.cpp
#include "stdafx.h"
#include "Singleton.h"
#include "Logger.h"
namespace demo {
struct SingletonInfo {
const std::string strSingletonName;
bool isConstructed;
SingletonInfo( const std::string& strSingletonNameIn ) :
strSingletonName( strSingletonNameIn ),
isConstructed( false )
{}
};
// Order must match types defined in Singleton::SingletonType enum
static std::array<SingletonInfo, 1> s_aSingletons = { SingletonInfo( "Logger" ) };
Singleton::Singleton( SingletonType eType ) :
eType_( eType ) {
bool bSaveInLog = s_aSingletons.at( TYPE_LOGGER ).isConstructed;
try {
if ( !s_aSingletons.at( eType ).isConstructed ) {
// Test Initialize Order
for ( int i = 0; i < eType; ++i ) {
if ( !s_aSingletons.at( i ).isConstructed ) {
throw ExceptionHandler( s_aSingletons.at( i ).strSingletonName +
" must be constructed before constructing " +
s_aSingletons.at( eType ).strSingletonName,
bSaveInLog );
}
}
s_aSingletons.at( eType ).isConstructed = true;
} else {
throw ExceptionHandler( s_aSingletons.at( eType ).strSingletonName +
" can only be constructed once.",
bSaveInLog );
}
} catch ( std::exception& ) {
// eType is out of range
std::ostringstream strStream;
strStream << __FUNCTION__ << " Invalid Singleton Type specified: " << eType;
throw ExceptionHandler( strStream, bSaveInLog );
}
}
Singleton::~Singleton() {
s_aSingletons.at( eType_ ).isConstructed = false;
}
void Singleton::logMemoryAllocation( bool isAllocated ) const {
if ( isAllocated ) {
Logger::log( "Created " + s_aSingletons.at( eType_ ).strSingletonName );
} else {
Logger::log( "Destroyed " + s_aSingletons.at( eType_ ).strSingletonName );
}
}
} // namespace demo
For the reset of the project code see the 2nd provided answer: If you are looking to up-vote or accept please use this as the primary answer to vote upon, and please do not vote on the 2nd answer as it is only reference to this answer!
Part 2: - Nested Function Stack Calls With Exceptions.
Note: - Please do not VOTE on this answer please refer to the first answer as this is just a continuation for reference of pertaining classes!
You can find Part 1 here. I had to split this into 2 separate answers for I was about 2,000 characters above the max character limit of 30,000. I do apologize for any inconvenience. However one can not apply this ExceptionHandler without the provided classes.
FileHandlers:
FileHandler.h
#ifndef FILE_HANDLER_H
#define FILE_HANDLER_H
namespace demo {
class FileHandler {
protected:
std::fstream fileStream_;
std::string strFilePath_;
std::string strFilenameWithPath_;
private:
bool bSaveExceptionInLog_;
public:
virtual ~FileHandler();
FileHandler( const FileHandler& c ) = delete;
FileHandler& operator=( const FileHandler& c ) = delete;
protected:
FileHandler( const std::string& strFilename, bool bSaveExceptionInLog );
void throwError( const std::string& strMessage ) const;
void throwError( const std::ostringstream& strStreamMessage ) const;
bool getString( std::string& str, bool appendPath );
};
} // namespace demo
#endif // !FILE_HANDLER_H
FileHandler.cpp
#include "stdafx.h"
#include "FileHandler.h"
namespace demo {
FileHandler::FileHandler( const std::string& strFilename, bool bSaveExceptionInLog ) :
bSaveExceptionInLog_( bSaveExceptionInLog ),
strFilenameWithPath_( strFilename ) {
// Extract path info if it exists
std::string::size_type lastIndex = strFilename.find_last_of( "/\\" );
if ( lastIndex != std::string::npos ) {
strFilePath_ = strFilename.substr( 0, lastIndex );
}
if ( strFilename.empty() ) {
throw ExceptionHandler( __FUNCTION__ + std::string( " missing filename", bSaveExceptionInLog_ ) );
}
}
FileHandler::~FileHandler() {
if ( fileStream_.is_open() ) {
fileStream_.close();
}
}
void FileHandler::throwError( const std::string& strMessage ) const {
throw ExceptionHandler( "File [" + strFilenameWithPath_ + "] " + strMessage, bSaveExceptionInLog_ );
}
void FileHandler::throwError( const std::ostringstream& strStreamMessage ) const {
throwError( strStreamMessage.str() );
}
bool FileHandler::getString( std::string& str, bool appendPath ) {
fileStream_.read( &str[0], str.size() );
if ( fileStream_.fail() ) {
return false;
}
// Trim Right
str.erase( str.find_first_of( char( 0 ) ) );
if ( appendPath && !strFilePath_.empty() ) {
// Add path if one exists
str = strFilePath_ + "/" + str;
}
return true;
}
} // namespace demo
TextFileReader.h
#ifndef TEXT_FILE_READER_H
#define TEXT_FILE_READER_H
#include "FileHandler.h"
namespace demo {
class TextFileReader : public FileHandler {
public:
explicit TextFileReader( const std::string& strFilename );
virtual ~TextFileReader() = default;
std::string readAll() const;
bool readLine( std::string& strLine );
TextFileReader( const TextFileReader& c ) = delete;
TextFileReader& operator=( const TextFileReader& c ) = delete;
};
} // namespace demo
#endif // !TEXT_FILE_READER_H
TextFileReader.cpp
#include "stdafx.h"
#include "TextFileReader.h"
namespace demo {
TextFileReader::TextFileReader( const std::string& strFilename ) :
FileHandler( strFilename, true ) {
fileStream_.open( strFilenameWithPath_.c_str(), std::ios_base::in );
if ( !fileStream_.is_open() ) {
throwError( __FUNCTION__ + std::string( " can not open file for reading" ) );
}
}
std::string TextFileReader::readAll() const {
std::ostringstream strStream;
strStream << fileStream_.rdbuf();
return strStream.str();
}
bool TextFileReader::readLine( std::string& strLine ) {
if ( fileStream_.eof() ) {
return false;
}
std::getline( fileStream_, strLine );
return true;
}
} // namespace demo
TextFileWriter.h
#ifndef TEXT_FILE_WRITER_H
#define TEXT_FILE_WRITER_H
#include "FileHandler.h"
namespace demo {
class TextFileWriter : public FileHandler {
public:
explicit TextFileWriter( const std::string& strFilename, bool bAppendToFile, bool bSaveExceptionInLog = true );
virtual ~TextFileWriter() = default;
void write( const std::string& str );
TextFileWriter( const TextFileWriter& c ) = delete;
TextFileWriter& operator=( const TextFileWriter& c ) = delete;
};
} // namespace demo
#endif // !TEXT_FILE_WRITER_H
TextFileWriter.cpp
#include "stdafx.h"
#include "TextFileWriter.h"
namespace demo {
TextFileWriter::TextFileWriter( const std::string& strFilename, bool bAppendToFile, bool bSaveExceptionInLog ) :
FileHandler( strFilename, bSaveExceptionInLog ) {
fileStream_.open( strFilenameWithPath_.c_str(),
std::ios_base::out | (bAppendToFile ? std::ios_base::app : std::ios_base::trunc) );
if ( !fileStream_.is_open() ) {
throwError( __FUNCTION__ + std::string( " can not open file for writing" ) );
}
}
void TextFileWriter::write( const std::string& str ) {
fileStream_ << str;
}
} // namespace demo
Processes & Threads
BlockProcess.h
#ifndef BLOCK_PROCESS_H
#define BLOCK_PROCESS_H
namespace demo {
class BlockProcess final {
private:
HANDLE hMutex_;
public:
explicit BlockProcess( const std::string& strName );
~BlockProcess();
bool isBlocked() const;
BlockProcess( const BlockProcess& c ) = delete;
BlockProcess& operator=( const BlockProcess& c ) = delete;
};
} // namespace demo
#endif // !BLOCK_PROCESS_H
BlockProccess.cpp
#include "stdafx.h"
#include "BlockProcess.h"
namespace demo {
BlockProcess::BlockProcess( const std::string& strName ) {
hMutex_ = CreateMutex( nullptr, FALSE, strName.c_str() );
}
BlockProcess::~BlockProcess() {
CloseHandle( hMutex_ );
}
bool BlockProcess::isBlocked() const {
return (hMutex_ == nullptr || GetLastError() == ERROR_ALREADY_EXISTS);
}
} // namespace demo
BlockThread.h
#ifndef BLOCK_THREAD_H
#define BLOCK_THREAD_H
namespace demo {
class BlockThread final {
private:
CRITICAL_SECTION* pCriticalSection_;
public:
explicit BlockThread( CRITICAL_SECTION& criticalSection );
~BlockThread();
BlockThread( const BlockThread& c ) = delete;
BlockThread& operator=( const BlockThread& c ) = delete;
};
} // namespace demo
#endif // !BLOCK_THREAD_H
BlockThread.cpp
#include "stdafx.h"
#include "BlockThread.h"
namespace demo {
BlockThread::BlockThread( CRITICAL_SECTION& criticalSection ) {
pCriticalSection_ = &criticalSection;
EnterCriticalSection( pCriticalSection_ );
}
BlockThread::~BlockThread() {
LeaveCriticalSection( pCriticalSection_ );
}
} // namespace demo
Utilities:
Utility.h
#ifndef UTILITY_H
#define UTILITY_H
namespace demo {
class Utility {
public:
static void pressAnyKeyToQuit();
static std::string toUpper( const std::string& str );
static std::string toLower( const std::string& str );
static std::string trim( const std::string& str, const std::string elementsToTrim = " \t\n\r" );
static unsigned convertToUnsigned( const std::string& str );
static int convertToInt( const std::string& str );
static float convertToFloat( const std::string& str );
static std::vector<std::string> splitString( const std::string& strStringToSplit, const std::string& strDelimiter, const bool keepEmpty = true );
Utility( const Utility& c ) = delete;
Utility& operator=( const Utility& c ) = delete;
private:
Utility(); // Private - Not A Class Object
template<typename T>
static bool stringToValue( const std::string& str, T* pValue, unsigned uNumValues );
template<typename T>
static T getValue( const std::string& str, std::size_t& remainder );
};
#include "Utility.inl"
} // namespace demo
#endif // !UTILITY_H
Utility.inl
template<typename T>
static bool Utility::stringToValue( const std::string& str, T* pValue, unsigned uNumValues ) {
int numCommas = std::count( str.begin(), str.end(), ',' );
if ( numCommas != uNumValues - 1 ) {
return false;
}
std::size_t remainder;
pValue[0] = getValue<T>( str, remainder );
if ( uNumValues == 1 ) {
if ( str.size() != remainder ) {
return false;
}
} else {
std::size_t offset = remainder;
if ( str.at( offset ) != ',' ) {
return false;
}
unsigned uLastIdx = uNumValues - 1;
for ( unsigned u = 1; u < uNumValues; ++u ) {
pValue[u] = getValue<T>( str.substr( ++offset ), remainder );
offset += remainder;
if ( (u < uLastIdx && str.at( offset ) != ',') ||
(u == uLastIdx && offset != str.size()) ) {
return false;
}
}
}
return true;
}
Utility.cpp
#include "stdafx.h"
#include "Utility.h"
namespace demo {
void Utility::pressAnyKeyToQuit() {
std::cout << "\nPress any key to quit." << std::endl;
_getch();
}
std::string Utility::toUpper( const std::string& str ) {
std::string result = str;
std::transform( str.begin(), str.end(), result.begin(), ::toupper );
return result;
}
std::string Utility::toLower( const std::string& str ) {
std::string result = str;
std::transform( str.begin(), str.end(), result.begin(), ::tolower );
return result;
}
std::string Utility::trim( const std::string& str, const std::string elementsToTrim ) {
std::basic_string<char>::size_type firstIndex = str.find_first_not_of( elementsToTrim );
if ( firstIndex == std::string::npos ) {
return std::string(); // Nothing Left
}
std::basic_string<char>::size_type lastIndex = str.find_last_not_of( elementsToTrim );
return str.substr( firstIndex, lastIndex - firstIndex + 1 );
}
template<>
float Utility::getValue( const std::string& str, std::size_t& remainder ) {
return std::stof( str, &remainder );
}
template<>
int Utility::getValue( const std::string& str, std::size_t& remainder ) {
return std::stoi( str, &remainder );
}
template<>
unsigned Utility::getValue( const std::string& str, std::size_t& remainder ) {
return std::stoul( str, &remainder );
}
unsigned Utility::convertToUnsigned( const std::string& str ) {
unsigned u = 0;
if ( !stringToValue( str, &u, 1 ) ) {
std::ostringstream strStream;
strStream << __FUNCTION__ << " Bad conversion of [" << str << "] to unsigned";
throw strStream.str();
}
return u;
}
int Utility::convertToInt( const std::string& str ) {
int i = 0;
if ( !stringToValue( str, &i, 1 ) ) {
std::ostringstream strStream;
strStream << __FUNCTION__ << " Bad conversion of [" << str << "] to int";
throw strStream.str();
}
return i;
}
float Utility::convertToFloat( const std::string& str ) {
float f = 0;
if ( !stringToValue( str, &f, 1 ) ) {
std::ostringstream strStream;
strStream << __FUNCTION__ << " Bad conversion of [" << str << "] to float";
throw strStream.str();
}
return f;
}
std::vector<std::string> Utility::splitString( const std::string& strStringToSplit, const std::string& strDelimiter, const bool keepEmpty ) {
std::vector<std::string> vResult;
if ( strDelimiter.empty() ) {
vResult.push_back( strStringToSplit );
return vResult;
}
std::string::const_iterator itSubStrStart = strStringToSplit.begin(), itSubStrEnd;
while ( true ) {
itSubStrEnd = search( itSubStrStart, strStringToSplit.end(), strDelimiter.begin(), strDelimiter.end() );
std::string strTemp( itSubStrStart, itSubStrEnd );
if ( keepEmpty || !strTemp.empty() ) {
vResult.push_back( strTemp );
}
if ( itSubStrEnd == strStringToSplit.end() ) {
break;
}
itSubStrStart = itSubStrEnd + strDelimiter.size();
}
return vResult;
}
} // namespace demo
Precompiled Headers:
stdafx.h
#ifndef STDAFX_H
#define STDAFX_H
// Included files that typically will not change
// during the development process of this application.
// System - Architect Includes
#include <Windows.h>
#include <process.h>
//#include <mmsystem.h>
// Character & Basic IO
#include <conio.h> // for _getch()
#include <tchar.h>
//---------------------------------------------//
// Standard Library Includes
// Atomics, Regular Expressions, Localizations
#include <atomic> // C++11
#include <clocale>
//#include <codecvt> // C++11 // Deprecated in C++17
#include <locale>
#include <regex>
// Numerics & Numeric Limits
#include <climits>
#include <cfloat>
#include <cstdint> // C++11
#include <cinttypes> // C++11
#include <limits>
#include <cmath>
#include <complex>
#include <valarray>
#include <random> // C++11
#include <numeric>
#include <ratio> // C++11
#include <cfenv> // C++11
// Strings, Streams & IO
#include <string>
#include <sstream>
#include <iostream>
#include <iomanip>
#include <fstream>
// Thread Support
#include <thread> // C++11
#include <mutex> // C++11
#include <shared_mutex> // C++14
#include <future> // C++11
#include <condition_variable> // C++11
// Containers
#include <array> // C++11
#include <stack>
#include <list>
#include <forward_list> // C++11
#include <map>
#include <unordered_map> // C++11
#include <queue>
#include <deque>
#include <set>
#include <unordered_set> // C++11
#include <vector>
// Algorithms, Iterators
#include <algorithm> // Note* C++ 17 also has <execution>
#include <iterator>
// Dynamic Memory
#include <new>
#include <memory>
#include <scoped_allocator> // C++11
// Utilities
#include <bitset>
#include <ctime> // Compatability with C style time formarts
#include <chrono> // C++ 11 - C++ Time Utilities
#include <functional>
#include <initializer_list> // C++11
#include <memory>
#include <thread>
#include <typeinfo>
#include <typeindex> // C++11
#include <type_traits> // C++11
#include <tuple> // C++11
#include <utility>
// C++ 17
#include <any>
#include <filesystem>
#include <optional>
#include <string_view>
#include <variant>
// C++ 20
// #include <compare>
// #include <charconv>
// #include <syncstream>
// 3rd Party Library Includes Here.
// User-Application Specific commonly used non changing headers.
#include "ExceptionHandler.h"
namespace demo {
enum ReturnCode {
RETURN_OK = 0,
RETURN_ERROR = 1,
}; // ReturnCode
extern const unsigned INVALID_UNSIGNED;
extern const unsigned INVALID_UNSIGNED_SHORT;
} // namespace demo
#endif // !STDAFX_H
stdafx.cpp
#include "stdafx.h"
namespace demo {
const unsigned INVALID_UNSIGNED = static_cast<const unsigned>(-1);
const unsigned INVALID_UNSIGNED_SHORT = static_cast<const unsigned short>(-1);
} // namespace demo
I'm trying to connect via boost ssl socket to aws s3.
It works but when I read, I had several problem.
Corrupted files on the same file but not others.
File not corrupted (md5filter got all the data) but the data sent to the buffer are not good. Meaning there is a problem between the different layers of read somewhere but can't figure out where.
Sometimes the program get stuck in the S3_client::read function and loop thousands of times in the do-while loop calling read. But it never reaches md5filter read.
It get stuck between filterStream.read() and md5filter.read() which is not called. I don't know if it gzip or filterStream. But it only happens if there is no call to the lower layers of read for a while.
Can you help spot the problem in my code ?
#ifndef BTLOOP_AWSCLIENT_H
#define BTLOOP_AWSCLIENT_H
#include "boost/iostreams/filter/gzip.hpp"
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/filtering_streambuf.hpp>
#include <boost/iostreams/categories.hpp>
#include <boost/iostreams/stream.hpp>
#include <string>
#include <set>
#include <map>
#include <openssl/md5.h>
#include <sstream>
#include <fstream>
#include <iostream>
#include <boost/asio.hpp>
#include <boost/asio/ssl.hpp>
#include "Logger.h"
namespace io = boost::iostreams;
namespace asio = boost::asio;
namespace ssl = boost::asio::ssl;
typedef ssl::stream<asio::ip::tcp::socket> ssl_socket;
namespace S3Reader
{
class MD5Filter
{
public:
typedef char char_type;
struct category :
io::multichar_input_filter_tag{};
MD5Filter( std::streamsize n );
~MD5Filter();
template<typename Source>
std::streamsize read( Source& src, char* s, std::streamsize n );
void setBigFileMode() { _bigFileMode = true; }
std::string close();
void setFileName( std::string fileName ) { _fileName = fileName; };
inline std::streamsize writtenBytes() {std::streamsize res = _writtenBytes; _writtenBytes= 0; return res;};
inline bool eof(){return _eof;};
private:
void computeMd5( char* buffer, size_t size, bool force = false );
private:
bool _bigFileMode;
int _blockCount;
std::vector<unsigned char> _bufferMD5;
MD5_CTX _mdContext;
unsigned char _hashMd5[MD5_DIGEST_LENGTH];
std::string _fileName;
std::streamsize _writtenBytes;
int _totalSize;
bool _eof;
};
class Ssl_wrapper : public io::device<io::bidirectional>
{
public:
Ssl_wrapper( ssl_socket* sock, std::streamsize n ) :
_sock( sock ),_totalSize(0) { };
std::streamsize read( char_type* s, std::streamsize n )
{
boost::system::error_code ec;
size_t rval = _sock->read_some( asio::buffer( s, n ), ec );
_totalSize +=rval;
LOG_AUDIT( " wrapperR: " << rval << " " << _totalSize << " "<<ec.message());
if ( !ec )
{
return rval;
}
else if ( ec == asio::error::eof )
return -1;
else
throw boost::system::system_error( ec, "Wrapper read_some" );
}
std::streamsize write( const char* s, std::streamsize n )
{
boost::system::error_code ec;
size_t rval = _sock->write_some( asio::buffer( s, n ), ec );
if ( !ec )
{
return rval;
}
else if ( ec == asio::error::eof )
return -1;
else
throw boost::system::system_error( ec, " Wrapper read_some" );
}
private:
ssl_socket* _sock;
int _totalSize;
};
class S3_client
{
public:
S3_client( const std::string& key_id, const std::string& key_secret, const std::string& bucket );
virtual ~S3_client();
bool open( const std::string& fileName );
int read( char* buffer, size_t size );
int readLine( char* buffer, size_t size );
void close();
bool eof() { return _filterStream.eof(); }
std::string authorize( const std::string request );
bool connectSocket( std::string url, std::string port, std::string auth );
private :
std::string _key_id;
std::string _key_secret;
std::string _bucket;
std::string _fileName;
io::gzip_decompressor _gzip;
MD5Filter _md5Filter;
boost::posix_time::seconds _timeout;
ssl_socket* _sock;
Ssl_wrapper* _wrapper;
io::stream<Ssl_wrapper>* _sockstream;
std::map<std::string, std::string> _headerMap;
io::filtering_istream _filterStream;
int _totalSize;
};
}
#endif //BTLOOP_AWSCLIENT_H
S3Client.cpp
#include "S3_client.h"
#include <boost/algorithm/string.hpp>
#include <boost/lexical_cast.hpp>
#include <boost/iostreams/copy.hpp>
#include <boost/iostreams/filter/counter.hpp>
#include <boost/exception/diagnostic_information.hpp>
#include <system/ArmError.h>
namespace io = boost::iostreams;
namespace asio = boost::asio;
namespace ssl = boost::asio::ssl;
namespace S3Reader
{
static const size_t s3_block_size = 8 * 1024 * 1024;
static const std::string base64_chars =
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"abcdefghijklmnopqrstuvwxyz"
"0123456789+/";
static inline bool is_base64( unsigned char c )
{
return (isalnum( c ) || (c == '+') || (c == '/'));
}
std::string url_encode( const std::string& value )
{
std::ostringstream escaped;
escaped.fill( '0' );
escaped << std::hex;
for ( auto i = value.begin(), n = value.end(); i != n; ++i )
{
auto c = *i;
if ( isalnum( c ) || c == '-' || c == '_' || c == '.' || c == '~' )
{
escaped << c;
continue;
}
escaped << std::uppercase;
escaped << '%' << std::setw( 2 ) << int((unsigned char) c );
escaped << std::nouppercase;
}
return escaped.str();
}
std::string base64_encode( unsigned char const* bytes_to_encode, unsigned int in_len )
{
std::string ret;
int i = 0;
int j = 0;
unsigned char char_array_3[3];
unsigned char char_array_4[4];
while ( in_len-- )
{
char_array_3[i++] = *(bytes_to_encode++);
if ( i == 3 )
{
char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
char_array_4[3] = char_array_3[2] & 0x3f;
for ( i = 0; (i < 4); i++ )
ret += base64_chars[char_array_4[i]];
i = 0;
}
}
if ( i )
{
for ( j = i; j < 3; j++ )
char_array_3[j] = '\0';
char_array_4[0] = (char_array_3[0] & 0xfc) >> 2;
char_array_4[1] = ((char_array_3[0] & 0x03) << 4) + ((char_array_3[1] & 0xf0) >> 4);
char_array_4[2] = ((char_array_3[1] & 0x0f) << 2) + ((char_array_3[2] & 0xc0) >> 6);
char_array_4[3] = char_array_3[2] & 0x3f;
for ( j = 0; (j < i + 1); j++ )
ret += base64_chars[char_array_4[j]];
while ((i++ < 3))
ret += '=';
}
return ret;
}
std::string to_hex( const uint8_t* buffer, size_t buffer_size )
{
std::stringstream sst;
for ( uint i = 0; i < buffer_size; i++ )
{
sst << std::setw( 2 ) << std::setfill( '0' ) << std::hex << int( buffer[i] );
}
return sst.str();
}
std::string getDateForHeader( bool amzFormat )
{
time_t lt;
time( < );
struct tm* tmTmp;
tmTmp = gmtime( < );
char buf[50];
if ( amzFormat )
{
strftime( buf, 50, "Date: %a, %d %b %Y %X +0000", tmTmp );
return std::string( buf );
}
else
{
tmTmp->tm_hour++;
//strftime( buf, 50, "%a, %d %b %Y %X +0000", tmTmp );
std::stringstream ss;
ss << mktime( tmTmp );
return ss.str();
}
}
MD5Filter::MD5Filter( std::streamsize n ) :
_bigFileMode( false ), _blockCount( 0 ), _writtenBytes(0), _totalSize( 0 )
{
MD5_Init( &_mdContext );
memset( _hashMd5, 0, MD5_DIGEST_LENGTH );
}
MD5Filter::~MD5Filter()
{
close();
}
template<typename Source>
std::streamsize MD5Filter::read( Source& src, char* s, std::streamsize n )
{
int result =0;
try
{
if ((result = io::read( src, s, n )) == -1 )
{
_eof=true;
LOG_AUDIT( _fileName << " md5R: " << result << " " << _totalSize );
return -1;
}
}
catch ( boost::exception& ex)
{
LOG_ERROR( _fileName <<" "<< boost::diagnostic_information(ex)<< " " << result );
}
computeMd5( s, (size_t) result );
_totalSize += result;
_writtenBytes = result;
LOG_AUDIT( _fileName << " md5R: " << result << " " << _totalSize );
return result;
}
void MD5Filter::computeMd5( char* buffer, size_t size, bool force )
{
size_t realSize = s3_block_size;
uint8_t blockMd5[MD5_DIGEST_LENGTH];
if ( !_bigFileMode )
{
MD5_Update( &_mdContext, buffer, size );
return;
}
if ( size > 0 )
{
_bufferMD5.insert( _bufferMD5.end(), &buffer[0], &buffer[size] );
}
if ((_bufferMD5.size() < s3_block_size) && !force )
return;
if ( force )
realSize = _bufferMD5.size();
MD5( &_bufferMD5[0], realSize, blockMd5 );
MD5_Update( &_mdContext, blockMd5, MD5_DIGEST_LENGTH );
_blockCount++;
if ( _bufferMD5.size() == s3_block_size )
{
_bufferMD5.clear();
return;
}
if ( force )
return;
memcpy( &_bufferMD5[0], &_bufferMD5[s3_block_size], _bufferMD5.size() - s3_block_size );
_bufferMD5.erase( _bufferMD5.begin() + s3_block_size, _bufferMD5.end());
}
std::string MD5Filter::close()
{
std::string mdOutput;
computeMd5( NULL, 0, true );
MD5_Final( _hashMd5, &_mdContext );
mdOutput = to_hex( _hashMd5, MD5_DIGEST_LENGTH );
if ( _bigFileMode )
{
mdOutput += "-" + boost::lexical_cast<std::string>( _blockCount );
}
return mdOutput;
}
std::string S3_client::authorize( const std::string request )
{
unsigned char* digest;
digest = HMAC( EVP_sha1(), _key_secret.c_str(), (int) _key_secret.size(), (unsigned char*) request.c_str(), (int) request.size(), NULL, NULL );
std::string signature( url_encode( base64_encode( digest, 20 )));
return "?AWSAccessKeyId=" + _key_id + "&Expires=" + getDateForHeader( false ) + "&Signature=" + signature;
}
S3_client::S3_client( const std::string& key_id, const std::string& key_secret, const std::string& bucket ) :
_key_id( key_id ), _key_secret( key_secret ), _bucket( bucket ), _gzip( io::gzip::default_window_bits, 1024 * 1024 )
, _md5Filter( s3_block_size ), _timeout( boost::posix_time::seconds( 1 )), _totalSize( 0 ) { }
S3_client::~S3_client()
{
close();
}
bool S3_client::connectSocket( std::string url, std::string port, std::string auth )
{
std::string amzDate = getDateForHeader( true );
std::string host = "url";
boost::asio::io_service io_service;
boost::asio::ip::tcp::resolver resolver( io_service );
boost::asio::ip::tcp::resolver::query query( url, "https" );
auto endpoint = resolver.resolve( query );
// Context with default path
ssl::context ctx( ssl::context::sslv23 );
ctx.set_default_verify_paths();
_sock = new ssl_socket( io_service, ctx );
boost::asio::socket_base::keep_alive option( true );
_wrapper = new Ssl_wrapper( _sock, s3_block_size );
_sockstream = new io::stream<Ssl_wrapper>( boost::ref( *_wrapper ));
asio::connect( _sock->lowest_layer(), endpoint );
_sock->set_verify_mode( ssl::verify_peer );
_sock->set_verify_callback( ssl::rfc2818_verification( url ));
_sock->handshake( ssl_socket::client );
_sock->lowest_layer().set_option( option );
std::stringstream ss;
ss << "GET " << _fileName << auth << " HTTP/1.1\r\n" << "Host: " << host << "\r\nAccept: */*\r\n\r\n";
_sockstream->write( ss.str().c_str(), ss.str().size());
_sockstream->flush();
std::string http_version;
int status_code = 0;
(*_sockstream) >> http_version;
(*_sockstream) >> status_code;
if ( !_sockstream || http_version.substr( 0, 5 ) != "HTTP/" )
{
std::cout << "Invalid response: " << http_version << " " << status_code << std::endl;
return false;
}
if ( status_code != 200 )
{
std::cout << "Response returned with status code " << http_version << " " << status_code << std::endl;
return false;
}
return true;
}
bool S3_client::open( const std::string& fileName )
{
std::string port = "443";
std::string url = "bucket";
std::stringstream authRequest;
std::string date = getDateForHeader( false );
_fileName = fileName;
authRequest << "GET\n\n\n" << date << "\n/" << _bucket << "" << fileName;
std::string auth = authorize( authRequest.str());
if ( !connectSocket( url, port, auth ))
THROW( "Failed to open socket" );
std::string header;
while ( std::getline( *_sockstream, header ) && header != "\r" )
{
std::vector<std::string> vectLine;
boost::split( vectLine, header, boost::is_any_of( ":" ));
if ( vectLine.size() < 2 )
continue;
boost::erase_all( vectLine[1], "\"" );
boost::erase_all( vectLine[1], "\r" );
boost::erase_all( vectLine[1], " " );
_headerMap[vectLine[0]] = vectLine[1];
}
if ( _headerMap.find( "Content-Length" ) == _headerMap.end())
return false;
if ( _headerMap.find( "Content-Type" ) == _headerMap.end())
return false;
if ((uint) std::atoi( _headerMap.at( "Content-Length" ).c_str()) > s3_block_size )
_md5Filter.setBigFileMode();
_md5Filter.setFileName( _fileName );
if ( _headerMap["Content-Type"] == "binary/octet-stream" )
_filterStream.push( _gzip, s3_block_size );
_filterStream.push( boost::ref( _md5Filter ), s3_block_size );
_filterStream.push( boost::ref( *_sockstream ), s3_block_size );
return true;
}
void S3_client::close()
{
std::string localMD5 = _md5Filter.close();
std::string headerMD5 = _headerMap["ETag"];
if ( localMD5 != headerMD5 )
THROW ( "Corrupted file " << _fileName << " " << localMD5 << " " << headerMD5 << "." );
else
LOG_AUDIT( "Close S3: " << _fileName << " " << localMD5 << " " << headerMD5 << "." );
}
int S3_client::readLine( char* buffer, size_t size )
{
_filterStream.getline( buffer, size );
return _filterStream.gcount();
}
int S3_client::read( char* buffer, size_t size )
{
std::streamsize sizeRead = 0;
do
{
_filterStream.read( buffer, size );
sizeRead = _md5Filter.writtenBytes();
_totalSize += sizeRead;
LOG_AUDIT( _fileName << " s3R: " << sizeRead << " " << _totalSize );
}
while( sizeRead ==0 && !_md5Filter.eof() && !_sockstream->eof() && _filterStream.good() && _sock->next_layer().is_open());
return sizeRead;
}
}
int main( int argc, char** argv )
{
S3Reader::S3_client client( key_id, key_secret, s3_bucket );
client.open("MyFile");
while (client.read(buffer, bufferSize) >0 ) {}
}
how can I write to a file at the nth line (for example the 5th line) in c++?
here's my attempt:
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
ifstream stream1("1.txt");
string line ;
ofstream stream2("2.txt");
int lineNumber = 0;
while(getline( stream1, line ) )
{
if (lineNumber == 5)
{
stream2 << "Input" << endl;
lineNumber = 0;
}
lineNumber++;
}
stream1.close();
stream2.close(); return 0;
}
in "1.txt", I have the word "Student" at the 4th line, now I want to ignore the above 4 lines and input the word "Input" at the 5th line (below the word "Student"). When I run the above code, the output file is blank. Any suggestion how to fix this? Thanks.
If I understand it right, all you want is a replica of 1.txt in 2.txt with just the specific line number replaced with your personal content.
In your case it seems, the word is "Input".
Well here is a code that I modified from your original one -
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
ifstream stream1("1.txt");
string line ;
ofstream stream2("2.txt");
int lineNumber = 0;
int line_to_replace = 4; //This will replace 5th line
while(getline( stream1, line ) )
{
if (lineNumber == line_to_replace)
{
stream2 << "Input" << endl;
}
else
stream2 << line << endl;
lineNumber++;
}
stream1.close();
stream2.close();
return 0;
}
Input File (1.txt) -
sdlfknas
sdfas
sdf
g
thtr
34t4
bfgndty
45y564
grtg
Output File (2.txt) -
sdlfknas
sdfas
sdf
g
Input
34t4
bfgndty
45y564
grtg
p.s. To learn and understand programming better, I would recommend not to use:
using namespace std;
When you're reading the 5th line, lineNumber equals 4 b/c you start your counting at 0.
Change if(lineNumber == 5)
to if(lineNumber == 4) You also have an issue where you're setting lineNumber = 0 then immediately incrementing to 1, so you're only going to count 4 lines before outputting again.
I would create a function like this...
bool isBlank(string line) {
if (!line.empty()) {
for (auto x: line) {
if (isalnum(x)) {
return false;
}
}
}
return true;
}
It returns true if a string is empty or has no alphanumeric characters.
You can call this function right after the getline statement.
The isalnum function is specified in <cctype>
After working with your code I managed to get the output that you desired. Here is the updated version of your code.
#include <iostream>
#include <fstream>
int main() {
std::ifstream stream1( "1.txt" );
std::string line;
std::ofstream stream2( "2.txt" );
int lineNumber = 1;
while ( getLine( stream1, line ) ) {
if ( lineNumber == 5 ) {
stream2 << "Input" << std::endl;
} else {
stream2 << std::endl;
lineNumber++;
}
}
stream1.close();
stream2.close();
return 0;
}
The one thing you have to make sure is that in your 1.txt that has the word student on the 4th line is that you must have at least 2 empty lines after this text in the file. A simple enter or carriage return will do! If you do not the while( getline() ) will go out of scope and it will not read the next line and the code block will never enter your if() statement when lineNumber == 5 and it will not print the text "Input" to your stream2 file stream object.
If your last line of text in your 1.txt file is the line with the string of text Student what happens here is it will add this line of text to your line string variable then the code will increment your lineNumber to equal 5. The next time you go into the while loop to call getline() it returns false because you are at the EOF since there are no more lines of text from the file to read in and this causes the while loop to break out of execution and it goes out of scope and the if( lineNumber == 5 ) never gets called because it is nested within the while loop's scope.
My first answer addressed the issue with your problem and getting the output to your text file appropriately. However as I mentioned about the while loop for reading in a line of text and using the same counter for both file streams is not very elegant. A more accurate way to do this which will also allow for debugging to be simplified would be to read in your full input file one line at a time and save each line into a string while storing your strings in a vector. This way you can parse each line of text that you need one at a time and you can easily traverse your vector to quickly find your line of text. You should also do checks to make sure your file exists and that it opens correctly.
#include <iostream>
#include <fstream>
#include <vector>
#include <string>
int main() {
std::string strTextFileIn( "1.txt" );
std::ifstream in;
std::string strLine;
std::vector<std::string> vFileContents;
// Open File Stream
in.open( strTextFileIn.c_str(), std::ios_base::in );
// Test To Make Sure It Opened Properly And That It Exists
if ( !in.is_open() ) {
std::cout << "Failed to open file, check to make sure file exists." << std::endl;
return -1;
} else {
while ( !in.eof() ) {
// Get Line Of Text And Save To String
std::getline( in, strLine );
// Push String Into Vector
vFileContents.push_back( strLine );
}
}
// Done With File Close File Stream
if ( in.is_open() ) {
in.close();
}
// Now That We Have Read In The File Contents And Have Saved Each Line Of Text To A String
// And Stored It In Our Container (Vector) We Can Traverse Our Vector To Find Which String
// Matches The Text We Are Looking For Retrive Its Indexed Value And Then Write To Our
// Output File According To Where You Want The Text To Be Written
unsigned index = 0;
const std::string strLookup( "Student" );
for ( unsigned int i = 0; i < vFileContents.size(); i++ ) {
if ( vFileContents[i] == strLookup ) {
// Strings Match We Have Our Indexed Value
index = i;
}
}
// We Want To Write Our Line Of Text 1 Line Past Our Index Value As You Have Stated.
std::string strTextFileOut( "2.txt" );
std::ofstream out;
// Open Our File For Writting
out.open( strTextFileOut.c_str(), std::ios_base::out );
if ( !out.is_open() ) {
std::cout << "Failed To open file.";
vFileContents.clear();
return -1;
} else {
for ( unsigned int i = 1; i <= index; i++ ) {
out << std::endl; // Write Blank Lines
}
// The Text Or String You Want To Write To File
out << "Input" << std::endl;
}
// Done With File Stream
if ( in.is_open() ) {
in.close();
}
// Clear Out Vector
vFileContents.clear();
return 0;
} // main
Now this can be simplified a bit more by creating a class hierarchy for working with various file stream object types so that you don't have to write this code to open, close, check validity, read in full file or by line over and over again everywhere you need it. This makes it modular. However this structure relies on a few other classes such as an ExceptionHandler class and a Logger class. Below is a small multi file application.
stdafx.h NOTE: Not all of these includes and defines will be used here, but this is coming from a larger project of mine and I'm stripping out only the classes that are needed here, but leaving my standard header as is. The only contents that I stripped out of this "stdafx.h" is anything that has to deal with OpenGL, OpenAL, Ogg - Vorbis, GLM Libraries & APIs
#ifndef STDAFX_H
#define STDAFX_H
#define VC_EXTRALEAN // Exclude Rarely Used Stuff Windows Headers - Windows Only
// Instead of Creating Another File That VS Makes For You "targetver.h"
// I Will Just Append Its Contents Here
#include <SDKDDKVer.h> // Windows Only
#include <Windows.h> // Windows Only
#include <process.h>
#include <tchar.h>
#include <conio.h>
#include <memory>
#include <string>
#include <numeric>
#include <vector>
#include <array>
#include <unordered_map>
#include <queue>
#include <iostream>
#include <sstream>
#include <iomanip>
#include <fstream>
#include "ExceptionHandler.h"
namespace pro {
enum ReturnCode {
RETURN_OK = 0,
RETURN_ERROR = 1,
};
extern const unsigned INVALID_UNSIGNED;
extern const unsigned INVALID_UNSIGNED_SHORT;
} // namespace pro
#endif // STDAFX_H
stdafx.cpp
#include "stdafx.h"
namespace pro {
const unsigned INVALID_UNSIGNED = static_cast<const unsigned>( -1 );
const unsigned INVALID_UNSIGNED_SHORT = static_cast<const unsigned short>( -1 );
} // namespace pro
ExceptionHandler.h
#ifndef EXCEPTION_HANDLER_H
#define EXCEPTION_HANDLER_H
namespace pro {
class ExceptionHandler sealed {
private:
std::string m_strMessage;
public:
explicit ExceptionHandler( const std::string& strMessage, bool bSaveInLog = true );
explicit ExceptionHandler( const std::ostringstream& strStreamMessage, bool bSavedInLog = true );
// ~ExceptionHandler(); // Default Okay
// ExeptionHandler( const ExceptionHandler& c ); // Default Copy Constructor Okay & Is Needed
const std::string& getMessage() const;
private:
ExceptionHandler& operator=( const ExceptionHandler& c ); // Not Implemented
}; // ExceptionHandler
} // namespace pro
#endif // EXCEPTION_HANDLER_H
ExceptionHandler.cpp
#include "stdafx.h"
#include "ExceptionHandler.h"
#include "Logger.h"
namespace pro {
ExceptionHandler::ExceptionHandler( const std::string& strMessage, bool bSaveInLog ) :
m_strMessage( strMessage ) {
if ( bSavedInLog ) {
Logger::log( m_strMessage, Logger::TYPE_ERROR );
}
}
ExceptionHandler::ExceptionHandler( const std::ostringstream& strStreamMessage, bool bSaveInLog ) :
m_strMessage( strStreamMessage.str() ) {
if ( bSaveInLog ) {
Logger::log( m_strMessage, Logger::TYPE_ERROR );
}
}
const std::string& ExceptionHandler::getMessage() const {
return m_strMessage;
}
} // namespace pro
BlockThread.h -- Needed For Logger
#ifndef BLOCK_THREAD_H
#define BLOCK_THREAD_H
namespace pro {
class BlockThread sealed {
private:
CRITICAL_SECTION* m_pCriticalSection;
public:
explicit BlockThread( CRITICAL_SECTION& criticalSection );
~BlockThread();
private:
BlockThread( const BlockThread& c ); // Not Implemented
BlockThread& operator=( const BlockThread& c ); // Not Implemented
}; // BlockThread
} // namespace pro
#endif // BLOCK_THREAD_H
BlockThread.cpp
#include "stdafx.h"
#include "BlockThread.h"
namespace pro {
BlockThread::BlockThread( CRTICAL_SECTION& criticalSection ) {
m_pCriticalSection = &criticalSection;
EnterCriticalSection( m_pCriticalSection );
}
BlockThread::~BlockThread() {
LeaveCriticalSection( m_pCriticalSection );
}
} // namespace pro
Logger is a Singleton since you will only want one instance of it while your application is running.
Singleton.h
#ifndef SINGLETON_H
#define SINGLETON_H
namespace pro {
class Singleton {
public:
enum SingletonType {
TYPE_LOGGER = 0, // Must Be First!
// TYPE_SETTINGS,
// TYPE_ENGINE,
};
private:
SingletonType m_eType;
public:
virtual ~Singleton();
protected:
explicit Singleton( SingletonType eType );
void logMemoryAllocation( bool isAllocated ) const;
private:
Singleton( const Singleton& c ); // Not Implemented
Singleton& operator=( const Singleton& c ); // Not Implemented
}; // Singleton
} // namespace pro
#endif // SINGLETON_H
Singleton.cpp
#include "stdafx.h"
#include "Logger.h"
#include "Singleton.h"
//#include "Settings.h"
namespace pro {
struct SingletonInfo {
const std::string strSingletonName;
bool isConstructed;
SingletonInfo( const std::string& strSingletonNameIn ) :
strSingletonName( strSingletonNameIn ),
isConstructed( false ) {}
};
// Order Must Match Types Defined In Singleton::SingletonType enum
static std::array<SingletonInfo, 1> s_aSingletons = { SingletonInfo( "Logger" ) }; /*,
SingletonInfo( "Settings" ) };*/ // Make Sure The Number Of Types Matches The Number In This Array
Singleton::Singleton( SingletonType eType ) :
m_eType( eType ) {
bool bSaveInLog = s_aSingletons.at( TYPE_LOGGER ).isConstructed;
try {
if ( !s_aSingletons.at( eType ).isConstructed ) {
// Test Initialization Order
for ( int i = 0; i < eType; ++i ) {
if ( !s_aSingletons.at( i ).isConstructed ) {
throw ExceptionHandler( s_aSingletons.at( i ).strSingletonName + " must be constructed before constructing " + s_aSingletons.at( eType ).strSingletonName, bSaveInLog );
}
}
s_aSingletons.at( eType ).isConstructed = true;
/*if ( s_aSingletons.at( TYPE_ENGINE ).isConstructed &&
Setttings::get()->isDebugLogginEnabled( Settings::DEBUG_MEMORY ) ) {
logMemoryAllocation( true );
}*/
} else {
throw ExceptionHandler( s_aSingletons.at( eType ).strSingletonName + " can only be constructed once.", bSaveInLog );
}
} catch ( std::exception& ) {
// eType Is Out Of Range
std::ostringstream strStream;
strStream << __FUNCTION__ << " Invalid Singleton Type Specified: " << eType;
throw ExceptionHandler( strStream, bSaveInLog );
}
}
Singleton::~Singleton() {
/*if ( s_aSingletons.at( TYPE_ENGINE ).isConstructed &&
Settings::get()->isDebugLoggingEnabled( Settings::DEBUG_MEMORY ) ) {
logMemoryAllocation( false );
}*/
s_aSingletons.at( m_eType ).isConstructed = false;
}
void Singleton::logMemoryAllocation( bool isAllocated ) const {
if ( isAllocated ) {
Logger::log( "Created " + s_aSingletons.at( m_eType ).strSingletonName );
} else {
Logger::log( "Destroyed " + s_aSingletons.at( m_eType ).strSingletonName );
}
}
} // namespace pro
Logger.h
#ifndef LOGGER_H
#define LOGGER_H
#include "Singleton.h"
namespace pro {
class Logger sealed : public Singleton {
public:
// Number Of Items In Enum Type Must Match The Number
// Of Items And Order Of Items Stored In s_aLogTypes
enum LoggerType {
TYPE_INFO = 0,
TYPE_WARNING,
TYPE_ERROR,
TYPE_CONSOLE,
}; // LoggerType
private:
std::string m_strLogFilename;
unsigned m_uMaxCharacterLength;
std::array<std::string, 4> m_aLogTypes
const std::string m_strUnknownLogType;
HANDLE m_hConsoleOutput;
WORD m_consoleDefualtColor;
public:
explicit Logger( const std::string& strLogFilename );
virtual ~Logger();
static void log( const std::string& strText, LoggerType eLogType = TYPE_INFO );
static void log( const std::ostringstream& strStreamText, LoggerType eLogType = TYPE_INFO );
static void log( const char* szText, LoggerType eLogType = TYPE_INFO );
private:
Logger( const Logger& c ); // Not Implemented
Logger& operator=( const Logger& c ); // Not Implemented
}; // Logger
} // namespace pro
#endif // LOGGER_H
Logger.cpp
#include "stdafx.h"
#include "Logger.h"
#include "BlockThread.h"
#include "TextFileWriter.h"
namespace pro {
static Logger* s_pLogger = nullptr;
static CRITICAL_SECTION = s_criticalSection;
// White Text On Red Background
static const WORD WHITE_ON_RED = FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_BLUE | FOREGROUND_INTENSITY | BACKGROUND_RED;
Logger::Logger( const std::string& strLogFilename ) :
Singleton( TYPE_LOGGER ),
m_strLogFilename( strLogFilename ),
m_uMaxCharacterLength( 0 ),
m_strUnknownLogType( "UNKNOWN" ) {
// Order Must Match Types Defined In Logger::Type enum
m_aLogTypes[0] = "Info";
m_aLogTypes[1] = "Warning";
m_aLogTypes[2] = "Error";
m_aLogTypes[3] = ""; // Console
// Find Widest Log Type String
m_uMaxCharacterLength = m_strUnknownLogType.size();
for each( const std::string& strLogType in m_aLogTypes ) {
if ( m_uMaxCharacterLength < strLogType.size() ) {
m_uMaxCharacterLength = strLogType.size();
}
}
InitializeCriticalSection( &s_criticalSection );
BlockThread blockThread( s_criticalSection ); // Enter Critical Section
// Start Log File
TextFileWriter file( m_strLogFilename, false, false );
// Prepare Console
m_hConsoleOutput = GetStdHandle( STD_OUTPUT_HANDLE );
CONSOLE_SCREEN_BUFFER consoleInfo;
GetConsoleScreenBufferInfo( m_hConsoleOutput, &consoleInfo );
m_consoleDefaultColor = consoleInfo.wAttributes;
s_pLogger = this;
logMemoryAllocation( true );
}
Logger::~Logger() {
logMemoryAllocation( false );
s_pLogger = nullptr;
DeleteCriticalSection( &s_criticalSection );
}
void Logger::log( const std::string& strtext, LoggerType eLogType ) {
log( strText.c_str(), eLogType );
}
void Logger::log( const std::string& strText, LoggerType eLogType ) {
log( strText.str().c_str(), eLogType );
}
void Logger::log( const char* szText, LoggerType eLogType ) {
if ( nullptr == s_pLogger ) {
std::cout << "Logger has not been initialized, can not log " << szText << std::endl;
return;
}
BlockThread blockThread( s_criticalSection ); // Enter Critical Section
std::ostringstream strStream;
// Default White Text On Red Background
WORD textColor = WHITE_ON_RED;
// Chose Log Type Text String, Display "UNKNOWN" If eLogType Is Out Of Range
strStream << std::setfill(' ') << std::setw( s_pLogger->m_uMaxCharacterLength );
try {
if ( TYPE_CONSOLE != eLogType ) {
strStream << s_pLogger->m_aLogTypes.at( eLogType );
}
if ( TYPE_WARNING == eLogType ) {
// Yellow
textColor = FOREGROUND_RED | FOREGROUND_GREEN | FOREGROUND_INTENSITY | BACKGROUND_RED | BACKGROUND_GREEN;
} else if ( TYPE_INFO == eLogType ) {
// Green
textColor = FOREGROUND_GREEN;
} else if ( TYPE_CONSOLE == eLogType ) {
// Cyan
textColor = FOREGROUND_GREEN | FOREGROUND_BLUE;
}
} catch( ... ) {
strStream << s_pLogger->m_strUnknownLogType;
}
// Date And Time
if ( TYPE_CONSOLE != eLogType ) {
SYSTEMTIME time;
GetLocalTime( &time );
strStream << " [" << time.wYear << "."
<< std::setfill('0') << std::setw( 2 ) << time.wMonth << "."
<< std::setfill('0') << std::setw( 2 ) << time.wDay << " "
<< std::setfill(' ') << std::setw( 2 ) << time.wHour << ":"
<< std::setfill('0') << std::setw( 2 ) << time.wMinute << ":"
<< std::setfill('0') << std::setw( 2 ) << time.wSecond << "."
<< std::setfill('0') << std::setw( 3 ) << time.wMilliseconds << "] ";
}
strStream << szText << std::endl;
// Log Message
SetConsoleTextAttribute( s_pLogger->m_hConsoleOutput, textColor );
std::cout << strStream.str();
// Save Message To Log File
try {
TextFileWriter file( s_pLogger->m_strLogFilename, true, false );
file.write( strStream.str() );
} catch( ... ) {
// Not Saved In Log File, Write Message To Console
std::cout << __FUNCTION__ << " failed to write to file: " << strStream.str() << std::endl;
}
// Reset To Default Color
SetConsoleTextAttribute( s_pLogger->m_hConsoleOutput, s_pLogger->m_consoleDefaultColor );
}
} // namespace pro
FileHandler.h - Base Class
#ifndef FILE_HANDLER_H
#define FILE_HANDLER_H
namespace pro {
// class AssetStorage; // Not Used Here
class FileHandler {
protected:
// static AssetStorage* m_pAssetStorage; // Not Used Here
std::fstream m_fileStream;
std::string m_strFilePath;
std::string m_strFilenameWithPath;
private:
bool m_bSaveExceptionInLog;
public:
virtual ~FileHandle();
protected:
FileHandler( const std::string& strFilename, bool bSaveExceptionInLog );
void throwError( const std::string& strMessage ) const;
void throwError( const std::ostringstream& strStreamMessage ) const;
bool getString( std::string& str, bool appendPath );
private:
FileHandler( const FileHandler& c ); // Not Implemented
FileHandler& operator=( const FileHandler& c ); // Not Implemented
}; // FileHandler
} // namespace pro
#endif // FILE_HANDLER_H
FileHandler.cpp
#include "stdafx.h"
#include "FileHandler.h"
// #include "AssetStorage.h" // Not Used Here
namespace pro {
// AssetStorage* FileHandler::m_pAssetStorage = nullptr; // Not Used Here
FileHandler::FileHandler( const std::string& strFilename, bool bSaveExceptionInLog ) :
m_bSaveExceptionInLog( bSaveExceptionInLog ),
m_strFilenameWithPath( strFilename ) {
/*if ( bSaveExceptionInLog && nullptr == m_pAssetStorage ) {
m_pAssetStorage = AssetStorage::get();
}*/ // Not Used Here
// Extract Path Info If It Exists
std::string::size_type lastIndex = strFilename.find_last_of( "/\\" );
if ( lastIndex != std::string::npos ) {
m_strFilePath = strFilename.substr( 0, lastIndex );
}
if ( strFilename.empty() ) {
throw ExceptionHandler( __FUNCTION__ + std::string( " missing filename", m_bSaveExceptionInLog );
}
}
FileHandler::~FileHandler() {
if ( m_fileStream.is_open() ) {
m_fileStream.close();
}
}
void FileHandler::throwError( const std::string& strMessage ) const {
throw ExceptionHandler( "File [" + m_strFilenameWithPath + "] " + strMessage, m_bSaveExceptionInLog );
}
void FileHandler::throwError( const std::ostringstream& strStreamMessage ) const {
throwError( strStreamMessage.str() );
}
bool FileHandler::getString( std::string& str, bool appendPath ) {
m_fileStream.read( &str[0], str.size() );
if ( m_fileStream.fail() ) {
return false;
}
// Trim Right
str.erase( str.find_first_of( char( 0 ) ) );
if ( appendPath && !m_strFilePath.empty() ) {
// Add Path If One Exists
str = m_strFilePath + "/" + str;
}
return true;
}
} // namespace pro
Now for the two inherited classes that you have been waiting for to handle File Streams. These Two are Strictly Text. Others within my project are TextureFiles, ModelObjectFiles, etc. I will be showing only the TextFileReader & TextFileWriter.
TextFileReader.h
#ifndef TEXT_FILE_READER_H
#define TEXT_FILE_READER_H
#include "FileHandler.h"
namespace pro {
class TextFileReader : public FileHandler {
private:
public:
explicit TextFileReader( const std::string& strFilename );
// virtual ~ TextFileReader(); // Default Okay
std::string readAll() const;
bool readLine( std::string& strLine );
private:
TextFileReader( const TextFileReader& c ); // Not Implemented
TextFileReader& operator=( const TextFileReader& c ); // Not Implemented
}; // TextFileReader
} // namespace pro
#endif // TEXT_FILE_READER_H
TextFileReader.cpp
#include "stdafx.h"
#include "TextFileReader.h"
namespace pro {
TextFileReader::TextFileReader( const std::string& strFilename ) :
FileHandler( strFilename, true ) {
m_fileStream.open( m_strFilenameWithPath.c_str(), std::ios_base::in );
if ( !m_fileStream.is_open() ) {
throwError( __FUNCTION__ + std::string( " can not open file for reading" ) );
}
std::string TextFileReader::readAll() const {
std::ostringstream strStream;
strStream << m_fileStream.rdbuf();
return strStream.str();
}
bool TextFileReader::readLine( std::string& strLine ) {
if ( m_fileStream.eof() ) {
return false;
}
std::getline( m_fileStream, strLine );
return true;
}
} // namespace pro
TextFileWriter.h
#ifndef TEXT_FILE_WRITER_H
#define TEXT_FILE_WRITER_H
#include "FileHandler.h"
namespace pro {
class TextFileWriter : public FileHandler {
private:
public:
TextFileWriter( const std::string& strFilename, bool bAppendToFile, bool bSaveExceptionInLog = true );
void write( const std::string& str );
private:
TextFileWriter( const TextFileWriter& c ); // Not Implemented
TextFileWriter& operator=( const TextFileWriter& c ); // Not Implemented
}; // TextFileWriter
} // namespace pro
#endif // TEXT_FILE_WRITER_H
TextFileWriter.cpp
#include "stdafx.h"
#include "TextFileWriter.h"
namespace pro {
TextFileWriter::TextFileWriter( const std::string& strFilename, bool bAppendToFile, bool bSaveExceptionInLog ) :
FileHandler( strFilename, bSaveExceptionInLog ) {
m_fileStream.open( m_strFilenameWithPath.c_str(),
std::ios_base::out | ( bAppendToFile ? std::ios_base::app : std::ios_base::trunc ) );
if ( !m_fileStream.is_open() ) {
throwError( __FUNCTION__ + std::string( " can not open file for writing" ) );
}
}
void TextFileWriter::write( const std::string& str ) {
m_fileStream << str;
}
} // namespace pro
Now to see a sample of this in action. If you look in the Logger class you will already see a use of the TextFileWriter.
main.cpp
#include "stdafx.h"
#include "Logger.h"
#include "TextFileReader.h"
#include "TextFileWriter.h"
int _tmain( int iNumArguments, _TCHAR* pArgumentText[] ) {
using namespace pro;
try {
// This Is Using The TextFileWriter & Console Output
// Logger::TYPE_INFO is by default!
Logger logger( "logger.txt" );
logger.log( "Some Info" );
logger.log( "Error!", Logger::TYPE_ERROR );
logger.log( "Warning!", Logger::TYPE_WARNING );
TextFileReader textReaderSingle( "logger.txt" );
TextFileReader textReaderAll( "logger.txt" );
std::string strTextSingle;
std::string strTextAll;
textReaderSingle.readLine( strTextSingle );
std::cout << "Read Single Line: << std::endl << strText << std::endl << std::endl;
strTextAll = textReaderAll.readAll();
std::cout << "Read All: " << std::endl << strTextAll << std::endl;
//Check The logger.txt that was generated in your projects folder!
std::cout << "Press any key to quit" << std::endl;
_getch();
} catch ( ExceptionHandler& e ) {
std::cout << "Exception Thrown: " << e.getMessage() << std::endl;
std::cout << "Press any key to quit" << std::endl;
_getch();
return RETURN_ERROR;
} catch( ... ) {
std::cout << __FUNCTION__ << " Caught Unknown Exception" << std::endl;
std::cout << "Press any key to quit" << std::endl;
_getch();
return RETURN_ERROR;
}
return RETURN_OK;
}
A majority of this work is accredited to Marek A. Krzeminski, MASc at www.MarekKnows.com. In essence all of these class objects are his; the only major difference is I used my own namespace pro as opposed to his. Both main functions are of my own work, the first stand alone and the second using his library code.
This is a project that has been in the works for a few years now and most of my advanced knowledge of the C++ language is due to following his video tutorials. This current project is a fairly large scale professional quality GameEngine using Shaders in OpenGL. All of this has been typed and debugged by hand while following along his tutorials.
As a major note; I had also hand typed a majority of this here as well, if this does not compile correctly it may be due to typographical errors. The source it self is a working application. What you see here is a very small percentage of his works! I am willing to accept the credit as to answer this person's question with the accumulation of this knowledge but I can not accept credit for this as being my own work in order to protect Marek and his copyright materials.
With this kind of setup it is quite easy to create your own file parsers and multiple file parsers for different types of files. As I stated above there are two other classes inherited from FileHandler that I did not show. If you would like to see more of this project, please vist www.MarekKnows.com and join the community.
I am going through and comparing a bunch of DNA sequences to find if it is a subset of another. I remove those that are subsets of another.
I'm using a linked list and I keep getting a segmentation fault somewhere around the output of the data back to the output file.
I'd also greatly appreciate feedback on overall code structure. I know its rather messy so I figured someone could point out some things that should be improved on.
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
using namespace std;
/*
* Step 1. Load all sequences and their metadata into structures.
*
* Step 2. Start n^2 operation to compare sequences.
*
* Step 3. Output file back to a different fasta file.
*/
typedef struct sequence_structure sequence_structure;
struct sequence_structure
{
char *sequence;
char *id;
char *header;
sequence_structure *next_sequence_structure;
sequence_structure *previous_sequence_structure;
int length;
};
int main(int argc, char *argv[])
{
FILE *input_file;
ofstream output_file;
/* this is the TAIL of the linked list. This is a reversed linked list. */
sequence_structure *sequences;
int first_sequence = 0;
char *line = (char*) malloc( sizeof( char ) * 1024 );
if( argc != 3 )
{
printf("This program requires a input file and output file as its argument!\n");
return 0;
}
else
{
/* let's read the input file. */
input_file = fopen( argv[1], "r" );
}
while( !feof(input_file) )
{
string string_line;
fgets( line, 2048, input_file );
string_line = line;
if( string_line.length() <= 2 )
break;
if( string_line.at( 0 ) == '>' )
{
sequence_structure *new_sequence = (sequence_structure *) malloc( sizeof( sequence_structure ) );
new_sequence->id = (char *) malloc( sizeof( char ) * ( 14 + 1 ) );
string_line.copy( new_sequence->id, 14, 1 );
(new_sequence->id)[14] = '\0';
stringstream ss ( string_line.substr( 23, 4 ) );
ss >> new_sequence->length;
new_sequence->header = (char *) malloc( sizeof(char) * ( string_line.length() + 1 ) );
string_line.copy( new_sequence->header, string_line.length(), 0 );
(new_sequence->header)[string_line.length()] = '\0';
fgets( line, 2048, input_file );
string_line = line;
new_sequence->sequence = (char *) malloc( sizeof(char) * ( string_line.length() + 1 ) );
string_line.copy( new_sequence->sequence, string_line.length(), 0 );
(new_sequence->sequence)[string_line.length()] = '\0';
if( first_sequence == 0 )
{
sequences = new_sequence;
sequences->previous_sequence_structure = NULL;
first_sequence = 1;
}
else
{
sequences->next_sequence_structure = new_sequence;
new_sequence->previous_sequence_structure = sequences;
sequences = new_sequence;
}
}
else
{
cout << "Error: input file reading error." << endl;
}
}
fclose( input_file );
free( line );
sequence_structure *outer_sequence_node = sequences;
while( outer_sequence_node != NULL )
{
sequence_structure *inner_sequence_node = sequences;
string outer_sequence ( outer_sequence_node->sequence );
while( inner_sequence_node != NULL )
{
string inner_sequence ( inner_sequence_node->sequence );
if( outer_sequence_node->length > inner_sequence_node->length )
{
if( outer_sequence.find( inner_sequence ) != std::string::npos )
{
cout << "Deleting the sequence with id: " << inner_sequence_node->id << endl;
cout << inner_sequence_node->sequence << endl;
cout << "Found within the sequence with id: " << outer_sequence_node->id << endl;
cout << outer_sequence_node->sequence << endl;
sequence_structure *previous_sequence = inner_sequence_node->previous_sequence_structure;
sequence_structure *next_sequence = inner_sequence_node->next_sequence_structure;
free( inner_sequence_node->id );
free( inner_sequence_node->sequence );
free( inner_sequence_node->header );
if( next_sequence != NULL )
next_sequence->previous_sequence_structure = previous_sequence;
if( previous_sequence != NULL )
{
inner_sequence_node = previous_sequence;
free( previous_sequence->next_sequence_structure );
previous_sequence->next_sequence_structure = next_sequence;
}
}
}
inner_sequence_node = inner_sequence_node->previous_sequence_structure;
}
outer_sequence_node = outer_sequence_node->previous_sequence_structure;
}
output_file.open( argv[2], ios::out );
while( sequences->previous_sequence_structure != NULL )
{
sequences = sequences->previous_sequence_structure;
}
sequence_structure *current_sequence = sequences;
while( current_sequence->next_sequence_structure != NULL )
{
output_file << current_sequence->header;
output_file << current_sequence->sequence;
current_sequence = current_sequence->next_sequence_structure;
}
output_file << current_sequence->header;
output_file << current_sequence->sequence;
output_file.close();
while( sequences != NULL )
{
cout << "Freeing sequence with this id: " << sequences->id << endl;
free( sequences->id );
free( sequences->header );
free( sequences->sequence );
if( sequences->next_sequence_structure != NULL )
{
sequences = sequences->next_sequence_structure;
free( sequences->previous_sequence_structure );
}
else
{
sequences = NULL;
}
}
return 0;
}