Through this snippet i try to remove a certain file from a directory. Here is the code for that.
/* char* cpathToDeleteGND;
char* cpathToDeleteFST;
char* cpathToDeleteSND;
*/
cout << "Enter the name to be removed : ";
cin.ignore();
getline( cin , fullName );
string pathToDeleteGND = "d:/HostelManager/studentDetails/groundFloor/" + fullName + ".txt";
string pathToDeleteFST = "d:/HostelManager/studentDetails/firstFloor/" + fullName + ".txt";
string pathToDeleteSND = "d:/HostelManager/studentDetails/secondFloor/" + fullName + ".txt";
ifstream checkToDeleteGND( pathToDeleteGND );
ifstream checkToDeleteFST( pathToDeleteFST );
ifstream checkToDeleteSND( pathToDeleteSND );
cpathToDeleteGND = new char[ pathToDeleteGND.size() + 1 ];
cpathToDeleteFST = new char[ pathToDeleteFST.size() + 1 ];
cpathToDeleteSND = new char[ pathToDeleteSND.size() + 1 ];
strcpy( cpathToDeleteGND , pathToDeleteGND.c_str() );
strcpy( cpathToDeleteFST , pathToDeleteFST.c_str() );
strcpy( cpathToDeleteSND , pathToDeleteSND.c_str() );
if( checkToDeleteGND ) {
if( remove( cpathToDeleteGND) == 0 ) {
cout << "\nStudent details cleared successfully !";
}
} else if( checkToDeleteFST) {
if( remove( cpathToDeleteFST) == 0 ) {
cout << "\nStudent details cleared successfully ! ";
}
} else if( checkToDeleteSND ) {
if( remove( cpathToDeleteSND) == 0 ) {
cout << "\nStudent details cleared successfully !";
}
} else {
cout << "\nIt seems that either the student has already been removed or does not exist.";
}
I give the name that should be removed from the directory.Though the if else blocks work but the remove function does not work. I can't understand the reason ..
For example , the output goes like :
Enter the name to be removed : diana
Press any key to continue . . .
The file diana.txt existed that's why it didn't execute the last else block . But the remove function does not work. Why is that ?
The file diana.txt existed that's why it didn't execute the last else block . But the remove function does not work. Why is that?
You don't know because you only print a message if remove succeeds. Try:
if (remove(pathToDeleteFST.c_str()) == 0) {
// success, print something
} else {
// failure, much more interesting
cout << "Can't remove " << pathToDeleteFST << ": "
<< strerror(errno) << endl;
}
errno is in errno.h, strerror in string.h.
(Instead of opening the file to check whether it exists, you could also charge ahead and try to remove it. As #n.m. notes, that may even be necessary on Windows.)
You open the file before deleting it. Windows won't delete files which are open by someone. Do not check for existence of a file by opening it, use stat or just call remove without checking.
This is a very old question but may still be relevant for people who come across this looking for a solution.
The reason this particular example can't delete the file properly is because the files to be deleted are currently being used by the code inside of the iftream variables for each.
Before you can remove and files, you must first .close() them.
If you are having trouble with deleting files that you've previously loaded via fstream, ensure that you have properly closed them before trying to use the remove function.
Hope this helps.
There are a couple of issues with your code, but the probable reason
remove fails is that the file is open; Windows will not delete an open
file. You should probably refactor a lot of this into separate
functions. If you used a function like the following, for example:
bool
fileExists( std::string const& filename )
{
return std::ifstream( filename.c_str() );
}
, the problem wouldn't occur. There are other reasons why the open
might fail, but this is a good rough first approximation. (There are
better ways of testing the existance of a file, but they are system
dependent.)
Related
So far, I've tried (without success):
QJsonDocument – "document too large" (looks like the max size is artificially capped at 1 << 27 bytes)
Boost.PropertyTree – takes up 30 GB RAM and then segfaults
libjson – takes up a few gigs of RAM and then segfaults
I'm gonna try yajl next, but Json.NET handles this without any issues so I'm not sure why it should be such a big problem in C++.
Check out https://github.com/YasserAsmi/jvar. I have tested it with a large database (SF street data or something, which was around 2GB). It was quite fast.
Well, I'm not proud of my solution, but I ended up using some regex to split my data up into top-level key-value pairs (each one being only a few MB), then just parsed each one of those pairs with Qt's JSON parser and passed them into my original code.
Yajl would have been exactly what I needed for something like this, but I went with the ugly regex hack because:
Fitting my logic into Yajl's callback structure would have involved rewriting enough of my code to be a pain, and this is just for a one-off MapReduce job so the code itself doesn't matter long-term anyway.
The data set is controlled by me and guaranteed to always work with my regex.
For various reasons, adding dependencies to Elastic MapReduce deployments is a bigger hassle than it should be (and static Qt compilation is buggy), so for the sake of not doing more work than necessary I'm inclined to keep dependencies to a minimum.
This still works and performs well (both time-wise and memory-wise).
Note that the regex I used happens to work for my data specifically because the top-level keys (and only the top level keys) are integers; my code below is not a general solution, and I wouldn't ever advise a similar approach over a SAX-style parser where reasons #1 and #2 above don't apply.
Also note that this solution is extra gross (splitting and manipulating JSON strings before parsing + special cases for the start and end of the data) because my original expression that captured the entire key-value pairs broke down when one of the pairs happened to exceed PCRE's backtracking limit (it's incredibly annoying in this case that that's even a thing, especially since it's not configurable through either QRegularExpression or grep).
Anyway, here's the code; I am deeply ashamed:
QFile file( argv[1] );
file.open( QIODevice::ReadOnly );
QTextStream textStream( &file );
QString jsonKey;
QString jsonString;
QRegularExpression jsonRegex( "\"-?\\d+\":" );
bool atEnd = false;
while( atEnd == false )
{
QString regexMatch = jsonRegex.match
(
jsonString.append( textStream.read(1000000) )
).captured();
bool isRegexMatched = regexMatch.isEmpty() == false;
if( isRegexMatched == false )
{
atEnd = textStream.atEnd();
}
if( atEnd || (jsonKey.isEmpty() == false && isRegexMatched) )
{
QString jsonObjectString;
if( atEnd == false )
{
QStringList regexMatchSplit = jsonString.split( regexMatch );
jsonObjectString = regexMatchSplit[0]
.prepend( jsonKey )
.prepend( LEFT_BRACE )
;
jsonObjectString = jsonObjectString
.left( jsonObjectString.size() - 1 )
.append( RIGHT_BRACE )
;
jsonKey = regexMatch;
jsonString = regexMatchSplit[1];
}
else
{
jsonObjectString = jsonString
.prepend( jsonKey )
.prepend( LEFT_BRACE )
;
}
QJsonObject jsonObject = QJsonDocument::fromJson
(
jsonObjectString.toUtf8()
).object();
QString key = jsonObject.keys()[0];
... process data and store in boost::interprocess::map ...
}
else if( isRegexMatched )
{
jsonKey = regexMatch;
jsonString = jsonString.split( regexMatch )[1];
}
}
I've recently finished (probably still a bit beta) such a library:
https://github.com/matiu2/json--11
If you use the json_class .. it'll load it all into memory, which is probably not what you want.
But you can parse it sequentially by writing your own 'mapper'.
The included mapper, iterates through the JSON, mapping the input to JSON classes:
https://github.com/matiu2/json--11/blob/master/src/mapper.hpp
You could write your own that does whatever you want with the data, and feed a file stream into it, so as not to load the whole lot into memory.
So as an example to get you started, this just outputs the json data in some random format, but doesn't fill up the memory any (completely untested nor compiled):
#include "parser.hpp"
#include <fstream>
#include <iterator>
#include <string>
int main(int argc, char **) {
std::ifstream file("hugeJSONFile.hpp");
std::istream_iterator<char> input(file);
auto parser = json::Parser(input);
using Parser = decltype(parser);
using std::cout;
using std::endl;
switch (parser.getNextType()) {
case Parser::null:
parser.readNull();
cout << "NULL" << endl;
return;
case Parser::boolean:
bool val = parser.readBoolean();
cout << "Bool: " << val << endl;
case Parser::array:
parser.consumeOneValue();
cout << "Array: ..." << endl;
case Parser::object:
parser.consumeOneValue();
cout << "Map: ..." << endl;
case Parser::number: {
double val = parser.readNumber<double>();
cout << "number: " << val << endl;
}
case Parser::string: {
std::string val = parser.readString();
cout << "string: " << val << endl;
}
case Parser::HIT_END:
case Parser::ERROR:
default:
// Should never get here
throw std::logic_error("Unexpected error while parsing JSON");
}
return 0;
}
Addendum
Originally I had planned for this library to never copy any data. eg. read a string just gave you a start and end iterator to the string data in the input, but because we actually need to decode the strings, I found that methodology too impractical.
This library automatically converts \u0000 codes in JSON to utf8 encoding in standard strings.
When dealing with records you can for example format your json and use the newline as a separator between objects, then parse each line separately eg:
"records": [
{ "someprop": "value", "someobj": { ..... } ... },
.
.
.
or:
"myobj": {
"someprop": { "someobj": {}, ... },
.
.
.
I just faced the same problem with Qt's 5.12 JSON support. Fortunately starting with Qt 5.15 (64 Bit) reading of large JSON files (I tested 1GB files) works flawlessly.
I have a file called "sequence_30.dat" that contains a sequence of 1 and -1 in a vertical representation (i.e.: each 1 or -1 is in a separate line) .. I am trying to read the file for another operation using the following code:
int length = 31
QFile file("sequence_"+ (static_cast<QString>(length)) +".dat");
if(file.exists()){
file.open(QIODevice::ReadOnly);
if(file.isOpen()){
....
....
}
file.close();
}
but when debugging, the compiler skips the "if(file.exists())" statement and when it is removed the compiler again skips the "if(file.isOpen())" statement
I am very sure that path is correct, but if is not how to make sure that I am in the right path (i.e.: is there is a way to check where am I reading from) .. and if the path is correct why my file is not opening ?
static_cast<QString>(length)
Should be:
QString::number( length )
You can check it by just printing it out to the console:
cout << qPrintable( QString( "sequence_" ) +
QString::number( length ) + ".dat" ) << endl;
static_cast doesn't work that way, so instead of a static_cast, you should use QString::number to convert an int into a QString.
I've looked around a bit and have found no definitive answer on how to read a specific line of text from a file in C++. I have a text file with over 100,000 English words, each on its own line. I can't use arrays because they obviously won't hold that much data, and vectors take too long to store every word. How can I achieve this?
P.S. I found no duplicates of this question regarding C++
while (getline(words_file, word))
{
my_vect.push_back(word);
}
EDIT:
A commenter below has helped me to realize that the only reason loading a file to a vector was taking so long was because I was debugging. Plainly running the .exe loads the file nearly instantaneously. Thanks for everyones help.
If your words have no white-space (I assume they don't), you can use a more tricky non-getline solution using a deque!
using namespace std;
int main() {
deque<string> dictionary;
cout << "Loading file..." << endl;
ifstream myfile ("dict.txt");
if ( myfile.is_open() ) {
copy(istream_iterator<string>(myFile),
istream_iterator<string>(),
back_inserter<deque<string>>(dictionary));
myfile.close();
} else {
cout << "Unable to open file." << endl;
}
return 0;
}
The above reads the entire file into a string and then tokenizes the string based on the std::stream default (any whitespace - this is a big assumption on my part) which makes it slightly faster. This gets done in about 2-3 seconds with 100,000 words. I'm also using a deque, which is the best data structure (imo) for this particular scenario. When I use vectors, it takes around 20 seconds (not even close to your minute mark -- you must be doing something else that increases complexity).
To access the word at line 1:
cout << dictionary[0] << endl;
Hope this has been useful.
You have a few options, but none will automatically let you go to a specific line. File systems don't track line numbers within files.
One way is to have fixed-width lines in the file. Then read the appropriate amount of data based upon the line number you want and the number of bytes per line.
Another way is to loop, reading lines one a time until you get to the line that you want.
A third way would be to have a sort of index that you create at the beginning of the file to reference the location of each line. This, of course, would require that you control the file format.
I already mentioned this in a comment, but I wanted to give it a bit more visibility for anyone else who runs into this issue...
I think that the following code will take a long time to read from the file because std::vector probably has to re-allocate its internal memory several times to account for all of these elements that you are adding. This is an implementation detail, but if I understand correctly std::vector usually starts out small and increases its memory as necessary to accommodate new elements. This works fine when you're adding a handful of elements at a time, but is really inefficient when you're adding a thousand elements at once.
while (getline(words_file, word)) {
my_vect.append(word); }
So, before running the loop above, try to initialize the vector with my_vect(100000) (constructor with the number of elements specified). This forces std::vector to allocate enough memory in advance so that it doesn't need to shuffle things around later.
The question is exceedingly unclear. How do you determine the specific
line? If it is the nth line, simplest solution is just to call
getline n times, throwing out all but the last results; calling
ignore n-1 times might be slightly faster, but I suspect that if
you're always reading into the same string (rather than constructing a
new one each time), the difference in time won't be enormous. If you
have some other criteria, and the file is really big (which from your
description it isn't) and sorted, you might try using a binary search,
seeking to the middle of the file, reading enough ahead to find the
start of the next line, then deciding the next step according to it's
value. (I've used this to find relevant entries in log files. But
we're talking about files which are several Gigabytes in size.)
If you're willing to use system dependent code, it might be advantageous
to memory map the file, then search for the nth occurance of a '\n'
(std::find n times).
ADDED: Just some quick benchmarks. On my Linux box, getting the
100000th word from /usr/share/dict/words (479623 words, one per line,
on my machine), takes about
272 milliseconds, reading all words
into an std::vector, then indexing,
256 milliseconds doing the same, but
with std::deque,
30 milliseconds using getline, but
just ignoring the results until the
one I'm interested in,
20 milliseconds using
istream::ignore, and
6 milliseconds using mmap and
looping on std::find.
FWIW, the code in each case is:
For the std:: containers:
template<typename Container>
void Using<Container>::operator()()
{
std::ifstream input( m_filename.c_str() );
if ( !input )
Gabi::ProgramManagement::fatal() << "Could not open " << m_filename;
Container().swap( m_words );
std::copy( std::istream_iterator<Line>( input ),
std::istream_iterator<Line>(),
std::back_inserter( m_words ) );
if ( static_cast<int>( m_words.size() ) < m_target )
Gabi::ProgramManagement::fatal()
<< "Not enough words, had " << m_words.size()
<< ", wanted at least " << m_target;
m_result = m_words[ m_target ];
}
For getline without saving:
void UsingReadAndIgnore::operator()()
{
std::ifstream input( m_filename.c_str() );
if ( !input )
Gabi::ProgramManagement::fatal() << "Could not open " << m_filename;
std::string dummy;
for ( int count = m_target; count > 0; -- count )
std::getline( input, dummy );
std::getline( input, m_result );
}
For ignore:
void UsingIgnore::operator()()
{
std::ifstream input( m_filename.c_str() );
if ( !input )
Gabi::ProgramManagement::fatal() << "Could not open " << m_filename;
for ( int count = m_target; count > 0; -- count )
input.ignore( INT_MAX, '\n' );
std::getline( input, m_result );
}
And for mmap:
void UsingMMap::operator()()
{
int input = ::open( m_filename.c_str(), O_RDONLY );
if ( input < 0 )
Gabi::ProgramManagement::fatal() << "Could not open " << m_filename;
struct ::stat infos;
if ( ::fstat( input, &infos ) != 0 )
Gabi::ProgramManagement::fatal() << "Could not stat " << m_filename;
char* base = (char*)::mmap( NULL, infos.st_size, PROT_READ, MAP_PRIVATE, input, 0 );
if ( base == MAP_FAILED )
Gabi::ProgramManagement::fatal() << "Could not mmap " << m_filename;
char const* end = base + infos.st_size;
char const* curr = base;
char const* next = std::find( curr, end, '\n' );
for ( int count = m_target; count > 0 && curr != end; -- count ) {
curr = next + 1;
next = std::find( curr, end, '\n' );
}
m_result = std::string( curr, next );
::munmap( base, infos.st_size );
}
In each case, the code is run
You could seek to a specific position, but that requires that you know where the line starts. "A little less than a minute" for 100,000 words does sound slow to me.
Read some data, count the newlines, throw away that data and read some more, count the newlines again... and repeat until you've read enough newlines to hit your target.
Also, as others have suggested, this is not a particularly efficient way of accessing data. You'd be well-served by making an index.
Yesterday I discovered an odd bug in rather simple code that basically gets text from an ifstream and tokenizes it. The code that actually fails does a number of get()/peek() calls looking for the token "/*". If the token is found in the stream, unget() is called so the next method sees the stream starting with the token.
Sometimes, seemingly depending only on the length of the file, the unget() call fails. Internally it calls pbackfail() which then returns EOF. However after clearing the stream state, I can happily read more characters so it's not exactly EOF..
After digging in, here's the full code that easily reproduces the problem:
#include <iostream>
#include <fstream>
#include <string>
//generate simplest string possible that triggers problem
void GenerateTestString( std::string& s, const size_t nSpacesToInsert )
{
s.clear();
for( size_t i = 0 ; i < nSpacesToInsert ; ++i )
s += " ";
s += "/*";
}
//write string to file, then open same file again in ifs
bool WriteTestFileThenOpenIt( const char* sFile, const std::string& s, std::ifstream& ifs )
{
{
std::ofstream ofs( sFile );
if( ( ofs << s ).fail() )
return false;
}
ifs.open( sFile );
return ifs.good();
}
//find token, unget if found, report error, show extra data can be read even after error
bool Run( std::istream& ifs )
{
bool bSuccess = true;
for( ; ; )
{
int x = ifs.get();
if( ifs.fail() )
break;
if( x == '/' )
{
x = ifs.peek();
if( x == '*' )
{
ifs.unget();
if( ifs.fail() )
{
std::cout << "oops.. unget() failed" << std::endl;
bSuccess = false;
}
else
{
x = ifs.get();
}
}
}
}
if( !bSuccess )
{
ifs.clear();
std::string sNext;
ifs >> sNext;
if( !sNext.empty() )
std::cout << "remaining data after unget: '" << sNext << "'" << std::endl;
}
return bSuccess;
}
int main()
{
std::string s;
const char* testFile = "tmp.txt";
for( size_t i = 0 ; i < 12290 ; ++i )
{
GenerateTestString( s, i );
std::ifstream ifs;
if( !WriteTestFileThenOpenIt( testFile, s, ifs ) )
{
std::cout << "file I/O error, aborting..";
break;
}
if( !Run( ifs ) )
std::cout << "** failed for string length = " << s.length() << std::endl;
}
return 0;
}
The program fails when the string length gets near the typical multiple=of-2 buffersizes 4096, 8192, 12288, here's the output:
oops.. unget() failed
remaining data after unget: '*'
** failed for string length = 4097
oops.. unget() failed
remaining data after unget: '*'
** failed for string length = 8193
oops.. unget() failed
remaining data after unget: '*'
** failed for string length = 12289
This happens when tested on on Windows XP and 7, both compiled in debug/release mode, both dynamic/static runtime, both 32bit and 64bit systems/compiles, all with VS2008, default compiler/linker options.
No problem found when testing with gcc4.4.5 on a 64bit Debian system.
Questions:
can other people please test this? I would really appreciate some active collaboration form SO.
is there anything that is not correct in the code that could cause the problem (not talking about whether it makes sense)
or any compiler flags that might trigger this behaviour?
all parser code is rather critical for the application and is tested heavily, but off course this problem was not found in the test code. Should I come up with extreme test cases, and if so, how do I do that? How could I ever predict this could cause a problem?
if this really is a bug, where should do I best report it?
is there anything that is not correct in the code that could cause the problem (not talking about whether it makes sense)
Yes. Standard streams are required to have at least 1 unget() position. So you can safely do only one unget() after a call to get(). When you call peek() and the input buffer is empty, underflow() occurs and the implementation clears the buffer and loads a new portion of data. Note that peek() doesn't increase current input location, so it points to the beginning of the buffer. When you try to unget() the implementation tries to decrease current input position, but it's already at the beginning of the buffer so it fails.
Of course this depends on the implementation. If the stream buffer holds more than one character then it may sometimes fail and sometimes not. As far as I know microsoft's implementation stores only one character in basic_filebuf (unless you specify a greater buffer explicitly) and relies on <cstdio> internal buffering (btw, that's one reason why MVS iostreams are slow). Quality implementation may load the buffer again from the file when unget() fails. But it isn't required to do so.
Try to fix your code so you don't need more than one unget() position. If you really need it then wrap the stream with a stream that guarantees that unget() won't fail (look at Boost.Iostreams). Also the code you posted is nonsense. It tries to unget() and then get() again. Why?
I was wondering why im losing the information.
I have three functions thus far.
An option processing function, and two vector functions one for reading a text file and the other for adding a user specifcation to the same vector.
Whats happening is that i read the file and save its contents into a vector and then choose the next option to add a specification. To do that i use push_back. I output ( or debug ) to see if i was succesful. yes. So i choose the option that reads the file again, and im back to where i started. The users spec was lost. And i think its because im allocating it every time i enter that option.
Im a begginner so my code is not up to most programming standards.
Heres my first function which isolates feilds delimted by commas, saved into structure member varaibles, then saved into a vector as elements for each line in the file.
vector<sStruct> * loadFile(char *myTextFile)
{
myStruct
sStruct;
vector<myStruct>
vectorAddress,
*vectorData = new vector<myStruct>
string
feild1, feild2, feild3, feild4;
ifstream
*inFile = new ifstream;
inFile->open( myTextFile, ios::in );
if ( !inFile->good() )
{
cout << "? File Doesnt Exist! " << endl;
}
while ( !inFile->eof() )
{
getline( *inFile, feild1, ',' );
sStruct.m_1 = field1;
getline( *inFile, feild2, ',' );
sStruct.m_2 = field2;
getline( *inFile, field3, ',' );
sStruct.m_3; = feild3
getline( *inFile, feild4 );
sStruct.m_4 = feield4;
vectorData->push_back( sStruct );
}
inFile->clear();
inFile->close();
cout << vectorData->size();
delete inFile; // allocated obj delete to fast why bother?
return vectorData;
}
This function is successful in adding another element into the vector.
vector<sStruct> * addElement(vector<sStruct> *vAddElement)
{
sStruct addElement; // referring to the same struct.
cout << "Enter a String: ";
cin >> addElement.feild1
vAddElement->push_back( addElement );
cout << vAddElement->size() << endl;
return vAddElement;
}
When im in the first function, i debug my vector object, and the data from the file is saved. ok. So i go to the next function and add a string to the struct member that has the first feild. Hopefully not overwritting anything. I debug to make sure and nope, its all good, push_back works nice. But, when i go to my first function. Everythingn is back as it was when is started.
I know its because im reading the file there, and allocating each time i enter that function. Is there a way to prevent this?
Your function addElement() gets a parameter vAddElement, but you are pushing into vectorData...?!?
This is because you are creating a new instance of the vector each time you enter loadFile() method. If you want to preserve the contents of the vector, then don't create the vector object inside loadFile. Create it outside this function ( probably from the one which calls loadFile()) and pass it to this function.
Make the changes to loadFile() function to take vector object by reference and update the same in loadfile().
Something like this:
bool loadFile(char *myTextFile, vector<sStruct>& vectorData_out)
{
//Read the file
//push_back the element into vectorData_out
//vectorData_out.push_back() ...code to push
}
2.Then change the addElement to accept vectorData_out by reference:
bool addElement(vector<sStruct>& vAddElement_out)
{
//initilization code for addElement
vAddElement_out.push_back( addElement );
cout << vAddElement->size() << endl;
}
Now your calling code looks line this:
std::vector<sStruct> aVectorData;
loadFile("filename.txt", aVectorData);
addElement(aVectorData);
EDIT: Also, try avoiding allocating on heap unless it is absolutely necessary
Are the user specified fields in a single line or spread across multiple lines?
getline( *inFile, feild1, ',' );
sStruct.m_1 = field1;
getline( *inFile, feild2, ',' );
sStruct.m_2 = field2;
getline( *inFile, field3, ',' );
sStruct.m_3; = feild3
getline( *inFile, feild4 );
sStruct.m_4 = feield4;
The code snippet above reads in four lines. Can you paste the first few lines of your user input file?