I'm working on a project that requires me to load some of the data from an XML file on to a GUI. The GUI allows the user to make some changes to the data. What I want to be able to do is to save these changes back onto the XML file.
I know it is possible to rewrite the whole file but the file is pretty huge, and not all the data in the file is being changed or even being used in my program.
This is my first project working with TinyXML and C++ Builder. I am just looking for some suggestions as to how I should approach this.
Unless you are certain that the new text will be exactly the same size as the old, rewriting only part of a text file is not a good idea in general. There are file formats where piecemeal replacement is possible. XML is not one of them. Not in the general case, at least.
Inserting data in the middle of a file, thus moving the rest down, is basically equivalent to loading the rest of the file, making the file bigger, and writing it back. So you may as well just load the entire file, make your modifications, and save it again. Your code will be simpler and likely not much slower.
And no, a SAX parser isn't going to help you here. It allows you to stream reading (though I would suggest a pull parser rather than a push one), but that's not going to allow you to insert data into the file. That's generally not supported by most XML parsers I know of. They can write data, but writing and non-destructively inserting are two different things.
TinyXml will let you do what you want without damaging the file contents (as long as its valid xml). I just checked this so I am quite certain. Obviously you have to know and precisely what attributes and tags you want to edit, but you can add/edit tags without affecting existing attributes/tags/comments even within the tags you edit. It will take a while until you get used to the structure, but it is definitely possible.
You have to know the structure of the xml!
TiXmlDocument doc("filepath"); //will open your document
if (!doc.LoadFile()) //you do have to open the whole file
{
cout<<"No XML structure found"<<endl;
return; // exit function don't load anything
}
TiXmlElement *root = doc.RootElement(); //pointer to root element
Now you can use this pointer and commands like:
root->FirstChild("tageone")->ToElement();
tageone->SetDoubleAttribute("attribute", value);
to change stuff.
Sorry for the rushed explanation, but you'll need to read through the documentation a bit to get the hang of it.
cheers
Update
As I said in the comment, I don't think that you are better off if you insert into the middle of a file. However, if you need/want additional security I suggest two additional steps:
perform a sanity check of the xml file at all the important steps. This can be anything where you make sure that the file you are reading is really what you need.
calculate a checksum over the content of the whole file before saving and check it afterwards. This does not necessarily need to be a CRC, I just named the function calculate_crc(). Anything that lets you verify the integrity of the data is good.
I would do this approximately as follows (pseudocode):
TiXmlDocument doc( "demo.xml" );
doc.LoadFile();
perform_sanitycheck(doc);
// do whatever you need to change
perform_sanitycheck(doc);
unsigned int crc = calculate_crc(doc);
doc.SaveFile("temp_name.xml"); // save the file under another name
TiXmlDocument doc2( "temp_name.xml" );
perform_sanitycheck(doc2);
if(verify_crc(doc, crc))
{
delete_file("demo.xml");
rename_file("temp_name.xml", "demo.xml");
}
The sanity check would take the appropriate action if necessary. You need to substitute the two function delete_file() and rename_file() with an API or library function for your environment.
The functions calculate_crc() and verify_crc() could be specifically crafted to check only the parts that you need to have unchanged.
Related
So I noticed when I want to write external data for my program I have to use and that inserts my data I want into a notepad. What Im wondering is, say I wanted to write to a file that my users couldn't edit, like a file that would hold scores and such for a game that I dont want players to be able to edit manually. Would such a thing be possible through c++ standard library alone, or would I need some other library? And I understand some programs may be able to read it, but Im more oriented towards wether people can read it by simply looking at the notepad.
You say you just want to prevent people from easily using notepad.exe to see and edit the file content. That can be done by writing the data as binary rather than text:
std::ofstream out("score.dat", std::ios::binary);
std::uint32_t score = 12000;
out.write(reinterpret_cast<char*>(&score), sizeof score);
However it's still trivial for users to see and modify the data using a hex editor.
You could make it require a bit more work by encrypting the data first, but given that the program must have all the information necessary to read and write the file it's still pretty easy to get around.
I wanted to know what my options are for storing strings in a QT application.
One of my major requirements in not re-building the entire project or any file in-case one string changes and also have all the strings in one place.In short I would like to have the strings in one place and extract them during Application startup
I've used all of the elements talked about in above answers.
XML, JSON, QSettings w/ Ini files, tr()
All of them can do it just fine. I've put together some notes on the different options:
QTranslator
Qt Linguist and the tr() tags are designed to take your stock language and translate it into another language. Keeping track of multiple versions of the english translation and modifying/releasing without Qt Linguist is almost impossible. Qt Linguist is required to "release" your translations and convert them from a TS file (translation source) to an optimized QM file.
The QM file format is a compact binary format that is used by the localized application. It provides extremely fast lookups for translations.
Here is what using a translation file looks like:
QTranslator translator;
translator.load("hellotr_la");
app.installTranslator(&translator);
http://qt-project.org/doc/qt-4.8/qtranslator.html#details
I think using QTranslator for a few string changes may be a weird use case, unless you are using for localizing your program. But like the docs say, it is optimized for very fast look ups of string replacements.
QXMLStreamReader
The stream reader is "recommended" way to access XML files, or at least with better support. You write your own files for organizing it, or you write code to generate the XML.
<STRING_1>Some string</STRING_1>
Here is what it looks like to navigate into xml.
QXmlStreamReader xml;
...
while (!xml.atEnd()) {
xml.readNext();
... // do processing
}
if (xml.hasError()) {
... // do error handling
}
XML is very similar to Json, but with larger files and the start and end tags are longer. There are a lot more stock readers out there for XML. It is also a lot more human readable in many cases because so many people know html and they are very similar.
QJsonDocument
The JSON suppport in Qt 5 looks really good. I haven't built a project with it quite yet It is as easy as it looks, and as far as accessing and setting, it looks just like using a dictionary or a map or a vector.
UPDATE: You just pass around a pointer into your QJsonDocument or your QJsonObject or your QJsonArray as you are navigating deeper or appending more onto your Json file. And when you are done you can save it as a binary file, or as a clear text, human readable file, with proper indentation and everything!
How to create/read/write JSon files in Qt5
Json seems to be turning into the replacement for XML for many people. I like the example of using Json to save and load the state of a role playing game.
http://qt-project.org/doc/qt-5/qtcore-savegame-example.html
QSettings
QSettings is one of my favorites, just because it has been supported for so long, and it is how most persistent settings should be saved and accessed.
When I use it, to take advantage of the defaults and fall back mechanisms, I put this in my main.cpp:
QCoreApplication::setOrganizationName("MySoft");
QCoreApplication::setOrganizationDomain("mysoft.com");
QCoreApplication::setApplicationName("Star Runner");
And because I sometimes find a need to edit these setting by hand in windows, I use the Ini format.
QSettings::setDefaultFormat(QSettings::IniFormat); // also in main.cpp
Then when I deploy my exe, and I want to have particular value loaded instead of the hardcoded defaults, the installer drops the main fallback into
C:/ProgramData/MySoft/Star Runner.ini
And when the program saves a change at runtime, it gets saved to:
C:/Users/<username>/AppData/Roaming/MySoft/Star Runner.ini
And then throughout my program if I need to get a setting or set a setting, it takes 3 lines of code or less.
// setting the value
QSettings s;
s.setValue("Strings/string_1", "new string");
// getting the value
QString str;
QSettings s;
str = s.value("Strings/string_1", "default string").toString();
And here is what your ini file would look like:
[Strings]
string_1=default string
QSettings is the way to go if you are storing a few strings you want to change on deployment or at runtime. (or if a checkbox is now checked, or your window size and position, or the recent files list or whatever).
QSettings has been optimized quite a bit and is well thought out. The ini support is awesome, with the exception that it sometimes reorders groups and keys (usually alphabetically), and it may drop any comments you put in it. I think ini comments are either started with a ; or a #.
Hope that helps.
One way to do this would be to put it in a shared library. This way you can only recompile the shared library, but not the whole project. Another approach would be to put it in a file or a database and load it at runtime.
And of course you have to check your include dependencies. If you are including the headers everywhere, the compiler will rebuild everything that depends on it, even if the header is not really needed.
Another possible solution is to replace all strings with default ones inside tr() calls and use Qt Linguist to manage all the strings.
You'll also be able to load all the "translations" from external .qm file on startup.
It is simple: Store your volatile strings in QSettings http://qt-project.org/doc/qt-4.8/qsettings.html (you can use ini files or registry) or an XML file which will make you application configurable.
Edit: Well, after thinking about it a few more minutes, Guilherme's comment is right. QSettings will need to be initialized somehow (either manually or from some default values in your code)... and to be honest manual editing registry to change a few strings is not the brightest idea. So I conclude that XML or JSON is definitely better. It has advantages, for example you can keep several config files which allow you for switching languages at runtime etc.
I'm currently brainstorming a financial program that will deal with (over time) fairly large amounts of data. It will be a C++/Qt GUI app.
I figure reading all the data into memory at runtime is out of the question because given enough data, it might hog too much memory.
I'm trying to come up with a way to read into memory only what I need, for example, if I have an account displayed, only the data that is actually being displayed (and anything else that is absolutely necessary). That way the memory footprint could remain small even if the data file is 4gb or so.
I thought about some sort of searching function that would slowly read the file line by line and find a 'tag' or something identifying the specific data I want, and then load that, but considering this could theoretically happen every time there's a gui update that seems like a terrible way to go.
Essentially I want to be able to efficiently locate specific data in a file, read only that into memory, and possibly change it and write it back without reading and writing the whole file every time. I'm not an experienced programmer and my googling for ideas hasn't been very successful.
Edit: I should probably mention I intend to use Qt's fancy QDataStream related classes to store the data. In other words the file will likely be binary and not easily searchable line by line like a text file.
Okay based on your comments.
Start simple. Forget about your fiscal application for now, except as background. So suitable example for your file system
One data type e.g accounts.
Start with fixed width columns giving you a fixed width record.
One file for data
Have another file for the index of account number
Do Insert, Update and Delete, you'll learn a lot.
For instance.
Delete, you could find the index and the data, move them out and rebuild both files.
You could have a an internal field on the account record, that indicated it had been deleted, set that in data, and just remove the index. The latter is also rewrite the entire file though. You could put the delete flag in the index file instead...
When inserting do you want to append, do you want to find a deleted record and reuse that slot?
Is your index just going to be a straight list of accounts and position, or dovyouvwant to hash it, use a tree. You could spend a weeks if not months just looking at indexing strategies alone.
Happy learning anyway. It will be interesting to help with your future questions.
Well a lot of questions have been made about parsing XML in C++ and so on...
But, instead of a generic problem, mine is very specific.
I am asking for a very efficient XML parser for C++. In particular I have a VERY VERY BIG XML file to parse.
My application must open this file and retrieve data. It must also insert new nodes and save the final result in the file again.
To do this I used, at the beginning, rapidxml, but it requires me to open the file, parse it all (all the content because this lib has no functions to access the file directly without loading the entire tree first), then edit the tree, modify it and store the final tree on the file by overwriting it... It consumes too much resources.
Is there an XML parser that does not require me to load the entire file, but that I can use to insert, quickly, new nodes and retrieve data? Can you please indicate solutions for this problem of mine?
You want a streaming XML parser rather than what is called a DOM parser.
There are two types of streaming parsers: pull and push. A pull parser is good for quickly writing XML parsers that load data into program memory. A push parser is good for writing a program to translate one document to another (which is what you are trying to accomplish). I think, therefore, that a push parser would be best for your problem.
In order to use a push parser, you need to write what is essentially an event handler for parsing events. By "parsing event", I mean events like "start tag reached", "end tag reached", "text found", "attribute parsed", etc.
I suggest that as you read in the document, you write out the transformed document to a separate, temporary file. Thus, your XML parsing event handlers will need to be written so that they are stateful and write out the XML of the translated document incrementally.
Three excellent push parser libraries for C++ include Expat, Xerces-C++, and libxml2.
Search for "SAX parser". They are mostly tokenizers, i.e. they emit tag by tag without building a tree.
SAX parsers are faster than DOM parsers because DOM parsers read the entire file into memory before building an in-memory representation of the XML document, whereas a SAX parser behaves like an event listener and builds the document as it reads in the file. Go here for an explanation.
As you mentioned Xerces is a good C++ SAX parser.
I would recommend looking into ways of breaking the XML document into smaller XML documents as that seems to be part of your problem.
Okay, here is one off the beaten track, I looked at this, but haven't really used it myself, it's called asmxml. These boys claim performance bar none, downside, you need x86 assembler.
If you really seek high performance XML stream parser then libhpxml is likely the right thing for you.
I’m convinced that no XML library exists that allows you to modify a file without loading it first. This simply isn’t possible because files don’t work that way: you cannot insert (or remove) in the middle of a file. You can only overwrite a block of identical size, or append at the end. But your request would require to append or remove in the middle of the file.
Reading only parts of an XML file may be possible. But writing … no way.
Go for template libraries as much as possible, like Boost::property_tree or Boost::XMLParser or POCO::XML and Folly has XML Parser in it.
Avoid old C libraries, it all old code designs.
someone say QtXML module is high performance for huge XML files.
In windows is it possible through an API to write to the middle of a file without overwriting any data and without having to rewrite everything after that?
If it's possible then I believe it will obviously fragment the file; how many times can I do it before it becomes a serious problem?
If it's not possible what approach/workaround is usually taken? Re-writing everything after the insertion point becomes prohibitive really quickly with big (ie, gigabytes) files.
Note: I can't avoid having to write to the middle. Think of the application as a text editor for huge files where the user types stuff and then saves. I also can't split the files in several smaller ones.
I'm unaware of any way to do this if the interim result you need is a flat file that can be used by other applications other than the editor. If you want a flat file to be produced, you will have to update it from the change point to the end of file, since it's really just a sequential file.
But the italics are there for good reason. If you can control the file format, you have some options. Some versions of MS Word had a quick-save feature where they didn't rewrite the entire document, rather they appended a delta record to the end of the file. Then, when re-reading the file, it applied all the deltas in order so that what you ended up with was the right file. This obviously won't work if the saved file has to be usable immediately to another application that doesn't understand the file format.
What I'm proposing there is to not store the file as text. Use an intermediate form that you can efficiently edit and save, then have a step which converts that to a usable text file infrequently (e.g., on editor exit). That way, the user can save as much as they want but the time-expensive operation won't have as much of an impact.
Beyond that, there are some other possibilities.
Memory-mapping (rather than loading) the file may provide efficiences which would speed things up. You'd probably still have to rewrite to the end of the file but it would be happening at a lower level in the OS.
If the primary reason you want fast save is to start letting the user keep working (rather than having the file available to another application), you could farm the save operation out to a separate thread and return control to the user immediately. Then you would need synchronisation between the two threads to prevent the user modifying data yet to be saved to disk.
The realistic answer is no. Your only real choices are to rewrite from the point of the modification, or build a more complex format that uses something like an index to tell how to arrange records into their intended order.
From a purely theoretical viewpoint, you could sort of do it under just the right circumstances. Using FAT (for example, but most other file systems have at least some degree of similarity) you could go in and directly manipulate the FAT. The FAT is basically a linked list of clusters that make up a file. You could modify that linked list to add a new cluster in the middle of a file, and then write your new data to that cluster you added.
Please note that I said purely theoretical. Doing this kind of manipulation under a complete unprotected system like MS-DOS would have been difficult but bordering on reasonable. With most newer systems, doing the modification at all would generally be pretty difficult. Most modern file systems are also (considerably) more complex than FAT, which would add further difficulty to the implementation. In theory it's still possible -- in fact, it's now thoroughly insane to even contemplate, where it was once almost reasonable.
I'm not sure about the format of your file but you could make it 'record' based.
Write your data in chunks and give each chunk an id.
Id could be data offset in file.
At the start of the file you could
have a header with a list of ids so
that you can read records in
order.
At the end of 'list of ids' you could point to another location in the file (and id/offset) that stores another list of ids
Something similar to filesystem.
To add new data you append them at the end and update index (add id to the list).
You have to figure out how to handle delete record and update.
If records are of the same size then to delete you can just mark it empty and next time reuse it with appropriate updates to index table.
Probably the most efficient way to do this (if you really want to do it) is to call ReadFileScatter() to read the chunks before and after the insertion point, insert the new data in the middle of the FILE_SEGMENT_ELEMENT[3] list, and call WriteFileGather(). Yes, this involves moving bytes on disk. But you leave the hard parts to the OS.
If using .NET 4 try a memory-mapped file if you have an editor-like application - might jsut be the ticket. Something like this (I didn't type it into VS so not sure if I got the syntax right):
MemoryMappedFile bigFile = MemoryMappedFile.CreateFromFile(
new FileStream(#"C:\bigfile.dat", FileMode.Create),
"BigFileMemMapped",
1024 * 1024,
MemoryMappedFileAccess.ReadWrite);
MemoryMappedViewAccessor view = MemoryMapped.CreateViewAccessor();
int offset = 1000000000;
view.Write<ObjectType>(offset, ref MyObject);
I noted both paxdiablo's answer on dealing with other applications, and Matteo Italia's comment on Installable File Systems. That made me realize there's another non-trivial solution.
Using reparse points, you can create a "virtual" file from a base file plus deltas. Any application unaware of this method will see a continuous range of bytes, as the deltas are applied on the fly by a file system filter. For small deltas (total <16 KB), the delta information can be stored in the reparse point itself; larger deltas can be placed in an alternative data stream. Non-trivial of course.
I know that this question is marked "Windows", but I'll still add my $0.05 and say that on Linux it is possible to both insert or remove a lump of data to/from the middle of a file without either leaving a hole or copying the second half forward/backward:
fallocate(fd, FALLOC_FL_COLLAPSE_RANGE, offset, len)
fallocate(fd, FALLOC_FL_INSERT_RANGE, offset, len)
Again, I know that this probably won't help the OP but I personally landed here searching for a Linix-specific answer. (There is no "Windows" word in the question, so web search engine saw no problem with sending me here.