Managing %SYS.PTools.SQLStats data

Managing %SYS.PTools.SQLStats data - profiling

I need to profile an application using Caché database and I'm trying to use CacheMonitor for that.
I have enabled query statistics (I suppose CacheMonitor executes DO SetSQLStats^%apiSQL(3) internally) and two days after, my server has gone out of disk space. I'm afraid there is too much data in %SYS.PTools.SQLQuery and %SYS.PTools.SQLStats and I would like to free some space.
Is there any administration tool to manage these data? How can I delete data from sql statistics?
NOTE: My knowledge about Caché is almost none.

It sounds like this is a pretty general problem of how to delete a global and then reclaim the disk space.
To delete the data, you should be able to use a SQL delete statement to clear out %SYS.PTools.SQLStats (which should be larger), and/or %SYS.PTools.SQLQuery.
Since this is Cache, you also might kill the global from the command line. I haven't used these classes, but looking at the class definition in ^oddDEF it appears to store the data in ^%SYS.PTools.SQLQueryD, ^%SYS.PTools.SQLQueryI, and ^%SYS.PTools.SQLQueryS (which is the standard default storage, so this would be likely anyway).
If you only want to delete some of it you will need to craft your own SQL for it.
Once they are deleted, you need to actually shrink the database (like most databases, it can grow dynamically but doesn't automatically give up any space). See this reference for an example of one way to do that. The basic idea is on page 3 - you can make a new database then copy all the data into it, then delete the old once you are sure you don't need it. Don't forget to do a backup first.
To make this easier in the future, you can use the global mapping feature to save the %SYS.PTools globals into their own, new, database. Then when you want to shrink that database you can just replace it with a new one without copying all the data around (as is suggested in the class documentation for %SYS.PTools.SQLStats).

Related

Can you embed files into exe, and update them with each use?

I'm trying to make an application which to manage information about several providers.
Target system is windows and I'll be coding with c++.
The users are not expected to be handy on anything related to computers, so I want to make it as fool-proof as possible. Right now my objective is to distribute only an executable, which should store all the information they introduce in there.
Each user stores information of their own providers, so I don't need the aplication to share the data with other instances. They do upload the information into a preexisting system via csv, but I can handle that easily.
I expect them to introduce new information at least once a month, so I need to update the information embedded. Is that even possible? Making a portable exe and update its information? So far the only portable apps I've seen which allow saving some personification do so by making you drag files along with your exe.
I try to avoid SQL to avoid compatibility problems (for my own applications I use external TXTs and parse the data), but if you people tell me it's the only way, I'll use sql.
I've seen several other questions about embedding files, but it seems all of them are constants. My files need to be updatable
Thanks in advance!
Edit: Thanks everyone for your comments. I've understood that what I want is not worth the problems it'd create. I'll store the data separatedly and make an effort so my coworkers understand what's the difference between an executable and it's data (just like explaining the internet to your grandma's grandma...)

While I wouldn't go as far as to say that it's impossible, it will definitely be neither simple nor pretty nor something anyone should ever recommend doing.
The basic problem is: While your .exe is running, the .exe file is mapped into memory and cannot be modified. Now, one thing you could do is have your .exe, when it's started, create a temporary copy of itself somewhere, start that one, tell the new process where the original image is located (e.g., via commandline arguments), and then have the original exit. That temporary copy could then modify the original image. To put data into your .exe, you can either use Resources, or manually modify the PE image, e.g., using a special section created inside the image to hold your data. You can also simply append arbitrary data at the end of an .exe file without corrupting it.
However, I would like to stress again that I do not recommend actually doing stuff like that. I would simply store data in separate files. If your users are familiar with Excel, then they should be familiar with the idea that data is stored in files…

Should I store a list in memory or in a database and should I build a class to connect to DB?

I am writing a C++ program, I have a class that provides services for the rest of the clases in the program.
I am writing now the clases and the UML.
1) the class that I refer to has a task list that is changing over time and conditions are being checked on this list, I am thinking to keep it in a table in a databasse that every line in the table would represent a task, this way in case that the program crashes or stops working I can restore the last situation, the other option is to keep the task list in memory and keep a copy in the database.
the task list should be searched every second
Which approach is more recommended?
2) In order to write and to read to the database I can call the database directly from the class or build a database communication class, if I write a data communication class I need to give specific options and to build a mini server for this,
e.g. write a line to the database, read a line to the database, update only the first column etc..
what is the recommended approach for this?
Thanks.

First, if the database is obvious and easy, and there are no performance problems, just do that. You're talking about running a query once/second, and maybe marking a task done or adding a new one every so often; even sqlite on a slow SMB share should be able to handle that just fine.
If you do need to optimize it, then there are two approaches: Either still with the database and cache it in-memory, or use memory as your primary storage and come up with a persistence mechanism that uses the database. But until you need to optimize it, don't.
Next, how should you do it? Your question makes it sound like you're thinking in terms of a whole three-tier system, with a "mini-server" sitting between the database server and your task list. There's really no need for that. What you want is a bespoke ORM, but that makes it sound more complicated than it is. All you're doing is writing a class that wraps a database connection and provides a handful of methods—get_due, mark_done, add, get_next_id—each of which maps SQL parameters to Task members. For example (with no error handling):
void mark_done(Task task) {
db.execute("UPDATE Task SET done=true WHERE id=%s", task.id);
}
Three more methods like that, plus a constructor to connect to the database (including creating the Task table if it didn't already exist), and your class is done.
The reason you don't want to write the database stuff directly into Task is that you don't really have anywhere to store shared information like the database connection object; either you need globals (or class attributes, which are effectively globals), or you need copies in every single Task instance (or, really, weak references—which you're going to fake with either a reference or a raw pointer, either way leading to shutdown problems somewhere down the line).
Finally, your whole reason for doing this is error recovery, and databases do a great job of journaling so nothing ever gets inconsistent, but you do have to make sure to structure your app to take advantage of that. For example, you may want to mark all the now-due tasks "in process", then process them, then mark them all "done"; that way, at recovery time, you know exactly which tasks may or may not have been done, and can act appropriately. The more steps you can commit to the database, the less data loss you have to deal with—but of course the more code you have to write, and the slower it gets. So, do as much as necessary, but no more.

Saving information in Database just to recover crashed information may be bit of an overkill.
You ideally want to serialize the list and save it - as binary, xml or csv based values. This can be done based on a timer or certain events in your applications.
Databases may also be used if you can come up with a structure that looks exactly similar to tables - so that you can do one-to-one mapping between the objects and probably write SQL queries easily. But keep that on a separate layer for abstraction.

Updating a field in all records in elasticsearch

I'm new to ElasticSearch, so this is probably something quite trivial, but I haven't figured out anything better that fetching everything, processing with a script and updating the registers one by one.
I want to make something like a simple SQL update:
UPDATE RECORD SET SOMEFIELD = SOMEXPRESSION
My intent is to replace the actual bogus data with some data that makes more sense (so the expression is basically randomly choosing from a pool of valid values).

There are a couple of open issues about making possible to update documents by query.
The technical challenge is that lucene (the text search engine library that elasticsearch uses under the hood) segments are read only. You can never modify an existing document. What you need to do is delete the old version of the document (which by the way will only be marked as deleted till a segment merge happens) and index the new one. That's what the existing update api does. Therefore, an update by query might take a long time and lead to issues, that's why it's not released yet. A mechanism that allows to interrupt running queries would be a nice to have too for this case.
But there's the update by query plugin that exposes exactly that feature. Just beware of the potential risks before using it.

Reading/writing only needed data to/from a large data file to minimize memory footprint

I'm currently brainstorming a financial program that will deal with (over time) fairly large amounts of data. It will be a C++/Qt GUI app.
I figure reading all the data into memory at runtime is out of the question because given enough data, it might hog too much memory.
I'm trying to come up with a way to read into memory only what I need, for example, if I have an account displayed, only the data that is actually being displayed (and anything else that is absolutely necessary). That way the memory footprint could remain small even if the data file is 4gb or so.
I thought about some sort of searching function that would slowly read the file line by line and find a 'tag' or something identifying the specific data I want, and then load that, but considering this could theoretically happen every time there's a gui update that seems like a terrible way to go.
Essentially I want to be able to efficiently locate specific data in a file, read only that into memory, and possibly change it and write it back without reading and writing the whole file every time. I'm not an experienced programmer and my googling for ideas hasn't been very successful.
Edit: I should probably mention I intend to use Qt's fancy QDataStream related classes to store the data. In other words the file will likely be binary and not easily searchable line by line like a text file.

Okay based on your comments.
Start simple. Forget about your fiscal application for now, except as background. So suitable example for your file system
One data type e.g accounts.
Start with fixed width columns giving you a fixed width record.
One file for data
Have another file for the index of account number
Do Insert, Update and Delete, you'll learn a lot.
For instance.
Delete, you could find the index and the data, move them out and rebuild both files.
You could have a an internal field on the account record, that indicated it had been deleted, set that in data, and just remove the index. The latter is also rewrite the entire file though. You could put the delete flag in the index file instead...
When inserting do you want to append, do you want to find a deleted record and reuse that slot?
Is your index just going to be a straight list of accounts and position, or dovyouvwant to hash it, use a tree. You could spend a weeks if not months just looking at indexing strategies alone.
Happy learning anyway. It will be interesting to help with your future questions.

Atomic delete for large amounts of files

I am trying to delete 10000+ files at once, atomically e.g. either all need to be deleted at once, or all need to stay in place.
Of course, the obvious answer is to move all the files into a temporary directory, and delete it recursively on success, but that doubles the amount of I/O required.
Compression doesn't work, because 1) I don't know which files will need to be deleted, and 2) the files need to be edited frequently.
Is there anything out there that can help reduce the I/O cost? Any platform will do.
EDIT: let's assume a power outage can happen anytime.

Kibbee is correct: you're looking for a transaction. However, you needn't depend on either databases or special file system features if you don't want to. The essence of a transaction is this:
Write out a record to a special file (often called the "log") that lists the files you are going to remove.
Once this record is safely written, make sure your application acts just as if the files have actually been removed.
Later on, start removing the files named in the transaction record.
After all files are removed, delete the transaction record.
Note that, any time after step (1), you can restart your application and it will continue removing the logically deleted files until they're finally all gone.
Please note that you shouldn't pursue this path very far: otherwise you're starting to reimplement a real transaction system. However, if you only need a very few simple transactions, the roll-your-own approach might be acceptable.

On *nix, moving files within a single filesystem is a very low cost operation, it works by making a hard link to the new name and then unlinking the original file. It doesn't even change any of the file times.
If you could move the files into a single directory, then you could rename that directory to get it out of the way as a truly atomic op, and then delete the files (and directory) later in a slower, non-atomic fashion.
Are you sure you don't just want a database? They all have transaction commit and rollback built-in.

I think what you are really looking for is the ability to have a transaction. Because the disc can only write one sector at a time, you can only delete the files one at a time. What you need is the ability to roll back the previous deletions if one of the deletes doesn't happen successfully. Tasks like this are usually reserved for databases. Whether or not your file system can do transactions depends on which file system and OS you are using. NTFS on Windows Vista supports Transactional NTFS. I'm not too sure on how it works, but it could be useful.
Also, there is something called shadow copy for Windows, which in the Linux world is called an LVM Snapshot. Basically it's a snapshot of the disc at a point in time. You could take a snapshot directly before doing the delete, and on the chance that it isn't successfully, copy the files back out of the snapshot. I've created shadow copies using WMI in VBScript, I'm sure that similar apis exist for C/C++ also.
One thing about Shadow Copy and LVM Snapsots. The work on the whole partition. So you can't take a snapshot of just a single directory. However, taking a snapshot of the whole disk takes only a couple seconds. So you would take a snapshot. Delete the files, and then if unsucessful, copy the files back out of the snapshot. This would be slow, but depending on how often you plan to roll back, it might be acceptable. The other idea would be to restore the entire snapshot. This may or may not be good as it would roll back all changes on the entire disk. Not good if your OS or other important files are located there. If this partition only contains the files you want to delete, recovering the entire snapshot may be easier and quicker.

Instead of moving the files, make symbolic links into the temporary directory. Then if things are OK, delete the files. Or, just make a list of the files somewhere and then delete them.

Couldn't you just build the list of pathnames to delete, write this list out to a file to_be_deleted.log, make sure that file has hit the disk (fsync()), then start doing the deletes. After all the deletes have been done, remove the to_be_deleted.log transaction log.
When you start up the application, it should check for the existence of to_be_deleted.log, and if it's there, replay the deletes in that file (ignoring "does not exist" errors).

The basic answer to your question is "No.". The more complex answer is that this requires support from the filesystem and very few filesystems out there have that kind of support. Apparently NT has a transactional FS which does support this. It's possible that BtrFS for Linux will support this as well.
In the absence of direct support, I think the hardlink, move, remove option is the best you're going to get.

I think the copy-and-then-delete method is pretty much the standard way to do this. Do you know for a fact that you can't tolerate the additional I/O?
I wouldn't count myself an export at file systems, but I would imagine that any implementation for performing a transaction would need to first attempt to perform all of the desired actions, and then it would need to go back and commit those actions. I.E. you can't avoid performing more I/O than doing it non-atomically.

Do you have an abstraction layer (e.g. database) for reaching the files? (If your software goes direct to the filesystem then my proposal does not apply).
If the condition is "right" to delete the files, change the state to "deleted" in your abstraction layer and begin a background job to "really" delete them from the filesystem.
Of course this proposal incurs a certain cost at opening/closing of the files but saves you some I/O on symlink creation etc.

On Windows Vista or newer, Transactional NTFS should do what you need:
HANDLE txn = CreateTransaction(NULL, 0, 0, 0, 0, NULL /* or timeout */, TEXT("Deleting stuff"));
if (txn == INVALID_HANDLE_VALUE) {
/* explode */
}
if (!DeleteFileTransacted(filename, txn)) {
RollbackTransaction(txn); // You saw nothing.
CloseHandle(txn);
die_horribly();
}
if (!CommitTransaction(txn)) {
CloseHandle(txn);
die_horribly();
}
CloseHandle(txn);

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js