Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a memory problem. Since 12 Years we use in our software (C++, 32Bit) own made tables to store data.
The Tables are stored on disk. When we want to use data of it, they are loaded in Memory and stay there.
Some tables are very big, they have more than 2 Million rows. When we load them into memory
they up to 400MB. Due to 32bit and memory fragmentation we can actually load maximum 2 such tables int
o memory before other operations did't get enough of memory.
The software is installed on more the 3000 clients. The OS on the clients is win7-win10 (32Bit and 64Bit) and some insignificant XP and Vista Systems
So we discussed a good (fast, propper ) way to get out of this problem. Here are some ideas:
switching to 64Bit
switching from our own tables to sqlite or ejdb
opening every table in an own process and comunicate with the process to get data of the table
extend our own tables that thay can read directly from disk
All Ideas are more or less propper, practicable and fast (implementing speed and execution speed). The
advantages and disadvantages of every idea ist very complicated and will go beyond the scope of this.
Has someone another good idea to solve this problem?
[update]
I will try to explain this from a different angle. First the software is
installed on a wide base of different windows OS. From XP to W10 on all
kind of computers. The software can be used on single desktops as on terminal
servers with a central LAN data pool (only a folder on a file-server).
It collects articles in a special way. So there are a lot of informations about
all kind of article data and also price informations from different vendors.
So there is a big need for hiding/crypt this information to outsiders.
The current database is like an in-memory table of strings, doubles or long data values.
Each row can contain a different set of columns. But most of the tables are like
a structured database table. The whole table data is crypted and zipped in one block.
Once loaded the whole data is expanded in memory where we can access this data very fast.
If an index is needed, we do this with a std::map inside the software.
We tried to compare our current table data against SQLite and EJDB. A file which contains
about half a million simple article data takes 3.5 MB in our data, 28MB in SQLite and 100 MB (in
several files) on EJDB. SQLite and EJDB shows the data in plain strings ore simple binary parts
of as example "double". So with a good editor you can match an article number with a price very
easy.
The software uses about 40 DLLs with several dependencys of third-party libs. So switching from
32 to 64 BIT is a challenge. Also does it not solve our problems with 32Bit terminal servers by our
client installations.
Going to a real database (like MySQL, MongoDB etc.) is a big challenge too as we freqently update our
data every month on the wide base of computers. There is not allway a internet connection to use a
real server client modell.
So what can we do?
Use SQLite or EJDB or something else and crypt our data in each field ?
Reprogramm our database so it uses smaller chunks of data which leafs on this and loaded the
chunks on demand as they were neaded ?
Only the indexes are in memory. Manage the disk-data maybe with a B-Tree strategie.
Time is short. So reinventing the wheel does not help. What would you do or use in such
a situation ?
400MB. Due to 32bit and memory fragmentation we can actually load maximum 2 such tables
Aren't you by any chance "loading" this tables by allocating a large chunk of memory and reading table content from disc into it? If so then you should switch to loading tables using smaller memory-mapped blocks (probably 4Mb each which corresponds to large memory page size). This way you should be able to utilize most of 3.5 Gb address space available for 32-bit program.
Related
Okay I know this question feels weird, but let me present my problem. I am building Qt based application with SQLite as my database. I noticed a few things. Whenever you are performing operations like manipulating row one by one directly on the sqlite file, it seems slow. That is because it is doing I/O operations on a file which is stored in hard drive. But when I use SSD instead of HDD, the speed is considerably improved, that is because SSD has high IO speed. But if I load a table into QSqlTableModel, we can make all the changes and save it the speed is good. That is because in one query the data is fetched from sqlite file and stored in RAM memory. And So the IO operations are less. So it got me thinking is it possible to save the sqlite file in RAM when my application launches, perform all my opeartions and then when user chooses to close, at that instant i can save the file to HDD? One might think why don't I just use qsqltablemodel itself, but there are some cases for me which involves creating tables and deleting tables, which qt doesnt support out of box, we need to execute query for that. So if anyone can point me if there's a way to achieve this in Qt, that would be great!
I am very new to database and I am trying to implement a offline map viewer. What would be the efficiency of the qsqldatabase?
To make it extreme, for example, is it possible to download all satellite image of all the detail levels of US from the google's map server and store it in a local sqlite database and still perform real time query based on my current gps location?
The Qt Database driver for SQLite uses SQLite internally (surprise!). So the question is more like: Is SQLite the right database to use? My answer: I would not use it to store geographical data, consider to look for a database which is optimized for this task.
If this is not an option; SQLite is really efficient. First check if your data is within the limits. Do not forget to create indexes and analyze the database. Then it should be able to handle your task. Here I assume you just want to get an image by its geographical position (but other solutions can be a lot faster because your data is sortable — if I remember correctly SQLite is not optimized for that).
As you will store large blobs, you may want to have a look at the Internal Versus External BLOBs in SQLite document. Maybe this gives you the answer already.
I am working on a project which needs to deal with large seismic data of SEGY format (from several GB to TB). This data represents the 3D underground structure.
Data structure is like:
1st tract, 2,3,5,3,5,....,6
2nd tract, 5,6,5,3,2,....,3
3rd tract, 7,4,5,3,1,....,8
...
What I want to ask is, in order to read and deal with the data fast, do I have to convert the data into another form? Or it's better to read from the original SEGY file? And is there any existing C package to do that?
If you need to access it multiple times and
if you need to access it randomly and
if you need to access it fast
then load it to a database once.
Do not reinvent the wheel.
When dealing of data of that size, you may not want to convert it into another form unless you have to - though some software does do just that. I found a list of free geophysics software on Wikipedia that look promising; many are open source and read/write SEGY files.
Since you are a newbie to programming, you may want to consider if the Python library segpy suits your needs rather than a C/C++ option.
Several GB is rathe medium, if we are toking about poststack.
You may use segy and convert on the fly, you may invent your own format. It depends whot you needed to do. Without changing segy format it's enough to createing indexes to traces. If segy is saved as inlines - it's faster access throug inlines, although crossline access is not very bad.
If it is 3d seismic, the best way to have the same quick access to all inlines/crosslines is to have own format - based od beans, e.g 8x8 traces - loading all beans and selecting tarces access time may be very quick - 2-3 secends. Or you may use SSD disk, or 2,5x RAM as your SEGY.
To quickly access timeslices you have 2 ways - 3D beans or second file stored as timeslices (the quickes way). I did same kind of that 10 years ago - access time to 12 GB SEGY was acceptable - 2-3 seconds in all 3 directions.
SEGY in database? Wow ... ;)
The answer depends upon the type of data you need to extract from the SEG-Y file.
If you need to extract only the headers (Text header, Binary header, Extended Textual File headers and Trace headers) then they can be easily extracted from the SEG-Y file by opening the file as binary and extracting relevant information from the respective locations as mentioned in the data exchange formats (rev2). The extraction might depend upon the type of data (Post-stack or Pre-stack). Also some headers might require conversions from one format to another (e.g Text Headers are mostly encoded in EBCDIC format). The complete details about the byte locations and encoding formats can be read from the above documentation
The extraction of trace data is a bit tricky and depends upon various factors like the encoding, whether the no. of trace samples is mentioned in the trace headers, etc. A careful reading of the documentation and getting to know about the type of SEG data you are working on will surely make this task a lot easier.
Since you are working with the extracted data, I would recommend to use already existing libraries (segpy: one of the best python library I came across). There are also numerous free available SEG-Y readers, a very nice list has already been mentioned by Daniel Waechter; you can choose any one of them that suits your requirements and the type file format supported.
I recently tried to do something same using C++ (Although it has only been tested on post-stack data). The project can be found here.
I have a server-client application where clients are able to edit data in a file stored on the server side. The problem is that the file is too large in order to load it into the memory (8gb+). There could be around 50 string replacements per second invoked by the connected clients. So copying the whole file and replacing the specified string with the new one is out of question.
I was thinking about saving all changes in a cache on the server side and perform all the replacements after reaching a certain amount of data. After reaching that amount of data I would perform the update by copying the file in small chunks and replace the specified parts.
This is the only idea I came up with but I was wondering if there might be another way or what problems I could encounter with this method.
When you have more than 8GB of data which is edited by many users simultaneously, you are far beyond what can be handled with a flatfile.
You seriously need to move this data to a database. Regarding your comment that "the file content is no fit for a database": sorry, but I don't believe you. Especially regarding your remark that "many people can edit it" - that's one more reason to use a database. On a filesystem, only one user at a time can have write access to a file. But a database allows concurrent write access for multiple users.
We could help you to come up with a database schema, when you open a new question telling us how your data is structured exactly and what your use-cases are.
You could use some form of indexing on your data (in a separate file) to allow quick access to the relevant parts of this gigantic file (we've been doing this with large files successfully (~200-400gb), but as Phillipp mentioned you should move that data to a database, especially for the read/write access. Some frameworks (like OSG) already come with a database back-end for 3d terrain data, so you can peek there, how they do it.
So I have an app which needs to read a lot of small data (e.g an app that processes lots of customer records).
Normally, for server systems that do this kind of stuff, you can use a database which handles a) caching most recently used data b) indexing them, and c) storing them for efficient retrieval from the file system.
Right now, for my app, I just have a std::map<> that maps data id into the data themselves which are pretty small (like 120 bytes each, or something close to that). The data id themselves are just mapped into the file system directly.
However, this kind of system does not handle unloading of data (if the memory starts to run out) or efficient storage of the data (granted, iOS uses flash storage, and so the OS should handle the caching), but yeah.
Are there libraries or something like that that can handle this kind of stuff? Or would I have to build my own caching system from scratch? Not too bothered with indexing, only a) caching/unloading and c) storing for efficient retrieval. I'm feeling wary on storing thousands (or potentially hundreds of thousands) of files onto the file system.
There is SQLite that can be used in iOS or use CoreData which is already available on the device
Why don't you trust property list files and NSDictionary? They're exactly for this purpose.