I have a directory structure of individual databases (around 500) that are accessed via individual connections. Processing queries can be quite slow however. Profiling has given me the hint that the reason is that every connection ( set up via sqlite3_open_v2 ) uses the default vfs which, after enough connections, has 500 entries and every sqlite function that searches through this list takes some time.
Now my question:
Would it be possible to speed up the process by creating an individual vfs for each connection since I never access more than one table from a connection? If yes, how can this be achieved?
Regards
I also contacted SQLite support. The problem is known and a fix will go into a future release. Thanks anyways!
Related
I'm working on retrieving data like Products, Orders eCommerce platforms such as BigCommerce, Shopify, etc., and save it in our own databases. To improve the data retrieval speed from their APIs, we're planning to use the Bluebird library.
Earlier, the data retrieval logic was like retrieving one page at a time. Since we're planning to make concurrent calls "n" number of pages will be retrieved concurrently.
For example, Bicommerce allows us to make up to 3 concurrent calls at a time. So, we need to make the concurrent calls so that we will not retrieve the same page more than once, and in case if a request failed then a request for that page will be resent.
What's the best way to implement this? One idea that strikes my mind is,
One Possible Solution - Keep an index of ongoing requests in the database and update it on the API completion, so we will know which are unsuccessful.
Is there a better way of doing this? Any suggestions/ideas on this would be highly appreciated.
So I'm currently developing a messaging application to learn the process and I'm actually using Redis as a cache and use it with websockets to push real-time messages.
And then, this question popped in my mind:
Is it possible, to use Redis only to run a whole service (like a messaging application for example) ?
NOTE : This imply removing any form of database (we're only keeping strings)
I know you can set-up Redis to be persistent, but is it enough ? Is it robust enough ? Would it be a safe enough move ? Or totally insane ?
What are you thoughts ? I'd really like to know, and if you think it is possible, I'll give it a shot.
Thanks !
A few companies use Redis as their unique or primary database, so it is definitely not insane.
You can develop and run a full service using Redis as a backend, as long as you understand and accept the tradeoffs it implies.
By this I mean:
that you can use a Redis server as a high performance database as long as your whole data can reside in memory. It may imply that you reduce the size of your data, or choose not to store some of them which may be computed by your app on read access or imported from another source;
that if you can't store all of your data in the memory of a single server, you can use a Redis cluster, but it will limit the available Redis features (see implemented subset
that you have to think about the potential data losses when a server crashes, and determine if they are acceptable or not. It may be OK to lose some data if the process which produced them is robust and will create them again when the database restart (by example when the data stored in Redis come from an import process, which will start again from the last imported item). You can also use several Redis instances, with different persistency configuration: one which writes on disk each time a key is modified, avoiding potential data loss, but with much lower performances; and another one to store non critical data, which are written on disk every couple of seconds.
Redis may be used to store structured data, not only strings, using hashes. Each time you would create an index in a relational model, you have to create a data structure in Redis. By example if you want to store Person objects, you create a HASH for each of them, to store their properties, including a unique ID. If you want to be able to get people by city, you create a SET for each city, and you insert the ID of each newly created Person in the corresponding SET. So you will be able to get the list of persons in a given city. It's just an example, you have to define the model and data structures to be used according to your application.
I have implemented data sync using MS Sync framework 2.1 over WCF to sync multiple SQL Express databases with a central SQL server. Syncing is happening every three minutes through a windows service. Recently, we noticed that huge amounts of data is being exchanged over the network (~100 MB every 15 minutes). When I checked using Fiddler, the client calls the service with a GetKnowledge request four times in a session and each response is around 6 MB in size, although there are no changes at all in either database. This does not seem to be normal? How do I optimize the system to reduce such heavy traffic? Please help.
I have defined two scopes with first one having 15 tables all download only. The second one has 3 tables with upload only direction.
The XML response has a very huge number of <range> tags under coreFragments/coreFragment/ranges tag which is the major portion contributing to the response size.
Let me know if any additional information is required.
must be the sync knowledge. do you do lots of deletes? or do you have lots of replicas? try running a metadata cleanup and see if it compacts the sync knowledge.
Creating one to one scopes and re-provisioning fixed the issue. I am not still sure what caused the original issue.
Do you happen to have any join tables and use any ORM. If you do, then this post might help.
https://kumarkrish.wordpress.com/2015/01/07/microsoft-sync-frameworks-heavy-traffic/
Sitecore 6.6
I'm speaking with Sitecore Support about this as well, but thought I'd reach out to the community too.
We have a custom agent that syncs media on the file system with the media library. It's a new agent and we made the mistake of not monitoring the database size. It should be importing about 8 gigs of data, but the database ballooned to 713 GB in a pretty short amount of time. Turns out the "Blobs" table in both "master" and "web" databases is holding pretty much all of this space.
I attempted to use the "Clean Up Databases" tool from the Control Panel. I only selected one of the databases. This ran for 6 hours before it bombed due to consuming all the available locks on the SQL Server:
Exception: System.Data.SqlClient.SqlException
Message: The instance of the SQL Server Database Engine cannot obtain a LOCK
resource at this time. Rerun your statement when there are fewer active users.
Ask the database administrator to check the lock and memory configuration for
this instance, or to check for long-running transactions.
It then rolled everything back. Note: I increased the SQL and DataProvider timeouts to infinity.
Anyone else deal with something like this? It would be good if I could 'clean up' the databases in smaller chunks to avoid overwhelming the SQL Server.
Thanks!
Thanks for the responses, guys.
I also spoke with support and they were able to provide a SQL script that will clean the Blobs table:
DECLARE #UsableBlobs table(
ID uniqueidentifier
);
INSERT INTO
#UsableBlobs
select convert(uniqueidentifier,[Value]) as EmpID from [Fields]
where [Value] != ''
and (FieldId='{40E50ED9-BA07-4702-992E-A912738D32DC}' or FieldId='{DBBE7D99-1388-4357-BB34-AD71EDF18ED3}')
delete top (1000) from [Blobs]
where [BlobId] not in (select * from #UsableBlobs)
The only change I made to the script was to add the "top (1000)" so that it deleted in smaller chunks. I eventually upped that number to 200,000 and it would run for about an hour at a time.
Regarding cause, we're not quite sure yet. We believe our custom agent was running too frequently, causing the inserts to stack on top of each other.
Also note that there was a Sitecore update that apparently addressed a problem with the Blobs table getting out of control. The update was 6.6, Update 3.
I faced such a problem previously, and we had contacted Sitecore Support.
They gave us a Sitecore Support DLL, and suggessted a Web.Config change for Dataprovider -- from main type="Sitecore.Data.$(database).$(database)DataProvider, Sitecore.Kernel" to the new one.
The reason I am posting on this question of yours is that because the most of the time taken for us was in Cleaning up Blobs and and they gave us this DLL to increase Cleanup Blobs speed. So I think it might help you too.
Hence, I would like to suggest if you could please request Sitecore Support in this case, I am sure you might get the best solution to solve your case.
Hope this helps you!
Regards,
Varun Shringarpure
If you have a staging environment I would recommend taking a copy of database and try to shrink the database. Part of the database size might also be related to the transaction log.
if you have a DBA please have him (her) involved.
I've got a big users table that I'm caching in a C++ web service (BitTorrent tracker). The entire table is refetched every 5 minutes. This has some drawbacks, like data being up to 5 minutes old and refetching lots of data that hasn't changed.
Is there a simple way to fetch just the changes since last time?
Ideally I'd not have to change the queries that update the data.
Two immediate possibilities come to me:
MySQL Query Cache
Memcached (or similar) Caching Layer
I would try the query cache first as it likely is far easier to setup. Do some basic tests/benchmarks and see if it fits your needs. Memcached will likely be very similar to your existing cache but, as you mention, you'll to find a better way of invalidating stale cache entries (something that the query cache does for you).