Clean an in-memory database - in-memory-database

I'm using Spring Batch with an HSQLDB in-memory database for Spring Batch metadata. My application needs to run continuously, so this database becomes problematic for my memory. I need a way to clean it periodically. I already thought about using a stored procedure that deletes datas according to a condition (old datas). This procedure is called periodically by a dedicated thread using Spring StoredProcedure class.
If you have alternative solutions, i'm open.
Thanks

Finally, I used a background task scheduled to run every X minutes. This task uses JDBC to clean the database.

Related

Trigger ALL cloud run instances at once to do async job (rebuild cache)

I have a cloud run with multiple instances running or idle.
I want all the instances to do an async job periodically (to rebuild a cache).
Example of async job:
Periodically check if there is a new version of a JSON file on the object storage bucket
Do some processing on the JSON and store it as a variable (cache) that will be used by the API endpoints. So I do not need to contact database on each request.
Options on how to do it:
setInterval() to call rebuildCacheIfNeeded(). Cloud run cannot do async tasks in the background (they are assigned CPU resources only while handling a request).
webcron will not work. Only one instance would handle the request and the cache would be rebuild only on that instance.
Pub / sub on new file added to the bucket. Can pub/sub be setup in the way that all instances are awaken and all will rebuild the cache? If yes, this would be the best solution.
Call rebuildCacheIfNeeded() on each request and keep the http connection until the cache is rebuild. I would like to avoid this for obvious reasons.
Kill all instances of cloud run when new file is added to the bucket. Cloud run should be stateless, so this solution is the only one that complies with statelessness rule. But how kill all instances without running whole redeploy?
Any other possible solutions that I am missing?
Thank you
Please do not suggest "Just use a database"... The cached data is small and I would like to avoid a database latencies and possible point of failure.
You are trying to use side-effects of a service that is neither predictable nor manageable. That will lead to problems today and possibly failure when features are updated or new features are released. Design your application to use documented features.
There is no documented method to achieve your objective.

Cron in heroku but update files

I'm currently developing a screener webapp using heroku. The data is saved in csv (because its is a screener, it takes a lot of time to run if using api). Right now, I updated manually by updating and saving the data in my local then push it to heroku.
Is there a way to cron so that it updates the csv data once a day?
Heroku dynos have an ephemeral filesystem. Any changes you make will be lost when the dynos restart. This happens frequently (at least once per day).
For this reason, file-based data stores are not recommended on Heroku. If you move your data into a client-server database (Heroku's own Postgres service would be a good fit or you can pick something else), you could easily updated it whenever you like.
The Heroku Scheduler can be used to run tasks daily. Write a script to update your database table and schedule it to run whenever you like.

Running a background task involving a third-party service in Django

The use-case is this: we need to pull in data from a third-party service and update the database with fresh records, every week. The different ways I have been able to explore have been
either creating a custom django-admin command
or running a background task using Celery (and probably ELK for logging)
I just want to know which way is more feasible and simpler? And if there's another way that I can explore. What I want is monitoring the task for the first few runs then just relying on the logs.

Writing to Cloud SQL via Dataflow pipeline is very slow

I managed to connect to cloud sql via JDBCIO
DataSourceConfiguration.create("com.mysql.jdbc.Driver","jdbc:mysql://google/?cloudSqlInstance=::&socketFactory=com.google.cloud.sql.mysql.SocketFactory&user=&password=")
This works, however, the batch writes takes between 2-5 minutes for 1000 records, which is terrible. i have tried different networks to see if this was related, and the results were consistent.
Anyone have any ideas?
Where are you initializing this connection? If you are doing this inside of your DoFn it will create latency as the socket is built up and torn down on each bundle.
Have a look at DoFn.Setup this provides a clean way to init resources that will be persisted across bundle calls.

How to use SQLite :memory: database in webpy for unittesting

I want to use a SQLite in memory (":memory:") DB for the tests in my webapp. I'm using nosetests for the tests, and webpy as framework.
I want to populate the DB in the setup() function, and then run all my tests. My problem is that webpy closes all the open DB connections after each request, and the SQLite :memory: DB only lasts until you close the connection, so only the first test is actually run correctly and all the others fail.
My choices are either to run the tests on a disk backed DB, or to recreate the entire DB in memory at the beginning of each individual test.
Do you know how can I prevent webpy from closing DB connections after each request?
Can you think of any other way to get an in memory SQLite DB that lasts for more than one request using webpy?
Maybe you could run the tests on a DB stored on the disk, but using a RAM disk. In Windows, you can install a driver to set up a RAM disk (some instructions here). In Linux, I believe you want to set up tmpfs.
A ram disk will act exactly like a hard disk, but will operate completely from memory, so that you will lose some of the overhead of loading files to/from the hard disk.
Untested:
class NoCloseDB(web.db.SqliteDB):
def _unload_context(self):
pass # this keeps the _ctx.db attribute alive
web.db.register_database('sqlite',NoCloseDB) # overrides the previous registration
Notice that this can only work if you run web.py in a way that uses only one operating system process. If a request is dispatched across multiple processes, each one will still get its own database.