Creating a Version Node in Titan - clojure

I'm new to graph databases and to Titan. I'm embedding Titan in a Clojure app. When the app starts up, it creates a BerkeleyDB backed Titan store.
I want to know/do 3 things:
Is this database new? If so, create a version node with version 0. Run the migration procedures to bring the "schema" to the newest version.
If not, does it have a version node? If not, throw an exception.
If the database was preexisting and has a version node, run migration procedures to bring the "schema" up to date.
How do I do this in Titan? Is there a best practice for this?
EDIT:
OK, on further review, I think using a hard-coded vertexid makes the most sense. There's a TitanTransaction.containsVertex(long vertexid). Are there any drawbacks to this approach? I guess I don't know how vertexids are allocated and what their reserved ranges are, so this smells dangerous. I'm new to graph DBs, but I think in Neo4j creating a reference node from the root node is recommended. But Titan discourages root node usage because it becomes a supernode. IDK...

1- I don't know if there is a way to see if the database is new through Titan. You could check to see if the directory where BerkeleyDB will be stored exists before you start Titan.
2/3- Probably your best bet would be a hardcoded vertex with an indexed property "version". Do a look up within the (nearly empty) index on "version" at the start and base your logic on those results.
An aside, you might be interested in Titanium[0]. We're gearing up for a big release in the next week or so that should make it much more useful[1].
[0] http://titanium.clojurewerkz.org/
[1] http://blog.clojurewerkz.org/blog/2013/04/17/whats-going-on-with-titanium/

Related

Efficiently deleting the older documents in Elasticsearch

I'm storing application logs in Elasticsearch. I want to delete logs older than N months. The app uses index name my-log-index to write the logs. What will be the most efficient way? Some of the ways I found but not sure what will be the best way:
Use Delete by query API. Run this periodically.
Use alias instead of index name like my-log-alias and rollover to new index using the Rollover API after every N months. Also delete old indices periodically.
First approach uses expensive delete. Also, it may only soft delete. Second one looks more efficient. Which one is better or is there a better way?
Elasticsearch version: 6.2.3 (I know it is EOL but can't upgrade right now)
Rollover with ILM is the way to go
Lifecycle-management
With auto deletion

cassandra get null data after replace node

I have issue with cassandra version 2.1.9.
After replacing node (https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsReplaceNode.html) we get null data probably until read repair makes data consistent from hints (15-20min).
17 node cluster 150gb+ data on each node, manual repair takes too much time to use it.
We use datastax java driver 3.1.0 for connecting to cluster.
I do not know if it's relevant but we use LeveledCompactionStrategy.
After the replace is done the node is already consistent or the hints need to be loaded?
Any tips how to safely replace node?

Titan on AWS- Prevent dropping of edges and vertex's

Is there a way to run a gremlin server so that the drop command is prevented? I never actually drop any edges or vertex's so I'd like the added assurance that it can't be done by mistake
You could have some luck developing your own TraversalStrategy and intercept the behavior of the .drop() step, preventing it to actually delete data. However, people could still be able to bypass the Gremlin/TinkerPop API and directly manipulate the graph instance and remove graph elements (Vertex, Edge and Property).
Depending on your use case, you might just want to disable any mutation to the graph, and not just the removal of elements:
At the Titan level, you can use the storage-read.only option which makes the Titan storage backend read only. See Titan v1.0.0 documentation, Ch. 12 - Configuration reference, 12.3.23. storage.
You can also handle this at the TinkerPop level with the ReadOnlyStrategy Traversal strategy. See TinkerPop v3.0.1 documentation on ReadOnlyStrategy.

Storing, tracking and updating an SQLite database version in C++ application

I have an application written in C++ which uses an SQLite database to store information. I need a way of assigning a version number to the database. By this I mean, I need to be able to assign a version number to the state of the database, and if a new 'state' (version) is available, then I need to update the current database state to match the state of the updated version.
I am wondering whether it would be good practice to store the information required for this to happen in a table. I would need the version number, and then some way of storing the tables and their columns related to each version number. This would allow me to make comparisons etc.
I realise that this question Set a version to a SQLite database file is related, however it doesn't quite answer my question, as I am unsure if my approach is correct, and if so, how I go about achieving this.
All help is much appreciated.
Use PRAGMA user_version to read and store an integer value in the database file.
When the version in your code and database file are the same, do nothing. When they are different, upgrade/downgrade accordingly and update the version number.

Updating a field in all records in elasticsearch

I'm new to ElasticSearch, so this is probably something quite trivial, but I haven't figured out anything better that fetching everything, processing with a script and updating the registers one by one.
I want to make something like a simple SQL update:
UPDATE RECORD SET SOMEFIELD = SOMEXPRESSION
My intent is to replace the actual bogus data with some data that makes more sense (so the expression is basically randomly choosing from a pool of valid values).
There are a couple of open issues about making possible to update documents by query.
The technical challenge is that lucene (the text search engine library that elasticsearch uses under the hood) segments are read only. You can never modify an existing document. What you need to do is delete the old version of the document (which by the way will only be marked as deleted till a segment merge happens) and index the new one. That's what the existing update api does. Therefore, an update by query might take a long time and lead to issues, that's why it's not released yet. A mechanism that allows to interrupt running queries would be a nice to have too for this case.
But there's the update by query plugin that exposes exactly that feature. Just beware of the potential risks before using it.