cassandra get null data after replace node

cassandra get null data after replace node - replace

I have issue with cassandra version 2.1.9.
After replacing node (https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsReplaceNode.html) we get null data probably until read repair makes data consistent from hints (15-20min).
17 node cluster 150gb+ data on each node, manual repair takes too much time to use it.
We use datastax java driver 3.1.0 for connecting to cluster.
I do not know if it's relevant but we use LeveledCompactionStrategy.
After the replace is done the node is already consistent or the hints need to be loaded?
Any tips how to safely replace node?

Related

how to query all staking rewards on polkadot.js given an access to archive node

so I'm looking at this api docs on polkadot.js https://polkadot.js.org/docs/substrate/storage#staking
but I could not figure which one to use to actually query all the staking rewards given an account ID / publish address.
I was thinking I would have to loop for each era. but which one returns the staking rewards. so than I can calculate a total overtime? thank you very much !

In general, the node isn't used for querying historical state. Instead you very likely want to use an indexer service that generates data that is much easier to get queries on. There are a few options, but one of the most supported is substrate archive that I would suggest you use.
Alternatively you can look to substrate compatible block explorers to see what they do for this in their source code.

Titan on AWS- Prevent dropping of edges and vertex's

Is there a way to run a gremlin server so that the drop command is prevented? I never actually drop any edges or vertex's so I'd like the added assurance that it can't be done by mistake

You could have some luck developing your own TraversalStrategy and intercept the behavior of the .drop() step, preventing it to actually delete data. However, people could still be able to bypass the Gremlin/TinkerPop API and directly manipulate the graph instance and remove graph elements (Vertex, Edge and Property).
Depending on your use case, you might just want to disable any mutation to the graph, and not just the removal of elements:
At the Titan level, you can use the storage-read.only option which makes the Titan storage backend read only. See Titan v1.0.0 documentation, Ch. 12 - Configuration reference, 12.3.23. storage.
You can also handle this at the TinkerPop level with the ReadOnlyStrategy Traversal strategy. See TinkerPop v3.0.1 documentation on ReadOnlyStrategy.

Berkeley DB Environment issues

So we're using the Berkeley DB, and our API uses the BDB C++ API. We recently added some new indexes on our database. After adding the new indexes, we needed to migrate all the old data to add the new indexes on the old records, and since then whenever we start up the process that writes to the database, we get these warnings:
BDB2058 Warning: Ignoring DB_SET_LOCK_TIMEOUT when joining the environment.
BDB2059 Warning: Ignoring DB_SET_TXN_TIMEOUT when joining the environment.
If I'm understanding those correctly, we now runt he risk of deadlocking since it's 'ignoring' the timeouts we set. I'm also seeing the process hang when trying to write tot he database randomly. The only way to get around it is to restart the process right now. My question is if anyone knows what would cause these warnings, or how I might go about debugging the Environment instantiation to find out? Any help or suggestions would be appreciated.

The timeout's are likely a persistent global attribute of the dbenv environment, not an attribute of each usage instance of a dbenv.
You might try running db_recover on the database to remove the __db.NNN files.
Otherwise you may have multiple processes sharing a dbenv and the warning is indicating that later processes are trying to change attributes that are already set.

Creating a Version Node in Titan

I'm new to graph databases and to Titan. I'm embedding Titan in a Clojure app. When the app starts up, it creates a BerkeleyDB backed Titan store.
I want to know/do 3 things:
Is this database new? If so, create a version node with version 0. Run the migration procedures to bring the "schema" to the newest version.
If not, does it have a version node? If not, throw an exception.
If the database was preexisting and has a version node, run migration procedures to bring the "schema" up to date.
How do I do this in Titan? Is there a best practice for this?
EDIT:
OK, on further review, I think using a hard-coded vertexid makes the most sense. There's a TitanTransaction.containsVertex(long vertexid). Are there any drawbacks to this approach? I guess I don't know how vertexids are allocated and what their reserved ranges are, so this smells dangerous. I'm new to graph DBs, but I think in Neo4j creating a reference node from the root node is recommended. But Titan discourages root node usage because it becomes a supernode. IDK...

1- I don't know if there is a way to see if the database is new through Titan. You could check to see if the directory where BerkeleyDB will be stored exists before you start Titan.
2/3- Probably your best bet would be a hardcoded vertex with an indexed property "version". Do a look up within the (nearly empty) index on "version" at the start and base your logic on those results.
An aside, you might be interested in Titanium[0]. We're gearing up for a big release in the next week or so that should make it much more useful[1].
[0] http://titanium.clojurewerkz.org/
[1] http://blog.clojurewerkz.org/blog/2013/04/17/whats-going-on-with-titanium/

Updating a field in all records in elasticsearch

I'm new to ElasticSearch, so this is probably something quite trivial, but I haven't figured out anything better that fetching everything, processing with a script and updating the registers one by one.
I want to make something like a simple SQL update:
UPDATE RECORD SET SOMEFIELD = SOMEXPRESSION
My intent is to replace the actual bogus data with some data that makes more sense (so the expression is basically randomly choosing from a pool of valid values).

There are a couple of open issues about making possible to update documents by query.
The technical challenge is that lucene (the text search engine library that elasticsearch uses under the hood) segments are read only. You can never modify an existing document. What you need to do is delete the old version of the document (which by the way will only be marked as deleted till a segment merge happens) and index the new one. That's what the existing update api does. Therefore, an update by query might take a long time and lead to issues, that's why it's not released yet. A mechanism that allows to interrupt running queries would be a nice to have too for this case.
But there's the update by query plugin that exposes exactly that feature. Just beware of the potential risks before using it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

cassandra get null data after replace node - replace

Related

how to query all staking rewards on polkadot.js given an access to archive node

Titan on AWS- Prevent dropping of edges and vertex's

Berkeley DB Environment issues

Creating a Version Node in Titan

Updating a field in all records in elasticsearch

Categories

Resources