Knowing the status of a transaction in BigchainDB - bigchaindb

I sent a transaction to my BigchainDB instance.
bdb = BigchainDB(url_of_instance_with_port)
If I run
bdb.transactions.retrieve(txid), I get the following:
{'inputs': [{'owners_before': ['31LBKyjyd6GEA7R88mnHwthMf2ygtyB8xGDv1dZHfkqu'],
'fulfills': None,
'fulfillment': 'pGSAIB3NDv0SG46f7PYdtgzMezpc8Kq8F1nR0Y1Brwe5zbkQgUDWwwaRXj4yx9Z49f76L33feu9iUhNYUs0B3oPyLllUDZrHopIoHRt0FeBLd_5Yt-m1riyc3-0jYNdM0VAyGwAG'}],
'outputs': [{'public_keys': ['31LBKyjyd6GEA7R88mnHwthMf2ygtyB8xGDv1dZHfkqu'],
'condition': {'details': {'type': 'ed25519-sha-256',
'public_key': '31LBKyjyd6GEA7R88mnHwthMf2ygtyB8xGDv1dZHfkqu'},
'uri': 'ni:///sha-256;cfREMIEIaEdXKEfokspCcfiu9jSYvlfeq6zwdj2x3Fw?fpt=ed25519-sha-256&cost=131072'},
'amount': '1'}],
'operation': 'CREATE',
'metadata': {...},
'asset': {...},
'version': '2.0',
'id': '61b7f06a106d7db4e5cbaaf23f160d934c7e6e5ef6ab1df7c1eefca1def6ea18'}
However, bdb.blocks.get(txid) times out and the block states remain the same i.e. even the block zero doesn't contain any transactions. Why? How may I debug further?
That brings up another question I have, what is the notion of mining in BigchainDB? When is a block created?

If the bdb.blocks.get(txid) call times out, then that might be a bug in the Python driver, unless the BigchainDB network was down when you made the call, in which case of course it timed out. Can you try again, i.e. post a new transaction and then do bdb.blocks.get(txid). If it times out again, then maybe try the same HTTP request with your web browser, i.e. put something like this in your address bar:
http://your-hostname:port/api/v1/blocks?transaction_id=blahblahblah
and press Enter and see if it works. If not, then something is wrong with the node. If it does work, then something is wrong with the Python driver.
If something is wrong with the Python driver, please file a new issue at https://github.com/bigchaindb/bigchaindb-driver/issues/new
About Mining
BigchainDB nodes don't do anything like the "mining" that some Bitcoin or Ethereum nodes do. The nodes in a BigchainDB network achieve Byzantine fault-tolerant (BFT) consensus in a different way, using the Tendermint consensus mechanism. The theory and practice of BFT consensus go back to the late 1970s (although the BFT terminology only arose in the early 1980s), so it's not a new idea; Tendermint is just one of the newer BFT consensus algorithms/protocols.

Related

DDD - Concurrency with quantity

Hi everyone,
I'm a little bit lost with a problem thinking in ddd way.
Imagine you have an application to sell concert ticket. So you have an entity which is called Concert with the quantity number and a method to buy a ticket.
class Concert {
constructor(
public id: string,
public name: string,
public ticketQuantity: number,
) {}
buyTicket() {
this.ticketQuantity = this.ticketQuantity - 1;
}
}
The command looks like this:
async execute(command: BookConcertCommand): Promise<void> {
const concert = await this.concertRepository.findById(command.concertId);
concert.buyTicket();
await this.concertRepository.save(concert);
}
Imagine, your application has to carry a lot of users and 1000 users try to buy a ticket at the same when the ticketQuantity is 500.
How can you ensure the invariant of the quantity can't be lower than 0 ?
How can you deal with concurrency here because even if two users try to buy a ticket at the same time the data can be false ?
What are the patterns we can use to ensure consistency and concurrency ?
Optimistic or pessismistic concurrency can't be a solution because it will frustrate a lot of users and we try to put all our logic domain into our domain so we can't put any logic inside sql/db or use a transactional script approach.
How can you ensure the invariant of the quantity can't be lower than 0
You include logic in your domain model that only assigns a ticket if at least one unassigned ticket is available.
You include locking (either optimistic or pessimistic) to ensure "first writer wins" -- the loser(s) in a data race should abort or retry.
If your book of record was just data in memory, then you would ensure that all attempts to buy tickets for concert 12345 must first acquire the same lock. In effect, you serialize the requests so that the business logic is running one at a time only.
If your book of record was a relational database, then within the context of each transaction you might perform a "select for update" to get a copy of the current data, and perform the update in the same transaction. The database will raise it's flavor of concurrent modification exception to the connections that lose the race.
Alternatively, you use something like the semantics of a conditional-write / compare and swap: you get an unlocked copy of the concert from the book of record, make your changes, then send a "update this record if it still looks like the unlocked copy" message, if you get the response announcing you've won the race, congratulations - you're done. If not, you retry or fail.
Optimistic or pessismistic concurrency can't be a solution because it will frustrate a lot of users
Of course it can
If the concert is overbooked, they are going to be frustrated anyway
The business logic doesn't have to run synchronously with the request - it might be acceptable to write down that they want a ticket, and then contact them asynchronously to let them know that a ticket has been assigned to them
It may be helpful to review some of Udi Dahan's writing on collaborative and competitive domains; for instance, this piece from 2011.
In a collaborative domain, an inherent property of the domain is that multiple actors operate in parallel on the same set of data. A reservation system for concerts would be a good example of a collaborative domain – everyone wants the “good seats” (although it might be better call that competitive rather than collaborative, it is effectively the same principle).
You might be following these steps:
1- ReserveRequested -> ReserveRequestAccepted -> TicketReserved
2- ReserveRequested -> ReserveRequestRejected
When somebody clicks on the buy ticket button, you should create a reserve request entity, and then you can process the reservation in the background and by a queue system.
On the user side, you can return a unique reserve request-id to check the result of the process. So the frontend developer should fetch the result of process periodically until it succeeds or fails.

Lambda SQL Server RDS Connection Leak

Problem
I'm using mssql v6.2.0 in a Lambda that is invoked frequently (consistently ~25 concurrent invocations under standard load).
I seem to be having trouble with connection pooling or something because I keep having tons of open DB connections which overwhelm my database (SQL Server on RDS) causing the Lambdas to just time out waiting for query results.
I have read the docs, various similar questions, Github issues, etc. but nothing has worked for this particular issue.
Things I've Learned Already
I did learn that pooling is possible across invocations due to the fact that variables outside the handler function are shared across invocations in the same container. This makes me think I should see just a few connections for each container running my Lambda, but I don't know how many that is so it's hard to verify. Bottom line is that pooling should keep me from having tons and tons of open connections, so something isn't working right.
There are several different ways to use mssql and I have tried several of them. Notably I've tried specifying max pool size with both large and small values but got the same results.
AWS recommends that you check to see if there's already a pool before trying to create a new one. I tried that to no avail. It was something like pool = pool || await createPool()
I know that RDS Proxy exists to help with situations like this, but it appears it isn't offered (at this time) for SQL Server instances.
I do have the ability to slow down my data a bit, but this has a slight impact on the performance of the product as a whole, so I don't want to do that just to avoid solving a DB connections issue.
Left unchecked, I saw as many as 700 connections to the DB at once, leading me to think there's a leak of some kind and it's maybe not just a reasonable result of high usage.
I didn't find a way to shorten the TTL for connections on the SQL Server side as recommended by this re:Invent slide. Perhaps that is part of the answer?
Code
'use strict';
/* Dependencies */
const sql = require('mssql');
const fs = require('fs').promises;
const path = require('path');
const AWS = require('aws-sdk');
const GeoJSON = require('geojson');
AWS.config.update({ region: 'us-east-1' });
var iotdata = new AWS.IotData({ endpoint: process.env['IotEndpoint'] });
/* Export */
exports.handler = async function (event) {
let myVal= event.Records[0].Sns.Message;
// Gather prerequisites in parallel
let [
query1,
query2,
pool
] = await Promise.all([
fs.readFile(path.join(__dirname, 'query1.sql'), 'utf8'),
fs.readFile(path.join(__dirname, 'query2.sql'), 'utf8'),
sql.connect(process.env['connectionString'])
]);
// Query DB for updated data
let results = await pool.request()
.input('MyCol', sql.TYPES.VarChar, myVal)
.query(query1);
// Prepare IoT Core message
let params = {
topic: `${process.env['MyTopic']}/${results.recordset[0].TopicName}`,
payload: convertToGeoJsonString(results.recordset),
qos: 0
};
// Publish results to MQTT topic
try {
await iotdata.publish(params).promise();
console.log(`Successfully published update for ${myVal}`);
//Query 2
await pool.request()
.input('MyCol1', sql.TYPES.Float, results.recordset[0]['Foo'])
.input('MyCol2', sql.TYPES.Float, results.recordset[0]['Bar'])
.input('MyCol3', sql.TYPES.VarChar, results.recordset[0]['Baz'])
.query(query2);
} catch (err) {
console.log(err);
}
};
/**
* Convert query results to GeoJSON for API response
* #param {Array|Object} data - The query results
*/
function convertToGeoJsonString(data) {
let result = GeoJSON.parse(data, { Point: ['Latitude', 'Longitude']});
return JSON.stringify(result);
}
Question
Please help me understand why I'm getting runaway connections and how to fix it. For bonus points: what's the ideal strategy for handling high DB concurrency on Lambda?
Ultimately this service needs to handle several times the current load -- I realize this becomes a quite intense load. I'm open to options like read replicas or other read-performance-boosting measures as long as they're compatible with SQL Server, and they're not just a cop out for writing proper DB access code.
Please let me know if I can improve the question. I know there are similar ones out there but I have read/tried a lot of them and didn't find them to help. Thanks in advance!
Related Material
https://forums.aws.amazon.com/thread.jspa?messageID=678029 (old, but similar)
https://www.slideshare.net/AmazonWebServices/best-practices-for-using-aws-lambda-with-rdsrdbms-solutions-srv320 re:Invent slide deck
https://www.jeremydaly.com/reuse-database-connections-aws-lambda/ Relevant info but for MySQL instead of SQL Server
Answer
I finally found the answer after 4 days of effort. All I needed to do was scale up the DB. The code is actually fine as-is.
I went from db.t2.micro to db.t3.small (or 1 vCPU, 1GB RAM to 2 vCPU and 2GB RAM) at a net cost of roughly $15/mo.
Theory
In my case, the DB probably couldn't handle the processing (which involves several geographic calculations) for all my invocations at once. I did see CPU go up, but I assumed that was a result of the high open connections. When the queries slowed down, the concurrent invocations pile up as Lambdas start to wait for results, finally causing them to time out and not close their connections properly.
Comparisions:
db.t2.micro:
200+ DB connections (goes up continuously if you leave it running)
50+ concurrent invocations
5000+ ms Lambda duration when things slow down, ~300ms under no load
db.t3.small:
25-35 DB connections (constantly)
~5 concurrent invocations
~33 ms Lambda duration <-- ten times faster!
CloudWatch Dashboard
Summary
I think this issue was confusing to me because it didn't smell like a capacity issue. Almost every time I've dealt with high DB connections in the past, it has been a code error. Having tried options there, I thought it was "some magical gotcha of serverless" that I needed to understand. In the end it was as simple as changing DB tiers. My takeaway is that DB capacity issues can manifest themselves in ways other than high CPU and memory usage, and that high connections may be a result of something besides a code bug.
Update (4 months in)
This continues to work very well. I'm impressed that doubling the DB resources seems to have given > 2x performance. Now, when due to load (or a temporary bug during development), the db connections get really high (even over 1k) the DB handles it. I'm not seeing any issues at all with db connections timing out or the database getting bogged down due to load. Since the original time of writing I've added several CPU-intensive queries to support reporting workloads, and it continues to handle all these loads simultaneously.
We've also deployed this setup to production for one customer since the time of writing and it handles that workload without issue.
So a connection pool is no good on Lambda at all what you can do is reuse connections.
Trouble is every Lambda execution opens a pool it'll just flood the DB like you're getting, you want 1 connection per lambda container, you can use a db class like so (this is rough but lemmy know if you've got questions)
export default class MySQL {
constructor() {
this.connection = null
}
async getConnection() {
if (this.connection === null || this.connection.state === 'disconnected') {
return this.createConnection()
}
return this.connection
}
async createConnection() {
this.connection = await mysql.createConnection({
host: process.env.dbHost,
user: process.env.dbUser,
password: process.env.dbPassword,
database: process.env.database,
})
return this.connection
}
async query(sql, params) {
await this.getConnection()
let err
let rows
[err, rows] = await to(this.connection.query(sql, params))
if (err) {
console.log(err)
return false
}
return rows
}
}
function to(promise) {
return promise.then((data) => {
return [null, data]
}).catch(err => [err])
}
What you need to understand is A lambda execution is a little virtual machine that does a task and then stops, it does sit there for a while and if anyone else needs it then it gets reused along with the container and connection for a single task there's never multiple connections to a single lambda.
Hope this helps let me know if ya need any more detail! Oh and welcome to stackoverflow, that's a well-constructed question.

Should I have concern about datastoreRpcErrors?

When I run dataflow jobs that writes to google cloud datastore, sometime I see the metrics show that I had one or two datastoreRpcErrors:
Since these datastore writes usually contain a batch of keys, I am wondering in the situation of RpcError, if some retry will happen automatically. If not, what would be a good way to handle these cases?
tl;dr: By default datastoreRpcErrors will use 5 retries automatically.
I dig into the code of datastoreio in beam python sdk. It looks like the final entity mutations are flushed in batch via DatastoreWriteFn().
# Flush the current batch of mutations to Cloud Datastore.
_, latency_ms = helper.write_mutations(
self._datastore, self._project, self._mutations,
self._throttler, self._update_rpc_stats,
throttle_delay=_Mutate._WRITE_BATCH_TARGET_LATENCY_MS/1000)
The RPCError is caught by this block of code in write_mutations in the helper; and there is a decorator #retry.with_exponential_backoff for commit method; and the default number of retry is set to 5; retry_on_rpc_error defines the concrete RPCError and SocketError reasons to trigger retry.
for mutation in mutations:
commit_request.mutations.add().CopyFrom(mutation)
#retry.with_exponential_backoff(num_retries=5,
retry_filter=retry_on_rpc_error)
def commit(request):
# Client-side throttling.
while throttler.throttle_request(time.time()*1000):
try:
response = datastore.commit(request)
...
except (RPCError, SocketError):
if rpc_stats_callback:
rpc_stats_callback(errors=1)
raise
...
I think you should first of all determine which kind of error occurred in order to see what are your options.
However, in the official Datastore documentation, there is a list of all the possible errors and their error codes . Fortunately, they come with recommended actions for each.
My advice is that your implement their recommendations and see for alternatives if they are not effective for you

Can Amazon Simple Workflow (SWF) be made to work with jRuby?

For uninteresting reasons, I have to use jRuby on a particular project where we also want to use Amazon Simple Workflow (SWF). I don't have a choice in the jRuby department, so please don't say "use MRI".
The first problem I ran into is that jRuby doesn't support forking and SWF activity workers love to fork. After hacking through the SWF ruby libraries, I was able to figure out how to attach a logger and also figure out how to prevent forking, which was tremendously helpful:
AWS::Flow::ActivityWorker.new(
swf.client, domain,"my_tasklist", MyActivities
) do |options|
options.logger= Logger.new("logs/swf_logger.log")
options.use_forking = false
end
This prevented forking, but now I'm hitting more exceptions deep in the SWF source code having to do with Fibers and the context not existing:
Error in the poller, exception:
AWS::Flow::Core::NoContextException: AWS::Flow::Core::NoContextException stacktrace:
"aws-flow-2.4.0/lib/aws/flow/implementation.rb:38:in 'task'",
"aws-flow-2.4.0/lib/aws/decider/task_poller.rb:292:in 'respond_activity_task_failed'",
"aws-flow-2.4.0/lib/aws/decider/task_poller.rb:204:in 'respond_activity_task_failed_with_retry'",
"aws-flow-2.4.0/lib/aws/decider/task_poller.rb:335:in 'process_single_task'",
"aws-flow-2.4.0/lib/aws/decider/task_poller.rb:388:in 'poll_and_process_single_task'",
"aws-flow-2.4.0/lib/aws/decider/worker.rb:447:in 'run_once'",
"aws-flow-2.4.0/lib/aws/decider/worker.rb:419:in 'start'",
"org/jruby/RubyKernel.java:1501:in `loop'",
"aws-flow-2.4.0/lib/aws/decider/worker.rb:417:in 'start'",
"/Users/trcull/dev/etl/flow/etl_runner.rb:28:in 'start_workers'"
This is the SWF code at that line:
# #param [Future] future
# Unused; defaults to **nil**.
#
# #param block
# The block of code to be executed when the task is run.
#
# #raise [NoContextException]
# If the current fiber does not respond to `Fiber.__context__`.
#
# #return [Future]
# The tasks result, which is a {Future}.
#
def task(future = nil, &block)
fiber = ::Fiber.current
raise NoContextException unless fiber.respond_to? :__context__
context = fiber.__context__
t = Task.new(nil, &block)
task_context = TaskContext.new(:parent => context.get_closest_containing_scope, :task => t)
context << t
t.result
end
I fear this is another flavor of the same forking problem and also fear that I'm facing a long road of slogging through SWF source code and working around problems until I finally hit a wall I can't work around.
So, my question is, has anyone actually gotten jRuby and SWF to work together? If so, is there a list of steps and workarounds somewhere I can be pointed to? Googling for "SWF and jRuby" hasn't turned up anything so far and I'm already 1 1/2 days into this task.
I think the issue might be that aws-flow-ruby doesn't support Ruby 2.0. I found this PDF dated Jan 22, 2015.
1.2.1
Tested Ruby Runtimes The AWS Flow Framework for Ruby has been tested
with the official Ruby 1.9 runtime, also known as YARV. Other versions
of the Ruby runtime may work, but are unsupported.
I have a partial answer to my own question. The answer to "Can SWF be made to work on jRuby" is "Yes...ish."
I was, indeed, able to get a workflow working end-to-end (and even make calls to a database via JDBC, the original reason I had to do this). So, that's the "yes" part of the answer. Yes, SWF can be made to work on jRuby.
Here's the "ish" part of the answer.
The stack trace I posted above is the result of SWF trying to raise an ActivityTaskFailedException due to a problem in some of my activity code. That part is my fault. What's not my fault is that the superclass of ActivityTaskFailedException has this code in it:
def initialize(reason = "Something went wrong in Flow",
details = "But this indicates that it got corrupted getting out")
super(reason)
#reason = reason
#details = details
details = details.message if details.is_a? Exception
self.set_backtrace(details)
end
When your activity throws an exception, the "details" variable you see above is filled with a String. MRI is perfectly happy to take a String as an argument to set_backtrace(), but jRuby is not, and jRuby throws an exception saying that "details" must be an Array of Strings. This exception blows through all the nice error catching logic of the SWF library and into this code that's trying to do incompatible things with the Fiber library. That code then throws a follow-on exception and kills the activity worker thread entirely.
So, you can run SWF on jRuby as long as your activity and workflow code never, ever throws exceptions because otherwise those exceptions will kill your worker threads (which is not the intended behavior of SWF workers). What they are designed to do instead is communicate the exception back to SWF in a nice, trackable, recoverable fashion. But, the SWF code that does the communicating back to SWF has, itself, code that's incompatible with jRuby.
To get past this problem, I monkey-patched AWS::Flow::FlowException like so:
def initialize(reason = "Something went wrong in Flow",
details = "But this indicates that it got corrupted getting out")
super(reason)
#reason = reason
#details = details
details = details.message if details.is_a? Exception
details = [details] if details.is_a? String
self.set_backtrace(details)
end
Hope that helps someone in the same situation as me.
I'm using JFlow, it lets you start SWF flow activity workers with JRuby.

Celery Storing unrecoverable task failures for later resubmission

I'm using the djkombu transport for my local development, but I will probably be using amqp (rabbit) in production.
I'd like to be able to iterate over failures of a particular type and resubmit. This would be in the case of something failing on a server or some edge case bug triggered by some new variation in data.
So I could be resubmitting jobs up to 12 hours later after some bug is fixed or a third party site is back up.
My question is: Is there a way to access old failed jobs via the result backend and simply resubmit them with the same params etc?
You can probably access old jobs using:
CELERY_RESULT_BACKEND = "database"
and in your code:
from djcelery.models import TaskMeta
task = TaskMeta.objects.filter(task_id='af3185c9-4174-4bca-0101-860ce6621234')[0]
but I'm not sure you can find the arguments that the task is being started with ... Maybe something with TaskState...
I've never used it this way. But you might want to consider the task.retry feature?
An example from celery docs:
#task()
def task(*args):
try:
some_work()
except SomeException, exc:
# Retry in 24 hours.
raise task.retry(*args, countdown=60 * 60 * 24, exc=exc)
From IRC
<asksol> dpn`: task args and kwargs are not stored with the result
<asksol> dpn`: but you can create your own model and store it there
(for example using the task_sent signal)
<asksol> we don't store anything when the task is sent, only send a
message. but it's very easy to do yourself
This was what I was expecting, but hoped to avoid.
At least I have an answer now :)