Riak MapReduce - map works, reduce receives very small subset of results

Riak MapReduce - map works, reduce receives very small subset of results - mapreduce

I'm using Riak 2.0.0b1 on Ubuntu 12.10 (up to date). This is a developer box, so I have only one Riak instance - no clusters, etc.
I've put about 100k JSON documents (about 300 bytes each) into a bucket and now am trying to mapreduce over it. The data is random and I've also got 2i index on one of the keys which is basically dividing the dataset into 10 almost even parts of ~10k documents.
This query works as expected:
curl -XPOST -d'{
"inputs": {"bucket": "bucket", "index": "idx_bin", "key": "10"},
"query": [
{
"map": {
"language": "javascript",
"source": "Riak.mapValuesJson"
}
}
]
}' http://localhost:8080/mapred -H 'Content-Type: application/json' | python -m json.tool | egrep '^ {4}\{' | wc -l
9974
Got about ~10k results. Now if I want to do something in the reduce step, I get an answer which doesn't make sense:
curl -XPOST -d'{
"inputs": {"bucket": "bucket", "index": "idx_bin", "key": "10"},
"query": [
{
"map": {
"language": "javascript",
"source": "Riak.mapValuesJson"
}
},
{
"reduce": {
"language": "javascript",
"source": "function(o) { return [o.length] }"
}
}
]
}' http://localhost:8080/mapred -H 'Content-Type: application/json' | python -m json.tool
[
15
]
I'd like to see an error here if I'm reaching some (un)documented limits or a full list of objects please, not 15. (This number differs between runs; sometimes there's a couple more.) I went to the configs and done this:
javascript.map_pool_size = 64
javascript.reduce_pool_size = 64
javascript.maximum_stack_size = 32MB
javascript.maximum_heap_size = 64MB
Didn't help at all.
What is going on and how to get all objects in the reduce phase?

The reduce function is called many times. The map function will be run on about 1/3 of the vnodes in the cluster (that's 22 times in a cluster with ring_size 64), the reduce function will be called each time results are available from a map function, with it's first argument being a list containing both the result from the previous run of the reduce function, and the results from the map function. In your case, you counted the values returned from the first vnode, which was then passed as a value included with the second vnode's results, and only counted as a single value.
What you will need to do is have the reduce function return a value/object that is easily differentiated from the other values, such as
function(o) {
var prevCount = 0;
var countObjects = 0;
for each (e in o) {
if (typeof e === 'object' && typeof e.reduce_running_total === 'number') {
prevCount += e.reduce_running_total;
countObjects += 1;
}
}
return([{"reduce_running_total":o.length + prevCount - countObjects}]);
}
Or, you could save some network bandwidth, and instead of having the map phase return all of the objects, have the map function return a literal [1] for each key found, then the reduce function simply sums up all the numbers in the input list and returns them.

Related

How do I extract this field using JQ?

JQ makes me irrationally angry and I despise its very existence. I have been trying to extract the value of the tf-name key for 30 minutes and I'm going to become a hermit if I have to make another attempt at this. Please, how do I get tmp as a a result?
➜ aws resourcegroupstaggingapi get-resources --tag-filters Key=tf-name,Values=tmp --profile=test | jq
{
"ResourceTagMappingList": [
{
"ResourceARN": "arn:aws:s3:::stack-hate-1234567890salt",
"Tags": [
{
"Key": "aws:cloudformation:stack-name",
"Value": "hatestack"
},
{
"Key": "aws:cloudformation:stack-id",
"Value": "arn:aws:cloudformation:us-west-2:6666666666666:stack/FSStack/hate-hate-hate-123456789"
},
{
"Key": "tf-name",
"Value": "tmp"
},
{
"Key": "aws:cloudformation:logical-id",
"Value": "hatebucket1234"
},
{
"Key": "aws-cdk:auto-delete-objects",
"Value": "true"
}
]
}
]

Its quite simple to extract, once you know the type of the top level objects. You can see ResourceTagMappingList is a list of records, so it should have a [] notation following it, which I think most beginners with jq tend to miss out. So
.ResourceTagMappingList[].Tags | from_entries."tf-name"
should get you the desired value. See jqplay demo
The way it works is, the from_entries function takes an array of objects with key names key and value and transforms it to a value only pair. From there, we extract only the key name tf-name. The special double quotes are needed because the key name here contains a meta-character -, which should be quoted to treat the whole string as the key name.
Another way would be to use a select statement like below
.ResourceTagMappingList[].Tags[] | select(.Key == "tf-name").Value

Finding all obligations in spl-token-lending program

If an Obligation becomes unhealthy, it can be liquidated by calling LiquidateObligation instruction, however, I cannot liquidate it if I don't know it exists, and the process of finding them is still unclear to me.
What is the expected way for me to find all currently "working" Obligations?

The only way to get all of the Obligation accounts is to use the getProgramAccounts RPC endpoint with a filter, which fetches every account owned by the lending program that has a certain size. Since an Obligation has a size of 916 according to the code: https://github.com/solana-labs/solana-program-library/blob/9123a80a6a5b5f8a378a56c4501f99df7debda55/token-lending/program/src/state/obligation.rs#L329, you can do:
curl YOUR_RPC_ENDPOINT_HERE -X POST -H "Content-Type: application/json" -d '
{
"jsonrpc": "2.0",
"id": 1,
"method": "getProgramAccounts",
"params": [
"LENDING_PROGRAM_PUBKEY_IN_BASE_58",
{
"filters": [
{
"dataSize": 916
}
]
}
]
}
'
This was adapated from https://docs.solana.com/developing/clients/jsonrpc-api#example-35

How to get all the Transactions from a block in NEAR protocol using a single call?

I want to fetch all the transaction from a block using the single rpc call.
I know we can fetch using the chunk id but in that case we have to make a call for each chunk.

Unfortunately, it's impossible to do in a single call. However, it is possible in N+1 where N is a number of shards.
Request a block (by height, hash or finality - depends on your quest, lets assume you need latest)
https://docs.near.org/docs/api/rpc/block-chunk#block-details
httpie example
$ http post https://rpc.testnet.near.org/ id=dontcare jsonrpc=2.0 method=block params:='{"finality": "final"}'
Collect Chunks hashes from the response. You can find them in the response JSON
{
"id": "dontcare",
"jsonrpc": "2.0",
"result": {
"author": "node1",
"chunks": [
{
...
chunk_hash: 6ZJzhK4As3UGkyH2kxHmRFYoV7hiyXareMo1qzyxS624,
Using a jq
$ http post https://rpc.testnet.near.org/ id=dontcare jsonrpc=2.0 method=block params:='{"finality": "final"}' | jq .result.chunks[].chunk_hash
"GchAtNdcc16bKvnTa7RA3xkYAt2eMg22Qkmc9FfFTrK2"
"8P6u7zwsLvYMH5vbV4hnaCaL7FKuPnfJU4yNJY52WCd2"
"8p1XaC4BzCBVUhfYWyf6nBXF4m9uzJVEJmHCYnBMLuUn"
"7TkVTzCGMyxNnumX6ZsES5v3Wa3UnBZZAavF9zjMzDKC"
You need to perform a query to get every chunk like:
$ http post https://rpc.testnet.near.org/ id=dontcare jsonrpc=2.0 method=chunk params:='{"chunk_id": "GchAtNdcc16bKvnTa7RA3xkYAt2eMg22Qkmc9FfFTrK2"}' | jq .result.transactions[]
{
"signer_id": "art.artcoin.testnet",
"public_key": "ed25519:4o6mz55p1mNmfwg5EeTDXdtYFxQev672eU5wy5RjRCbw",
"nonce": 570906,
"receiver_id": "art.artcoin.testnet",
"actions": [
{
"FunctionCall": {
"method_name": "submit_asset_price",
"args": "eyJhc3NldCI6ImFCVEMiLCJwcmljZSI6IjM4MzQyOTEyMzgzNTEifQ==",
"gas": 30000000000000,
"deposit": "0"
}
}
],
"signature": "ed25519:2E6Bs8U1yRtAtYuzNSB1PUXeAywrTbXMpcM8Z8w6iSXNEtLRDd1aXDCGrv3fBTn1QZC7MoistoEsD5FzGSTJschi",
"hash": "BYbqKJq3c9qW77wspsmQG3KTKAAfZcSeoTLWXhk6KKuz"
}
And this way you can collect all the transactions from the block.
Alternatively, as #amgando said you can query the Indexer for Explorer database using public credentials
https://github.com/near/near-indexer-for-explorer#shared-public-access
But please be aware that the number of connections to the database is limited (resources) and often it's not that easy to get connected because a lot of people around the globe are using it.

Since NEAR is sharded, a "chunk" is what we call that piece of a block that was handled by a single shard
To build up the entire block you can either
construct the block from its chunk parts
use an indexer to capture what you need in real time

CouchDB query to get the doc with MAX timestamp

My CouchDB document format as below and based on the price changes there can be multiple documents with same product_id & store_id
{
"_id": "6b645d3b173b4776db38eb9fe6014a4c",
"_rev": "1-86a1d9f0af09beaa38b6fbc3095f06a8",
"product_id": "6b645d3b173b4776db38eb9fe60148ab",
"store_id": "0364e82c13b66325ee86f99f53049d39",
"price": "12000",
"currency": "AUD_$",
"time": 1579000390326
}
and I need to get the latest document (by time - the timestamp) for given product_id & store_id
For this, with my current solution I have to do two queries as below;
To get the latest timestamp. This returns the latest timestamp for given product_id & store_id
"max_time_by_product_store_id": {
"reduce": "function(keys, values) {var ids = []
values.forEach(function(time) {
if (!isNaN(time)){
ids.push(time);
}
});
return Math.max.apply(Math, ids)
}",
"map": "function (doc) {emit([doc.store_id, doc.product_id], doc.time);}"
}
Based on the latest timestamp, again I query to get the document with three parameters that are store_id, product_id & time as below,
"store_product_time": {
"map": "function (doc) {
emit([doc.store_id, doc.product_id, doc.time]);
}"
}
This works perfectly for me but my problem is I need to do two DB queries to get the document and looking for a solution to fetch the document within one DB query.
In CouchDB selector also has no way to get the document by MAX value.

With CouchDB's /db/_find, you can descending sort the result and limit the result to one document as follows:
{
"selector": {
"_id": {
"$gt": null
}
},
"sort": [
{
"time": "desc"
}
],
"limit": 1
}
CURL
curl -H 'Content-Type: application/json' -X POST http://localhost:5984/<db>/_find -d '{"selector":{"_id":{"$gt":null}},"sort":[{"time": "desc"}],"limit": 1}'
Please note that an index must previously be created for the sort field time (see /db/_index).

Trello API: getting boards / lists / cards information

Using Trello API:
- I've been able to get all the cards that are assigned to a Trello user
- I've been able to get all the boards that are assigned to an Organization
But I can't get any API call that returns all the lists that are in an Organization or User.
Is there any function that allows that ?
Thanks in advance

For the users who want the easiest way to access the id of a list :
Use the ".json" hack !
add ".json" at the end of your board URL to display the same output of the API query for that board, in your browser ! (no other tool needed, no hassle dealing with authentication).
For instance, if the URL of your board is :
https://trello.com/b/EI6aGV1d/blahblah
point your browser to
https://trello.com/b/EI6aGV1d/blahblah.json
And you will obtain something like
{
"id": "5a69a1935e732f529ef0ad8e",
"name": "blahblah",
"desc": "",
"descData": null,
"closed": false,
[...]
"cards": [
{
"id": "5b2776eba95348dd45f6b745",
"idMemberCreator": "58ef2cd98728a111e6fbd8d3",
"data": {
"list": {
"name": "Bla blah blah blah blah",
"id": "5a69a1b82f62a7af027d0378"
},
"board": {
[...]
Where you can just search for the name of your list to easily find its id next to it.
tip: use a json viewer extension to make your browser display a nice json. Personnally I use https://github.com/tulios/json-viewer/tree/0.18.0 but I guess there are a lot of good alternatives out there.

I don't believe there is a method in the Trello API to do this, so you'll have to get a list of boards for a user or organization:
GET /1/members/[idMember or username]/boards
Which returns (truncated to show just the parts we care about):
[{
"id": "4eea4ffc91e31d1746000046",
"name": "Example Board",
"desc": "This board is used in the API examples",
...
"shortUrl": "https://trello.com/b/OXiBYZoj"
}, {
"id": "4ee7e707e582acdec800051a",
"name": "Public Board",
"desc": "A board that everyone can see",
...
"shortUrl": "https://trello.com/b/IwLRbh3F"
}]
Then get the lists for each board:
GET /1/boards/[board_id]/lists
Which returns (truncated to only show the list id and name:
[{
"id": "4eea4ffc91e31d174600004a",
"name": "To Do Soon",
...
}, {
"id": "4eea4ffc91e31d174600004b",
"name": "Doing",
...
}, {
"id": "4eea4ffc91e31d174600004c",
"name": "Done",
...
}]
And go through this response for each board to build a list of all the lists a user or organization has.

You can do it by calling
GET /1/organizations/[idOrg]/boards?lists=all
It is here: https://developers.trello.com/advanced-reference/organization#get-1-organizations-idorg-or-name-boards
Look at the arguments.
There are several filters and fields. You can customize it.

to get all your boards use
Trello.get("/members/me/boards")
worked for me using client.js

Here is a quick and dirty bash script using curl and jq to get any given Board's ID or List's Id:
key="<your-key>"
token="<your-token>"
trelloUsername="<you-trello-username>"
boardName="<board-name>"
listName="<list-name>"
boardID=$(curl -s --request GET --url "https://api.trello.com/1/members/$trelloUsername/boards?key=$key&token=$token" --header 'Accept: application/json' | jq -r ".[] | select(.name == \"$boardName\").id")
echo "boardID: ${boardID}"
listID=$(curl -s --request GET --url "https://api.trello.com/1/boards/$boardID/lists?key=$key&token=$token" | jq -r ".[] | select(.name == \"$listName\").id")
echo "listID: ${listID}"
Example output:
boardID: 5eab513c719d2d681bafce0e
listID: 5eab519e66dd4272gb720e22

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Riak MapReduce - map works, reduce receives very small subset of results - mapreduce

Related

How do I extract this field using JQ?

Finding all obligations in spl-token-lending program

How to get all the Transactions from a block in NEAR protocol using a single call?

CouchDB query to get the doc with MAX timestamp

Trello API: getting boards / lists / cards information

Categories

Resources