CouchDB Map syntax - mapreduce

I have documents in CouchDB (v. 2.1.1) as follows:
{
"xyz": "a",
"abc": "def"
},
{
"xyz": "a",
"ghi": "jkl"
},
{
"xyz": "a",
"mno": "pqr"
},
{
"xyz": "a",
"stu": "vwx"
},
{
"xyz": "a",
"bcd": 1000
}
If I run a simple map function, for example:
function (doc) {
if (doc.xyz ){
emit(doc.xyz, doc.abc);}}
I get:
{
"id": "4c3406a1d92942b4fb10d1314e0061a9",
"key": "a",
"value": "def"
},
{
"id": "4c3406a1d92942b4fb10d1314e006ccf",
"key": "a",
"value": null
},
{
"id": "4c3406a1d92942b4fb10d1314e00787f",
"key": "a",
"value": null
},
{
"id": "4c3406a1d92942b4fb10d1314e00871e",
"key": "a",
"value": null
},
{
"id": "4c3406a1d92942b4fb10d1314e00906a",
"key": "a",
"value": null
}
I want to try and eliminate the 'null' outputs.
I am looking at having a CouchDB database with many small documents containing small snippets of information rather than having larger documents containing much more information per document.
My question is, is my document design a good one and if so how do I get just what I am looking for rather than rows of 'nulls'. If my storage design is not ideal, what kind of design should I be looking at to simplify the output given my plan to have many small 'docs'.
EDIT:
Having looked at possible answers, I have decided that having numerous small documents as I described in my question is not giving me the kind of benefit I imangined they would.
I was unable to get a satisfactory solution to the map function to get readable answers.
However, I investigated the 'Mango' query system available in recent updates of CouchDB and I was able using these queries to get acceptable output from a database like my supplied one.
This is what I did:
curl -X POST http://admin:123#127.0.0.1:5984/ptn/_find -d '{"selector": {"$or": [{"abc": {"$gt": null}},{"ghi": {"$gt": null}}]},"fields": ["abc","ghi"]}' -H "Content-Type:application/json"
Un-minified:
{
"selector": {
"$or": [
{
"abc": {
"$gt": null
}
},
{
"ghi": {
"$gt": null
}
}
]
},
"fields": [
"abc",
"ghi"
]
}
The output:
{"docs":[
{"abc":"def"},
{"ghi":"jkl"}
]
.....
A concise answer.
Sorting can be done but sorted fields must be indexed. Indexing is in any case advised for larger data sets.
Reference:
http://docs.couchdb.org/en/2.1.1/api/database/find.html
As my question required a map function, this perhaps cannot be regarded as a valid answer but for me it is an answer. I have tried the 'Mango' query system a little on other databases and it seems to be more useful/powerful than I thought is was although it offers no means of totaling etc.

Related

Best way to load 1MM JSON records into AWS Redshift with Kinesis Firehose?

I've got a bunch of JSON records that I want to add to an Amazon Redshift instance from S3, via Kinesis Firehose. It's several hundred files, give or take, that have 1,000 or so records each, and each file looks like the below sample. For my purposes, I don't care about the info entry, at least for now. I have a working Kinesis Firehose service that can update my Redshift DB with the sample stock ticker data, so that part is OK. My questions are (and hopefully this shouldn't actually be split into two different posts):
This is in large part a learning exercise, so if it's overkill for what I'm trying to do, that's OK. If there's a reason it's actually a bad idea, let me know.
If I want to just ignore the info field, do I have to use a Lambda to strip it, or is there a way to do that without one? If so, are there any tricks that wouldn't be the same as writing a script to process from a regular textfile? As I'm typing this I realize I could probably just put info in the DB and never touch it, but if there's a reason not to do that, or a cleaner way than that, I'd appreciate hearing it.
When I have individual manufacturers with a set of features, and there could be dozens of features per manufacturer, does it make sense to make a separate DB table for features, or am I coming at it from a Python dict/Perl hash perspective that doesn't make sense for a SQL DB when I need to tie them back together later?
Sample:
{
"info": {
"generated_on": "2022-08-09 19:25:34",
"version": "v1"
},
"manufacturer": [
{
"name": "Audi",
"id": 1,
"num_features": 2,
"features": [
{
"name": "seat heaters",
"standard": "N",
"cost": 100
},
{
"name": "A/C",
"standard": "Y",
"cost": 0
}
]
},
{
"name": "BMW",
"id": 2,
"num_features": 3,
"features": [
{
"name": "seat heaters",
"standard": "Y",
"cost": 0
},
{
"name": "backup camera",
"standard": "N",
"cost": 500
},
{
"name": "A/C",
"standard": "Y",
"cost": 0
}
]
}
]
}

Amazon Product Advertising API v5 - How to retrieving different type of books (hardcopy, paperback, kindle... ) with one API call

I’m a developer for a company where we’re using Amazon's great product advertising API (PA-API) for a many years for fetching book information. We’re currently using the Java SDK and API v5.
Issue
We provide our customers directly links with our affiliate to the related hardcopy or ebook on different Amazon stores. We do this by creating a SearchItems documentation request with the ISBN (example 9780399562396) as the keyword and no specific search index. In the past we got a response back with two items and therefore two ASINs, one for the hardcopy and one for the ebook (distinguishable by the itemInfo’s product group). However, since some time we had to recognize that the response only contains normally one item, the hard copy product.
I have already tried different approaches with the great Scratchpad.
Questions
The interesting thing is that when I explicitly include the search index (more information here) “Books” or “KindleStore” the API is responding with the expected item (for "books" with a book and for "KindleStore" with a kindle). We do that by having a look at the ItemInfo.Classifications (more information here) However, if I search in the index “All” or don’t specified it, it returns only one item (normally the hardcopy). Which seems to me quite strange… Should the API/search index even behave like this?
Furthermore, I was not able to figure out how to search in to indexes within the same request and it seems to me that this is not supported at all but I would expect that at least this would then return two items…
Therefore, I would like ask you if somebody has an explanation for us to retrieve with one request both ASINs (kindle + hard copy book) of the same ISBN. Of course it is possible to create two separate requests for each product group, however since the API rates are tied to the shipped item revenue, we would like to avoid unnecessary API requests.
Some examples with and without the usage of the explicit usage of the search index
In the following example I looking for the hardcopy or kindle of the book with ISBN 9780262043649 by doing a SearchItem request.
a) Hardcopy with given search index
Payload
{
"Keywords": "9780262043649",
"Resources": [
"ItemInfo.Classifications",
"ItemInfo.Title"
],
"SearchIndex": "Books",
"PartnerTag": "*********",
"PartnerType": "Associates",
"Marketplace": "www.amazon.com",
"Operation": "SearchItems"
}
Response
{
"SearchResult": {
"Items": [
{
"ASIN": "0262043645",
"DetailPageURL": "https://www.amazon.com/dp/0262043645?tag=getabstractcom&linkCode=osi&th=1&psc=1",
"ItemInfo": {
"Classifications": {
"Binding": {
"DisplayValue": "Hardcover",
"Label": "Binding",
"Locale": "en_US"
},
"ProductGroup": {
"DisplayValue": "Book",
"Label": "ProductGroup",
"Locale": "en_US"
}
},
"Title": {
"DisplayValue": "Novacene: The Coming Age of Hyperintelligence (The MIT Press)",
"Label": "Title",
"Locale": "en_US"
}
}
}
],
"SearchURL": "https://www.amazon.com/s?k=9780262043649&i=stripbooks&rh=p_n_availability%3A-1&tag=getabstractcom&linkCode=osi",
"TotalResultCount": 1
}
}
b) Kindle with given search index
Payload
{
"Keywords": "9780262043649",
"Resources": [
"ItemInfo.Classifications",
"ItemInfo.Title"
],
"SearchIndex": "KindleStore",
"PartnerTag": "******",
"PartnerType": "Associates",
"Marketplace": "www.amazon.com",
"Operation": "SearchItems"
}
Response
{
"SearchResult": {
"Items": [
{
"ASIN": "B08BT4MM18",
"DetailPageURL": "https://www.amazon.com/dp/B08BT4MM18?tag=getabstractcom&linkCode=osi&th=1&psc=1",
"ItemInfo": {
"Classifications": {
"Binding": {
"DisplayValue": "Kindle Edition",
"Label": "Binding",
"Locale": "en_US"
},
"ProductGroup": {
"DisplayValue": "Digital Ebook Purchas",
"Label": "ProductGroup",
"Locale": "en_US"
}
},
"Title": {
"DisplayValue": "Novacene: The Coming Age of Hyperintelligence",
"Label": "Title",
"Locale": "en_US"
}
}
}
],
"SearchURL": "https://www.amazon.com/s?k=9780262043649&i=digital-text&rh=p_n_availability%3A-1&tag=getabstractcom&linkCode=osi",
"TotalResultCount": 1
}
}
c) No specific search index
Payload
{
"Keywords": "9780262043649",
"Resources": [
"ItemInfo.Classifications",
"ItemInfo.Title"
],
"PartnerTag": "*******",
"PartnerType": "Associates",
"Marketplace": "www.amazon.com",
"Operation": "SearchItems"
}
Response
{
"SearchResult": {
"Items": [
{
"ASIN": "B08BT4MM18",
"DetailPageURL": "https://www.amazon.com/dp/B08BT4MM18?tag=getabstractcom&linkCode=osi&th=1&psc=1",
"ItemInfo": {
"Classifications": {
"Binding": {
"DisplayValue": "Kindle Edition",
"Label": "Binding",
"Locale": "en_US"
},
"ProductGroup": {
"DisplayValue": "Digital Ebook Purchas",
"Label": "ProductGroup",
"Locale": "en_US"
}
},
"Title": {
"DisplayValue": "Novacene: The Coming Age of Hyperintelligence",
"Label": "Title",
"Locale": "en_US"
}
}
}
],
"SearchURL": "https://www.amazon.com/s?k=9780262043649&rh=p_n_availability%3A-1&tag=getabstractcom&linkCode=osi",
"TotalResultCount": 1
}
}
Research/Further information
Documentation of the API
Search Index of Amazon
Scratchpad
Many thanks for any advice.

FB Graph API - Filtering results by specific IDs?

I am making a request to a specific node and edge using the graph API:
https://graph.facebook.com/v2.6/NODE_ID/EDGE_NAME
Example:
https://graph.facebook.com/v2.6/00000000000000/reports
which returns the results below:
"data": [
{
"id": "111111111111111",
"name": "Report A"
},
{
"id": "22222222222222",
"name": "Report B"
},
{
"id": "33333333333333",
"name": "Report C"
}
]
The above is literally returning a list of reports by id/name that exist under a specific company.
If I want to filter the results by specific reports, how can I go about doing this?
I tried variations such as the below, but they haven't worked and still return all reports:
https://graph.facebook.com/v2.6/00000000000000/reports?ids=22222222222222
I know I can make the report ID as the node to access it directly:
https://graph.facebook.com/v2.6/22222222222222/
But I want to view the properties of a subset of reports that belong to the company, so I was thinking I could build an array to do this.
https://graph.facebook.com/v2.6/00000000000000/reports?ids=22222222222222,33333333333333
Expected Result:
"data": [
{
"id": "111111111111111",
"name": "Report A"
},
{
"id": "22222222222222",
"name": "Report B"
},
{
"id": "33333333333333",
"name": "Report C"
}
]
This seems like it should work based on the below documentation, but it does not...
https://developers.facebook.com/docs/graph-api/using-graph-api
Could it be because the edge I'm accessing isn't able to recognize these IDs for some reason...? I know it's hard to say without knowing what I'm doing, but I can't disclose fully as it's proprietary...
Any advice is appreciated.

How do I use JSON with U2/Universe

U2/Universe JSON document have the following UDOSetProperty, how would one set the value if it has multiple values? For example if I have multiple emails.
example: UDOSetProperty(udoHandle, "to", value)
"to": [
{
"email": "recipientEmail#example.com",
"name": "Recipient Name",
"type": "to"
}
],
Not sure if you are trying to add another "to" array element or if you want to add a 2nd "email" only.
So working with your example:
"to": [
{
"email": [ "recipientEmail#example.com",
"name": "Recipient Name",
"type": "to"
},
{
"email": [ "recipient2Email#example.com",
"name": "Recipient2 Name",
"type": "to"
}
],
If you wanted to create the above JSON from scratch, with the UDO commands, the steps would be:
Using the following functions should help you with what you are trying to do:
Create the initial/root object UDOCreate(UDO_OBJECT,
udoHandle)
Create the array UDOCreate(UDO_ARRAY,
thisArray)
( Use UDOCreate and UDOSetProperty to create the theEmailObject you
want to add to the array, and then add it to the object with
UDOArrayAppendItem( thisArray, theEmailObject )
Then add the array to the root object eith UDOSetProperty(udoHandle,
"TO", thisArray)
Note the part that is important is that there are several functions for dealing with arrays.
Mike
Created a program that builds the JSON with the U2 UDO functions, and added it to github:
https://github.com/RocketSoftware/multivalue-lab/blob/master/U2/Demos/UDO/JSON/The-Basics/arrayExample

Writing a simple group by with map-reduce (Couchbase)

I'm new to the whole map-reduce concept, and i'm trying to perform a simple map-reduce function.
I'm currently working with Couchbase server as my NoSQL db.
I want to get a list of all my types:
key: 1, value: null
key: 2, value: null
key: 3, value: null
Here are my documents:
{
"type": "1",
"value": "1"
}
{
"type": "2",
"value": "2"
}
{
"type": "3",
"value": "3"
}
{
"type": "1",
"value": "4"
}
What I've been trying to do is:
Write a map function:
function (doc, meta) {
emit(doc.type, 0);
}
Using built-in reduce function:
_count
But i'm not getting the expected result.
How can I get all types ?
UPDATE
Please notice that the types are different documents, and I know that reduce works on a document and doesn't executes outside of it.
By default it will reduce all key groups. The feature you want is called group_level:
This is equivalent of reduce=true
~ $ curl 'http://localhost:8092/so/_design/dev_test/_view/test?group_level=0'
{"rows":[
{"key":null,"value":4}
]
}
But here is how you can get reduction by the first level of the key
~ $ curl 'http://localhost:8092/so/_design/dev_test/_view/test?group_level=1'
{"rows":[
{"key":"1","value":2},
{"key":"2","value":1},
{"key":"3","value":1}
]
}
There is also blog post about this: http://blog.couchbase.com/understanding-grouplevel-view-queries-compound-keys
There is appropriate option in couchbase admin console: