CouchDB - Get one document lesser than and greater than given key in single request - mapreduce

I have CouchDB database containing documents with key as timestamp and some value associated with it. Example
[
{"_id": "2018/03/11 10:23:00 +000", "value": "..."},
{"_id": "2018/03/11 14:25:02 +000", "value": "..."},
{"_id": "2018/03/12 09:29:12 +000", "value": "..."},
...
]
Given a timestamp I need to get one document which is lesser than or equal to the given timestamp and one doc which is greater than the given timestsamp.
For example, given TS: "2018/03/11 23:00:00 +000" i it should return
[
{"_id": "2018/03/11 14:25:02 +000", "value": "..."},
{"_id": "2018/03/12 09:29:12 +000", "value": "..."},
]
The way I know it is to do two different requests to get the greater one and lesser one independently. Is there a way to do this in one request?

Related

Error when importing CSV file into Amazon Personalize

I am trying to import a CSV file into Amazon Personalize
my schema looks like this:
{
"type": "record",
"name": "Items",
"namespace": "com.amazonaws.personalize.schema",
"fields": [
{
"name": "ITEM_ID",
"type": "string"
},
{
"name": "AUTHOR",
"type": "string",
"categorical": true
},
{
"name": "COUNTRY",
"type": "string",
"categorical": true
},
{
"name": "CITY",
"type": "string",
"categorical": true
},
{
"name": "STYLES",
"type": "string",
"categorical": true
},
{
"name": "CATEGORIES",
"type": "string",
"categorical": true
}
],
"version": "1.0"
}
the first few rows of data look like this:
ITEM_ID,AUTHOR,COUNTRY,CITY,STYLES,CATEGORIES
5b4253a7e12434f55875381e,5acd193f48ed4b9b3add5be6,US,city_us_austin,5ad45bc575eb016f3cdb562b|571aa21888a4fd9934f0fd7b|571aa21888a4fd9934f0fd79|5ad45e8c75eb016f3cdb563f|5b4ea35abaa12285687a1f47,593a866a082c26444eab2d3c|5a8e4820fc112d414fbc1be3
5b4253a7e12434f55875381f,5acd193f48ed4b9b3add5be6,US,city_us_jackson,571aa21888a4fd9934f0fd82|57600e419e4959cd069658eb|5ad45c3a75eb016f3cdb5631|571aa21888a4fd9934f0fd7b|57aaa7094a393f531ace43f0|575e6d8e34ca56f742bea1c8|571aa21888a4fd9934f0fd8f,593a866a082c26444eab2d3c|5a8e4820fc112d414fbc1be3
I get the error
Failed to create a data import job for item dataset.
Input csv has rows that do not conform to the dataset schema. Please ensure all required data fields are present and that they are of the type specified in the schema.
How can I figure out what is wrong with the CSV (it's thousands of lines long), so I have not idea if its a general mistake, or something wrong on a specific line?
In my experience, so long as the dataset is not >250 thousand records, you can still use Excel to check the data utilizing data filters and corresponding search functions. If it's more than that, look into using Notepad++ and RegEx. Your problem may be one of the following things:
(1) There's a missing comma. This would misalign your data and keep it from being processed.
(2) There's a missing ITEM_ID value. For Items, Personalize requires ITEM_ID and at least one metadata field. It might give this error if there is an instance where you are missing ITEM_ID or have ITEM_ID but no other metadata field values.
(3) STYLES and/or CATEGORIES exceeds 256 characters. There is probably a limit on String length, but I can't get a clear answer on this from the developer's guide. I would guess it's 256 characters. If I was betting money, this would be my guess on your problem.
Here is a different approach to solve the problem, maybe will be useful for other cases. I had the same issue, but when dealing with int columns having null values. Pandas by default converts the columns to float data type - something AWS Personalize dataset import job will not accept if you have dedfined these columns as int or long. Long story short, converting these columns to int solves the problem:
df.column_name = df.column_name.astype(pd.Int32Dtype())

Watson Assistant: Problem with extracting value for pattern entity

I am trying to get the value for the first group match of a pattern entity from the json response of Watson Assistant. The pattern is a simple regex to recognize sequences of numbers: ([0-9]+)
The json response looks like this:
"entity": "ID",
"location": [
18,
23
],
"value": "id",
"confidence": 1.0,
"groups": [
{
"group": "group_0",
"location": [
18,
23
]
}
]
},
{
"entity": "sys-number",
"location": [
18,
23
],
"value": "12345",
"confidence": 1.0,
"metadata": {
"numeric_value": 12345.0
}
}
]
So, the group is matched alright, but the field "value" is populated with the String literal from the entity config. I would expected to find the actual value there (which is the one the value field of the next entity, sys-number).
How do I need to change the config so that the value is included as-is in the value field (or somewhere else) and so that I don't have to extract the entity from the text string using the location values? Is it possible at all?
Thanks a lot
Cheers,
Martin
To access value of pattern based entity, you can either use <? #entity_name.literal ?> or <? #entity_name.groups[0] ?> - if there are more groups captured. You can find more info in the doc: https://cloud.ibm.com/docs/services/assistant?topic=assistant-entities

CouchDB Map syntax

I have documents in CouchDB (v. 2.1.1) as follows:
{
"xyz": "a",
"abc": "def"
},
{
"xyz": "a",
"ghi": "jkl"
},
{
"xyz": "a",
"mno": "pqr"
},
{
"xyz": "a",
"stu": "vwx"
},
{
"xyz": "a",
"bcd": 1000
}
If I run a simple map function, for example:
function (doc) {
if (doc.xyz ){
emit(doc.xyz, doc.abc);}}
I get:
{
"id": "4c3406a1d92942b4fb10d1314e0061a9",
"key": "a",
"value": "def"
},
{
"id": "4c3406a1d92942b4fb10d1314e006ccf",
"key": "a",
"value": null
},
{
"id": "4c3406a1d92942b4fb10d1314e00787f",
"key": "a",
"value": null
},
{
"id": "4c3406a1d92942b4fb10d1314e00871e",
"key": "a",
"value": null
},
{
"id": "4c3406a1d92942b4fb10d1314e00906a",
"key": "a",
"value": null
}
I want to try and eliminate the 'null' outputs.
I am looking at having a CouchDB database with many small documents containing small snippets of information rather than having larger documents containing much more information per document.
My question is, is my document design a good one and if so how do I get just what I am looking for rather than rows of 'nulls'. If my storage design is not ideal, what kind of design should I be looking at to simplify the output given my plan to have many small 'docs'.
EDIT:
Having looked at possible answers, I have decided that having numerous small documents as I described in my question is not giving me the kind of benefit I imangined they would.
I was unable to get a satisfactory solution to the map function to get readable answers.
However, I investigated the 'Mango' query system available in recent updates of CouchDB and I was able using these queries to get acceptable output from a database like my supplied one.
This is what I did:
curl -X POST http://admin:123#127.0.0.1:5984/ptn/_find -d '{"selector": {"$or": [{"abc": {"$gt": null}},{"ghi": {"$gt": null}}]},"fields": ["abc","ghi"]}' -H "Content-Type:application/json"
Un-minified:
{
"selector": {
"$or": [
{
"abc": {
"$gt": null
}
},
{
"ghi": {
"$gt": null
}
}
]
},
"fields": [
"abc",
"ghi"
]
}
The output:
{"docs":[
{"abc":"def"},
{"ghi":"jkl"}
]
.....
A concise answer.
Sorting can be done but sorted fields must be indexed. Indexing is in any case advised for larger data sets.
Reference:
http://docs.couchdb.org/en/2.1.1/api/database/find.html
As my question required a map function, this perhaps cannot be regarded as a valid answer but for me it is an answer. I have tried the 'Mango' query system a little on other databases and it seems to be more useful/powerful than I thought is was although it offers no means of totaling etc.

When predicting, what are the valid values for dataFormat?

Problem
Using the REST API, I have trained and deployed a model that I now want to use for prediction. I've defined the collections for prediction input and output and uploaded a json file formatted accordingly to the cloud storage. However, when trying to create a prediction job I cannot figure out what value to use for the dataFormat field, which is a required parameter. Is there any way to list all valid values?
What I've tried
My requests look like the one below. I've tried JSON, NEWLINE_DELIMITED_JSON (like when importing data into BigQuery), and even the json mime type application/json, in pretty much all different cases I can think of (upper and lower combined with snake, camel, etc.).
{
"jobId": "my_predictions_123",
"predictionInput": {
"modelName": "projects/myproject/models/mymodel",
"inputPaths": [
"gs://model-bucket/data/testset.json"
],
"outputPath": "gs://model-bucket/predictions/0/",
"region": "us-central1",
"dataFormat": "JSON"
},
"predictionOutput": {
"outputPath": "gs://my-bucket/predictions/1/"
}
}
All my attempts have only gotten me this back though:
{
"error": {
"code": 400,
"message": "Invalid value at 'job.prediction_input.data_format' (TYPE_ENUM), \"JSON\"",
"status": "INVALID_ARGUMENT",
"details": [
{
"#type": "type.googleapis.com/google.rpc.BadRequest",
"fieldViolations": [
{
"field": "job.prediction_input.data_format",
"description": "Invalid value at 'job.prediction_input.data_format' (TYPE_ENUM), \"JSON\""
}
]
}
]
}
}
From Cloud ML API reference document https://cloud.google.com/ml/reference/rest/v1beta1/projects.jobs#DataFormat, the data format field in your request should be "TEXT" for all text inputs (including JSON, CSV, etc).

How do I use JSON with U2/Universe

U2/Universe JSON document have the following UDOSetProperty, how would one set the value if it has multiple values? For example if I have multiple emails.
example: UDOSetProperty(udoHandle, "to", value)
"to": [
{
"email": "recipientEmail#example.com",
"name": "Recipient Name",
"type": "to"
}
],
Not sure if you are trying to add another "to" array element or if you want to add a 2nd "email" only.
So working with your example:
"to": [
{
"email": [ "recipientEmail#example.com",
"name": "Recipient Name",
"type": "to"
},
{
"email": [ "recipient2Email#example.com",
"name": "Recipient2 Name",
"type": "to"
}
],
If you wanted to create the above JSON from scratch, with the UDO commands, the steps would be:
Using the following functions should help you with what you are trying to do:
Create the initial/root object UDOCreate(UDO_OBJECT,
udoHandle)
Create the array UDOCreate(UDO_ARRAY,
thisArray)
( Use UDOCreate and UDOSetProperty to create the theEmailObject you
want to add to the array, and then add it to the object with
UDOArrayAppendItem( thisArray, theEmailObject )
Then add the array to the root object eith UDOSetProperty(udoHandle,
"TO", thisArray)
Note the part that is important is that there are several functions for dealing with arrays.
Mike
Created a program that builds the JSON with the U2 UDO functions, and added it to github:
https://github.com/RocketSoftware/multivalue-lab/blob/master/U2/Demos/UDO/JSON/The-Basics/arrayExample