Couchbase view using “WHERE” clause dynamically - mapreduce

I have Json Documents in the following format
Name :
Class :
City :
Type :
Age :
Level :
Mother :
Father :
I have a map function like this
function(doc,meta)
{
emit([doc.Name,doc.Age,doc.Type,doc.Level],null);
}
What I can do is give "name" and filter out all results but what I also want to do is give "age" only and filter out on that. For that couchbase does not provide functionality to skip "Name" key. So I have to create a new map function which has "Age" as first key, but I also have to query on only "Level" key also so like this. I would have to create many map functions for each field which obviously is not feasible so is there anything I can do apart from making new map function to achieve this type of functionality?
I can't us n1ql because I have 150 million documents so it will take a lot of time.

First of all - that is not a very good reduce function.
it does not have any filtering
function header should be function(doc, meta)
if you have mixture between json and binary objects - add meta.type == "json"
Now for the things you can do:
If you are using v4 and above (v4.1 in much more recommended) you can use N1QL and use it very similar to SQL language. (I didn't understand why you can't use n1ql)
You can emit multiple items in multiple order
i.e. if I have doc in the format of
{
"name": "Roi",
"age": 31
}
I can emit to the index two values:
function (doc, meta) {
if (meta.type=="json") {
emit(doc.name, null);
emit(doc.age, null);
}
}
Now I can query by 2 values.
this is much better than creating 2 views.
Anyway, if you have something to filter by - it is always recommended.

Related

DynamoDB equivalent of SQL's key or attribute 'not in("value1", "value2"...)'

I am trying to run some dynamoDB operations with AWS.DynamoDB.DocumentClient from aws-sdk module but I am unable to find an easy solution to select items where an attribute is not equals to an array of values.
e.g
attribute <> ["value1", "value2]
This is equivalent to a simple typical SQL operation in the form of:
select * from sometable where attribute not in("value1", "value2"...);
After trying out different ScanFilter and QueryFilter following the documentation here, it seems that the AttributeValueList for NE and NOT_CONTAINS does not accept multiple values.
I wish to arrive at the results as shown below without having to define multiple 'AND' queries
I have since arrived at this solution but it seems clumsy and I would have to write logic to create the filter condition string and ExpressionAttributeValues as the filter condition is dynamic.
FilterExpression: 'answer <> :answer1 AND answer <> :answer2',
ExpressionAttributeValues : {
':answer1' : "test1",
':answer2' : "test2"
}
I have therefore, 2 questions:
Is there a better way of doing this?
Is there a length limit to the string of KeyConditionExpression? I
am very sure there is but I cannot seem to find information with
regards to this.
There is no other way to achieve what you need. If all these values have something in common, and you know it in the write time, you can insert them with some kind of prefix and create GSI where they will be a sort key. In this case you'll be able to query them by prefix in the key condition expression. Otherwise, what you've suggested is your only option.
4KB for all of the expressions combined. As described in the Expression Parameters:
Expression parameters include ProjectionExpression, ConditionExpression, UpdateExpression, and FilterExpression.
The maximum length of any expression string is 4 KB. For example, the size of the ConditionExpression a=b is 3 bytes.

How to query a DynamoDB table to retreive only items which have a specific value inside an string-set attribute?

The items in my table have an attribute of type string set. I'll stick to the example from the documentation and call the set "colors". As the name indicates the set holds various strings representing colors in each item. This would look like
this.
Now I want to query the table so that I retrieve all items where a specific color is within the set. So in regards to the attached picture I would like to query for the color "Green" and want to receive the items Picture2 and Picture3.
Is there a way to do this?
Since the amount of all possible colors and items is huge plus the fact that only a very small amount of colors are associated to an item, a scan would be very inefficient. So far I tried to create a global secondary index (GSI) but it seems that its not possible in the way I want it or am I wrong?
Unless the field you are searching for is built into the primary key or secondary index, scan will be your only option.
The scan operation will allow you to use the contains keyword to search the set
let params = {
TableName : 'TABLE_NAME',
FilterExpression: "contains(#color, :color)",
ExpressionAttributeNames: {
"#color": "color",
},
ExpressionAttributeValues: {
":color": "Blue",
}
};
documentClient.scan(params, function(err, data) {
console.log(data);
});
According to the docs on secondary indexes, you cannot build an index using a set as the primary key
The key schema for the index. Every attribute in the index key schema must be a top-level attribute of type String, Number, or Binary. Other data types, including documents and sets, are not allowed.

dynamodb - scan items where map contains a key

I have a table that contains a field (not a key field), called appsMap, and it looks like this:
appsMap = { "qa-app": "abc", "another-app": "xyz" }
I want to scan all rows whose appsMap contains the key "qa-app" (the value is not important, just the key). I tried something like this but it doesn't work in the way I need:
FilterExpression = '#appsMap.#app <> :v',
ExpressionAttributeNames = {
"#app": "qa-app",
"#appsMap": "appsMap"
},
ExpressionAttributeValues = {
":v": { "NULL": True }
},
ProjectionExpression = "deviceID"
What's the correct syntax?
Thanks.
There is a discussion on the subject here:
https://forums.aws.amazon.com/thread.jspa?threadID=164470
You might be missing this part from the example:
ExpressionAttributeValues: {":name":{"S":"Jeff"}}
However, just wanted to echo what was already being said, scan is an expensive procedure that goes through every item and thus making your database hard to scale.
Unlike with other databases, you have to do plenty of setup with Dynamo in order to get it to perform at it's great level, here is a suggestion:
1) Convert this into a root value, for example add to the root: qaExist, with possible values of 0|1 or true|false.
2) Create secondary index for the newly created value.
3) Make query on the new index specifying 0 as a search parameter.
This will make your system very fast and very scalable regardless of how many records you get in there later on.
If I understand the question correctly, you can do the following:
FilterExpression = 'attribute_exists(#0.#1)',
ExpressionAttributeNames = {
"#0": "appsMap",
"#1": "qa-app"
},
ProjectionExpression = "deviceID"
Since you're not being a bit vague about your expectations and what's happening ("I tried something like this but it doesn't work in the way I need") I'd like to mention that a scan with a filter is very different than a query.
Filters are applied on the server but only after the scan request is executed, meaning that it will still iterate over all data in your table and instead of returning you each item, it applies a filter to each response, saving you some network bandwidth, but potentially returning empty results as you page trough your entire table.
You could look into creating a GSI on the table if this is a query you expect to have to run often.

How to use MapReduce when extracting a group of document id's by some criteria from CouchDB

I'm in my first week of CouchDB experimentation and trying to stop thinking in SQL. I have a collection of documents (5000 event files) that all have some ID value that will be common to groups of documents. So there might be 10 that all have TheID: 'foobar'.
(In case someone asks - TheID is not an auto-increment value from a relational database - it is a unique id assigned by a partner company of ours. I cannot redesign my source data to identify itself some other way, I have to use this TheID field to recognise groups of documents.)
I want to query my list of documents:
{ _id: 'document1', Message: { TheID: 'foobar' } }
{ _id: 'document2', Message: { TheID: 'xyz' } }
{ _id: 'document3', Message: { TheID: 'xyz' } }
{ _id: 'document4', Message: { TheID: 'foobar' } }
{ _id: 'document5', Message: { TheID: 'wibble' } }
{ _id: 'document6', Message: { TheID: 'foobar' } }
I want the results:
'foobar': [ 'document1', 'document4', 'document6' ]
'xyz': [ 'document2', 'document3' ]
'wibble': [ 'document5' ]
The aim is to represent groups of documents on our UI grouped by TheID, so the user can see all documents for a specific TheID together, and select that TheID to drill into the data querying just by that TheID value. Yes, the string id of each document is useful - in our case, the _id value of each document is the source event identifier, so it is a unique and useful value that the user is going to want to see in the list on screen.
In SQL one might order by or group by the TheID field and iterate the result set appropriately. I doubt this thinking is any use at all with a CouchDB query.
I know that I can use a map function to extract the TheID value for each document, for example:
function (doc) {
emit(doc.Message.TheID, 1);
}
or perhaps
function (doc) {
emit(doc._id, doc.Message.TheID);
}
I'm not sure exactly what I should emit as the key and value. Even if this is useful, I'm getting the feeling that I should not use a reduce function to try to 'reduce' the large map output (1 result row per document in the database) to what I want (3 results each with a list of document id's).
http://guide.couchdb.org/draft/views.html says "A common mistake new CouchDB users make is attempting to construct complex aggregate values with a reduce function. Full reductions should result in a scalar value, like 5, and not, for instance, a JSON hash with a set of unique keys and the count of each."
I thought I might be able to use reduce to scan the results of the map and somehow collect all results that have a common TheID value into a single result object. What I see when reading the reduce documentation is that it will be given arrays of keys and values that contain fairly unpredictable collections, driven by the structure of the btree underlying the map results. It won't be given arrays guaranteed to contain all similar TheID values that I could scan for. This approach seems completely broken.
So, is a map/reduce pair the right thing to do here? Should I look at using a 'show' or 'list' instead? I'm intending to build a mustache based HTML template engine around the results, so 'list' seems the wrong way to go.
Thanks in advance for any guidance.
EDIT I have done some local dev and come up with what I think is a broken solution. Hopefully this will show you the direction I'm trying to go in. See a public cloud based CouchDB I created at https://neek.iriscouch.com/_utils/database.html?test/_design/test/_view/collectByTheID
This is public. If you would like to play, please copy it to a new view, don't pollute this one in case others come in and want to see the original.
map function:
function(doc) {
emit(doc.Message.TheID, doc._id);
}
reduce function:
function(keys, values, rereduce) {
if (!rereduce) {
return values;
} else {
var ret = [];
values.forEach(function (ar) {
ret.concat(ar);
});
return ret;
}
}
Results:
"foobar" ["document6", "document4", "document1"]
"wibble" ["document5"]
"xyz" ["document3", "document2"]
The reduce function first leaves the array of values alone, and on the second pass concatenates them together. However when I run this on my large 5000+ document database it comes up with some TheID values with empty document id arrays. I believe this suffers from the problem I mentioned before, where the array of values passed to reduce are build dependent on the btree structure of the map they are extractd from and are not guaranteed to contain a complete set of values for given keys.
Make use of the group_level feature:
Map:
emit([doc.message.TheID, doc._id], null)
Reduce:
You must include a reduce to use group_level, it can be empty as below or something else, i.e. _count
function(keys, values){
return null;
}
A query with group_level=1 would return:
/_design/d/_view/v?group_level=1
[
{key: ["foobar"], value: null},
{key: ["xyz"], value: null},
{key: ["wibble"], value: null}
]
You would use this query to populate the top level in your grouping UI. When the user expands a category, you would do another query with group_level 2 and start and end keys:
/_design/d/_view/v?group_level=2&startkey=["foobar"]&endkey=["foobar",{}]
[
{key: ["foobar", "document6"], value: null},
{key: ["foobar", "document4"], value: null},
{key: ["foobar", "document1"], value: null}
]
This doesn't produce the output exactly as you are requesting, however, I think you'll find it flexible enough

Extjs dynamic filter

I have created a filter that contains unik values of column:
{
header : 'Вопрос А',
dataIndex: 'answerA',
itemId: 'answerA',
width : 100,
filter: {
type: 'list',
options: this.getStore().collect('answerA'),
},
editor: 'textfield'
}
When the user inputs new value in the cell of the column this value must appear in the filter. How can I do this?
I have looked at Extjs Grid Filter - Dynamic ListFilter but it doesn't help me much.
You should carefully look at StringFilter.js file which is at ux/grid/filter. I did this myself by adding a combobox to each filter. On combo expand I update its store by parsing the corresponding column data. Besides I advise you to collect unique values, for this task use Ext.Array.unique function. Otherwise, if two cells contain the same data both values will go to the store, which is not good, because any of them will produce the same filtration. I do not attach my code, because it is heavily dependent on my custom data types, but the idea is quite easy, I think.