How to write CouchDb views? - mapreduce

I have list of such documents in my database of couchDB.
{
"_id": "9",
"_rev": "1-f5a9a0b76c6ae1fe5e20f1a1f9e6f8ba",
"Project": "Vaibhava",
"Type": "activity",
"Name": "Civil_Clearence",
"PercentComplete": "",
"DateAndTime": "",
"SourcePMSId": "1049",
"ProgressUpdatedToPMSFlag": "NO",
"UserId": "Kundan",
"ParentId": "5"
}
How to write a view function so that when i pass a doc._id as a key then i must get all siblings of that doc._id(docs with ParentId same as the key which I have sent)??

As said in another answer, it is not possible to do that with a single request.
However, you can do the following instead:
Define a map (with no reduce) view indexed on ParentID:
function(o) {
if (o.ParentID) {
emit(o.ParentID);
}
}
Send a first request to your object to know the ID of its parent:
GET /myDatabase/myObject
Then send a request to your view
GET /myDatabase/_design/myApp/_view/myView/?key="itsParent"&include_docs=true
Having several requests should not cause much harm here, since their number (2) is constant.
Moreover you can hide them behind a single request handled by NodeJS.

Unfortunately, you would need to chain together two map-reduce functions to achieve this result and that functionality is not available in CouchDB. See this question for further information.

Related

How to add map to map array in AWS DynamoDB only when id is not existed?

Here is my DynamoDB structure.
{"books": [
{
"name": "Hello World 1",
"id": "1234"
},
{
"name": "Hello World 2",
"id": "5678"
}
]}
I want to set ConditionExpression to check whether id existed before adding new items to books array. Here is my ConditionExpression. I am using API gateway to access DynamoDB.
"ConditionExpression": "NOT contains(#lu.books.id,:id)",
"ExpressionAttributeValues": {":id": {
"S": "$input.path('$.id')"
}
}
Result when I test the API: no matter id existed or not, success to add items to array.
Any suggestion on how to do it? Thanks!
Unfortunately, you can't. However, there is a workaround.
Store the books in separate rows. For example
PK SK
BOOK_LU#<ID> BOOK_NAME#<book name>#BOOK_ID#<BOOK_ID>
Now you can use the 'if_not_exists' conditional expression
"ConditionExpression": "if_not_exists(id, :id)'",
"ExpressionAttributeValues": {":id": {
"S": "$input.path('$.id')"
}
}
The con is if you were previously fetching the list as part of another object you will have to change that.
The pro is that now you can easily work with the books + you won't hit the max row size limits if the books became too many.

How the users can access my Elasticsearch database in my Django SaaS?

Let's say that I have a SaaS based on Django backend that processes the data of the users and write everything to the Elasticsearch. Now I would like to give users access to search and request their data stored in ES using all possible search requests available in ES. Obviously the user should have only access to his data, not to other user's data. I am aware that it can be done in a lot of different ways but I wonder what is safe and the best solution? At this point I store everything in one index and type in the way shown below but I can do this in any way.
"_index": "example_index",
"_type": "example_type",
"_id": "H2s-lGsdshEzmewdKtL",
"_score": 1,
"_source": {
"user_id": 1,
"field1": "example1",
"field2": "example2",
"field3": "example3"
}
I think that the best way would be to associate every document with the user_id. The user would send for example GET request with body and authorization header with Token. I would use Token to extract id of the user for example in this way
key = request.META.get('HTTP_AUTHORIZATION').split()[1]
user_id = Token.objects.get(key=key).user_id
After this I would redirect his request to ES and only data that meet requirements and belongs to this user would be returned. Of course I could do this like shown above where I also add field user_id. For example I could use post_filter in this way:
To every request I would add something like this:
,
"post_filter": {
"match": {
"user_id": 1
}
}
For example the user sends GET with body
{
"query": {
"regexp": {
"tag": ".*example.*"
}
}
}
and I change this in my backend and redirect request to ES with body:
{
"query": {
"regexp": {
"tag": ".*example.*"
}
},
"post_filter": {
"match": {
"user_id": 1
}
}
}
but it doesn't seem to me that including this field in _source is a good idea. I am almost sure that it can be solved in a more optimal way than post_filtering. I see a lot of information about authorization in ES however I can’t find how can I associate document with user_id and then search only his documents without post_filtering. Any ideas?
UPDATE
My current solution looks in they way shown below however as I mentioned I believe that it is not optimal way. If anyone has an idea how can I solve this in the way described above I will be grateful for help.
I send for example
{
"query": {
"regexp": {
"tag": ".*test.*"
}
}
}
In Django backend I just do
key = request.META.get('HTTP_AUTHORIZATION').split()[1]
user_id = Token.objects.get(key=key).user_id
body = json.loads(request.body)
body['post_filter'] = {"match": {"user_id": user_id}}
res = es.search(index="pictures", doc_type="picture", body=body)
output = []
for hit in res['hits']['hits']:
output.append(hit["_source"])
return Response(
{'output': output},
status=status.HTTP_200_OK)
In elasticsearch 7.1, you have now basic security in the free version of elasticsearch. Thanks to that, you can control per indice thé Access of your user.

PUT to an array in existing collection

The following is the schema of my API end-point. I want to test a PUT using PostMan, where I add a "skill".
How would you enter the information in Postman so that you are just passing a new "skill" to the model?
{
"_id": "579a6fa26a0b6484172ae284",
"firstname": "Max",
"lastname": "Headron",
"skills": [
{"skill":"Can Type", "level":"Great"},
{"skill":"Can Run", "level":"Good"},
]
}
Would you use dot notation in a form field. So instead of adding "firstname", which is at the top level of the object, you might enter "skills.skill" and the "value".
Hopefully this is a clearer description.
Just change the body of the request.
See https://www.getpostman.com/docs/requests.

Populating search results with meta data in Amazon CloudSearch

Unfortunately, Amazon CloudSearch does not support nested JSON, meaning that the below document structure is not valid.
[{
"type": "add",
"id": 1,
"fields": {
"company_name": "My Company",
"services": [
{
"id": 123,
"name": "Construction",
"logo": "logo1.png"
},
{
"id": 456,
"name": "Programming",
"logo": "logo2.png"
}
]
}
}]
Basically I cannot nest an array of objects under the services key. In this particular scenario, only the nested name field has to be searchable, so what I could do is the following:
[{
"type": "add",
"id": 1,
"fields": {
"company_name": "My Company",
"services": [ "Construction", "Programming" ]
}
}]
The above JSON is valid, and I can still search for the service names. However, now I have lost some meta data about my services that I need when displaying the search results. Is there any way in which I can add the meta data to the document in Amazon CloudSearch and have it returned with my search results, such that I can use it when displaying the results?
Or do I have to fetch this additional meta data from my database afterwards to populate the search results with the additional data required to display the results? This does not seem feasible because it complicates my code much more than if I could fetch this data straight from CloudSearch. This would also impact the performance of the search, even though I could use caching - but I kind of want to avoid that if possible, because I don't need it for anything else right now.
So my questions are:
Can I somehow add the meta data for services to the CloudSearch documents and have it returned with my search results?
If not, should I then extract this data from my data store upon receiving the search results from CloudSearch?
Do you have any other solutions or ideas? Are there any best practices with this?
Thank you in advance!

Returning record(s) after store pushPayload call

Is there a better way to return the record(s) after DS.Store#pushPayload is called? This is what I'm doing...
var payload = { id: 1, title: "Example" }
store.pushPayload('post', payload);
return store.getById('post', payload.id);
But, with regular DS.Store#push you get the inserted record returned. The only difference between the two, from what I can tell, is that DS.Store#pushPayload serializes the payload data with the correct serializers.
DS.Store#pushPayload is able to take an array of items, not just one, and may contain side-loaded data. It processes a full payload and expects root keys in the payload:
{
"posts": [{
"id": 1,
"title": "title",
"comments": [1]
}],
"comments": [
//.. and so on ...
]
}
DS.Store#push expects a single record which has been normalized and contains no side loaded data (notice there is no root key):
{
"id": 1,
"title": "title",
"comments": [1]
}
For this reason, it makes sense for push to return the record, but for pushPayload to return nothing.
When you use pushPayload, a second lookup of store.find('post', 1) (or store.getById('post', 1)) is the way to go, I don't believe there is a better way.
As of this PR pushPayload can now return an array of all the records pushed into the store, once the 'ds-pushpayload-return' feature flag has been enabled.
At the moment, this feature isn't available in a standard or beta release-- you'll have to use
"ember-data": "emberjs/data#master",
(i.e. Canary) in your package.json in order to access it. I'm not sure when the feature will be generally available.