couchbase view using multiple keys to get result - mapreduce

I have the following document
{
"Credit_Amount": 99,
"Acc_no": 138,
"Job_No": "esmwga",
"Source_No": "x",
"Temp": 1017,
"Document_No": "gaf",
"Debit_Amount": 67,
"User_Id": "xirbzsewiw"
}
and my map function is this
function (doc, meta) {
if(doc.Type == "GLEntry")
{
emit([doc.Acc_no,doc.User_Id],[doc.Credit_Amount,doc.Debit_Amount]);
}
}
and this is my reduce function
function(key,values,rereduce){
var sum1=0,sum2=0;
for(var i=0;i<values.length;++i)
{
sum1+=values[i][0];
sum2+=values[i][1];
}
return ([sum1,sum2])
}
when I pass this key
[138,"xirbzsewiw"]
group level 2
I get this output
[ 99, 67 ]
But When I give this as key
[138]
group level 1
I get empty result. But what I have understood is it will group using only acc number when I give group level 1 so it should give same output. Am I doing something wrong?

Abhi is correct, the result set for your specified key is empty so the reduce is also empty. You can check that by querying with reduce=false.
You are probably confused from another question you asked where you are using startkey and endkey to get a range. With startkey and endkey you do not need to specify exact keys, the first partial match will be treated as the start or end. In your example if you query with startkey=[138]&endkey=[139]&inclusive_end=false you should see the result you expect.

Here is what I think is probably happening here:
Your map function emits keys like: [138,"xirbzsewiw"]
You need to pass key that your map function generates as "keys" while doing reduce operations. In 2nd case, you're passing [138], which isn't a legitimate key so it isn't matching anything in index and hence you aren't seeing any output.
group_level does filtering on output i.e. numbers of indexes upto which you want to get from an array of strings. You might want to look at understanding group level view queries, if you haven't already to have better understanding of group_level

Related

Terraform Splat Expression Giving "Invalid template interpolation value"

I am using data sources in Terraform to fetch a list of ids of my security groups as such:
data "aws_security_groups" "test" {
filter {
name = "group-name"
values = ["the-name"]
}
}
output "security_group_id" {
value = "The id is ${data.aws_security_groups.test.ids[*]}"
}
However, this is giving me the following error:
Error: Invalid template interpolation value
on main.tf line 11, in output "security_group_id":
11: value = "The id is ${data.aws_security_groups.test.ids[*]}"
|----------------
| data.aws_security_groups.test.ids is list of string with 1 element
Cannot include the given value in a string template: string required.
But if I use data.aws_security_groups.test.ids[0] instead it displays the ID.
Can someone help me to display the list of IDs?
First, I want to note that you don't necessarily need to combine this list with a string message at all if you don't want to, because Terraform will accept output values of any type:
output "security_group_ids" {
value = data.aws_security_groups.test.ids
}
If having them included as part of a bigger string is important for your underlying problem then you'll need to make a decision about how you want to present these multiple ids in your single string. There are various different ways you could do that, depending on what you intend to do with this information.
One relatively-straightforward answer would be to make the string include a JSON representation of the list using jsonencode, like this:
output "security_group_id_message" {
value = "The ids are ${jsonencode(data.aws_security_groups.test.ids)}"
}
If you want a more human-friendly presentation then you might prefer to use a multi-line string instead, in which case you can customize the output using string templates.
output "security_group_id_message" {
value = <<-EOT
The ids are:
%{ for id in data.aws_security_groups.test.ids ~}
- ${id}
%{ endfor ~}
EOT
}
Or, for an answer somewhere in between, you could use join to just concatenate the values together with a simple delimiter, like this:
output "security_group_id_message" {
value = "The ids are ${join(",", data.aws_security_groups.test.ids)}"
}
Note that I removed the [*] from your reference in all of these examples, since it isn't really doing anything here: data.aws_security_groups.test.ids is already an iterable collection, and so is compatible with all of the language features I used in the examples above.
IIRC the provider considers this ids attribute to be a set of strings rather than a list of strings, and so that [*] suffix could potentially be useful in other situations to force converting the set into a list if you need it to be typed that way, although if that is your intent then I'd suggest using one of the following instead so that it's clearer to a future reader what it does:
sort(data.aws_security_groups.test.ids) (if it being in lexical order is important to the behavior; Terraform uses lexical sorting by default anyway, but calling sort is a good prompt to a reader unfamiliar with Terraform to look up that function to see what the actual sort order is.)
tolist(data.aws_security_groups.test.ids) (functionally equivalent to sort above when it's a set of strings, but avoids the implication that the specific ordering is important, if all that matters is that it's a list regardless of the ordering)

Randomly set one-third of na's in a column to one value and the rest to another value

I'm trying to impute missing values in a dataframe df. I have a column A with 300 NaN's. I want to randomly set 2/3rd of it to value1 and the rest to value2.
Please help.
EDIT: I'm actually trying to this on dask, which does not support item assignment. This is what I have currently. Initially, I thought I'll try to convert all NA's to value1
da.where(df.A.isnull() == True, 'value1', df.A)
I got the following error:
ValueError: need more than 0 values to unpack
As the comment suggests, you can solve this with Series.where.
The following will work, but I cannot promise how efficient this is. (I suspect it may be better to produce a whole column of replacements at once with numpy.choice.)
df['A'] = d['A'].where(~d['A'].isnull(),
lambda df: df.map(
lambda x: random.choice(['value1', 'value1', x])))
explanation: if the value is not null (NaN), certainly keep the original. Where it is null, replace with the corresonding values of the dataframe produced by the first lambda. This maps values of the dataframe (chunks) to randomly choose the original value for 1/3 and 'value1' for others.
Note that, depending on your data, this likely has changed the data type of the column.

compare two dictionary, one with list of float value per key, the other one a value per key (python)

I have a query sequence that I blasted online using NCBIWWW.qblast. In my xml blast file result I obtained for a query sequence a list of hit (i.e: gi|). Each hit or gi| have multiple hsp. I made a dictionary my_dict1 where I placed gi| as key and I appended the bit score as value. So multiple values for each key.
my_dict1 = {
gi|1002819492|: [437.702, 384.47, 380.86, 380.86, 362.83],
gi|675820360| : [2617.97, 2614.37, 122.112],
gi|953764029| : [414.258, 318.66, 122.112, 86.158],
gi|675820410| : [450.653, 388.08, 386.27] }
Then I looked for max value in each key using:
for key, value in my_dict1.items():
max_value = max(value)
And made a second dictionary my_dict2:
my_dict2 = {
gi|1002819492|: 437.702,
gi|675820360| : 2617.97,
gi|953764029| : 414.258,
gi|675820410| : 450.653 }
I want to compare both dictionary. So I can extract the hsp with the highest score bits. I am also including other parameters like query coverage and identity percentage (Not shown here). The finality is to get the best gi| with the highest bit scores, coverage and identity percentage.
I tried many things to compare both dictionary like this :
First code :
matches[]
if my_dict1.keys() not in my_dict2.keys():
matches[hit_id] = bit_score
else:
matches = matches[hit_id], bit_score
Second code:
if hit_id not in matches.keys():
matches[hit_id]= bit_score
else:
matches = matches[hit_id], bit_score
Third code:
intersection = set(set(my_dict1.items()) & set(my_dict2.items()))
Howerver I always end up with 2 types of errors:
1 ) TypeError: list indices must be integers, not unicode
2 ) ... float not iterable...
Please I need some help and guidance. Thank you very much in advance for your time. Best regards.
It's not clear what you're trying to do. What is hit_id? What is bit_score? It looks like your second dict is always going to have the same keys as your first if you're creating it by pulling the max value for each key of the first dict.
You say you're trying to compare them, but don't really state what you're actually trying to do. Find those with values under a certain max? Find those with the highest max?
Your first code doesn't work because I'm assuming you're trying to use a dict key value as an index to matches, which you define as a list. That's probably where your first error is coming from, though you haven't given the lines where the error is actually occurring.
See in-code comments below:
# First off, this needs to be a dict.
matches{}
# This will never happen if you've created these dicts as you stated.
if my_dict1.keys() not in my_dict2.keys():
matches[hit_id] = bit_score # Not clear what bit_score is?
else:
# Also not sure what you're trying to do here. This will assign a tuple
# to matches with whatever the value of matches[hit_id] is and bit_score.
matches = matches[hit_id], bit_score
Regardless, we really need more information and the full code to figure out your actual goal and what's going wrong.

Why is this freemarker code failing when there is a comma in the list?

I have a map that contains a list (all the values in the list are strings):
["diameter":["1", "2", "3"]]
["length":["2", "3", "4"]]
I iterate through it in freemarker:
<#list product.getSortedVariantMap.keySet() as variantCode>
<#list product.getSortedVariantMap[variantCode] as variantValue>
This works fine. However if one of the strings contains a comma like this:
def returnValue = ["diameter":["3,5"]]
I get the following error:
?size is unsupported for: freemarker.ext.beans.SimpleMethodModel
The problematic instruction:
----------
==> list product.getSortedVariantMap[variantCode] as variantValue [on line 200, column 41 in product.htm]
I have no idea what the error could be, a comma in a string shouldn't create that error.
It depends on FreeMarker configuration, but product.getSortedVariantMap most probably returns the method itself, not its return value. You should write product.sortedVariantMap. (Although I don't understand why it doesn't stop earlier, on product.getSortedVariantMap.keySet(). Maybe your example is not exactly what to run?)

Using a CouchDB view, can I count groups and filter by key range at the same time?

I'm using CouchDB. I'd like to be able to count occurrences of values of specific fields within a date range that can be specified at query time. I seem to be able to do parts of this, but I'm having trouble understanding the best way to pull it all together.
Assuming documents that have a timestamp field and another field, e.g.:
{ date: '20120101-1853', author: 'bart' }
{ date: '20120102-1850', author: 'homer'}
{ date: '20120103-2359', author: 'homer'}
{ date: '20120104-1200', author: 'lisa'}
{ date: '20120815-1250', author: 'lisa'}
I can easily create a view that filters documents by a flexible date range. This can be done with a view like the one below, called with key range parameters, e.g. _view/all-docs?startkey=20120101-0000&endkey=20120201-0000.
all-docs/map.js:
function(doc) {
emit(doc.date, doc);
}
With the data above, this would return a CouchDB view containing just the first 4 docs (the only docs in the date range).
I can also create a query that counts occurrences of a given field, like this, called with grouping, i.e. _view/author-count?group=true:
author-count/map.js:
function(doc) {
emit(doc.author, 1);
}
author-count/reduce.js:
function(keys, values, rereduce) {
return sum(values);
}
This would yield something like:
{
"rows": [
{"key":"bart","value":1},
{"key":"homer","value":2}
{"key":"lisa","value":2}
]
}
However, I can't find the best way to both filter by date and count occurrences. For example, with the data above, I'd like to be able to specify range parameters like startkey=20120101-0000&endkey=20120201-0000 and get a result like this, where the last doc is excluded from the count because it is outside the specified date range:
{
"rows": [
{"key":"bart","value":1},
{"key":"homer","value":2}
{"key":"lisa","value":1}
]
}
What's the most elegant way to do this? Is this achievable with a single query? Should I be using another CouchDB construct, or is a view sufficient for this?
You can get pretty close to the desired result with a list:
{
_id: "_design/authors",
views: {
authors_by_date: {
map: function(doc) {
emit(doc.date, doc.author);
}
}
},
lists: {
count_occurrences: function(head, req) {
start({ headers: { "Content-Type": "application/json" }});
var result = {};
var row;
while(row = getRow()) {
var val = row.value;
if(result[val]) result[val]++;
else result[val] = 1;
}
return result;
}
}
}
This design can be requested as such:
http://<couchurl>/<db>/_design/authors/_list/count_occurrences/authors_by_date?startkey=<startDate>&endkey=<endDate>
This will be slower than a normal map-reduce, and is a bit of a workaround. Unfortunately, this is the only way to do a multi-dimensional query, "which CouchDB isn’t suited for".
The result of requesting this design will be something like this:
{
"bart": 1,
"homer": 2,
"lisa": 2
}
What we do is basically emit a lot of elements, then using a list to group them as we want. A list can be used to display a result in any way you want, but will also often be slower. Whereas a normal map-reduce can be cached and only change according to the diffs, the list will have to be built anew every time it is requested.
It is pretty much as slow as getting all the elements resulting from the map (the overhead of orchestrating the data is mostly negligible): a lot slower than getting the result of a reduce.
If you want to use the list for a different view, you can simply exchange it in the URL you request:
http://<couchurl>/<db>/_design/authors/_list/count_occurrences/<view>
Read more about lists on the couchdb wiki.
You need to create a combined view:
combined/map.js:
function(doc) {
emit([doc.date, doc.author], 1);
}
combined/reduce.js:
_sum
This way you will be able to filter documents by start/end date.
startkey=[20120101-0000, "a"]&endkey=[20120201-0000, "a"]
Although your problem is hard to solve in general case, knowing some more restrictions on the possible queries can help a lot. E.g. if you know you will search on the ranges that will cover full days/months you can user the arrays of [year, month, day, time] instead of the string:
emit([doc.date_year, doc.date_month, doc.date_day, doc.date_time, doc.author] doc);
Even if you cannot predict that all possible queries will fit into grouping based on this key type, splitting the key may help you to optimize your range queries and decrease number of lookups needed (with the cost of some extra space).