Kotlin - group by list of Maps - list

I have a fieldList variable.
val fieldList: List<MutableMap<String, String>>
// fieldList Data :
[ {
"field_id" : "1",
"section_id" : "1",
"section_name" : "section1",
"field_name" : "something_1"
}, {
"field_id" : "2",
"section_id" : "1",
"section_name" : "section1",
"field_name" : "something_2"
}, {
"field_id" : "3",
"section_id" : "2",
"section_name" : "section2",
"field_name" : "something_3"
}, {
"field_id" : "4",
"section_id" : "3",
"section_name" : "section3",
"field_name" : "something_4"
} ]
And I want to group by section_id.
The results should be as follows:
val result: List<MutableMap<String, Any>>
// result Data :
[
{
"section_id": "1",
"section_name": "section1",
"field": [
{
"id": “1”,
"name": "something_1"
},
{
"id": “2”,
"name": "something_2"
}
]
},
{
"section_id": "2",
"section_name": "section2",
"field": [
{
"id": “3”,
"name": "something_3"
}
]
},
.
.
.
]
What is the most idiomatic way of doing this in Kotlin?
I have an ugly looking working version in Java, but I am quite sure Kotlin has a nice way of doing it..
it's just that I am not finding it so far !
Any idea ?
Thanks

Another way:
val newList = originalList.groupBy { it["section_id"] }.values
.map {
mapOf(
"section_id" to it[0]["section_id"]!!,
"section_name" to it[0]["section_name"]!!,
"field" to it.map { mapOf("id" to it["field_id"], "name" to it["field_name"]) }
)
}
Playground
Also, as broot mentioned, prefer using data classes instead of such maps.

Assuming we are guaranteed that the data is correct and we don't have to validate it, so:
all fields always exist,
section_name is always the same for a specific section_id.
This is how you can do this:
val result = fieldList.groupBy(
keySelector = { it["section_id"]!! to it["section_name"]!! },
valueTransform = {
mutableMapOf(
"id" to it["field_id"]!!,
"name" to it["field_name"]!!,
)
}
).map { (section, fields) ->
mutableMapOf(
"section_id" to section.first,
"section_name" to section.second,
"field" to fields
)
}
However, I suggest not using maps and lists, but proper data classes. Using a Map to store known properties and using Any to store either String or List is just very inconvenient to use and error-prone.

Related

Query with id, nested array and range in Elastic Search (Open Search AWS)

I have a ES document like below :
{
"_id" : "test#domain.com",
"age" : 12,
"hobbiles" : ["Singing", "Dancing"]
},
{
"_id" : "test1#domain.com",
"age" : 7,
"hobbiles" : ["Coding", "Chess"]
}
I am storing email as id, age and hobbiles, hobbies is nested type, age is long I want to query with id, age and hobbiles, something like below :
Select * FROM tbl where _id IN ('val1', 'val2') AND age > 5 AND hobbiles should match with Chess or Dancing
How can I do in Elastic Search ? I am using OpenSearch 1.3 (latest) : AWS
I will suspect that field hobbiles is keyword, then the query suggested:
PUT test
{
"mappings": {
"properties": {
"age": {
"type": "long"
},
"hobbiles": {
"type": "keyword"
}
}
}
}
POST test/_doc/test#domain.com
{
"age": 12,
"hobbiles": [
"Singing",
"Dancing"
]
}
POST test/_doc/test1#domain.com
{
"age": 7,
"hobbiles": [
"Coding",
"Chess"
]
}
GET test/_search
{
"query": {
"bool": {
"filter": [
{
"terms": {
"_id": [
"test1#domain.com",
"test#domain.com"
]
}
}
],
"must": [
{
"range": {
"age": {
"gt": 5
}
}
},
{
"terms": {
"hobbiles": [
"Coding",
"Chess"
]
}
}
]
}
}
}

Elasticsearch - sort by score of array matchings within multiple array

Indexed documents
{
"book_id":"book01",
"pages":[
{ "page_id":1, "words":["1", "2", "xx"] }
{ "page_id":2, "words":["4", "5", "xx"] }
{ "page_id":3, "words":["7", "8", "xx"] }
]
}
{
"book_id":"book02",
"pages":[
{ "page_id":1, "words":["1", "xx", "xx"] }
{ "page_id":2, "words":["4", "xx", "xx"] }
{ "page_id":3, "words":["7", "xx", "xx"] }
]
}
Input data
{
"book_id":"book_new",
"pages":[
{ "page_id":1, "words":["1", "2", "3"] }
{ "page_id":2, "words":["4", "5", "6"] }
{ "page_id":3, "words":["xx", "xx", "xx"] }
]
}
I have a number of books that have multiple pages. Each page has a list of words.
I would like to search for books with more-than-threshold similar pages.
Thresholds
min_word_match_score : 2 (minimum score of words match between two pages)
min_page_match_score : 2 (minimum number of similar pages between two books)
Key terms
similar pages: Two pages that have at least min_word_match_score same words
similar book: Two books that have at least min_page_match_score similar pages
Expected result
Based on the specified thresholds, the correct return should be only book01 because
book01-1 and book_new-1 have score 2 (>=min_word_match_score, totalScore++)
book01-2 and book_new-2 have score 2 (>=min_word_match_score, totalScore++)
book01 and book_new have 2 total scores (totalScore >= min_page_match_score)
Poor search query (not working)
"bool" : {
"should" : [
{
"match" : { "book_pages.visual_words" : {"query" : "1", "operator" : "OR"} },
"match" : { "book_pages.visual_words" : {"query" : "2", "operator" : "OR"} },
"match" : { "book_pages.visual_words" : {"query" : "3", "operator" : "OR"} }
}
],
"minimum_should_match" : 2
"adjust_pure_negative" : true,
"boost" : 1.0
}
}
I first tried to make a part if the query for page match but it's not search array by array and it's just searching against words of all pages. And I am not really sure how to manage the two different scores - words-match-score and pages-match-score.
Should I dig into innerHit? Please help!
Not be the best but my two cents!!
I don't think Elasticsearch provides the exact solution out of the box for this use case. The closest way to do what you want is to make use of More Like This query.
This query essentially helps you find similar documents to that of a document itself you'd provide as input.
Basically the algorithm is:
Find the top K terms with highest tf-idf from the input document.
You can specify from the input that min_term_frequency of words should be 1 or 2, and looking at your use-case it would be 1. Meaning only consider those words from the input document whose term frequency is 1.
Construct N number of disjunctive queries based on these terms or rather OR logical operator
These N number is configurable in the query request, by default it is 25 and the property is max_query_terms
Execute the queries internally and return the most similar documents.
More accurately from this link,
The MLT query simply extracts the text from the input document, analyzes it, usually using the same analyzer at the field, then selects the top K terms with highest tf-idf to form a disjunctive query of these terms.
Let's see how we can achieve some use-cases that you've mentioned.
Use Case 1: Find documents of a page having min_word_match_score 2.
Note that your field pages would need to be of nested type. Otherwise using object type it wouldn't be possible for this scenario. I suggest you go through the aforementioned links to know more on this.
Let's say I have two indexes
my_book_index - This would have the documents to be searched on
my_book_index_input - This would have the documents used as input documents
Both would have mapping structure as below:
{
"mappings": {
"properties": {
"book_id":{
"type": "keyword"
},
"pages":{
"type": "nested"
}
}
}
}
Sample Documents for my_book_index:
POST my_book_index/_doc/1
{
"book_id":"book01",
"pages":[
{ "page_id":1, "words":["11", "12", "13", "14", "105"] },
{ "page_id":2, "words":["21", "22", "23", "24", "205"] },
{ "page_id":3, "words":["31", "32", "33", "34", "305"] },
{ "page_id":4, "words":["41", "42", "43", "44", "405"] }
]
}
POST my_book_index/_doc/2
{
"book_id":"book02",
"pages":[
{ "page_id":1, "words":["11", "12", "13", "104", "105"] },
{ "page_id":2, "words":["21", "22", "23", "204", "205"] },
{ "page_id":3, "words":["301", "302", "303", "304", "305"] },
{ "page_id":4, "words":["401", "402", "403", "404", "405"] }
]
}
POST my_book_index/_doc/3
{
"book_id":"book03",
"pages":[
{ "page_id":1, "words":["11", "12", "13", "100", "105"] },
{ "page_id":2, "words":["21", "22", "23", "200", "205"] },
{ "page_id":3, "words":["301", "302", "303", "300", "305"] },
{ "page_id":4, "words":["401", "402", "403", "400", "405"] }
]
}
Sample Document for my_book_index_input:
POST my_book_index_input/_doc/1
{
"book_id":"book_new",
"pages":[
{ "page_id":1, "words":["11", "12", "13", "14", "15"] },
{ "page_id":2, "words":["21", "22", "23", "24", "25"] }
]
}
More Like This Query:
Use Case: Basically I am interested in finding documents which would be similar to the above documents having 4 matches in page 1 or 4 matches in page 2
POST my_book_index/_search
{
"size": 10,
"_source": "book_id",
"query": {
"nested": {
"path": "pages",
"query": {
"more_like_this" : {
"fields" : ["pages.words"],
"like" : [
{
"_index": "my_book_index_input",
"_id": 1
}
],
"min_term_freq" : 1,
"min_doc_freq": 1,
"max_query_terms" : 25,
"minimum_should_match": 4
}
},
"inner_hits": {
"_source": ["pages.page_id", "pages.words"]
}
}
}
}
Basically I want to search in my_book_index all the documents that are similar to _doc:1 in the index my_book_index_input.
Notice each and every parameter in the query. I'd suggest you go through line by line to understand all this.
Note the response below when you execute that query:
Response:
{
"took" : 71,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 6.096043,
"hits" : [
{
"_index" : "my_book_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 6.096043,
"_source" : {
"book_id" : "book01" <---- Document 1 returns
},
"inner_hits" : {
"pages" : {
"hits" : {
"total" : {
"value" : 2, <---- Number of pages hit for this document
"relation" : "eq"
},
"max_score" : 6.096043,
"hits" : [
{
"_index" : "my_book_index",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "pages",
"offset" : 0
},
"_score" : 6.096043,
"_source" : {
"page_id" : 1, <---- Page 1 returns as it has 4 matches
"words" : [
"11",
"12",
"13",
"14",
"105"
]
}
},
{
"_index" : "my_book_index",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "pages",
"offset" : 1
},
"_score" : 6.096043,
"_source" : {
"page_id" : 2, <--- Page 2 returns as it also has 4 matches
"words" : [
"21",
"22",
"23",
"24",
"205"
]
}
}
]
}
}
}
}
]
}
}
Note that only document with book_id: 1 returned. The reason is simple. I've mentioned the below properties in the query:
"min_term_freq" : 1,
"min_doc_freq": 1,
"max_query_terms" : 25,
"minimum_should_match": 4
Basically, only consider those terms to search from input document whose term freq is 1, which is available in min of 1 documents, and the number of matches in one nested document should be 4.
Change the parameters for e.g. min_doc_freq to 3 and min_should_match to 3, you should see few more documents.
Notice that you would not see all the document fulfilling the above properties, that is because of the way it has been implemented. Remember the steps I've mentioned in the beginning. Perhaps that's why.
Use Case 2: Use Case 1 + Return only those with min page match is 2
I'm not sure if this is supported i.e. adding filter to inner_hits based on _count of inner_hits, however I believe this is something you can add it at your application layer. Basically get the above response, calculate the inner_hits.pages.hits.total_value and thereby return only those documents further to the consumer. Basically below is how your request response flow would be:
For Request: Client Layer (UI) ---> Service Layer --> Elasticsearch
For Response: Elasticsearch ---> Service Layer (filter logic for n pages match) --> Client Layer (or UI)
This may not be the best solution and at times may give you results that may not be as what you expect accurately, but I'd suggest at least giving it a try as the only other solution instead of using this query, is sadly to write your own custom client code which would make use of TermVectorAPI as mentioned in this link.
Remember the algorithm as how MLT query works and see if you can dig deep as why results are returning the way they are.
Not sure if this does, but I hope it helps!

How do I get only the element values that match in the list in the Elastic Search?

[Hi, there]
I want to create an ES query that only retrieves certain elements that match in the list.
Here is my ES index schema.
"test-es-2018":{
"aliases": {},
"mappings": {
"test-1": {
"properties": {
"categoryName": {
"type": "keyword",
"index": false
},
"genDate": {
"type": "date"
},
"docList": {
"properties": {
"rank": {
"type": "integer",
"index": false
},
"doc-info": {
"properties": {
"docId": {
"type": "keyword"
},
"docName": {
"type": "keyword",
"index": false
},
}
}
}
},
"categoryId": {
"type": "keyword"
},
}
}
}
}
There are documents listed in the category. Documents in the list have their own information.
*search query in Kibana.
source": {
"categoryName" : "food" ,
"genDate" : 1577981646638,
"docList" [
{
"rank": 2,
"doc-info": {...}
},
{
"rank": 1,
"doc-info": {...}
},
{
"rank": 5,
"doc-info": {...}
},
],
"categoryId": "201"
}
First, I want to get only the element value that match in the list.
I would like to see only documents with rank 1 in the list. However, if I query using match as below, the result is the same as *search query in kibana.
*match query in Kibana.
GET test-es-2018/_search
{
"query": {
"bool": {
"must": [
{ "match": { "docList.rank": 1 } },
]
}
}
}
In my opinion, it seems to print the entire list because it contains a document with rank one.
What I want is:
source": {
"categoryName" : "food" ,
"genDate" : 1577981646638,
"docList" [
{
"rank": 1,
"doc-info": {...}
},
],
"categoryId": "201"
}
Is this possible?
Second, I want to sort the docList by rank. I tried sorting by creating a query like the following, but it was not sorted.
*sort query in Kibana.
GET test-es-2018/_search?
{
"query" : {
"bool" : {...}
},
"sort" : [
{
"docList.rank" : {
"order" : "asc"
}
}
]
}
What I want is:
source": {
"categoryName" : "food" ,
"genDate" : 1577981646638,
"docList" [
{
"rank": 1,
"doc-info": {...}
},
{
"rank": 2,
"doc-info": {...}
},
{
"rank": 5,
"doc-info": {...}
},
],
"categoryId": "201"
}
I do not know how to access the list. Is there a good idea for both of these issues?
In general you could use source filter to retrieve only part of the document but this way it's not possible to exclude some fields based on their values.
As far as I know Elasticsearch doesn't support changing order of field values in the _source. Partly the desired result can be achieved by using nested fields along with inner_hits -> sort query expression. This way sorted subhits will be returned in the inner_hits section of the response.
P.S. Typically working with Elasticsearch you should consider indexed document as the smallest indivisible search unit.

loopbackjs "inq" for array of objects

I have array of object field in loopback model.Want to use "inq" option to filter by day.Already have seen docs but those are for array of strings,not the one Iam finding.
weekDays": [
{
"day": "Monday",
"startTime": "03:45",
"endTime": "04:23"
},
{
"day": "Wednesday",
"startTime": "03:23",
"endTime": "12:23"
}
Syntax for array of string is like {weekDays:{inq:[]}} ,help what modification has to be done here.
You can use two way in MongoDB
1.Simple find method
db.getCollection('user').find({'weekDays.day' : {$in: ["Monday"]}})
2.By using aggregate
db.getCollection('user').aggregate([
{$unwind:'$weekDays'},
{$match : {'weekDays.day' : {$in : ['Monday']}}},
{ "$group": {
"_id": "$id",
"weekDays" : { "$push": "$weekDays" },
}},
])
3.aggregate in loopback
var collection = ModelName.getDataSource().connector.collection("myCollection");
collection.aggregate(
[
{ $unwind:'$weekDays' },
{ $match : {'weekDays.day' : {$in : ['Monday']}}},
{ "$group": { "_id": "$id", "weekDays" : { "$push": "$weekDays" }}},
],
function(err, data) {
if (err) {
} else {
console.lod(data)
});
}
}
);

Regex inside array in mongoDB

i want to do a query inside a array in mongodb with regex, the collections have documents like this:
{
"_id" : ObjectId("53340d07d6429d27e1284c77"),
"company" : "New Company",
"worktypes" : [
{
"name" : "Pompas",
"works" : [
{
"name" : "name 2",
"code" : "A00011",
"price" : "22,22"
},
{
"name" : "name 3",
"code" : "A00011",
"price" : "22,22"
},
{
"name" : "name 4",
"code" : "A00011",
"price" : "22,22"
},
{
"code" : "asdasd",
"name" : "asdads",
"price" : "22"
},
{
"code" : "yy",
"name" : "yy",
"price" : "11"
}
]
},
{
"name" : "name 4",
"works" : [
{
"code" : "A112",
"name" : "Nombre",
"price" : "11,2"
}
]
},
{
"name" : "ee",
works":[
{
"code" : "aa",
"name" : "aa",
"price" : "11"
},
{
"code" : "A00112",
"name" : "Nombre",
"price" : "12,22"
}
]
}
]
}
Then i need to find a document by the company name and any work inside it have match a regex in code or name work.
I have this:
var companyquery = { "company": "New Company"};
var regQuery = new RegExp('^A0011.*$', 'i');
db.categories.find({$and: [companyquery,
{$or: [
{"worktypes.works.$.name": regQuery},
{"worktypes.works.$.code": regQuery}
]}]})
But dont return any result..I think the error is try to search inside array with de dot and $..
Any idea?
Edit:
With this:
db.categories.find({$and: [{"company":"New Company"},
{$or: [
{"worktypes.works.name": {"$regex": "^A00011$|^a00011$"}},
{"worktypes.works.code": {"$regex": "^A00011$|^a00011$"}}
]}]})
This is the result:
{
"_id" : ObjectId("53340d07d6429d27e1284c77"),
"company" : "New Company",
"worktypes" : [
{
"name" : "Pompas",
"works" : [
{
"name" : "name 2",
"code" : "A00011",
"price" : "22,22"
},
{
"code" : "aa",
"name" : "aa",
"price" : "11"
},
{
"code" : "A00112",
"name" : "Nombre",
"price" : "12,22"
},
{
"code" : "asdasd",
"name" : "asdads",
"price" : "22"
},
{
"code" : "yy",
"name" : "yy",
"price" : "11"
}
]
},
{
"name" : "name 4",
"works" : [
{
"code" : "A112",
"name" : "Nombre",
"price" : "11,2"
}
]
},
{
"name" : "Bombillos"
},
{
"name" : "Pompas"
},
{
"name" : "Bombillos 2"
},
{
"name" : "Other type"
},
{
"name" : "Other new type"
}
]
}
The regex dont field the results ok..
You are using a JavaScript native RegExp object for the regular expression, however for mongo to process the regular expression it needs to be sent as part of the query document, and this is not the same thing.
Also the regex will not match the values that you want. It could actualy be ^A0111$ for the exact match, but your case insensitive match causes a problem causing a larger scan of a possible index. So there is a better way to write that. Also see the documentation link for the problems with case insensitive matches.
Use the $regex operator instead:
db.categories.find({
"$and": [
{"company":"New Company"},
{ "$or": [
{ "worktypes.works.name": { "$regex": "^A00011$|^a00011$" }},
{ "worktypes.works.code": { "$regex": "^A00011$|^a00011$" }}
]}
]
})
Also the positional $ placeholders are not valid for a query, they are only used in projection or an update or the first matching element found by the query.
But your actual problem seems to be that you are trying to only get the elements of an array that "match" your conditions. You cannot do this with .find() and for that you need to use .aggregate() instead:
db.categories.aggregate([
// Always makes sense to match the actual documents
{ "$match": {
"$and": [
{"company":"New Company"},
{ "$or": [
{ "worktypes.works.name": { "$regex": "^A00011$|^a00011$" }},
{ "worktypes.works.code": { "$regex": "^A00011$|^a00011$" }}
]}
]
}},
// Unwind the worktypes array
{ "$unwind": "$worktypes" },
// Unwind the works array
{ "$unwind": "$worktypes.works" },
// Then use match to filter only the matching entries
{ "$match": {
"$or": [
{ "worktypes.works.name": { "$regex": "^A00011$|^a00011$" } },
{ "worktypes.works.code": { "$regex": "^A00011$|^a00011$" } }
]
}},
/* Stop */
// If you "really" need the arrays back then include all the following
// Otherwise the steps up to here actually got you your results
// First put the "works" array back together
{ "$group": {
"_id": {
"_id": "$_id",
"company": "$company",
"workname": "$worktypes.name"
},
"works": { "$push": "$worktypes.works" }
}},
// Then put the "worktypes" array back
{ "$group": {
"_id": "$_id._id",
"company": { "$first": "$_id.company" },
"worktypes": {
"$push": {
"name": "$_id.workname",
"works": "$works"
}
}
}}
])
So what .aggregate() does with all of these stages is it breaks the array elements into normal document form so they can be filtered using the $match operator. In that way, only the elements that "match" are returned.
What "find" is correctly doing is matching the "document" that meets the conditions. Since documents contain the elements that match then they are returned. The two principles are very different things.
When you mean to "filter" use aggregate.
i think there is a typo :
the regex should be : ^A00011.*$
triple 0 instead of double 0
You can try aggregate method and aggregation array operators, so this query will be supported from MongoDB 4.2,
$match to match your condition
$addFields to add/edit field in document
$map to iterate loop of worktypes array
$filter to iterate loop of works array and it will return the filtered result as per provided condition
$regexMatch to match regex expression same as we did in $match stage, it will return a boolean response, so we checked $or condition here,
$mergeObjects to merge current object of worktypes and updated works array property
second $addFields for remove empty result of works array
$filter to iterate loop of worktypes array and check negative condition to remove empty works document
db.categories.aggregate([
{
$match: {
$and: [
{ "company": "New Company" },
{
$or: [
{ "worktypes.works.name": { "$regex": "^A00011$|^a00011$" } },
{ "worktypes.works.code": { "$regex": "^A00011$|^a00011$" } }
]
}
]
}
},
{
$addFields: {
worktypes: {
$map: {
input: "$worktypes",
in: {
$mergeObjects: [
"$$this",
{
works: {
$filter: {
input: "$$this.works",
cond: {
$or: [
{
$regexMatch: {
input: "$$this.name",
regex: "^A00011$|^a00011$"
}
},
{
$regexMatch: {
input: "$$this.code",
regex: "^A00011$|^a00011$"
}
}
]
}
}
}
}
]
}
}
}
}
},
{
$addFields: {
worktypes: {
$filter: {
input: "$worktypes",
cond: { $ne: ["$$this.works", []] }
}
}
}
}
])
Playground