Represent a hierarchical data.frame as a nested list - regex

How to nicely convert a data.frame with hierarchical information to a JSON (or nested list)?
Let's say we have the following data.frame:
df <- data.frame(
id = c('1', '1.1', '1.1.1', '1.2'),
value = c(10, 5, 5, 5))
# id value
# 1 10
# 1.1 5
# 1.1.1 5
# 1.2 5
Then I would like to end up with the following JSON:
{
"id": "1",
"value": 10,
"children": [
{
"id": "1.1",
"value": 5,
"children": [
{
"id": "1.1.1",
"value": 5
}
]
},
{
"id": "1.2",
"value": 5
}
]
}
Where id defines the hierarchical structure, and . is a delimiter.
My intention is to easily be able to convert data from R to hierarchical D3 visualisations (e.g. Partition Layout or Zoomable Treemaps). It would also be nice if it is possible to add more "value"-columns; e.g value, size, weight, etc.
Thank you!
EDIT: I reverted to the original question, so it is easier to follow all the answers (sorry for all the editing).

I tend to have RJSONIO installed which does this:
R> df <- data.frame(id = c('1', '1.1', '1.1.1', '1.2'), value = c(10, 5, 5, 5))
R> RJSONIO::toJSON(df)
[1] "{\n \"id\": [ \"1\", \"1.1\", \"1.1.1\", \"1.2\" ],\n\"value\": [ 10, 5, 5, 5 ] \n}"
R> cat(RJSONIO::toJSON(df), "\n")
{
"id": [ "1", "1.1", "1.1.1", "1.2" ],
"value": [ 10, 5, 5, 5 ]
}
R>
That is not your desired output but the desired nesting / hierarchy was not present in the data.frame. I think if you nest a data.frame inside a list you will get there.
Edit: For your revised question, here is the R output of reading you spec'ed JSON back in:
R> RJSONIO::fromJSON("/tmp/foo.json")
$id
[1] "1"
$value
[1] 10
$children
$children[[1]]
$children[[1]]$id
[1] "1.1"
$children[[1]]$value
[1] 5
$children[[1]]$children
$children[[1]]$children[[1]]
$children[[1]]$children[[1]]$id
[1] "1.1.1"
$children[[1]]$children[[1]]$value
[1] 5
$children[[2]]
$children[[2]]$id
[1] "1.2"
$children[[2]]$value
[1] 5
R>

A possible solution.
First I define the following functions:
# Function to get the number hierarchical dimensions (occurences of "." + 1)
ch_dim <- function(x, delimiter = ".") {
x <- as.character(x)
chr.count <- function(x) length(which(unlist(strsplit(x, NULL)) == delimiter))
if (length(x) > 1) {
sapply(x, chr.count) + 1
} else {
chr.count(x) + 1
}
}
# Function to convert a hierarchical data.frame to a nested list
lst_fun <- function(ch, id_col = "id", num = min(d), stp = max(d)) {
# Convert data.frame to character
ch <- data.frame(lapply(ch, as.character), stringsAsFactors=FALSE)
# Get number of hierarchical dimensions
d <- ch_dim(ch[[id_col]])
# Convert to list
lapply(ch[d == num,][[id_col]], function(x) {
tt <- ch[grepl(sprintf("^%s.", x), ch[[id_col]]),]
current <- ch[ch[[id_col]] == x,]
if (stp != num && nrow(tt) > 0) {
c(current, list(children = lst_fun(tt, id_col, num + 1, stp)))
} else { current }
})
}
then convert the data.frame to a list:
lst <- lst_fun(df, "id")
and finally, the JSON:
s <- RJSONIO::toJSON(lst)

Related

How to remove Map from list in dart

How can remove Map from list based on key value in map, in dart
void main() {
List<Map> names = [
{"id": 1, "name": "Bob"},
{"id": 2, "name": "Alex"},
];
names.forEach((element) {
element.keys.where((key) => element[key] == 1).forEach((names.remove));
});
print(names);
}
I try the above code but it does not works for me.
Thanks
names.removeWhere((element) => element["id"] == 1);

customized sorting using search term in django

I am searching a term "john" in a list of dict ,
I have a list of dict like this :
"response": [
{
"name": "Alex T John"
},
{
"name": "Ajo John"
},
{
"name": "John",
}]
I am using :
response_query = sorted(response, key = lambda i: i['name'])
response_query return ascending order of result only but I need a result with first name as a priority.
Expected result:
{
"name": "John"
},
{
"name": "Ajo John"
},
{
"name": "Alex T John",
}
The first name containing search term should appear first.
If you need to sort with priorities you can try a key-function that returns tuple. In your particular case, as far as I got the question, this function will work fine:
response_query = sorted(
response,
key=lambda i: (len(i['name'].split()) > 1, i['name'])
)
In other words, I added the condition len(i['name'].split()) > 1 that return False (it will go first) if the name consists of one word only, else True.
For the case, if you need the priority condition as the name starts with the term you used in the search, the result would be:
term = 'john'
...
response_query = sorted(
response,
key=lambda i: (not i['name'].lower().startswith(term), i['name'])
)

Elastic search 5, search from list by sublist

I'm trying to search from an object that has a list property.
I need to be able to select all object that contains all sublist items.
ex :
If my object has [A,B,C] it should be returned for the given querys :
[A], [A,B], [A,B,C], [A,C], [C,A] ... (Input order doesn't have to match)
But if the sublist contains any element that is not part of the object list, it should not be returned.
ex :
[D], [A,D] ...
Those querys should not be valid.
I've managed to do it for the query with an existing sublist, but not when any item of the sublist doesn't exists.
Any ideas ?
Thanks !
Use comma seperate for sublist query item as a value for match query and set operator value to "and" as following:
Sample of document:
{
"Id": 1,
"Name": "One",
"tags": ["A","B","C"]
}
For sublist:[A,B]:
{
"query": {
"match": {
"tags": {
"query": "A,B",
"operator": "and"
}
}
}
}
I test in ElasticSearch 5.6.0 and 6.1.2
Assuming A, B, C, etc are mapped as keyword types, multiple bool query filter clauses would be one way
var response = client.Search<User>(s => s
.Query(q => +q
.Term(f => f.Badges, "A") && +q
.Term(f => f.Badges, "B") && +q
.Term(f => f.Badges, "C")
)
);
generates the following query
{
"query": {
"bool": {
"filter": [
{
"term": {
"badges": {
"value": "A"
}
}
},
{
"term": {
"badges": {
"value": "B"
}
}
},
{
"term": {
"badges": {
"value": "C"
}
}
}
]
}
}
}
A user document would need to have at least all of A, B and C badges to be considered a match.
A user document may well have other badges in addition to A, B and C; if you need to find documents that have exactly A, B and C, take a look at the terms_set query with a minimum_should_match* value set to the number of passed terms.

Sort documents based on first character in field value

I have a set of data like this:
[{name: "ROBERT"}, {name: "PETER"}, {name: "ROBINSON"} , {name: "ABIGAIL"}]
I want to make a single mongodb query that can find:
Any data which name starts with letter "R" (regex: ^R)
Followed by any data which name contains letter "R" NOT AS THE FIRST CHARACTER, like: peteR, adleR, or caRl
so it produces:
[{name: "ROBERT"}, {name: "ROBINSON"}, {name: "PETER"}]
it basically just display any data that contains "R" character in it but I want to sort it so that data with "R" as the first character appears before the rest
So far I've come out with 2 separate query then followed by an operation to eliminate any duplicated results, then joined them. So is there any more efficient way to do this in mongo ?
What you want is add a weight to you documents and sort them accordingly.
First you need to select only those documents that $match your criteria using regular expressions.
To do that, you need to $project your documents and add the "weight" based on the value of the first character of your string using a logical $condition processing.
The condition here is $eq which add weight 1 to the document if the lowercase of the first character in the name is "r" or 0 if it's not.
Of course the $substr and the $toLower string aggregation operators respectively return the the first character in lowercase.
Finally you $sort your documents by weight in descending order.
db.coll.aggregate(
[
{ "$match": { "name": /R/i } },
{ "$project": {
"_id": 0,
"name": 1,
"w": {
"$cond": [
{ "$eq": [
{ "$substr": [ { "$toLower": "$name" }, 0, 1 ] },
"r"
]},
1,
0
]
}
}},
{ "$sort": { "w": -1 } }
]
)
which produces:
{ "name" : "ROBERT", "w" : 1 }
{ "name" : "ROBINSON", "w" : 1 }
{ "name" : "PETER", "w" : 0 }
try this :
db.collectioname.find ("name":/R/)

traverse through nested dictionary having lists to get value

I need to traverse through below nested dictionary and get the values highlighted "REM" and "signmeup-3.4.208.zip". Can anyone help in getting these values out?
{"actions":[{},{"parameters":[{"name":"ReleaseRequest","value":"REM"},{"name":"Artifact","value":"signmeup-3.4.2088.zip"}]},{"causes":[{"shortDescription":"Started by user ","userId":"sbc","userName":"xyz"}]},{},{},{},{},{},{"parameters":[{"name":"DESCRIPTION_SETTER_DESCRIPTION","value":"inf-xyz"}]},{}],"artifacts":[{"displayPath":"INT_backup.xml","fileName":"INT_backup.xml","relativePath":"INT_backup.xml"},{"displayPath":"Invalidlist.txt","fileName":"Invalidlist.txt","relativePath":"Invalidlist.txt"},{"displayPath":"OUT_backup.xml","fileName":"OUT_backup.xml","relativePath":"OUTP_backup.xml"}],"building":False,"description":"inf-ECR2.2088.zip","duration":1525074,"estimatedDuration":1303694,"executor":None,"fullDisplayName":"inf-#33","id":"2015-07-27_18-17-00","keepLog":False,"number":33,"result":"SUCCESS","timestamp":1438046220000,"url":"inf/33/","builtOn":"Windows_Slave","changeSet":{"items":[],"kind":None},"culprits":[]}
>>> d = {
... "actions": [
... {},
... {"parameters": [
... {"name": "ReleaseRequest", "value": "REM"},
... {"name": "Artifact", "value": "signmeup-3.4.208.zip"}
... ]},
... {"causes": [{"shortDescription": "user"}]}
... ]
... }
To get each value:
>>> d['actions'][1]['parameters'][0]['value']
'REM'
>>> d['actions'][1]['parameters'][1]['value']
'signmeup-3.4.208.zip'
To get all values:
>>> [param['value'] for param in d['actions'][1]['parameters']]
['REM', 'signmeup-3.4.208.zip']