Finding complete sequences using a RavenDb index

Finding complete sequences using a RavenDb index - mapreduce

I have documents in RavenDb that may look something like this:
{ "Id": "obj/1", "Version": 1 },
{ "Id": "obj/1", "Version": 2 },
{ "Id": "obj/1", "Version": 3 },
{ "Id": "obj/1", "Version": 4 },
{ "Id": "obj/2", "Version": 1 },
{ "Id": "obj/2", "Version": 2 },
{ "Id": "obj/2", "Version": 3 },
{ "Id": "obj/3", "Version": 1 },
{ "Id": "obj/3", "Version": 3 }
I'm trying to create an index that would give me:
The sequences "obj/1" and "obj/2", preferably grouped by Id.
Not the sequence "obj/3", since its not yet complete
How would I do this?

I managed to solve it. I'm not sure it is the optimal solution but it seems to work.
class SequenceIndex : AbstractIndexCreationTask<MyObject>
{
public EventSequenceIndex()
{
Map = objects => from d in docs
orderby d.Version
select new
{
Id = d.Id,
Version = d.Version
};
TransformResults = (database, results) =>
from result in results
group result by result.Id into g
where g.Select(d => d.Version + 1)
.Except(g.Select(d => d.Version))
.Count() == 1
select new
{
Id = g.Key,
Objects = g
};
}
}
With query:
var events = session.Query<MyObject, SequenceIndex>()
.As<MySequenceObjectView>()
.ToArray();
I group the documents by Id and then take all version + 1 except all version, which for an original sequence of 1, 2, 3 will be 2, 3, 4 except 1, 2, 3, which gives me 4 (which is why I use the Count() == 1. But if there is a hole in the sequence, the count would be greater than 1, and therefore excluded from the results.

Related

Map JSON data to Athena table

I am not sure come up with the CREATE TABLE statement for below mentioned JSON data. I checked the supported deserializer libraries in Athena but I was not able to figure it out. Can someone please advise.
{
"1": [
{
"a": {
"id_info": [
{
"id": "a1",
"id_type": "a1fx",
"mt": 0,
"pv": 1
}
]
},
"b": {
"id_info": [
{
"id": "b1",
"id_type": "b1fx",
"mt": 0,
"pv": 1
}
]
}
}
]
}
My expected output from the data when I run SELECT query is
key,category,id,id_type,mt,pv
1,a,a1,a1fx,0,1
1,b,b1,b1fx,0,1

How to know count of JSON objects in the file?

I have JSON file
{
"media": [
{
"id": 1234,
"order": 1,
},
{
"id": 1385,
"order": 3,
},
{
"id": 1289,
"order": 2,
}
]
}
with 3 blocks (objects) with the same fileds name ("id", "order"). In the presented example we have 3 blocks. I want to get 3 as answer for my request. How can I do it? If it possible, with the boost library. Next time I'll get another JSON file witch will contains 4 or 5 same blocks
{
"id": 1234,
"order": 1,
},
and I want to know count of those blocks in the JSON file

Simplest: Live On Compiler Explorer
#include <boost/json/src.hpp>
static auto sample = R"({
"media": [{
"id": 1234,
"order": 1,
},
{
"id": 1385,
"order": 3,
},
{
"id": 1289,
"order": 2,
}]
})";
int main() {
auto v = boost::json::parse(sample, {}, {.allow_trailing_commas=true});
return v.at("media").as_array().size();
}
Returns 3
Note that there are much more efficient ways to do it using e.g. https://www.boost.org/doc/libs/1_79_0/libs/json/doc/html/json/ref/boost__json__stream_parser.html or even just https://www.boost.org/doc/libs/1_79_0/libs/json/doc/html/json/ref/boost__json__basic_parser.html

Django count manytomany relation shows wrong number

I currently have a queryset which aims to extract a list of used colors associated with products, and return response with a list of colors and the amount of of products they're attached to. The colors originate in its own model called ProductColor and is referenced through a ManyToMany relationship in the Product model.
In the example bellow there are 3 products which has these colors registered:
Product 1: White
Product 2: White, Red
Product 3: White
Wished output should be something like:
[
{
"colors__name": "White",
"count": 3
},
{
"colors__name": "Red",
"count": 1
},
]
However, the closest i get is:
[
{
"colors__name": "White",
"count": 1
},
{
"colors__name": "White",
"count": 1
},
{
"colors__name": "Red",
"count": 1
},
{
"colors__name": "White",
"count": 1
}
]
The queryset is structured like this:
products = Product.objects.filter(
category__parent__name__iexact=category,
status='available'
).values('colors__name', count=Count('colors', distinct=True))
I've tried to add .distinct() at the end of the queryset, but then it returns:
[
{
"colors__name": "White",
"count": 1
},
{
"colors__name": "Red",
"count": 1
}
]
I've also tried using an annotation through .annotate(count=Count('colors')), but then it returns:
[
{
"colors__name": "White",
"count": 7
},
{
"colors__name": "Red",
"count": 3
}
]
How can i make sure it displays the correct amount next to the correct color (White: 4, Red: 1)?

How do I extract data from "List" field

I'm getting JSON data from webservice and trying to make a table . Datadisk is presented as List and clicking into each item will navigate further down the hiearchy like denoted in screenshots below. I need to concatate storageAccountType for each item with | sign, so if there were 2 list items for Greg-VM and it had Standard_LRS for first one and Premium_LRS for second one then new column will list Standard_LRS | Premium_LRS for that row.
Input returned by function is below
[
{
"name": "rhazuremspdemo",
"disk": {
"id": "/subscriptions/24ba3e4c-45e3-4d55-8132-6731cf25547f/resourceGroups/AzureMSPDemo/providers/Microsoft.Compute/disks/rhazuremspdemo_OsDisk_1_346353b875794dd4a7a5c5938abfb7df",
"storageAccountType": "StandardSSD_LRS"
},
"datadisk": []
},
{
"name": "w12azuremspdemo",
"disk": {
"id": "/subscriptions/24ba3e4c-45e3-4d55-8132-6731cf25547f/resourceGroups/AzureMSPDemo/providers/Microsoft.Compute/disks/w12azuremspdemo_OsDisk_1_09788205f8eb429faa082866ffee0f18",
"storageAccountType": "Premium_LRS"
},
"datadisk": []
},
{
"name": "Greg-VM",
"disk": {
"id": "/subscriptions/24ba3e4c-45e3-4d55-8132-6731cf25547f/resourceGroups/GREG/providers/Microsoft.Compute/disks/Greg-VM_OsDisk_1_63ed471fef3e4f568314dfa56ebac5d2",
"storageAccountType": "Premium_LRS"
},
"datadisk": [
{
"name": "Data",
"createOption": "Attach",
"diskSizeGB": 10,
"managedDisk": {
"id": "/subscriptions/24ba3e4c-45e3-4d55-8132-6731cf25547f/resourceGroups/GREG/providers/Microsoft.Compute/disks/Data",
"storageAccountType": "Standard_LRS"
},
"caching": "None",
"toBeDetached": false,
"lun": 0
},
{
"name": "Disk2",
"createOption": "Attach",
"diskSizeGB": 10,
"managedDisk": {
"id": "/subscriptions/24ba3e4c-45e3-4d55-8132-6731cf25547f/resourceGroups/GREG/providers/Microsoft.Compute/disks/Disk2",
"storageAccountType": "Standard_LRS"
},
"caching": "None",
"toBeDetached": false,
"lun": 1
}
]
}
]
How do I do that?
Thanks,
G

This should help you. It steps through the process.
If you have a scenario like this
you can use Add custom Column and type the follwing expression:
=Table.Columns([TableName], "ColumnName")
to get it as list:
Now you can left click on the Custom column and chose Extract Values....
Choose Custom and your delimiter | and hit OK
This way the data who was in your list will now be in the same row with the delimiter

How do I make an User required JSON

I have a JSON file, in that three objects are available, In that 2nd and 3rd objects does not have some fields which I actually needed. In missing fields, I need to add my own values. I will provide my code below
I tried this So far:
with open("final.json") as data1:
a = json.load(data1)
final = []
for item in a:
d = {}
d["AppName"]= item["name"]
d["AppId"] = item["id"]
d["Health"] = item["health"]
d["place1"] = item["cities"][0]["place1"]
d["place2"] = item["cities"][0]["place2"]
print(final)
Error: I am getting Key Error
My Input JSON file has similar data:
[{
"name": "python",
"id": 1234,
"health": "Active",
"cities": {
"place1": "us",
"place2": "newyork"
}
},
{
"name": "java",
"id": 2345,
"health": "Active"
}, {
"name": "python",
"id": 1234
}
]
I am expecting output:
[{
"name": "python",
"id": 1234,
"health": "Active",
"cities": {
"place1": "us",
"place2": "newyork"
}
},
{
"name": "java",
"id": 2345,
"health": "Null",
"cities": {
"place1": "0",
"place2": "0"
}
}, {
"name": "python",
"id": 1234,
"health": "Null",
"cities": {
"place1": "0",
"place2": "0"
}
}
]

I see two issues with the code that you have posted.
First, you are referring to the 'cities' field in you input JSON as if it is a list when it is, in fact, an object.
Second, to handle JSON containing objects which may be missing certain fields, you should use the Python dictionary get method. This method takes a key and an optional value to return if the key is not found (default is None).
for item in a:
d = {}
d["AppName"]= item["name"]
d["AppId"] = item["id"]
d["Health"] = item.get("health", "Null")
d["place1"] = item.get("cities", {}).get("place1", "0")
d["place2"] = item.get("cities", {}).get("place2", "0")

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Finding complete sequences using a RavenDb index - mapreduce

Related

Map JSON data to Athena table

How to know count of JSON objects in the file?

Django count manytomany relation shows wrong number

How do I extract data from "List" field

How do I make an User required JSON

Categories

Resources