Encode a Field using Conditions in Vega-Lite - powerbi

OBJECTIVE
I'm trying to add data labels to my chart, however I have multiple bars layered on top of each other, and I need the data labels to hover over different bars depending on if values in a certain field are positive or negative.
ATTEMPT
This could be achieved by changing the "field" property of the "y" encoding using a condition. I've spent some time exploring Vega-Lite documentation and experimenting with some stuff, but I can't get it to work no matter what I try. Vega seems to ignore by condition. I'm also curious if I'm able to apply conditions to "mark" rather than "encoding". When values are negative, I'd like to change "dy" to 10 from -10.
Any suggestions?
'''
"mark": {
"type":"text",
"dy":-10
},
"encoding": {
"text": {
"field": "field_one"
},
"y": {
"condition":{
"test":"datum['test_data'] < 0",
"field": "field_one"
},
"field": "field_two"
}....
}
'''

A couple solutions exist.
Two Text Encodings:
One solution is to create two texts with varying y offsets and fields, and hide them depending on whether the values are positive or negative.
{
"mark": {
"type":"text",
"dy":-10
},
"encoding": {
"text": {
"condition":{
"test": "datum['test_data'] >= 0",
"type":"quantitative",
"field": "test_data"
}
"value": ""
},
"y": {
"field": "field_one",
"type": "quantitative"
}
}
},
{
"mark": {
"type":"text",
"dy":10
},
"encoding": {
"text": {
"condition":{
"test": "datum['test_data'] < 0",
"type":"quantitative",
"field": "test_data"
}
"value": ""
},
"y": {
"field": "field_two",
"type": "quantitative"
}
}
}
Transformation and Calculation
The other solution only solves the "dy" problem and was answered using another technique involving transform and calculate on GitHub.

Related

Show negative values in Deneb column chart in red

I've been playing around with the Deneb visual in PowerBI Desktop and (amongst many other things) have been trying to create a simple column chart that shows negative values in red and positive values in green, however can't for the life of me seem to get it working - I believe the condition/test in my script is correct, but it refuses to 'fire' when it's 'true'
I've read through the condition page of the Vega-Lite documentation https://vega.github.io/vega-lite/docs/condition.html and have a condition section within the encoding/color
I've added Month End and MonthYear columns from my Calendar table and an EBITDA measure from a fact table to the Deneb visual
Month End
MonthYear
EBITDA
31/7/2021
"Jul-21"
8277.56
31/8/2021
"Aug-21"
-15123.66
30/9/2021
"Sep-21"
9502.11
31/10/2021
"Oct-21"
13090.99
{
"data": {"name": "dataset"},
"mark": "bar",
"encoding": {
"x": {
"field": "MonthYear",
"sort": {"field": "Month End"}
},
"y": {
"field": "EBITDA",
"aggregate": "sum"
},
"color": {
"condition": {
"test": "datum['EBITDA']<0",
"value": "red"
},
"value": "green"
}
}
}
If I adjust the condition to be "test": "1==1" then the 'true' path works, so I assume I've got something wrong with my test line, though this seems to be correct per a lot of blogs, stackoverflow questions etc.
I've also tried using a "tranform:" channel to create a new Neg field in the Deneb dataset and referring to that field in my test, but it still won't adjust the colour.
It doesn’t like your aggregation. It looks like the data you are sending in is already aggregated by Power BI. If so, this will work:
"y": {
"field": "b",
"type": "quantitative"
},
View sample in the Vega Editor
If your data isn’t aggregated, add an aggregate transform like this:
"transform": [
{"aggregate": [{
"op": "sum",
"field": "b",
"as": "bsum"
}],
"groupby": ["a"]}
],
"mark": "bar",
"encoding": {
"x": {
"field": "a",
"sort": {"field": "a"}
},
"y": {
"field": "bsum",
"type": "quantitative"
},
"color": {
"condition": {
"test": "datum['bsum']<0",
"value": "red"
},
"value": "green"
}
}
}
Open the Chart in the Vega Editor

Deneb plot (Vega-Lite) for Power BI: How to use an free scale y-axis with facet

I am using Deneb custom visual to repeat visual for different tasks. Is it possible to only show the relevent Y-axis values. Following data is used:
The following Vega-lite JSON is used:
{
"data": {"name": "dataset"},
"mark": {
"type": "bar",
"opacity": 1,
"tooltip": true,
"cornerRadius": 15
},
"encoding": {
"x": {
"field": "Earliest StartDate",
"type": "temporal"
},
"y": {
"field": "MachGrpCode",
"type": "nominal",
"axis": {
"title": null,
"grid": true,
"tickBand": "extent"
}
},
"row": {
"field": "ProdHeaderOrdNr",
"header": {"labelAngle": 0}
}
},
"resolve": {
"axis": {
"x": "independent",
"y": "independent"
}
}
}
Which results in:
Is it possible to only use the relevent task values (for 022 --> erase 6700 row)?
From what I've seen, this is a Vega bug. The recommended work-around is to use vconcat with a filter transform. If you only have a few ProdHeaderOrdNr, this is doable.
Open the Chart in the Vega Editor

Access an Array Item by index in AWS Dynamodb Query Results "Items" in Step Function

I have this dynamodb:Query in my step function:
{
"Type": "Task",
"Resource": "arn:aws:states:::aws-sdk:dynamodb:query",
"Next": "If nothing returned by query Or Study not yet Zipped",
"Parameters": {
"TableName": "TEST-StudyProcessingTable",
"ScanIndexForward": false,
"Limit": 1,
"KeyConditionExpression": "OrderID = :OrderID",
"FilterExpression": "StudyID = :StudyID",
"ExpressionAttributeValues": {
":OrderID": {
"S.$": "$.body.order_id"
},
":StudyID": {
"S.$": "$.body.study_id"
}
}
},
"ResultPath": "$.processed_files"
}
The results comes in as an array called Items which is nested under my ResultPath
processed_files.Items:
{
"body": {
"order_id": "1001",
"study_id": "1"
},
"processed_files": {
"Count": 1,
"Items": [
{
"Status": {
"S": "unzipped"
},
"StudyID": {
"S": "1"
},
"ZipFileS3Key": {
"S": "path/to/the/file"
},
"UploadSet": {
"S": "4"
},
"OrderID": {
"S": "1001"
},
"UploadSet#StudyID": {
"S": "4#1"
}
}
],
"LastEvaluatedKey": {
"OrderID": {
"S": "1001"
},
"UploadSet#StudyID": {
"S": "4#1"
}
},
"ScannedCount": 1
}
}
My question is how do i access the items inside this array from a choice state in a step function?
I need to query then decide something based on the results by checking the item in a condition in a choice state.
The problem is that since this is an array I can't access it using regular JsonPath (like with Items.item), and in my next step the choice condition does NOT accept an index like processed_files.Items['0'].Status
Ok so the answer was so simple all you need to do is use a number instead of string for the array index like this.
processed_files.Items[0].Status
I was originally mislead by an error I received which said that it expected a ' or '[' after the first '['. I mistakenly thought this meant it only accepts strings.
I was wrong, it works like any other array.
I hope this helps somebody one day.

How to interpret user search query (in Elasticsearch)

I would like to serve my visitors the best results possible when they use our search feature.
To achieve this I would like to interpret the search query.
For example a user searches for 'red beds for kids 120cm'
I would like to interpret it as following:
Category-Filter is "beds" AND "children"
Color-filter is red
Size-filter is 120cm
Are there ready to go tools for Elasticsearch?
Will I need NLP in front of Elasticsearch?
Elasticsearch is pretty powerful on its own and is very much capable of returning the most relevant results to full-text search queries, provided that data is indexed and queried adequately.
Under the hood it always performs text analysis for full-text searches (for fields of type text). A text analyzer consists of a character filter, tokenizer and a token filter.
For instance, synonym token filter can replace kids with children in the user query.
Above that search queries on modern websites are often facilitated via category selectors in the UI, which can easily be implemented with querying keyword fields of Elasticsearch.
It might be enough to model your data correctly and tune its indexing to implement the search you need - and if that is not enough, you can always add some extra layer of NLP-like logic on the client side, like #2ps suggested.
Now let me show a toy example of what you can achieve with a synonym token filter and copy_to feature.
Let's define the mapping
Let's pretend that our products are characterized by the following properties: Category, Color, and Size.LengthCM.
The mapping will look something like:
PUT /my_index
{
"mappings": {
"properties": {
"Category": {
"type": "keyword",
"copy_to": "DescriptionAuto"
},
"Color": {
"type": "keyword",
"copy_to": "DescriptionAuto"
},
"Size": {
"properties": {
"LengthCM": {
"type": "integer",
"copy_to": "DescriptionAuto"
}
}
},
"DescriptionAuto": {
"type": "text",
"analyzer": "MySynonymAnalyzer"
}
}
},
"settings": {
"index": {
"analysis": {
"analyzer": {
"MySynonymAnalyzer": {
"tokenizer": "standard",
"filter": [
"MySynonymFilter"
]
}
},
"filter": {
"MySynonymFilter": {
"type": "synonym",
"lenient": true,
"synonyms": [
"kid, kids => children"
]
}
}
}
}
}
}
Notice that we selected type keyword for the fields Category and Color.
Now, what about these copy_to and synonym?
What will copy_to do?
Every time we send an object for indexing into our index, value of the keyword field Category will be copied to a full-text field DescritpionAuto. This is what copy_to does.
What will synonym do?
To enable synonym we need to define a custom analyzer, see MySynonymAnalyzer which we defined under "settings" above.
Roughly, it will replace every token that matches something on the left of => with the token on the right.
How will the documents look like?
Let's insert a few example documents:
POST /my_index/_doc
{
"Category": [
"beds",
"adult"
],
"Color": "red",
"Size": {
"LengthCM": 150
}
}
POST /my_index/_doc
{
"Category": [
"beds",
"children"
],
"Color": "red",
"Size": {
"LengthCM": 120
}
}
POST /my_index/_doc
{
"Category": [
"couches",
"adult",
"family"
],
"Color": "blue",
"Size": {
"LengthCM": 200
}
}
POST /my_index/_doc
{
"Category": [
"couches",
"adult",
"family"
],
"Color": "red",
"Size": {
"LengthCM": 200
}
}
As you can see, DescriptionAuto is not present in the original documents - though due to copy_to we will be able to query it.
Let's see how.
Performing the search!
Now we can try out our index with a simple query_string query:
POST /my_index/_doc/_search
{
"query": {
"query_string": {
"query": "red beds for kids 120cm",
"default_field": "DescriptionAuto"
}
}
}
The results will look something like the following:
"hits": {
...
"max_score": 2.3611186,
"hits": [
{
...
"_score": 2.3611186,
"_source": {
"Category": [
"beds",
"children"
],
"Color": "red",
"Size": {
"LengthCM": 120
}
}
},
{
...
"_score": 1.0998137,
"_source": {
"Category": [
"beds",
"adult"
],
"Color": "red",
"Size": {
"LengthCM": 150
}
}
},
{
...
"_score": 0.34116736,
"_source": {
"Category": [
"couches",
"adult",
"family"
],
"Color": "red",
"Size": {
"LengthCM": 200
}
}
}
]
}
The document with categories beds and children and color red is on top. And its relevance score is twice bigger than of its follow-up!
How can I check how Elasticsearch interpreted the user's query?
It is easy to do via analyze API:
POST /my_index/_analyze
{
"text": "red bed for kids 120cm",
"analyzer": "MySynonymAnalyzer"
}
{
"tokens": [
{
"token": "red",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "bed",
"start_offset": 4,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "for",
"start_offset": 8,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "children",
"start_offset": 12,
"end_offset": 16,
"type": "SYNONYM",
"position": 3
},
{
"token": "120cm",
"start_offset": 17,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 4
}
]
}
As you can see, there is no token kids, but there is token children.
On a side note, in this example Elasticsearch wasn't able, though, to parse the size of the bed: token 120cm didn't match to anything, since all sizes are integers, like 120, 150, etc. Another layer of tweaking will be needed to extract 120 from 120cm token.
I hope this gives an idea of what can be achieved with Elasticsearch's built-in text analysis capabilities!

How do you handle large relationship data attributes and compound documents?

If an article has several comments (think thousands over time). Should data.relationships.comments return with a limit?
{
"data": [
{
"type": "articles",
"id": 1,
"attributes": {
"title": "Some title",
},
"relationships": {
"comments": {
"links": {
"related": "https://www.foo.com/api/v1/articles/1/comments"
},
"data": [
{ "type": "comment", "id": "1" }
...
{ "type": "comment", "id": "2000" }
]
}
}
}
],
"included": [
{
"type": "comments",
"id": 1,
"attributes": {
"body": "Lorem ipusm",
}
},
.....
{
"type": "comments",
"id": 2000,
"attributes": {
"body": "Lorem ipusm",
}
},
]
}
This starts to feel concerning, when you think of compound documents (http://jsonapi.org/format/#document-compound-documents). Which means, the included section will list all comments as well, making the JSON payload quite large.
If you want to limit the number of records you get at a time from a long list use pagination (JSON API spec).
I would load the comments separately with store.query (ember docs), like so -
store.query('comments', { author_id: <author_id>, page: 3 });
which will return the relevant subset of comments.
If you don't initially want to make two requests per author, you could include the first 'page' in the authors request as you're doing now.
You may also want to look into an addon like Ember Infinity (untested), which will provide an infinite scrolling list and automatically make pagination requests.