How to use the json output from aws Amazon Transcribe Medical - amazon-web-services

the output is made of json with the whole text and each speaker segments like so:
"speaker_labels": {
"speakers": 2,
"segments": [
{
"start_time": "0.94",
"speaker_label": "spk_0",
"end_time": "3.065",
"items": [
{
"start_time": "1.01",
"speaker_label": "spk_0",
"end_time": "1.22"
},
.
.
.
and then the each word and its timestamp
"items": [
{
"start_time": "1.01",
"end_time": "1.22",
"alternatives": [{ "confidence": "1.0", "content": "word" }],
"type": "pronunciation"
},
{
"start_time": "1.22",
"end_time": "1.81",
"alternatives": [{ "confidence": "1.0", "content": "word" }],
"type": "pronunciation"
},
{
"alternatives": [{ "confidence": "0.0", "content": "another word" }],
"type": "punctuation"
},
.
.
.
and i need each speaker's words how am i suppose to get that data without doing alot of logic with the start end time of the words and all the start end of the speakers.
the result i expect:
spk_0 : words words words
spk_1 : words words more words

Related

How to build a multi-dimentional json native query for Druid?

I have data with multiple dimensions, stored in the Druid cluster. for example, Data of movies and the revenue they earned from each country where they were screened.
I'm trying to build a query that the answer to be returned will be a table of all the movies, the total revenue of each of them, and the revenue for each country.
I succeeded to do it in Turnilo - it generated for me the following Druid query -
[
[
{
"queryType": "timeseries",
"dataSource": "movies_source",
"intervals": "2021-11-18T00:01Z/2021-11-21T00:01Z",
"granularity": "all",
"aggregations": [
{
"name": "__VALUE__",
"type": "doubleSum",
"fieldName": "revenue"
}
]
},
{
"queryType": "topN",
"dataSource": "movies_source",
"intervals": "2021-11-18T00:01Z/2021-11-21T00:01Z",
"granularity": "all",
"dimension": {
"type": "default",
"dimension": "movie_id",
"outputName": "movie_id"
},
"aggregations": [
{
"name": "revenue",
"type": "doubleSum",
"fieldName": "revenue"
}
],
"metric": "revenue",
"threshold": 50
}
],
[
{
"queryType": "topN",
"dataSource": "movies_source",
"intervals": "2021-11-18T00:01Z/2021-11-21T00:01Z",
"granularity": "all",
"filter": {
"type": "selector",
"dimension": "movie_id",
"value": "some_movie_id"
},
"dimension": {
"type": "default",
"dimension": "country",
"outputName": "country"
},
"aggregations": [
{
"name": "revenue",
"type": "doubleSum",
"fieldName": "revenue"
}
],
"metric": "revenue",
"threshold": 5
}
]
]
But it doesn't work when I'm trying to use it as a body for a Postman query - I got
{
"error": "Unknown exception",
"errorMessage": "Unexpected token (START_ARRAY), expected VALUE_STRING: need JSON String that contains type id (for subtype of org.apache.druid.query.Query)\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 2, column: 3]",
"errorClass": "com.fasterxml.jackson.databind.exc.MismatchedInputException",
"host": null
}
How should I build the corresponding query so that it works with Postman?
I am not familiar with Turnilo but have you tried using the Druid Console to write SQL and convert to Native request with the "Explain SQL query" option under the "Run/..." menu?
Your native queries seem to be doing a Top N instead of listing all movies, so I think the SQL might be something like:
SELECT movie_id, country_id, SUM(revenue) total_revenue
FROM movies_source
WHERE __time BETWEEN '2021-11-18 00:01:00' AND '2021-11-21 00:01:00'
GROUP BY movie_id, country_id
ORDER BY total_revenue DESC
LIMIT 50
I don't have the data source to test, but tested with sample wikipedia data with similar query structure:
SELECT namespace, cityName, sum(sum_added) total
FROM "wikipedia" r
WHERE cityName IS NOT NULL
AND __time BETWEEN '2015-09-12 00:00:00' AND '2015-09-15 00:00:00'
GROUP BY namespace, cityName
ORDER BY total DESC
limit 50
which results in the following Native query:
{
"queryType": "groupBy",
"dataSource": {
"type": "table",
"name": "wikipedia"
},
"intervals": {
"type": "intervals",
"intervals": [
"2015-09-12T00:00:00.000Z/2015-09-15T00:00:00.001Z"
]
},
"virtualColumns": [],
"filter": {
"type": "not",
"field": {
"type": "selector",
"dimension": "cityName",
"value": null,
"extractionFn": null
}
},
"granularity": {
"type": "all"
},
"dimensions": [
{
"type": "default",
"dimension": "namespace",
"outputName": "d0",
"outputType": "STRING"
},
{
"type": "default",
"dimension": "cityName",
"outputName": "d1",
"outputType": "STRING"
}
],
"aggregations": [
{
"type": "longSum",
"name": "a0",
"fieldName": "sum_added",
"expression": null
}
],
"postAggregations": [],
"having": null,
"limitSpec": {
"type": "default",
"columns": [
{
"dimension": "a0",
"direction": "descending",
"dimensionOrder": {
"type": "numeric"
}
}
],
"limit": 50
},
"context": {
"populateCache": false,
"sqlOuterLimit": 101,
"sqlQueryId": "cd5aabed-5e08-49b7-af63-fe82c125d3ee",
"useApproximateCountDistinct": false,
"useApproximateTopN": false,
"useCache": false
},
"descending": false
}

"Buffalo" query fails to return "Buffalo Exchange" from /discover endpoint

NOTE: this question is specifically for support staff of the HERE Developer API because they ask freemium users to post support questions on Stack Overflow rather than trying to contact them directly. If you're not a member of their staff and you're unable to help or if the question is unclear to you, don't worry about it. :)
For some reason the /discover endpoint doesn't return the "Buffalo Exchange" place that's at my specified coordinates, but only returns 2 localities that are much further away. This is the query that I'm using: https://discover.search.hereapi.com/v1/discover?at=34.003975%2C-118.484823&q=Buffalo&limit=20&apiKey=<insert API KEY>. These are the results I currently receive:
{
"items": [
{
"title": "Buffalo, NY, United States",
"id": "here:cm:namedplace:21018816",
"resultType": "locality",
"localityType": "city",
"address": {
"label": "Buffalo, NY, United States",
"countryCode": "USA",
"countryName": "United States",
"stateCode": "NY",
"state": "New York",
"county": "Erie",
"city": "Buffalo",
"postalCode": "14202"
},
"position": {
"lat": 42.88544,
"lng": -78.87846
},
"distance": 3551940,
"mapView": {
"west": -78.9168,
"south": 42.82603,
"east": -78.79492,
"north": 42.96651
}
},
{
"title": "Buffalo City, Eastern Cape, South Africa",
"id": "here:cm:namedplace:23402337",
"resultType": "locality",
"localityType": "city",
"address": {
"label": "Buffalo City, Eastern Cape, South Africa",
"countryCode": "ZAF",
"countryName": "South Africa",
"state": "Eastern Cape",
"county": "Buffalo City",
"city": "Buffalo City",
"postalCode": "5201"
},
"position": {
"lat": -33.0148,
"lng": 27.9038
},
"distance": 16910944,
"mapView": {
"west": 27.15745,
"south": -33.28749,
"east": 28.08053,
"north": -32.674
}
}
]
}
You can see that for both places the resultType is "locality".
Now compare that to the first result of a similar query that searches for the term "Exchange" instead of "Buffalo". All other query params are the same. This is the URL: https://discover.search.hereapi.com/v1/discover?at=34.003975%2C-118.484823&q=Exchange&limit=20&apiKey=<insert API KEY>, and this is how the results begin (not shown fully because there are many results):
{
"items": [
{
"title": "Buffalo Exchange",
"id": "here:pds:place:8403fv6k-b15f290ec4f409deea99318f7388bbd6",
"resultType": "place",
"address": {
"label": "Buffalo Exchange, 2449 Main St, Santa Monica, CA 90405, United States",
"countryCode": "USA",
"countryName": "United States",
"stateCode": "CA",
"state": "California",
"county": "Los Angeles",
"city": "Santa Monica",
"district": "City of Santa Monica",
"street": "Main St",
"postalCode": "90405",
"houseNumber": "2449"
},
"position": {
"lat": 34.00342,
"lng": -118.48483
},
"access": [
{
"lat": 34.00331,
"lng": -118.48493
}
],
"distance": 61,
"categories": [
{
"id": "600-6800-0090",
"name": "Women's Apparel",
"primary": true
},
{
"id": "600-6800-0000",
"name": "Clothing & Accessories"
},
{
"id": "600-6800-0089",
"name": "Men's Apparel"
},
{
"id": "600-6900-0251",
"name": "Used/Second-hand Merchandise Stores"
}
],
"references": [
{
"supplier": {
"id": "core"
},
"id": "1211447153"
},
{
"supplier": {
"id": "yelp"
},
"id": "5PzeN6hGLBPmJpCJ2ZmfCQ"
}
],
"contacts": [
{
"phone": [
{
"value": "+13103147300"
},
{
"value": "+13103924301",
"categories": [
{
"id": "600-6800-0000"
}
]
}
],
"fax": [
{
"value": "(520) 622-7015",
"categories": [
{
"id": "600-6800-0000"
}
]
}
],
"www": [
{
"value": "http://www.buffaloexchange.com",
"categories": [
{
"id": "600-6800-0000"
},
{
"id": "600-6900-0251"
}
]
}
],
"email": [
{
"value": "contact#bufex.com",
"categories": [
{
"id": "600-6800-0000"
}
]
}
]
}
],
"openingHours": [
{
"categories": [
{
"id": "600-6800-0000"
}
],
"text": [
"Mon-Sun: 11:00 - 20:00"
],
"isOpen": false,
"structured": [
{
"start": "T110000",
"duration": "PT09H00M",
"recurrence": "FREQ:DAILY;BYDAY:MO,TU,WE,TH,FR,SA,SU"
}
]
},
{
"categories": [
{
"id": "600-6800-0090"
},
{
"id": "600-6900-0251"
}
],
"text": [
"Mon-Sat: 11:00 - 20:00",
"Sun: 11:00 - 19:00"
],
"isOpen": false,
"structured": [
{
"start": "T110000",
"duration": "PT09H00M",
"recurrence": "FREQ:DAILY;BYDAY:MO,TU,WE,TH,FR,SA"
},
{
"start": "T110000",
"duration": "PT08H00M",
"recurrence": "FREQ:DAILY;BYDAY:SU"
}
]
}
]
},
...
}
You can see that the first result has the name "Buffalo Exchange" and the "resultType" is "place". This is the result I want. The question is why does this result fail to show up when the search query is "Buffalo"? Of course with the /discover endpoint I can't specify the category IDs I want to search, that's only available via the /browse endpoint. But with the /browse endpoint I can't specify a specific search term like "Buffalo" or "Exchange".
Update: this problem also happens with the "Bison" query in Alberta, Canada. The query for this is https://discover.search.hereapi.com/v1/discover?at=56.745531%2C-111.351341&q=Exchange&limit=20&apiKey=<insert API KEY>. This query yields only 10 results, and only 4 of them have resultType of "place".
We recommend application developers to use both Autosuggest and Discover, you get the nearby Buffalo Exchange because Autosuggest considers that the query is incomplete. Note that Discover considers the query to be complete.
As HERE Geocoding and Search is meant to provide relevant responses to user queries.
Geocoding Search Api (Autosuggest): https://developer.here.com/documentation/geocoding-search-api/dev_guide/topics/endpoint-autosuggest-brief.html
Geocoding Search Api (Discover): https://developer.here.com/documentation/geocoding-search-api/dev_guide/topics/endpoint-discover-brief.html
The above-mentioned user guide can help modify the query for accurate results.

Elastic Search Sort

I have a table for some activities like
[
{
"id": 123,
"name": "Ram",
"status": 1,
"activity": "Poster Design"
},
{
"id": 123,
"name": "Ram",
"status": 1,
"activity": "Poster Design"
},
{
"id": 124,
"name": "Leo",
"categories": [
"A",
"B",
"C"
],
"status": 1,
"activity": "Brochure"
},
{
"id": 134,
"name": "Levin",
"categories": [
"A",
"B",
"C"
],
"status": 1,
"activity": "3D Printing"
}
]
I want to get this data from elastic search 5.5 by sorting on field activity, but I need all the data corresponding to name = "Ram" first and then remaining in a single query.
You can use function score query to boost the result based on match for the filter(this case ram in name).
Following query should work for you
POST sort_index/_search
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"boost": "5",
"functions": [{
"filter": {
"match": {
"name": "ram"
}
},
"random_score": {},
"weight": 1000
}],
"score_mode": "max"
}
},
"sort": [{
"activity.keyword": {
"order": "desc"
}
}]
}
I would suggest using a bool query combined with the should clause.
U will also need to use the sort clause on your field.

Use regex in Powershell v2 to get values from a json file

How would I access the following values using the regex function in Powershell, and assign each one to an individual variable?:
id (i.e. get the value: TOKEN_ID) - under token
id (i.e. get the value: TENANT_ID) - under token, tenant
adminURL (i.e. get the value: http://10.100.0.222:35357/v2.0) - the first value under serviceCatalog,endpoints
As I am using Powershell v2, I can't use the ConvertFrom-Json cmdlet. So far I've tried converting the document to an xml file using the a third-party PS script, but it doesn't always get it right. I'd like to use regex, but I am not very comfortable with it.
$json =
"{
"access": {
"metadata": {
"is_admin": 0,
"roles": [
"9fe2ff9ee4384b1894a90878d3e92bab"
]
},
"serviceCatalog": [
{
"endpoints": [
{
"adminURL": "http://10.100.0.222:8774/v2/TENANT_ID",
"id": "0eb78b6d3f644438aea327d9c57b7b5a",
"internalURL": "http://10.100.0.222:8774/v2/TENANT_ID",
"publicURL": "http://8.21.28.222:8774/v2/TENANT_ID",
"region": "RegionOne"
}
],
"endpoints_links": [],
"name": "nova",
"type": "compute"
},
{
"endpoints": [
{
"adminURL": "http://10.100.0.222:9696/",
"id": "3f4b6015a2f9481481ca03dace8acf32",
"internalURL": "http://10.100.0.222:9696/",
"publicURL": "http://8.21.28.222:9696/",
"region": "RegionOne"
}
],
"endpoints_links": [],
"name": "neutron",
"type": "network"
},
{
"endpoints": [
{
"adminURL": "http://10.100.0.222:8776/v2/TENANT_ID",
"id": "16f6416588f64946bdcdf4a431a8f252",
"internalURL": "http://10.100.0.222:8776/v2/TENANT_ID",
"publicURL": "http://8.21.28.222:8776/v2/TENANT_ID",
"region": "RegionOne"
}
],
"endpoints_links": [],
"name": "cinder_v2",
"type": "volumev2"
},
{
"endpoints": [
{
"adminURL": "http://10.100.0.222:8779/v1.0/TENANT_ID",
"id": "be48765ae31e425cb06036b1ebab694a",
"internalURL": "http://10.100.0.222:8779/v1.0/TENANT_ID",
"publicURL": "http://8.21.28.222:8779/v1.0/TENANT_ID",
"region": "RegionOne"
}
],
"endpoints_links": [],
"name": "trove",
"type": "database"
},
{
"endpoints": [
{
"adminURL": "http://10.100.0.222:9292",
"id": "1adfcb5414304f3596fb81edb2dfb514",
"internalURL": "http://10.100.0.222:9292",
"publicURL": "http://8.21.28.222:9292",
"region": "RegionOne"
}
],
"endpoints_links": [],
"name": "glance",
"type": "image"
},
{
"endpoints": [
{
"adminURL": "http://10.100.0.222:8777",
"id": "350f3b91d73f4b3ab8a061c94ac31fbb",
"internalURL": "http://10.100.0.222:8777",
"publicURL": "http://8.21.28.222:8777",
"region": "RegionOne"
}
],
"endpoints_links": [],
"name": "ceilometer",
"type": "metering"
},
{
"endpoints": [
{
"adminURL": "http://10.100.0.222:8000/v1/",
"id": "2198b0d32a604e75a5cc1e13276a813d",
"internalURL": "http://10.100.0.222:8000/v1/",
"publicURL": "http://8.21.28.222:8000/v1/",
"region": "RegionOne"
}
],
"endpoints_links": [],
"name": "heat-cfn",
"type": "cloudformation"
},
{
"endpoints": [
{
"adminURL": "http://10.100.0.222:8776/v1/TENANT_ID",
"id": "7c193c4683d849ca8e8db493722a4d8c",
"internalURL": "http://10.100.0.222:8776/v1/TENANT_ID",
"publicURL": "http://8.21.28.222:8776/v1/TENANT_ID",
"region": "RegionOne"
}
],
"endpoints_links": [],
"name": "cinder",
"type": "volume"
},
{
"endpoints": [
{
"adminURL": "http://10.100.0.222:8773/services/Admin",
"id": "11fac8254be74d7d906110f0069e5748",
"internalURL": "http://10.100.0.222:8773/services/Cloud",
"publicURL": "http://8.21.28.222:8773/services/Cloud",
"region": "RegionOne"
}
],
"endpoints_links": [],
"name": "nova_ec2",
"type": "ec2"
},
{
"endpoints": [
{
"adminURL": "http://10.100.0.222:8004/v1/TENANT_ID",
"id": "38fa4f9afce34d4ca0f5e0f90fd758dd",
"internalURL": "http://10.100.0.222:8004/v1/TENANT_ID",
"publicURL": "http://8.21.28.222:8004/v1/TENANT_ID",
"region": "RegionOne"
}
],
"endpoints_links": [],
"name": "heat",
"type": "orchestration"
},
{
"endpoints": [
{
"adminURL": "http://10.100.0.222:35357/v2.0",
"id": "256cdf78ecb04051bf0f57ec11070222",
"internalURL": "http://10.100.0.222:5000/v2.0",
"publicURL": "http://8.21.28.222:5000/v2.0",
"region": "RegionOne"
}
],
"endpoints_links": [],
"name": "keystone",
"type": "identity"
}
],
"token": {
"audit_ids": [
"gsjrNoqFSQeuLUo0QeJprQ"
],
"expires": "2014-12-15T15:09:29Z",
"id": "TOKEN_ID",
"issued_at": "2014-12-15T14:09:29.794527",
"tenant": {
"description": "Auto created account",
"enabled": true,
"id": "TENANT_ID",
"name": "USERNAME"
}
},
"user": {
"id": "USER_ID",
"name": "USERNAME",
"roles": [
{
"name": "_member_"
}
],
"roles_links": [],
"username": "USERNAME"
}
}
}"
If you are using .NET 3.5 or higher on your machines with PowerShell 2.0, you can use a JSON serializer (from the linked answer):
[System.Reflection.Assembly]::LoadWithPartialName("System.Web.Extensions")
$json = "{a:1,b:2,c:{nested:true}}"
$ser = New-Object System.Web.Script.Serialization.JavaScriptSerializer
$obj = $ser.DeserializeObject($json)
This would be preferable to using regex.
For admin URL for example, you'd refer to:
$obj.access.serviceCatalog[0].endpoints[0].adminURL
Using RegEx Anyway
if ($json -match '(?s)"serviceCatalog".+?"endpoints".+?"adminURL"[^"]+"(?<adminUrl>[^"]+)".+?"token".+?"id"[^"]+"(?<tokenID>[^"]+)".+?"tenant".+?"id"[^"]+"(?<tenantID>[^"]+)') {
$Matches['adminURL']
$Matches['tokenID']
$Matches['tenantID']
}
RegEx Breakdown:
(?s) tells the regex engine that . matches anything, including newlines (by default it wouldn't).
Of course all of the "whatever" parts just match literally.
.+? matches 1 or more of any character (including newlines since we're using s), and the ? makes it non-greedy.
[^"]+ this matches 1 or more characters that are not a double quote.
() is a capturing group. By using (?<name>) we can refer back to the group later by name rather than number, just a nicety.
So the basic idea is to look for the literals, then get to a point where we can capture the values needed. After a -regex operator match in PowerShell, the $Matches variable is populated with the matches, groups, etc.
Note that this relies on the values being in the order they are in the posted JSON. If they were in a different order it would fail.
To work around that you could split this into 3 different regex matches.

Why application posts on my wall marked as links?

My application creates posts on user's wall, but it's marked as link.
I see that other applications doing same thing but for some reason facebook don't mark their posts as links. Tried to find the difference between them and me in graph api explorer, and the only difference is status_type attribute (this is not even documented). Mine set to "shared_story" and in other applications is "app_created_story". Tried to set it manualy but facebook still use shared_story even if I set it to app_created_story
Data from graph api explorer (first is my application)
[{
"id": "100001302536317_168939309896943",
"from": {
"name": "Андрей Реаскович",
"id": "100001302536317"
},
"story": "Андрей Реаскович shared a link.",
"story_tags": {
"0": [
{
"id": "100001302536317",
"name": "Андрей Реаскович",
"offset": 0,
"length": 16,
"type": "user"
}
]
},
"picture": "http://external.ak.fbcdn.net/safe_image.php?d=AQBcrRQnipKHB6g6&w=90&h=90&url=http%3A%2F%2Fstatic.jackpotrush.com%2Fimages%2Fviral%2Fgames%2Fjoker_jester.png",
"link": "http://apps.facebook.com/jackrush/?ref=bonus-game",
"name": "I Won Big on a Bonus Round at Jackpot Rush!",
"caption": " ",
"description": "I Just Won 200 Coins at a Jackpot Rush Bonus Round and it Feels Amazing!",
"icon": "http://static.ak.fbcdn.net/rsrc.php/v2/yD/r/aS8ecmYRys0.gif",
"actions": [
{
"name": "Comment",
"link": "http://www.facebook.com/100001302536317/posts/168939309896943"
},
{
"name": "Like",
"link": "http://www.facebook.com/100001302536317/posts/168939309896943"
}
],
"privacy": {
"description": "Only Me",
"value": "SELF"
},
"type": "link",
"status_type": "shared_story",
"application": {
"name": "Jackrush Development",
"namespace": "jackrush",
"id": "434595133235542"
},
"created_time": "2012-11-12T08:48:18+0000",
"updated_time": "2012-11-12T08:48:18+0000",
"comments": {
"count": 0
}
},
{
"id": "100001302536317_420171444702936",
"from": {
"name": "Андрей Реаскович",
"id": "100001302536317"
},
"picture": "http://platform.ak.fbcdn.net/www/app_full_proxy.php?app=242829452408559&v=1&size=z&cksum=cd0fa7a15202d05700bf1464fd989d42&src=http%3A%2F%2Fc.cdn.blueshellgames.com%2Fs%2F%3Aac05b878%2Fmonte%2Fi%2Ffeed_story_image-logo_black.png",
"link": "http://apps.facebook.com/luckyslotsgame/?action=shareLevelUpReward&posterID=fb%3A100001302536317&k_utag=df3f4636aabab65e&levelUpToken=6553%3AsVFCJmW43zW_Nni9vBB9xg%3D%3D&level=4&refID=fb%3A100001302536317&utm_medium=viral&utm_source=levelUp",
"name": "First 5 Clickers get 150 Free Coins!",
"caption": "Андрей just reached level 4 and is giving away 150 Free Coins to celebrate!",
"description": "Play the best slots on Facebook! Get free chips daily! Unlock new slot machines and new bonus games! Don't miss out, play now!",
"icon": "http://photos-d.ak.fbcdn.net/photos-ak-snc7/v85006/51/242829452408559/app_2_242829452408559_9481.gif",
"actions": [
{
"name": "Comment",
"link": "http://www.facebook.com/100001302536317/posts/420171444702936"
},
{
"name": "Like",
"link": "http://www.facebook.com/100001302536317/posts/420171444702936"
},
{
"name": "Claim BONUS Coins",
"link": "http://apps.facebook.com/luckyslotsgame/?action=shareLevelUpReward&posterID=fb%3A100001302536317&k_utag=df3f4636aabab65e&levelUpToken=6553%3AsVFCJmW43zW_Nni9vBB9xg%3D%3D&level=4&refID=fb%3A100001302536317&utm_medium=viral&utm_source=levelUp"
}
],
"privacy": {
"description": "Public",
"value": "EVERYONE"
},
"type": "link",
"status_type": "app_created_story",
"application": {
"name": "Lucky Slots",
"namespace": "luckyslotsgame",
"id": "242829452408559"
},
"created_time": "2012-11-12T08:13:25+0000",
"updated_time": "2012-11-12T08:13:25+0000",
"comments": {
"count": 0
}
}]