Related
I'm trying to validate message with json schema in WSO2 micro integrator 1.2.0.
<validate cache-schema="true">
<schema key="conf:schema/eip_dit_oko_jsonschema_stage_0_input_params.json"/>
<on-fail>
<payloadFactory media-type="json">
<format>{"Error":"$1","Error Details":"$2"}</format>
<args>
<arg evaluator="xml" expression="$ctx:ERROR_MESSAGE"/>
<arg evaluator="xml" expression="$ctx:ERROR_DETAIL"/>
</args>
</payloadFactory>
<property name="HTTP_SC" scope="axis2" type="STRING" value="500"/>
<respond/>
</on-fail>
</validate>
If the schema file is in registry
<item>
<file>eip_dit_oko_jsonschema_stage_0_input_params.json</file>
<path>/_system/config/schema</path>
<mediaType>application/json</mediaType>
<properties/>
</item>
then sequence fails with
[2022-11-22 21:14:33,492] ERROR {ValidateMediator} - {api:eip_dit_oko_api_stage_0} Error creating a new schema objects for schemas : [Value {name ='null', keyValue ='conf:schema/eip_dit_oko_jsonschema_stage_0_input_params.json'}] org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
at com.sun.org.apache.xerces.internal.jaxp.validation.Util.toSAXParseException(Util.java:74)
at com.sun.org.apache.xerces.internal.jaxp.validation.Util.toSAXException(Util.java:62)
at com.sun.org.apache.xerces.internal.jaxp.validation.XMLSchemaFactory.newSchema(XMLSchemaFactory.java:258)
at org.apache.synapse.mediators.builtin.ValidateMediator.mediate(ValidateMediator.java:429)
...
Obviously, integrator tries to read json-schema file as xml.
If I try to follow this answer about using local entry instead of registry
<validate cache-schema="true">
<schema key="eip_dit_oko_jsonschema_stage_0_input_params"/>
...
</validate>
<?xml version="1.0" encoding="UTF-8"?>
<localEntry key="eip_dit_oko_jsonschema_stage_0_input_params" xmlns="http://ws.apache.org/ns/synapse"><![CDATA[{ "$schema": "http://json-schema.org/draft-04/schema", "id": "http://example.com/example.json", "type": "object", "title": "The root schema", "required": [ "getData" ], "properties": { "getData": { "id": "#getData", "type": "object", "title": "The getData schema", "required": [ "p_limit", "p_offset" ], "properties": { "p_limit": { "id": "#p_limit", "type": "integer", "title": "The p_limit schema" }, "p_offset": { "id": "#p_offset", "type": "integer", "title": "The p_offset schema" }, "p_defect_text": { "id": "#/properties/defect_text", "type": "array", "items": {"id": "#/properties/defect_text/items","type": "string" } }, "p_district_code": { "id": "#p_district_code", "type": "array", "items": {"id": "#/properties/defect_text/items","type": "integer" } }, "p_okrug_code": { "id": "#p_okrug_code", "type": "array", "items": {"id": "#/properties/defect_text/items","type": "integer" } }, "p_status": { "id": "#p_status", "type": "array", "items": {"id": "#/properties/defect_text/items","type": "string" } }, "p_sys_status": { "id": "#p_sys_status", "type": "array", "items": {"id": "#/properties/defect_text/items","type": "string" } }, "p_ticket": { "id": "#p_sys_status", "type": "array", "items": {"id": "#/properties/defect_text/items","type": "string" } }, "p_season": { "id": "#p_sys_status", "type": "array", "items": {"id": "#/properties/defect_text/items","type": "string" } }, "p_critical": { "id": "#p_sys_status", "type": "array", "items": {"id": "#/properties/defect_text/items","type": "string" } }, "p_owner_name": { "id": "#p_sys_status", "type": "array", "items": {"id": "#/properties/defect_text/items","type": "string" } }, "p_address": { "id": "#p_sys_status", "type": "array", "items": {"id": "#/properties/defect_text/items","type": "string" } }, "p_address_like": { "id": "#p_address_like", "type": "string" },"p_id_object": { "id": "#p_sys_status", "type": "array", "items": {"id": "#/properties/defect_text/items","type": "integer" } }, "p_id_300": { "id": "#p_sys_status", "type": "array", "items": {"id": "#/properties/defect_text/items","type": "string" } }, "p_type_object": { "id": "#p_sys_status", "type": "array", "items": {"id": "#/properties/defect_text/items","type": "string" } }, "p_id_systems": { "id": "#p_sys_status", "type": "array", "items": {"id": "#/properties/defect_text/items","type": "string" } }, "p_defect_el1": { "id": "#p_sys_status", "type": "array", "items": {"id": "#/properties/defect_text/items","type": "string" } }, "p_defect_el": { "id": "#p_sys_status", "type": "array", "items": {"id": "#/properties/defect_text/items","type": "string" } }, "p_sys_sla": { "id": "#p_sys_status", "type": "string", "format":"date-time", "title": "The p_sys_status schema" }, "p_sys_sla_from": { "id": "#p_sys_status", "type": "string", "format":"date-time", "title": "The p_sys_status schema" }, "p_sys_sla_to": { "id": "#p_sys_status", "type": "string", "format":"date-time", "title": "The p_sys_status schema" }, "p_data_creation_from": { "id": "#p_sys_status", "type": "string", "format":"date-time", "title": "The p_sys_status schema" }, "p_data_creation_to": { "id": "#p_sys_status", "type": "string", "format":"date-time", "title": "The p_sys_status schema" }, "p_view_date_from_from": { "id": "#p_sys_status", "type": "string", "format":"date-time", "title": "The p_sys_status schema" }, "p_view_date_to_to": { "id": "#p_sys_status", "type": "string", "format":"date-time", "title": "The p_sys_status schema" }, "p_view_date_from_to": { "id": "#p_sys_status", "type": "string", "format":"date-time", "title": "The p_sys_status schema" }, "p_view_date_to_from": { "id": "#p_sys_status", "type": "string", "format":"date-time", "title": "The p_sys_status schema" }, "p_deadline": { "id": "#p_sys_status", "type": "number", "title": "The p_sys_status schema" } } } }}]]></localEntry>
then mediation fails as this
[2022-11-22 16:13:17,517] WARN {SynapseConfigUtils} - Cannot convert object to a StreamSource
EDIT
Request:
curl -v http://localhost:8290/api/stage -H 'Content-Type: application/json' -d '{"getData": {"p_season": ["winter"], "p_limit": 10, "p_offset": 0}}'
One reason for the issue is that the Payload you are sending is not a JSON or you are not sending the Content-Type: application/json header. But I would assume if you are sending the incorrect content type it would fail before reaching the validate medaitor. So my guess is that you are not sending any Payload with the request at all. Are you trying to test this with a GET request? As per the code if you don't send a JSON payload it will go down the XML path which can cause the issue you are facing, hence make sure you send a valid JSON payload with your request.
I have data with multiple dimensions, stored in the Druid cluster. for example, Data of movies and the revenue they earned from each country where they were screened.
I'm trying to build a query that the answer to be returned will be a table of all the movies, the total revenue of each of them, and the revenue for each country.
I succeeded to do it in Turnilo - it generated for me the following Druid query -
[
[
{
"queryType": "timeseries",
"dataSource": "movies_source",
"intervals": "2021-11-18T00:01Z/2021-11-21T00:01Z",
"granularity": "all",
"aggregations": [
{
"name": "__VALUE__",
"type": "doubleSum",
"fieldName": "revenue"
}
]
},
{
"queryType": "topN",
"dataSource": "movies_source",
"intervals": "2021-11-18T00:01Z/2021-11-21T00:01Z",
"granularity": "all",
"dimension": {
"type": "default",
"dimension": "movie_id",
"outputName": "movie_id"
},
"aggregations": [
{
"name": "revenue",
"type": "doubleSum",
"fieldName": "revenue"
}
],
"metric": "revenue",
"threshold": 50
}
],
[
{
"queryType": "topN",
"dataSource": "movies_source",
"intervals": "2021-11-18T00:01Z/2021-11-21T00:01Z",
"granularity": "all",
"filter": {
"type": "selector",
"dimension": "movie_id",
"value": "some_movie_id"
},
"dimension": {
"type": "default",
"dimension": "country",
"outputName": "country"
},
"aggregations": [
{
"name": "revenue",
"type": "doubleSum",
"fieldName": "revenue"
}
],
"metric": "revenue",
"threshold": 5
}
]
]
But it doesn't work when I'm trying to use it as a body for a Postman query - I got
{
"error": "Unknown exception",
"errorMessage": "Unexpected token (START_ARRAY), expected VALUE_STRING: need JSON String that contains type id (for subtype of org.apache.druid.query.Query)\n at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 2, column: 3]",
"errorClass": "com.fasterxml.jackson.databind.exc.MismatchedInputException",
"host": null
}
How should I build the corresponding query so that it works with Postman?
I am not familiar with Turnilo but have you tried using the Druid Console to write SQL and convert to Native request with the "Explain SQL query" option under the "Run/..." menu?
Your native queries seem to be doing a Top N instead of listing all movies, so I think the SQL might be something like:
SELECT movie_id, country_id, SUM(revenue) total_revenue
FROM movies_source
WHERE __time BETWEEN '2021-11-18 00:01:00' AND '2021-11-21 00:01:00'
GROUP BY movie_id, country_id
ORDER BY total_revenue DESC
LIMIT 50
I don't have the data source to test, but tested with sample wikipedia data with similar query structure:
SELECT namespace, cityName, sum(sum_added) total
FROM "wikipedia" r
WHERE cityName IS NOT NULL
AND __time BETWEEN '2015-09-12 00:00:00' AND '2015-09-15 00:00:00'
GROUP BY namespace, cityName
ORDER BY total DESC
limit 50
which results in the following Native query:
{
"queryType": "groupBy",
"dataSource": {
"type": "table",
"name": "wikipedia"
},
"intervals": {
"type": "intervals",
"intervals": [
"2015-09-12T00:00:00.000Z/2015-09-15T00:00:00.001Z"
]
},
"virtualColumns": [],
"filter": {
"type": "not",
"field": {
"type": "selector",
"dimension": "cityName",
"value": null,
"extractionFn": null
}
},
"granularity": {
"type": "all"
},
"dimensions": [
{
"type": "default",
"dimension": "namespace",
"outputName": "d0",
"outputType": "STRING"
},
{
"type": "default",
"dimension": "cityName",
"outputName": "d1",
"outputType": "STRING"
}
],
"aggregations": [
{
"type": "longSum",
"name": "a0",
"fieldName": "sum_added",
"expression": null
}
],
"postAggregations": [],
"having": null,
"limitSpec": {
"type": "default",
"columns": [
{
"dimension": "a0",
"direction": "descending",
"dimensionOrder": {
"type": "numeric"
}
}
],
"limit": 50
},
"context": {
"populateCache": false,
"sqlOuterLimit": 101,
"sqlQueryId": "cd5aabed-5e08-49b7-af63-fe82c125d3ee",
"useApproximateCountDistinct": false,
"useApproximateTopN": false,
"useCache": false
},
"descending": false
}
below is the one element from my json, which I want to put in respective column in table form in AWS athena.
Like
Date
From
To
Sat, 11 Sep 2021
info#hello.com
xyz#hotmail.com
"headers": [
{
"name": "Date",
"value": "Sat, 11 Sep 2021"
},
{
"name": "From",
"value": "info#hello.com"
},
{
"name": "To",
"value": "xyz#hotmail.com"
},
{
"name": "Message-ID",
"value": "<873411463.53966.1631381472705.JavaMail.ec2-user#ip-10-0-61-104.ap-south-1.compute.internal>"
},
{
"name": "Subject",
"value": "Hello there"
},
{
"name": "MIME-Version",
"value": "1.0"
},
{
"name": "Content-Type",
"value": "text/html; charset=UTF-8"
},
{
"name": "Content-Transfer-Encoding",
"value": "7bit"
}
]```
I am attempting to build a small web application for our internal team to use to view our CloudWatch logs. Right now I'm very early in development and simply trying to access the logs via Postman using https://logs.us-east-1.amazonaws.com as specified in the AWS API official documentation.
I have followed the steps to set up my POST request to the endpoint with the following headers:
Postman Generated Headers
Also, in following with the documentation I have provided the Action in the body of this post request:
{"Action": "DescribeLogGroups"}
Using the AWS CLI this works fine and I can see all my logs groups.
When I send this request to https://logs.us-east-1.amazonaws.com I get back:
{
"Output": {
"__type": "com.amazon.coral.service#UnknownOperationException",
"message": null
},
"Version": "1.0"
}
The status code is 200.
Things I have tried:
Removing the body of the request altogether -> results in "internal server error"
appending /describeloggroups to the URL with no body -> results in "internal server error"
I'm truly not sure what I'm doing wrong here.
Best way is to set the X-Amz-Target header to Logs_20140328.DescribeLogGroups.
Here is an example request: https://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_DescribeLogGroups.html#API_DescribeLogGroups_Example_1_Request
Below is a Postman collection you can try. Save it as a file and import into Postman with File -> Import. It also requires you to set credential and region variables in postman.
{
"info": {
"name": "CloudWatch Logs",
"schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json"
},
"item": [
{
"name": "DescribeLogs",
"request": {
"auth": {
"type": "awsv4",
"awsv4": [
{
"key": "sessionToken",
"value": "{{SESSION_TOKEN}}",
"type": "string"
},
{
"key": "service",
"value": "logs",
"type": "string"
},
{
"key": "region",
"value": "{{REGION}}",
"type": "string"
},
{
"key": "secretKey",
"value": "{{SECRET_ACCESS_KEY}}",
"type": "string"
},
{
"key": "accessKey",
"value": "{{ACCESS_KEY_ID}}",
"type": "string"
}
]
},
"method": "POST",
"header": [
{
"warning": "This is a duplicate header and will be overridden by the Content-Type header generated by Postman.",
"key": "Content-Type",
"type": "text",
"value": "application/json"
},
{
"key": "X-Amz-Target",
"type": "text",
"value": "Logs_20140328.DescribeLogGroups"
},
{
"warning": "This is a duplicate header and will be overridden by the host header generated by Postman.",
"key": "host",
"type": "text",
"value": "logs.{{REGION}}.amazonaws.com"
},
{
"key": "Accept",
"type": "text",
"value": "application/json"
},
{
"key": "Content-Encoding",
"type": "text",
"value": "amz-1.0"
}
],
"body": {
"mode": "raw",
"raw": "{}"
},
"url": {
"raw": "https://logs.{{REGION}}.amazonaws.com",
"protocol": "https",
"host": [
"logs",
"{{REGION}}",
"amazonaws",
"com"
]
}
},
"response": []
}
],
"protocolProfileBehavior": {}
}
Try copying this into a json file and import it in Postman and add the missing variables.
I tried to get a DescribeLogGroups in the service "logs". Look in the docs here
https://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/API_DescribeLogGroups.html#API_DescribeLogGroups_Example_1_Request
for more information about the headers and body.
PS: Session token is optional, I didn't need it in my case
Hope it works for anyone who
{
"info": {
"_postman_id": "8660f3fc-fc6b-4a71-84ba-739d8b4ea7c2",
"name": "CloudWatch Logs",
"schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json"
},
"item": [
{
"name": "DescribeLogs",
"request": {
"auth": {
"type": "awsv4",
"awsv4": [
{
"key": "service",
"value": "{{AWS_SERVICE_NAME}}",
"type": "string"
},
{
"key": "region",
"value": "{{AWS_REGION}}",
"type": "string"
},
{
"key": "secretKey",
"value": "{{AWS_SECRET_ACCESS_KEY}}",
"type": "string"
},
{
"key": "accessKey",
"value": "{{AWS_ACCESS_KEY_ID}}",
"type": "string"
},
{
"key": "sessionToken",
"value": "",
"type": "string"
}
]
},
"method": "POST",
"header": [
{
"key": "X-Amz-Target",
"value": "Logs_20140328.DescribeLogGroups",
"type": "text"
},
{
"key": "Content-Encoding",
"value": "amz-1.0",
"type": "text"
}
],
"body": {
"mode": "raw",
"raw": "{}",
"options": {
"raw": {
"language": "json"
}
}
},
"url": {
"raw": "https://{{AWS_SERVICE_NAME}}.{{AWS_REGION}}.amazonaws.com",
"protocol": "https",
"host": [
"{{AWS_SERVICE_NAME}}",
"{{AWS_REGION}}",
"amazonaws",
"com"
]
}
},
"response": []
}
]
}
I'm currently creating Avro schema to store twitter data streams.
My data source in JSON:
{
'id': '123456789',
'text': 'bla bla bla...',
'entities': {
'hashtags': [{'text':'hashtag1'},{'text':'hashtag2'}]
}
}
in Cassandra, I can define collection (sets or lists) to store hashtags data.
But I have no idea how to define this structure in Apache Avro.
Here's my best try:
{"namespace": "ln.twitter",
"type": "record",
"name": "main",
"fields": [
{"name": "id","type": "string"},
{"name": "text","type": "string"},
{"name": "hashtags","type": "string"} // is there any better format for this ?
]
}
Need your advice please.
Thanks,
Yusata.
The entities field needed explicit records (or maps) inside. Here's a schema that should work:
{
"type": "record",
"name": "Main",
"fields": [
{
"name": "id",
"type": "string"
},
{
"name": "text",
"type": "string"
},
{
"name": "entities",
"type": {
"type": "record",
"name": "Entities",
"fields": [
{
"name": "hashtags",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "Hashtag",
"fields": [
{
"name": "text",
"type": "string"
}
]
}
}
}
]
}
}
]
}
In case it's helpful, you can use this tool to generate an (anonymous) Avro schema from any valid JSON record. You'll then just need to add names to the record types.
You can try it on your example after switching its ' to ":
{
"id": "123456789",
"text": "bla bla bla...",
"entities": {"hashtags": [{"text": "hashtag1"}, {"text": "hashtag2"}]}
}