AWS Redshift - Loading GeoJSON Into Geometry Field - amazon-web-services

Currently it is not possible to load GeoJSON directly into a redshift geometry column using the copy command, but a workaround has been suggested at:-
Copying GeoJSON data from S3 to Redshift
This involves ingesting as WKT then converting to geometry using a spatial function, however i'm not entirely sure how to get from geojson to wkt - i am sure there must be some converter available.
But this is where my limited understanding of spatial data comes in - let's say i want to load weather geojson objects like the one shown below into a table in redshift.
If I understand it correctly, each Feature in the FeatureCollection would be a row in the table with just the contents of json geometry field loaded into a field of type GEOMETRY i.e. this field does not take any of the properties or attributes of the feature. The properties would then be loaded into completely separate fields using conventional datatypes. Then if i wanted to export that feature as geojson, i would have to stitch the geometry and properties back together again.
Is that correct?
Or does the GEOMETRY type actually have the facility to store the properties as well as the geometry field contents?
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"id": "abcd1234",
"geometry": {
"type": "MultiPolygon",
"coordinates": [
[
[
[
7.51,
48.04
],
[
8.12,
48.05
],
[
8.19,
47.95
]
]
]
]
},
"geometry_name": "contour",
"properties": {
"identifier": "abcd1234",
"analysisTime": "2019-04-06T14:15:00Z",
"convectionCellType": "CELL_BASE",
"speed": 3,
"area": "MSG",
"phasetype": null,
"top": 10363,
"intensityValue": null,
"ice": true,
"created_at": "2020-07-21T12:01:25.651Z"
}
}
],
"totalFeatures": 1,
"numberMatched": 1,
"numberReturned": 1,
"timeStamp": "2021-02-03T16:39:48.963Z",
"crs": {
"type": "name",
"properties": {
"name": "urn:ogc:def:crs:EPSG::4326"
}
}
}

I found this Gist which provides the following Python snippet to do the conversion:
from shapely.geometry import shape
o = {
"coordinates": [[[23.314208, 37.768469], [24.039306, 37.768469], [24.039306, 38.214372], [23.314208, 38.214372], [23.314208, 37.768469]]],
"type": "Polygon"
}
geom = shape(o)
# Now it's very easy to get a WKT/WKB representation
geom.wkt
geom.wkb

Related

Can I create a Django Rest Framework API with Geojson format without having a model

I have a Django app that requests data from an external API and my goal is to convert that data which is returned as list/dictionary format into a new REST API with a Geojson format.
I came across django-rest-framework-gis but I don't know if I could use it without having a Model. But if so, how?
I think the best way is to use the python library geojson
pip install geojson
If you do not have a Model like in geodjango you have to explicitly describe the geometry from the data you have.
from geojson import Point, Feature, FeatureCollection
data = [
{
"id": 1,
"address": "742 Evergreen Terrace",
"city": "Springfield",
"lon": -123.02,
"lat": 44.04
},
{
"id": 2,
"address": "111 Spring Terrace",
"city": "New Mexico",
"lon": -124.02,
"lat": 45.04
}
]
def to_geojson(entries):
features = []
for entry in entries:
point = Point((entry["lon"], entry["lat"]))
del entry["lon"]
del entry["lat"]
feature = Feature(geometry=point, properties=entry)
features.append(feature)
return FeatureCollection(features)
if __name__ == '__main__':
my_geojson = to_geojson(data)
print(my_geojson)
Create the point geometry from lon, lat (Could also be another geometry type)
Create a feature with the created geometry and add the dictionary as properties. Note that I deleted lon, lat entries from the dictionary to not show up as properties.
Create A feature collection from multiple features
Result:
{"features": [{"geometry": {"coordinates": [-123.02, 44.04], "type":
"Point"}, "properties": {"address": "742 Evergreen Terrace", "city":
"Springfield", "id": 1}, "type": "Feature"}, {"geometry":
{"coordinates": [-124.02, 45.04], "type": "Point"}, "properties":
{"address": "111 Spring Terrace", "city": "New Mexico", "id": 2},
"type": "Feature"}], "type": "FeatureCollection"}
More Info here: Documentation Geojson Library

How to highlight custom extractions using a2i's crowd-textract-analyze-document?

I would like to create a human review loop for images that undergone OCR using Amazon Textract and Entity Extraction using Amazon Comprehend.
My process is:
send image to Textract to extract the text
send text to Comprehend to extract entities
find the Block IDs in Textract's output of the entities extracted by Comprehend
add new Blocks of type KEY_VALUE_SET to textract's JSON output per the docs
create a Human Task with crowd-textract-analyze-document element in the template and feed it the modified textract output
What fails to work in this process is step 5. My custom entities are not rendered properly. By "fails to work" I mean that the entities are not highlighted on the image when I click them on the sidebar. There is no error in the browser's console.
Has anyone tried such a thing?
Sorry for not including examples. I will remove secrets/PII from my files and attach them to the question
I used the AWS documentation of the a2i-crowd-textract-detection human task element to generate the value of the initialValue attribute. It appears the doc for that attribute is incorrect. While the the doc shows that the value should be in the same format as the output of Textract, namely:
[
{
"BlockType": "KEY_VALUE_SET",
"Confidence": 38.43309020996094,
"Geometry": { ... }
"Id": "8c97b240-0969-4678-834a-646c95da9cf4",
"Relationships": [
{ "Type": "CHILD", "Ids": [...]},
{ "Type": "VALUE", "Ids": [...]}
],
"EntityTypes": ["KEY"],
"Text": "Foo bar"
},
]
the a2i-crowd-textract-detection expects the input to have lowerCamelCase attribute names (rather than UpperCamelCase). For example:
[
{
"blockType": "KEY_VALUE_SET",
"confidence": 38.43309020996094,
"geometry": { ... }
"id": "8c97b240-0969-4678-834a-646c95da9cf4",
"relationships": [
{ "Type": "CHILD", "ids": [...]},
{ "Type": "VALUE", "ids": [...]}
],
"entityTypes": ["KEY"],
"text": "Foo bar"
},
]
I opened a support case about this documentation error to AWS.

Mask RCNN - How to use one JSON File for each image in the dataset?

So I have been using Mask-RCNN for work. I have to do custom object detection. For this I have labelled all my images using either polygon or circle depending on the geometry of the object in the given image. Now I have an annotations folder with separate annotations file for each image. i.e. I have 128 images and hence I have 128 annotations.json files.
But according to the information given on the Mask RCNN github Repo, we need only one annotation json file.
So my question is:
How to change the export_boxes and load_mask functions given in the code to accommodate my problem? If so, how do I do that? Also, consider that that I have 2 shapes in my JSON files. One is Polygon and other is circle. or should I merge all the JSON files into one? If I go ahead with the merging, would that have the correct formatting?
So a part of the JSON file which contains both polygon and circle is given below:
{ "label": "anchor", "points": [ [ 35.70270270270271, 18.37837837837838 ], [ 60.56756756756755, 15.675675675675675 ], [ 70.29729729729729, 32.43243243243243 ], [ 59.486486486486484, 49.729729729729726 ], [ 38.40540540540539, 49.729729729729726 ], [ 30.29729729729729, 37.2972972972973 ] ], "group_id": null, "shape_type": "polygon", "flags": {} }, { "label": "anchor", "points": [ [ 244.35135135135135, 168.64864864864865 ], [ 250.2972972972973, 183.78378378378378 ] ], "group_id": null, "shape_type": "circle", "flags": {} }
I would like to add something else. I have seen other issues which are related to this problem but I couldn't find any decisive answers and hence I am opening a new question here.
Please help me out.
With regards,
Yash.

When predicting, what are the valid values for dataFormat?

Problem
Using the REST API, I have trained and deployed a model that I now want to use for prediction. I've defined the collections for prediction input and output and uploaded a json file formatted accordingly to the cloud storage. However, when trying to create a prediction job I cannot figure out what value to use for the dataFormat field, which is a required parameter. Is there any way to list all valid values?
What I've tried
My requests look like the one below. I've tried JSON, NEWLINE_DELIMITED_JSON (like when importing data into BigQuery), and even the json mime type application/json, in pretty much all different cases I can think of (upper and lower combined with snake, camel, etc.).
{
"jobId": "my_predictions_123",
"predictionInput": {
"modelName": "projects/myproject/models/mymodel",
"inputPaths": [
"gs://model-bucket/data/testset.json"
],
"outputPath": "gs://model-bucket/predictions/0/",
"region": "us-central1",
"dataFormat": "JSON"
},
"predictionOutput": {
"outputPath": "gs://my-bucket/predictions/1/"
}
}
All my attempts have only gotten me this back though:
{
"error": {
"code": 400,
"message": "Invalid value at 'job.prediction_input.data_format' (TYPE_ENUM), \"JSON\"",
"status": "INVALID_ARGUMENT",
"details": [
{
"#type": "type.googleapis.com/google.rpc.BadRequest",
"fieldViolations": [
{
"field": "job.prediction_input.data_format",
"description": "Invalid value at 'job.prediction_input.data_format' (TYPE_ENUM), \"JSON\""
}
]
}
]
}
}
From Cloud ML API reference document https://cloud.google.com/ml/reference/rest/v1beta1/projects.jobs#DataFormat, the data format field in your request should be "TEXT" for all text inputs (including JSON, CSV, etc).

How do I use JSON with U2/Universe

U2/Universe JSON document have the following UDOSetProperty, how would one set the value if it has multiple values? For example if I have multiple emails.
example: UDOSetProperty(udoHandle, "to", value)
"to": [
{
"email": "recipientEmail#example.com",
"name": "Recipient Name",
"type": "to"
}
],
Not sure if you are trying to add another "to" array element or if you want to add a 2nd "email" only.
So working with your example:
"to": [
{
"email": [ "recipientEmail#example.com",
"name": "Recipient Name",
"type": "to"
},
{
"email": [ "recipient2Email#example.com",
"name": "Recipient2 Name",
"type": "to"
}
],
If you wanted to create the above JSON from scratch, with the UDO commands, the steps would be:
Using the following functions should help you with what you are trying to do:
Create the initial/root object UDOCreate(UDO_OBJECT,
udoHandle)
Create the array UDOCreate(UDO_ARRAY,
thisArray)
( Use UDOCreate and UDOSetProperty to create the theEmailObject you
want to add to the array, and then add it to the object with
UDOArrayAppendItem( thisArray, theEmailObject )
Then add the array to the root object eith UDOSetProperty(udoHandle,
"TO", thisArray)
Note the part that is important is that there are several functions for dealing with arrays.
Mike
Created a program that builds the JSON with the U2 UDO functions, and added it to github:
https://github.com/RocketSoftware/multivalue-lab/blob/master/U2/Demos/UDO/JSON/The-Basics/arrayExample