How to join whosonfirst results with other layers? Pelias API - pelias

I've configured Pelias API and see that my results don't include whosonfirst locality
my API results:
api.geocode.earth results:
I do have configured all services (placeholder, pip, libpostal) running correctly
Config for whosonfirst import job:
"whosonfirst": {
"sqlite": true,
"datapath": "/data/whosonfirst",
"dataHost": "https://data.geocode.earth/wof/dist/sqlite/whosonfirst-data-admin-ua-latest.db.bz2",
"importPostalcodes": "true",
"countryCode": "UA",
"importPlace": [ 101752483 ]
}
I can see some data from whosonfirst are imported to elastic succesfully:
How can I add this information to my venues/streets layers results?

Related

AWS sagemaker endpoint received client (400) error

I've deployed a tensorflow multi-label classification model using a sagemaker endpoint as follows:
predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type="ml.m5.2xlarge", endpoint_name='testing-2')
It gets deployed and works fine when I invoke it from the Sagemaker Jupyter instance:
sample = ['this movie was extremely good']
output=predictor.predict(sample)
output:
{'predictions': [[0.00370046496,
4.32942124e-06,
0.00080883503,
9.25126587e-05,
0.00023958087,
0.000130862]]}
However, I am unable to send a request to the deployed endpoint from other notebooks or sagemaker studio. I'm unsure of the request format.
I've tried several variations in the input format and still failed. The error message is as below:
sagemaker error
Request:
{
"body": {
"text": "Testing model's prediction on this text"
},
"contentType": "application/json",
"endpointName": "testing-2",
"customURL": "",
"customHeaders": [
{
"Key": "sm_endpoint_name",
"Value": "testing-2"
}
]
}
Error:
Error invoking endpoint: Received client error (400) from primary with message "{ "error": "Failed to process element:
0 key: text of 'instances' list. Error: INVALID_ARGUMENT: JSON object: does not have named input: text" }".
See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/testing-2
in account 793433463428 for more information.
Is there any way to find out exactly how the model expects the request format to be?
Earlier I had the same model on my local system and the way I tested it was using this curl request:
curl -s -H 'Content-Type: application/json' -d '{"text": "what ugly posts"}' http://localhost:7070/sentiment
And it worked fine without any issues.
I've tried different formats and replaced the "text" key inside body with other words like "input", "body", nothing etc.
Based on your description above, I assume you are deploying the TensorFlow model using the SageMaker TensorFlow container.
If you want to view what your model expects as input you can use the saved_model CLI:
1
├── keras_metadata.pb
├── saved_model.pb
└── variables
├── variables.data-00000-of-00001
└── variables.index
!saved_model_cli show --all --dir {"1"}
After you have confirmed the input name above you can invoke the endpoint as follows:
import json
import boto3
client = boto3.client('runtime.sagemaker')
data = {"instances": ['this movie was extremely good']}
response = client.invoke_endpoint(EndpointName=<EndpointName>,
Body=json.dumps(data))
response_body = response['Body']
print(response_body.read())
The same payload can then also be used in Studio when invoking the endpoint.

AWS Glue - Kafka Connection using SASL/SCRAM

I am trying to create an AWS Glue Streaming job that reads from Kafka (MSK) clusters using SASL/SCRAM client authentication for the connection, per
https://aws.amazon.com/about-aws/whats-new/2022/05/aws-glue-supports-sasl-authentication-apache-kafka/
The connection configuration has the following properties (plus adequate subnet and security groups):
"ConnectionProperties": {
"KAFKA_SASL_SCRAM_PASSWORD": "apassword",
"KAFKA_BOOTSTRAP_SERVERS": "theserver:9096",
"KAFKA_SASL_MECHANISM": "SCRAM-SHA-512",
"KAFKA_SASL_SCRAM_USERNAME": "auser",
"KAFKA_SSL_ENABLED": "false"
}
And the actual api method call is
df = glue_context.create_data_frame.from_options(
connection_type="kafka",
connection_options={
"connectionName": "kafka-glue-connector",
"security.protocol": "SASL_SSL",
"classification": "json",
"startingOffsets": "latest",
"topicName": "atopic",
"inferSchema": "true",
"typeOfData": "kafka",
"numRetries": 1,
}
)
When running logs show the client is attempting to connect to brokers using Kerberos, and runs into
22/10/19 18:45:54 INFO ConsumerConfig: ConsumerConfig values:
sasl.mechanism = GSSAPI
security.protocol = SASL_SSL
security.providers = null
send.buffer.bytes = 131072
...
org.apache.kafka.common.errors.SaslAuthenticationException: Failed to configure SaslClientAuthenticator
Caused by: org.apache.kafka.common.KafkaException: Principal could not be determined from Subject, this may be a transient failure due to Kerberos re-login
How can I authenticate the AWS Glue job using SASL/SCRAM? What properties do I need to set in the connection and in the method call?
Thank you

Import Azure Data explorer Arm Template fails

i exported my arm template from develop environment related to my Azure Data Explorer.
Now i'am trying to import it in test environment but the process fails:
New-AzResourceGroupDeployment : 12:18:24 - The deployment 'template' failed with error(s). Showing 3 out of 10 error(s). Status Message: [BadRequest] Validation Errors found: mapping does not exist (Code:EventHubValidationErrorFound) Status Message: [BadRequest] Validation Errors found: mapping does not exist (Code:EventHubValidationErrorFound) Status Message: [BadRequest] Validation Errors found: mapping does not exist (Code:EventHubValidationErrorFound) CorrelationId: b27cdf8e-c583-4dee-8dbc-2b0e4876b8ca
I have different Data Connections from my Azure DATA Explorer to an Event HUB:
`
{
"type": "Microsoft.Kusto/Clusters/Databases/EventHubConnections",
"apiVersion": "2018-09-07-preview",
"name": "[concat(parameters('Clusters_xyzazne_name'), '/asd/asd-fondi')]",
"location": "North Europe",
"dependsOn": [
"[resourceId('Microsoft.Kusto/Clusters/Databases', parameters('Clusters__name'), 'DNA_R_NRT')]",
"[resourceId('Microsoft.Kusto/Clusters', parameters('Clusters__name'))]"
],
"kind": "EventHub",
"properties": {
"eventHubResourceId": "[concat(parameters('namespaces_ehub_externalid'), '/eventhubs/fondi')]",
"consumerGroup": "fondi_consumer",
"tableName": "fondi",
"mappingRuleName": "fondi_mapping",
"dataFormat": "multijson"
}
}
`
I'm trying to import an Azure Data Explorer Arm Template to another environment but fails

How to run a Google Cloud Build trigger via cli / rest api / cloud functions?

Is there such an option? My use case would be running a trigger for a production build (deploys to production). Ideally, that trigger doesn't need to listen to any change since it is invoked manually via chatbot.
I saw this video CI/CD for Hybrid and Multi-Cloud Customers (Cloud Next '18) announcing there's an API trigger support, I'm not sure if that's what I need.
I did same thing few days ago.
You can submit your builds using gcloud and rest api
gcloud:
gcloud builds submit --no-source --config=cloudbuild.yaml --async --format=json
Rest API:
Send you cloudbuild.yaml as JSON with Auth Token to this url https://cloudbuild.googleapis.com/v1/projects/standf-188123/builds?alt=json
example cloudbuild.yaml:
steps:
- name: 'gcr.io/cloud-builders/docker'
id: Docker Version
args: ["version"]
- name: 'alpine'
id: Hello Cloud Build
args: ["echo", "Hello Cloud Build"]
example rest_json_body:
{"steps": [{"args": ["version"], "id": "Docker Version", "name": "gcr.io/cloud-builders/docker"}, {"args": ["echo", "Hello Cloud Build"], "id": "Hello Cloud Build", "name": "alpine"}]}
This now seems to be possible via API:
https://cloud.google.com/cloud-build/docs/api/reference/rest/v1/projects.triggers/run
request.json:
{
"projectId": "*****",
"commitSha": "************"
}
curl request (with using a gcloud command):
PROJECT_ID="********" TRIGGER_ID="*******************"; curl -X POST -T request.json -H "Authorization: Bearer $(gcloud config config-helper \
--format='value(credential.access_token)')" \
https://cloudbuild.googleapis.com/v1/projects/"$PROJECT_ID"/triggers/"$TRIGGER_ID":run
You can use google client api to create build jobs with python:
import operator
from functools import reduce
from typing import Dict, List, Union
from google.oauth2 import service_account
from googleapiclient import discovery
class GcloudService():
def __init__(self, service_token_path, project_id: Union[str, None]):
self.project_id = project_id
self.service_token_path = service_token_path
self.credentials = service_account.Credentials.from_service_account_file(self.service_token_path)
class CloudBuildApiService(GcloudService):
def __init__(self, *args, **kwargs):
super(CloudBuildApiService, self).__init__(*args, **kwargs)
scoped_credentials = self.credentials.with_scopes(['https://www.googleapis.com/auth/cloud-platform'])
self.service = discovery.build('cloudbuild', 'v1', credentials=scoped_credentials, cache_discovery=False)
def get(self, build_id: str) -> Dict:
return self.service.projects().builds().get(projectId=self.project_id, id=build_id).execute()
def create(self, image_name: str, gcs_name: str, gcs_path: str, env: Dict = None):
args: List[str] = self._get_env(env) if env else []
opt_params: List[str] = [
'-t', f'gcr.io/{self.project_id}/{image_name}',
'-f', f'./{image_name}/Dockerfile',
f'./{image_name}'
]
build_cmd: List[str] = ['build'] + args + opt_params
body = {
"projectId": self.project_id,
"source": {
'storageSource': {
'bucket': gcs_name,
'object': gcs_path,
}
},
"steps": [
{
"name": "gcr.io/cloud-builders/docker",
"args": build_cmd,
},
],
"images": [
[
f'gcr.io/{self.project_id}/{image_name}'
]
],
}
return self.service.projects().builds().create(projectId=self.project_id, body=body).execute()
def _get_env(self, env: Dict) -> List[str]:
env: List[str] = [['--build-arg', f'{key}={value}'] for key, value in env.items()]
# Flatten array
return reduce(operator.iconcat, env, [])
Here is the documentation so that you can implement more functionality: https://cloud.google.com/cloud-build/docs/api
Hope this helps.
If you just want to create a function that you can invoke directly, you have two choices:
An HTTP trigger with a standard API endpoint
A pubsub trigger that you invoke by sending a message to a pubsub topic
The first is the more common approach, as you are effectively creating a web API that any client can call with an HTTP library of their choice.
You should be able to manually trigger a build using curl and a json payload.
For details see: https://cloud.google.com/cloud-build/docs/running-builds/start-build-manually#running_builds.
Given that, you could write a Python Cloud function to replicate the curl call via the requests module.
I was in search of the same thing (Fall 2022) and while I haven't tested yet I wanted to answer before I forget. It appears to be available now in gcloud beta builds triggers run TRIGGER
you can trigger a function via
gcloud functions call NAME --data 'THING'
inside your function you can do pretty much anything possibile within Googles Public API's
if you just want to directly trigger Google Cloud Builder from git then its probably advisable to use Release version tags - so your chatbot might add a release tag to your release branch in git at which point cloud-builder will start the build.
more info here https://cloud.google.com/cloud-build/docs/running-builds/automate-builds

Is there any easy way / API to find out the number of pipelines on a gocd server?

Sorry for the brief question, but just wondering if there's an API to find out the number of pipelines on a GoCD server.
The Pipeline Groups API will give you what you need after some JSON parsing.
$ curl 'https://ci.example.com/go/api/config/pipeline_groups' \
-u 'username:password'
Returns:
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
[
{
"pipelines": [
{
"stages": [
{
"name": "up42_stage"
}
],
"name": "up42",
"materials": [
{
"description": "URL: https://github.com/gocd/gocd, Branch: master",
"fingerprint": "2d05446cd52a998fe3afd840fc2c46b7c7e421051f0209c7f619c95bedc28b88",
"type": "Git"
}
],
"label": "${COUNT}"
}
],
"name": "first"
}
]
You can grab the config.xml file and parse it. from the config repo or via http.
As an alternative, you can just get the cctray file from your server at http://yourgoserver/go/cctray.xml and parse it.
It contains information about all the pipelines (including its stages)
I would recommend using yagocd:
from yagocd import Yagocd
go = Yagocd(server='https://build.gocd.io')
# login as guest
go._session.get('https://build.gocd.io/go/plugin/interact/gocd.guest.user.auth.plugin/index')
print(len(list(go.pipelines)))
Yes, of course. You can get the desired output in different ways. The first easy way to get the number of pipelines and other statistical information from the GoCD support URL (https://example.com/go/api/support) which requires admin privilege.
If the user does not have the admin privilege, we need to go with the GoCD pipeline_groups API. The below command should give you the exact result with jq(JSON processor)
$ curl 'https://example.com/go/api/config/pipeline_groups' -u 'username:password' | jq -r '.[] | .pipelines[].name' | wc -l
NOTE: Still Go Administrator users can get the actual number of pipelines.