How to create a data catalog in Amazon Glue externally?

How to create a data catalog in Amazon Glue externally? - amazon-web-services

I want to create a data catalog externally in Amazon Glue. Is there any way?

AWS Glue Data Catalog consists of meta information about various data sources within AWS, e.g. S3, DynamoDB etc.
Instead of using Crawlers or AWS Console, you can populate data catalog directly with
AWS Glue API
related to different structures, like Database, Table etc. AWS provides several SDKs for different languages, e.g.
boto3 for python with easy to
use object-oriented API. So as long as you know how your data structure, you can use methods
create_database()
create_table()
create_partition()
batch_create_partition()
Create Database definition:
from pprint import pprint
import boto3
client = boto3.client('glue')
response = client.create_database(
DatabaseInput={
'Name': 'my_database', # Required
'Description': 'Database created with boto3 API',
'Parameters': {
'my_param_1': 'my_param_value_1'
},
}
)
pprint(response)
# Output
{
'ResponseMetadata': {
'HTTPHeaders': {
'connection': 'keep-alive',
'content-length': '2',
'content-type': 'application/x-amz-json-1.1',
'date': 'Fri, 11 Oct 2019 12:37:12 GMT',
'x-amzn-requestid': '12345-67890'
},
'HTTPStatusCode': 200,
'RequestId': '12345-67890',
'RetryAttempts': 0
}
}
Create Table definition:
response = client.create_table(
DatabaseName='my_database',
TableInput={
'Name': 'my_table',
'Description': 'Table created with boto3 API',
'StorageDescriptor': {
'Columns': [
{
'Name': 'my_column_1',
'Type': 'string',
'Comment': 'This is very useful column',
},
{
'Name': 'my_column_2',
'Type': 'string',
'Comment': 'This is not as useful',
},
],
'Location': 's3://some/location/on/s3',
},
'Parameters': {
'classification': 'json',
'typeOfData': 'file',
}
}
)
pprint(response)
# Output
{
'ResponseMetadata': {
'HTTPHeaders': {
'connection': 'keep-alive',
'content-length': '2',
'content-type': 'application/x-amz-json-1.1',
'date': 'Fri, 11 Oct 2019 12:38:57 GMT',
'x-amzn-requestid': '67890-12345'
},
'HTTPStatusCode': 200,
'RequestId': '67890-12345',
'RetryAttempts': 0
}
}

Related

Why Records: [] is empty when i consume data from kinesis stream by python script?

I am trying to consume data from kinesis data stream which is created and produce data to it successfully , but when running consumer script in python :
import boto3
import json
from datetime import datetime
import time
my_stream_name = 'test'
kinesis_client = boto3.client('kinesis', region_name='us-east-1')
response = kinesis_client.describe_stream(StreamName=my_stream_name)
my_shard_id = response['StreamDescription']['Shards'][0]['ShardId']
shard_iterator = kinesis_client.get_shard_iterator(StreamName=my_stream_name,
ShardId=my_shard_id,
ShardIteratorType='LATEST')
my_shard_iterator = shard_iterator['ShardIterator']
record_response = kinesis_client.get_records(ShardIterator=my_shard_iterator,
Limit=2)
while 'NextShardIterator' in record_response:
record_response = kinesis_client.get_records(ShardIterator=record_response['NextShardIterator'],
Limit=2)
print(record_response)
# wait for 5 seconds
time.sleep(5)
But the output of the message data is empty ('Records': []):
{'Records': [], 'NextShardIterator':
'AAAAAAAAAAFFVFpvvveOquLUe7WO9nZAcYNQdcS6f6a+YGrrrjZo1gULcu/ZYxC7AB+xVlUhgL9UFPrQ22qmcQa6iIsmuKWl26buBk3utXlVqiGuDUYSgqMOtkp0Y7pJwa6N/I0fYfl2PLTXp5Qz8+5ZYuTW1KDt+PeSU3992bwgdOm7744cxcSnYFaQuHqfa0vLlaRBTOACVz4fwjggUBN01WdsoEjKmgtfNmuHSA7s9LLNzAapMg==',
'MillisBehindLatest': 0, 'ResponseMetadata': {'RequestId':
'e451dd27-c867-cf3d-be83-edbe95e9da9f', 'HTTPStatusCode': 200,
'HTTPHeaders': {'x-amzn-requestid':
'e451dd27-c867-cf3d-be83-edbe95e9da9f', 'x-amz-id-2':
'ClSlC3gRJuEqL9YJcHgC2N/TLSv56o+6406ki2+Zohnfo/erFVMDpPqkEWT+XAeeHXCdhYBbnOeZBPyesbXnVs45KQG78eRU',
'date': 'Thu, 14 Apr 2022 14:23:21 GMT', 'content-type':
'application/x-amz-json-1.1', 'content-length': '308'},
'RetryAttempts': 0}}

Celery Django Body Encoding

Hi does anyone know how the body of a celery json is encoded before it is entered in the queue cache (i use Redis in my case).
{'body': 'W1sic2hhd25AdWJ4LnBoIiwge31dLCB7fSwgeyJjYWxsYmFja3MiOiBudWxsLCAiZXJyYmFja3MiOiBudWxsLCAiY2hhaW4iOiBudWxsLCAiY2hvcmQiOiBudWxsfV0=',
'content-encoding': 'utf-8',
'content-type': 'application/json',
'headers': {'lang': 'py',
'task': 'export_users',
'id': '6e506f75-628e-4aa1-9703-c0185c8b3aaa',
'shadow': None,
'eta': None,
'expires': None,
'group': None,
'retries': 0,
'timelimit': [None, None],
'root_id': '6e506f75-628e-4aa1-9703-c0185c8b3aaa',
'parent_id': None,
'argsrepr': "('<email#example.com>', {})",
'kwargsrepr': '{}',
'origin': 'gen187209#ubuntu'},
'properties': {'correlation_id': '6e506f75-628e-4aa1-9703-c0185c8b3aaa',
'reply_to': '403f7314-384a-30a3-a518-65911b7cba5c',
'delivery_mode': 2,
'delivery_info': {'exchange': '', 'routing_key': 'celery'},
'priority': 0,
'body_encoding': 'base64',
'delivery_tag': 'dad6b5d3-c667-473e-a62c-0881a7349684'}}
Just a background I have a nodejs project which needs to trigger my celery (django). Background tasks are all in the django app but the trigger and the details will come from a nodejs app.
Thanks in advance

It may just be simpler to use the nodejs celery client
https://github.com/mher/node-celery/blob/master/celery.js
to invoke a celery task from nodejs.

Send Image over AWS SNS Notification with Boto3

I am trying to send an image stored in AWS Lambda /tmp/ folder.
I extract the image from AWS Kinesis Video Stream and write them to /tmp/ using opencv.
I am testing the code locally using boto3 but this would normally be in a lambda function.
import boto3
import base64
if __name__ == '__main__':
client = boto3.client('sns')
phone_number = '+12345678900'
img = open('tmp/image10.jpeg', 'rb').read()
response = client.publish(
PhoneNumber=phone_number,
Message = "here is a picture: ",
MessageAttributes = {
'store' : {"DataType": "Binary", "BinaryValue": base64.b64encode(img)}
}
)
print(response)
I am getting a success response and the text in Message argument but not the image:
{'MessageId': '2d435caa-bb24-5198-9bdc-04ecef05c0ef', 'ResponseMetadata': {'RequestId': '5t97d18b-p15f-554b-b93f-819df53e64bc', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '5c97d98b-a15f-554b-bf3f-819df53e64bc', 'content-type': 'text/xml', 'content-length': '294', 'date': 'Fri, 30 Oct 2020 17:49:20 GMT'}, 'RetryAttempts': 0}}

How get Cognito users list in JSON-format

I'm going to backup of my Cognito users with Lambda but I can't get Cognito users list in JSON-format with boto3. I do:
import boto3
import os
import json
from botocore.exceptions import ClientError
COGNITO_POOL_ID = os.getenv('POOL_ID')
S3_BUCKET = os.getenv('BACKUP_BUCKET')
ENV_NAME = os.getenv('ENV_NAME')
filename = ENV_NAME + "-cognito-backup.json"
REGION = os.getenv('REGION')
cognito = boto3.client('cognito-idp', region_name=REGION)
s3 = boto3.resource('s3')
def lambda_handler (event,context):
try:
response = (cognito.list_users(UserPoolId=COGNITO_POOL_ID,AttributesToGet=['email_verified','email']))['Users']
data = json.dumps(str(response)).encode('UTF-8')
s3object = s3.Object(S3_BUCKET, filename)
s3object.put(Body=(bytes(data)))
except ClientError as error:
print(error)
But get one string and I'm not sure that is JSON at all:
[{'Username': 'user1', 'Attributes': [{'Name': 'email_verified', 'Value': 'true'}, {'Name': 'email', 'Value': 'user1#xxxx.com'}], 'UserCreateDate': datetime.datetime(2020, 2, 10, 13, 13, 34, 457000, tzinfo=tzlocal()), 'UserLastModifiedDate': datetime.datetime(2020, 2, 10, 13, 13, 34, 457000, tzinfo=tzlocal()), 'Enabled': True, 'UserStatus': 'FORCE_CHANGE_PASSWORD'}]
I need something like this:
[
{
"Username": "user1",
"Attributes": [
{
"Name": "email_verified",
"Value": "true"
},
{
"Name": "email",
"Value": "user1#xxxx.com"
}
],
"Enabled": "true",
"UserStatus": "CONFIRMED"
}
]

Try this:
import ast
import json
print(ast.literal_eval(json.dumps(response)))
For the dict response from the SDK?
Edit: Just realized since the list_users SDK also UserCreateDate object, json.dumps will complain about the transformation due to the datatime value of the UserCreateDate key. If you get that off, this will work without the ast module -
import json
data = {'Username': 'Google_11761250', 'Attributes': [{'Name': 'email', 'Value': 'abc#gmail.com'}],'Enabled': True, 'UserStatus': 'EXTERNAL_PROVIDER'}
print((json.dumps(data)))
> {"Username": "Google_1176125910", "Attributes": [{"Name": "email", "Value": "123#gmail.com"}], "Enabled": true, "UserStatus": "EXTERNAL_PROVIDER"}

You can check the output type by using
type(output)
I guess that it can be list type, so you can convert it into JSON and prettyprint by using:
print(json.dumps(output, indent=4))

Django x-www-form-urlencoded request

I'm trying to do the following request with django :
I tried the following code but it doesn't work :
data = {'username': admin,
'password': 123,
'grant_type': 'password',
'client_id': 'xxxx',
'client_secret': 'xxxx'}
headers = {'content-type': 'application/x-www-form-urlencoded'}
r = requests.post(url, data=data, headers=headers)
Thanks for your help !

It is form-encoded by default.
Typically, you want to send some form-encoded data — much like an HTML
form. To do this, simply pass a dictionary to the data argument. Your
dictionary of data will automatically be form-encoded when the request
is made.
>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.post("http://httpbin.org/post", data=payload)
>>> print r.text
{
"origin": "179.13.100.4",
"files": {},
"form": {
"key2": "value2",
"key1": "value1"
},
"url": "http://httpbin.org/post",
"args": {},
"headers": {
"Content-Length": "23",
"Accept-Encoding": "identity, deflate, compress, gzip",
"Accept": "*/*",
"User-Agent": "python-requests/0.8.0",
"Host": "127.0.0.1:7077",
"Content-Type": "application/x-www-form-urlencoded"
},
"data": ""
}
http://docs.python-requests.org/en/v0.10.7/user/quickstart/#make-a-post-request

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to create a data catalog in Amazon Glue externally? - amazon-web-services

I want to create a data catalog externally in Amazon Glue. Is there any way?

Related

Why Records: [] is empty when i consume data from kinesis stream by python script?

Celery Django Body Encoding

Send Image over AWS SNS Notification with Boto3

How get Cognito users list in JSON-format

Django x-www-form-urlencoded request

Categories

Resources