Extract connections in airflow and use in boto3 - amazon-web-services

I am. trying to extract connections from airflow as
key,pass = BaseHook.get_connection('aws_default')
print(conn.get_extra())
and use it in my boto3 connection as:
account_id = boto3.client('sts',aws_access_key_id=key,
aws_secret_access_key=pass).get_caller_identity().get('Account').
I am getting this error, how to resolve this?
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such table: connection

Related

Cannot connect to Cloud SQL using Apache-Beam JDBC

I am trying to connect to Cloud SQL by using Python SDK io.jdbc module, more specifically ReadFromJdbc class, which is documented here- https://beam.apache.org/releases/pydoc/current/apache_beam.io.jdbc.html
Based on it and info on connecting to Cloud MySQL using JDBC here- https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/blob/main/docs/jdbc-mysql.md I wrote the following code
import apache_beam as beam
import apache_beam.io.jdbc as jdbc
import typing
import apache_beam.coders as coders
from apache_beam.options.pipeline_options import PipelineOptions
pipeline_options = {
'project': 'project-name',
'runner': 'DataflowRunner',
'region': 'europe-central2',
'staging_location':"gs://temp",
'temp_location':"gs://temp",
'template_location':"gs://templates/temp_name"
}
pipeline_options = PipelineOptions.from_dictionary(pipeline_options)
serviceAccount = r'path\to\serviceaccount.json'
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = serviceAccount
ExampleRow = typing.NamedTuple('ExampleRow',
[('id', int), ('migration', str)])
coders.registry.register_coder(ExampleRow, coders.RowCoder)
with beam.Pipeline(options=pipeline_options) as p:
res = (
p
| "Read database list" >> jdbc.ReadFromJdbc(
table_name='table',
driver_class_name='com.mysql.jdbc.Driver',
jdbc_url='jdbc:mysql:///<DATABASE_NAME>?cloudSqlInstance=<INSTANCE_CONNECTION_NAME>&socketFactory=com.google.cloud.sql.mysql.SocketFactory&user=<MYSQL_USER_NAME>&password=<MYSQL_USER_PASSWORD>',
username='user',
password='pass',
query = "select id, migration from db.table;",
fetch_size=1,
classpath=["com.google.cloud.sql:mysql-socket-factory-connector-j-8:1.7.2"],
expansion_service = 'host:6666'
)
| "Print results" >> beam.io.WriteToText(r'gs://output/out.csv')
)
For the expansion service I have set up WLS2 python environment as documented here- https://beam.apache.org/documentation/sdks/java-multi-language-pipelines/#advanced-start-an-expansion-service
Unfortunately, I get this error:
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses; last error: UNAVAILABLE: ipv4:127.0.0.1:6666: WSA Error"
debug_error_string = "UNKNOWN:failed to connect to all addresses; last error: UNAVAILABLE: ipv4:127.0.0.1:6666: WSA Error {grpc_status:14, created_time:"2022-12-08T15:43:05.445755053+00:00"}"
I tried to switch expansion_service to a specific IP that I got from wls hostname -I but it produced the same result, even though you can reach it (tested with ping and hosted a webserver).
Am I doing something completely wrong? I find it hard to believe that it's so hard to connect to Cloud SQL, so I must be...
Transforms under apache_beam.io.jdbc module are cross-language transforms implemented in the Beam Java SDK. Hence, during the pipeline construction, Python SDK will connect to a Java expansion service to expand these transforms. You followed the instructions to create a Python expansion service.
I think the easiest thing to do will be to use the default expansion service.
First, install Java runtime in the computer from where the pipeline is constructed and make sure that java command is available.
Use the following transform to read from Cloud SQL,
p | "Read database list" >> jdbc.ReadFromJdbc(
table_name='table',
driver_class_name='com.mysql.jdbc.Driver',
jdbc_url='jdbc:mysql:///<DATABASE_NAME>?cloudSqlInstance=<INSTANCE_CONNECTION_NAME>&socketFactory=com.google.cloud.sql.mysql.SocketFactory&user=<MYSQL_USER_NAME>&password=<MYSQL_USER_PASSWORD>',
username='user',
password='pass',
query = "select id, migration from db.table;",
fetch_size=1,
classpath=["com.google.cloud.sql:mysql-socket-factory-connector-j-8:1.7.2"]
)

How to check the cluster status of Redshift using boto3 module in Python?

I need to check if the Redshift cluster is paused or available using Python. I am aware of the module boto3 which further provides a method describe_clusters(). I am not sure how to proceed further to write a Python script for that.
You could try
import boto3
import pandas as pd
DWH_CLUSTER_IDENTIFIER = 'Your clusterId'
KEY='Your AWS Key'
SECRET='Your AWS Secret'
# Get client to connect redshift
redshift = boto3.client('redshift',
region_name="us-east-1",
aws_access_key_id=KEY,
aws_secret_access_key=SECRET
)
# Get cluster list as it is in AWS console
myClusters = redshift.describe_clusters()['Clusters']
if len(myClusters) > 0:
df = pd.DataFrame(myClusters)
myCluster=df[df.ClusterIdentifier==DWH_CLUSTER_IDENTIFIER.lower()]
print("My cluster status is: {}".format(myCluster['ClusterAvailabilityStatus'].item()))
else:
print('No clusters available')

Connect to RabbitMQ instance remotely (created on AWS)

I'm having trouble connecting to a RabbitMQ instance (it's my first time doing so). I've spun one up on AWS, and been given access to an admin panel which I'm able to access.
I'm trying to connect to the RabbitMQ server in python/pika with the following code:
import pika
import logging
logging.basicConfig(level=logging.DEBUG)
credentials = pika.PlainCredentials('*******', '**********')
parameters = pika.ConnectionParameters(host='a-25c34e4d-a3eb-32de-abfg-l95d931afc72f.mq.us-west-1.amazonaws.com',
port=5671,
virtual_host='/',
credentials=credentials,
)
connection = pika.BlockingConnection(parameters)
I get pika.exceptions.IncompatibleProtocolError: StreamLostError: ("Stream connection lost: ConnectionResetError(54, 'Connection reset by peer')",) when I run the above.
you're trying to connect through AMQP protocol and AWS is using AMQPS, you should add ssl_options to your connection parameters like this
import ssl
logging.basicConfig(level=logging.DEBUG)
credentials = pika.PlainCredentials('*******', '**********')
context = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
parameters = pika.ConnectionParameters(host='a-25c34e4d-a3eb-32de-abfg-l95d931afc72f.mq.us-west-1.amazonaws.com',
port=5671,
virtual_host='/',
credentials=credentials,
ssl_options=pika.SSLOptions(context)
)
connection = pika.BlockingConnection(parameters)

Call MSSQL with Windows Auth from AWS Lambda

Is it possible to call MSSQL from AWS Lambda when MSSQL is part of windows domain and only allows windows authentication?
E.g. Is there any way to run Lambda function under AD user?
or use AD connector to associate AD user with IAM account and run Lambda under that account?
Yes, we can use pymssql python module inside the lambda function to connect to mssql server and execute your commands. You need to zip the pymssql module as a lambda deployment package.
here is the example of command and connection:
import boto3
import pymssql
#setup connection
conn = pymssql.connect("serverurl", "username", "password", "dbname")
#setup cursor, so that you can use it to execute your commands
cursor = conn.cursor()
cursor.execute("""
CREATE TABLE persons (
id INT NOT NULL,
name VARCHAR(100),
salesrep VARCHAR(100),
PRIMARY KEY(id)""")
# you must call commit() to persist your data if you don't set autocommit to True
conn.commit()
But, to connect using Windows Authentication, here is some mods:
When connecting using Windows Authentication, this is how to combine the database’s hostname and instance name, and the Active Directory/Windows Domain name and the username.
conn = pymssql.connect(
host=r'dbhostname\myinstance',
user=r'companydomain\username',
password=PASSWORD,
database='DatabaseOfInterest'
)

Why is my lambda not able to talk to elasticache?

I have an Redis ElastiCache cluster that has a FQDN for the primary node in the format: master.clustername.x.euw1.cache.amazonaws.com. I also have a Route53 record with the CNAME pointing at that FQDN.
I have a .net core lambda in the same VPC as the cluster, with access to the cluster via security groups. The lambda talks to the cluster using the Redis library developed by Stack Overflow (Github repo here for reference).
If I give the lambda the hostname the FQDN for the Redis cluster (the one that starts with master) I can connect, save data and read it.
If I give the lambda the CNAME (and that CNAME gives the same IP address as the FQDN when I ping it from my local machine and also if I use Dns.GetHostEntry within the lambda) it doesn't connect and I get the following error message:
One or more errors occurred. (It was not possible to connect to the redis server(s); to create a disconnected multiplexer, disable AbortOnConnectFail. SocketFailure on PING): AggregateException
at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
at lambda_method(Closure , Stream , Stream , LambdaContextInternal )
at StackExchange.Redis.ConnectionMultiplexer.ConnectImpl(Func`1 multiplexerFactory, TextWriter log) in c:\code\StackExchange.Redis\StackExchange.Redis\StackExchange\Redis\ConnectionMultiplexer.cs:line 890
at lambda.Redis.RedisClientBuilder.Build(String redisHost, String redisPort, Int32 redisDbId) in C:\BuildAgent\work\91d24911506461d0\src\Lambda\Redis\RedisClientBuilder.cs:line 9
at lambda.Ioc.ServiceBuilder.GetRedisClient() in C:\BuildAgent\work\91d24911506461d0\src\Lambda\IoC\ServiceBuilder.cs:line 18
at lambda.Ioc.ServiceBuilder.GetServices() in C:\BuildAgent\work\91d24911506461d0\src\Lambda\IoC\ServiceBuilder.cs:line 11
at Handlers.OrderHandler.Run(SNSEvent request, ILambdaContext context) in C:\BuildAgent\work\91d24911506461d0\src\Lambda\Handlers\OrderHandler.cs:line 26
Has anyone seen anything similar to this?
It turned out that because I was using an SSL certificate on the elasticache cluster and the SSL certificate was bound the the master. endpoint whereas I was trying to connect to the CNAME, the certificate validation was failing.
So I ended up querying the Route53 record within the code to get the master endpoint and it worked.
Possible workaround to isolate problems from your client lib -- follow AWS' tutorial and rewrite your Lambda to something like the code below (example in Python).
from __future__ import print_function
import time
import uuid
import sys
import socket
import elasticache_auto_discovery
from pymemcache.client.hash import HashClient
#elasticache settings
elasticache_config_endpoint = "your-elasticache-cluster-endpoint:port"
nodes = elasticache_auto_discovery.discover(elasticache_config_endpoint)
nodes = map(lambda x: (x[1], int(x[2])), nodes)
memcache_client = HashClient(nodes)
def handler(event, context):
"""
This function puts into memcache and get from it.
Memcache is hosted using elasticache
"""
#Create a random UUID... this will the sample element we add to the cache.
uuid_inserted = uuid.uuid4().hex
#Put the UUID to the cache.
memcache_client.set('uuid', uuid_inserted)
#Get item (UUID) from the cache.
uuid_obtained = memcache_client.get('uuid')
if uuid_obtained.decode("utf-8") == uuid_inserted:
# this print should go to the CloudWatch Logs and Lambda console.
print ("Success: Fetched value %s from memcache" %(uuid_inserted))
else:
raise Exception("Value is not the same as we put :(. Expected %s got %s" %(uuid_inserted, uuid_obtained))
return "Fetched value from memcache: " + uuid_obtained.decode("utf-8")
Reference: https://docs.aws.amazon.com/lambda/latest/dg/vpc-ec-deployment-pkg.html