How to stop Dataproc Job from airflow

How to stop Dataproc Job from airflow - google-cloud-platform

From Airflow we can submit dataproc jobs via DataprocSubmitJobOperator.
We can stop dataproc jobs in development environment but not in Production environnment via GCP Console.
Is there any way, we can kill dataproc jobs directly via Airflow, if dataproc job id is provided as parameter.

At the moment there is no operator for this action but DataprocHook has cancel_job function so you can create a custom operator :
class MyDataprocCancelJobOperator(BaseOperator):
""" Starts a job cancellation request."""
template_fields: Sequence[str] = ("region", "project_id", "impersonation_chain")
def __init__(
self,
*,
job_id: str,
project_id: str,
region: Optional[str] = None,
retry: Union[Retry, _MethodDefault] = DEFAULT,
timeout: Optional[float] = None,
metadata: Sequence[Tuple[str, str]] = (),
gcp_conn_id: str = "google_cloud_default",
impersonation_chain: Optional[Union[str, Sequence[str]]] = None,
**kwargs,
) -> None:
super().__init__(**kwargs)
self.job_id = job_id
self.project_id = project_id
self.region = region
self.retry = retry
self.timeout = timeout
self.metadata = metadata
self.gcp_conn_id = gcp_conn_id
self.impersonation_chain = impersonation_chain
def execute(self, context: 'Context'):
hook = DataprocHook(gcp_conn_id=self.gcp_conn_id, impersonation_chain=self.impersonation_chain)
job = hook.cancel_job(
job_id=self.job_id,
project_id=self.project_id,
region=self.region,
retry=self.retry,
timeout=self.timeout,
metadata=self.metadata,
)
return job

It seems that Airflow Dataproc doesn't include an operator to cancel job. See https://airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/cloud/dataproc.html. Maybe you can raise a feature request in the airflow community.

Related

How to get the list of Nitro system based EC2 instance type by CLI?

I know this page lists up the instance types which based on Nitro system but I would like to know the list in a dynamic way with CLI. (for example, using aws ec2 describe-instances). Is it possible to get Nitro based instance type other than parsing the static page? If so, could you tell me the how?

You'd have to write a bit of additional code to get that information. aws ec2 describe-instances will give you InstanceType property. You should use a programming language to parse the JSON, extract InstanceType and then call describe-instances like so: https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-instance-types.html?highlight=nitro
From the JSON you get back, extract hypervisor. That'll give you Nitro if the instance is Nitro.
Here's a Python code that might work. I have not tested it fully but you can tweak this to get the results you want.
"""List all EC2 instances"""
import boto3
def ec2_connection():
"""Connect to AWS using API"""
region = 'us-east-2'
aws_key = 'xxx'
aws_secret = 'xxx'
session = boto3.Session(
aws_access_key_id = aws_key,
aws_secret_access_key = aws_secret
)
ec2 = session.client('ec2', region_name = region)
return ec2
def get_reservations(ec2):
"""Get a list of instances as a dictionary"""
response = ec2.describe_instances()
return response['Reservations']
def process_instances(reservations, ec2):
"""Print a colorful list of IPs and instances"""
if len(reservations) == 0:
print('No instance found. Quitting')
return
for reservation in reservations:
for instance in reservation['Instances']:
# get friendly name of the server
# only try this for mysql1.local server
friendly_name = get_friendly_name(instance)
if friendly_name.lower() != 'mysql1.local':
continue
# get the hypervisor based on the instance type
instance_type = get_instance_info(instance['InstanceType'], ec2)
# print findings
print(f'{friendly_name} // {instance["InstanceType"]} is {instance_type}')
break
def get_instance_info(instance_type, ec2):
"""Get hypervisor from the instance type"""
response = ec2.describe_instance_types(
InstanceTypes=[instance_type]
)
return response['InstanceTypes'][0]['Hypervisor']
def get_friendly_name(instance):
"""Get friendly name of the instance"""
tags = instance['Tags']
for tag in tags:
if tag['Key'] == 'Name':
return tag['Value']
return 'Unknown'
def run():
"""Main method to call"""
ec2 = ec2_connection()
reservations = get_reservations(ec2)
process_instances(reservations, ec2)
if __name__ == '__main__':
run()
print('Done')

In the above answer , the statement "From the JSON you get back, extract hypervisor. That'll give you Nitro if the instance is Nitro " is not longer accurate.
As per the latest AWS documentation,
hypervisor - The hypervisor type of the instance (ovm | xen ). The value xen is used for both Xen and Nitro hypervisors.

Cleaned up, verified working code below:
# Get all instance types that run on Nitro hypervisor
import boto3
def get_nitro_instance_types():
"""Get all instance types that run on Nitro hypervisor"""
ec2 = boto3.client('ec2', region_name = 'us-east-1')
response = ec2.describe_instance_types(
Filters=[
{
'Name': 'hypervisor',
'Values': [
'nitro',
]
},
],
)
instance_types = []
for instance_type in response['InstanceTypes']:
instance_types.append(instance_type['InstanceType'])
return instance_types
get_nitro_instance_types()
Example output as of 12/06/2022 below:
['r5dn.8xlarge', 'x2iedn.xlarge', 'r6id.2xlarge', 'r6gd.medium',
'm5zn.2xlarge', 'r6idn.16xlarge', 'c6a.48xlarge', 'm5a.16xlarge',
'im4gn.2xlarge', 'c6gn.16xlarge', 'c6in.24xlarge', 'r5ad.24xlarge',
'r6i.xlarge', 'c6i.32xlarge', 'x2iedn.2xlarge', 'r6id.xlarge',
'i3en.24xlarge', 'i3en.12xlarge', 'm5d.8xlarge', 'c6i.8xlarge',
'r6g.large', 'm6gd.4xlarge', 'r6a.2xlarge', 'x2iezn.4xlarge',
'c6i.large', 'r6in.24xlarge', 'm6gd.xlarge', 'm5dn.2xlarge',
'd3en.2xlarge', 'c6id.8xlarge', 'm6a.large', 'is4gen.xlarge',
'r6g.8xlarge', 'm6idn.large', 'm6a.2xlarge', 'c6i.4xlarge',
'i4i.16xlarge', 'm5zn.6xlarge', 'm5.8xlarge', 'm6id.xlarge',
'm5n.16xlarge', 'c6g.16xlarge', 'r5n.12xlarge', 't4g.nano',
'm5ad.12xlarge', 'r6in.12xlarge', 'm6idn.12xlarge', 'g5.2xlarge',
'trn1.32xlarge', 'x2gd.8xlarge', 'is4gen.4xlarge', 'r6gd.xlarge',
'r5a.xlarge', 'r5a.2xlarge', 'c5ad.24xlarge', 'r6a.xlarge',
'r6g.medium', 'm6id.12xlarge', 'r6idn.2xlarge', 'c5n.2xlarge',
'g5.4xlarge', 'm5d.xlarge', 'i3en.3xlarge', 'r5.24xlarge',
'r6gd.2xlarge', 'c5d.large', 'm6gd.12xlarge', 'm6id.2xlarge',
'm6i.large', 'z1d.2xlarge', 'm5a.4xlarge', 'm5a.2xlarge',
'c6in.xlarge', 'r6id.16xlarge', 'c7g.8xlarge', 'm5dn.12xlarge',
'm6gd.medium', 'im4gn.8xlarge', 'm5dn.large', 'c5ad.4xlarge',
'r6g.16xlarge', 'c6a.24xlarge', 'c6a.16xlarge']

"""List all EC2 instances"""
import boto3
def ec2_connection():
"""Connect to AWS using API"""
region = 'us-east-2'
aws_key = 'xxx'
aws_secret = 'xxx'
session = boto3.Session(
aws_access_key_id = aws_key,
aws_secret_access_key = aws_secret
)
ec2 = session.client('ec2', region_name = region)
return ec2
def get_reservations(ec2):
"""Get a list of instances as a dictionary"""
response = ec2.describe_instances()
return response['Reservations']
def process_instances(reservations, ec2):
"""Print a colorful list of IPs and instances"""
if len(reservations) == 0:
print('No instance found. Quitting')
return
for reservation in reservations:
for instance in reservation['Instances']:
# get friendly name of the server
# only try this for mysql1.local server
friendly_name = get_friendly_name(instance)
if friendly_name.lower() != 'mysql1.local':
continue
# get the hypervisor based on the instance type
instance_type = get_instance_info(instance['InstanceType'], ec2)
# print findings
print(f'{friendly_name} // {instance["InstanceType"]} is {instance_type}')
break
def get_instance_info(instance_type, ec2):
"""Get hypervisor from the instance type"""
response = ec2.describe_instance_types(
InstanceTypes=[instance_type]
)
return response['InstanceTypes'][0]['Hypervisor']
def get_friendly_name(instance):
"""Get friendly name of the instance"""
tags = instance['Tags']
for tag in tags:
if tag['Key'] == 'Name':
return tag['Value']
return 'Unknown'
def run():
"""Main method to call"""
ec2 = ec2_connection()
reservations = get_reservations(ec2)
process_instances(reservations, ec2)
if name == 'main':
run()
print('Done')

How to get list of active connections on RDS using boto3

I can see following information regarding the RDS Instance
I want to know how can I get value of current activity using boto3. Current value as shown in below screenshot is 0.
I tried
response=client.describe_db_instances()
But it didnt returned the value of active connections.

You can get that data from CloudWatch. RDS sends any state information there and just render a few metrics in RDS dashboard.

#ivan thanks for the directions.
I created following python script to get information about instances with 0 connections and delete them after that. I hope it helps someone.
import datetime
import boto3
class RDSTermination:
#Strandard constructor for RDSTermination class
def __init__(self, cloudwatch_object, rds_object):
self.cloudwatch_object = cloudwatch_object
self.rds_object = rds_object
#Getter and setters for variables.
#property
def cloudwatch_object(self):
return self._cloudwatch_object
#cloudwatch_object.setter
def cloudwatch_object(self, cloudwatch_object):
self._cloudwatch_object = cloudwatch_object
#property
def rds_object(self):
return self._rds_object
#rds_object.setter
def rds_object(self, rds_object):
self._rds_object = rds_object
# Fetch connections details for all the RDS instances.Filter the list and return
# only those instances which are having 0 connections at the time of this script run
def _get_instance_connection_info(self):
rds_instances_connection_details = {}
response = self.cloudwatch_object.get_metric_data(
MetricDataQueries=[
{
'Id': 'fetching_data_for_something',
'Expression': "SEARCH('{AWS/RDS,DBInstanceIdentifier} MetricName=\"DatabaseConnections\"', 'Average', 300)",
'ReturnData': True
},
],
EndTime=datetime.datetime.utcnow(),
StartTime=datetime.datetime.utcnow() - datetime.timedelta(hours=2),
ScanBy='TimestampDescending',
MaxDatapoints=123
)
# response is of type dictionary with MetricDataResults as key
for instance_info in response['MetricDataResults']:
if len(instance_info['Timestamps']) > 0:
rds_instances_connection_details[instance_info['Label']] = instance_info['Values'][-1]
return rds_instances_connection_details
# Fetches list of all instances and there status.
def _fetch_all_rds_instance_state(self):
all_rds_instance_state = {}
response = self.rds_object.describe_db_instances()
instance_details = response['DBInstances']
for instance in instance_details:
all_rds_instance_state[instance['DBInstanceIdentifier']] = instance['DBInstanceStatus']
return all_rds_instance_state
# We further refine the list and remove instances which are stopped. We will work on
# Instances with Available state only
def _get_instance_allowed_for_deletion(self):
instances = self._get_instance_connection_info()
all_instance_state = self._fetch_all_rds_instance_state()
instances_to_delete = []
try:
for instance_name in instances.keys():
if instances[instance_name] == 0.0 and all_instance_state[instance_name] == 'available':
instances_to_delete.append(instance_name)
except BaseException:
print("Check if instance connection_info is empty")
return instances_to_delete
# Function to delete the instances reported in final list.It deletes instances with 0 connection
# and status as available
def terminate_rds_instances(self, dry_run=True):
if dry_run:
message = 'DRY-RUN'
else:
message = 'DELETE'
rdsnames = self._get_instance_allowed_for_deletion()
if len(rdsnames) > 0:
for rdsname in rdsnames:
try:
response = self.rds_object.describe_db_instances(
DBInstanceIdentifier=rdsname
)
termination_protection = response['DBInstances'][0]['DeletionProtection']
except BaseException as e:
print('[ERROR]: reading details' + str(e))
exit(1)
if termination_protection is True:
try:
print("Removing delete termination for {}".format(rdsname))
if not dry_run:
response = self.rds_object.modify_db_instance(
DBInstanceIdentifier=rdsname,
DeletionProtection=False
)
except BaseException as e:
print(
"[ERROR]: Could not modify db termination protection "
"due to following error:\n " + str(
e))
exit(1)
try:
if not dry_run:
print("i got executed")
response = self.rds_object.delete_db_instance(
DBInstanceIdentifier=rdsname,
SkipFinalSnapshot=True,
)
print('[{}]: RDS instance {} deleted'.format(message, rdsname))
except BaseException:
print("[ERROR]: {} rds instance not found".format(rdsname))
else:
print("No RDS instance marked for deletion")
if __name__ == "__main__":
cloud_watch_object = boto3.client('cloudwatch', region_name='us-east-1')
rds_object = boto3.client('rds', region_name='us-east-1')
rds_termination_object = RDSTermination(cloud_watch_object, rds_object)
rds_termination_object.terminate_rds_instances(dry_run=True)

Sending EMR Logs to CloudWatch

Is there a way to send EMR logs to CloudWatch instead of S3. We would like to have all our services logs in one location. Seems like the only thing you can do is set up alarms for monitoring but that doesn't cover logging.
https://docs.aws.amazon.com/emr/latest/ManagementGuide/UsingEMR_ViewingMetrics.html
Would I have to install CloudWatch agent on the nodes in the cluster https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AgentReference.html

you can install the CloudWatch agent via EMR’s bootstrap configuration, and configure it to watch log directories. It then starts to push logs to Amazon CloudWatch Logs

You can read the logs from s3 and push them to the cloudwatch using boto3 and delete them from s3 if you do not need. In some use-cases stdout.gz log will be needed to be in the cloudwatch for monitoring purposes.
boto3 documentation on put_log_events
import boto3
import botocore.session
import logging
import time
import datetime
import gzip
def get_session(service_name):
session = botocore.session.get_session()
aws_access_key_id = session.get_credentials().access_key
aws_secret_access_key = session.get_credentials().secret_key
aws_session_token = session.get_credentials().token
region = session.get_config_variable('region')
return boto3.client(
service_name = service_name,
region_name = region,
aws_access_key_id = aws_access_key_id,
aws_secret_access_key = aws_secret_access_key,
aws_session_token = aws_session_token
)
def get_log_file(s3, bucket, key):
log_file = None
try:
obj = s3.get_object(Bucket=bucket, Key=key)
compressed_body = obj['Body'].read()
log_file = gzip.decompress(compressed_body)
except Exception as e:
logger.error(f"Error reading from bucket : {e}")
raise
return log_file
def create_log_events(logs, batch_size):
log_event_batch = []
log_event_batch_collection = []
try:
for line in logs.splitlines():
log_event = {'timestamp': int(round(time.time() * 1000)), 'message':line.decode('utf-8')}
if len(log_event_batch) < batch_size:
log_event_batch.append(log_event)
else:
log_event_batch_collection.append(log_event_batch)
log_event_batch = []
log_event_batch.append(log_event)
except Exception as e:
logger.error(f"Error creating log events : {e}")
raise
log_event_batch_collection.append(log_event_batch)
return log_event_batch_collection
def create_log_stream_and_push_log_events(logs, log_group, log_stream, log_event_batch_collection, delay):
response = logs.create_log_stream(logGroupName=log_group, logStreamName=log_stream)
seq_token = None
try:
for log_event_batch in log_event_batch_collection:
log_event = {
'logGroupName': log_group,
'logStreamName': log_stream,
'logEvents': log_event_batch
}
if seq_token:
log_event['sequenceToken'] = seq_token
response = logs.put_log_events(**log_event)
seq_token = response['nextSequenceToken']
time.sleep(delay)
except Exception as e:
logger.error(f"Error pushing log events : {e}")
raise
The caller function
def main():
s3 = get_session('s3')
logs = get_session('logs')
BUCKET_NAME = 'Your_Bucket_Name'
KEY = 'logs/emr/Path_To_Log/stdout.gz'
BATCH_SIZE = 10000 #According to boto3 docs
PUSH_DELAY = 0.2 #According to boto3 docs
LOG_GROUP='test_log_group' #Destination log group
LOG_STREAM='{}-{}'.format(time.strftime('%Y-%m-%d'),'logstream.log')
log_file = get_log_file(s3, BUCKET_NAME, KEY)
log_event_batch_collection = create_log_events(log_file, BATCH_SIZE)
create_log_stream_and_push_log_events(logs, LOG_GROUP, LOG_STREAM, log_event_batch_collection, PUSH_DELAY)

Required Cloudformation Script for Blue/Green deployment on ECS

I am trying to write a cloud-formation template for AWS ECS with blue green deployment support. This blue-green feature was added recently by AWS in ECS and couldn't find any reference for updating it in cloud-formation template. They have given documentation on, how to do it through UI but not through cloud-formation. I guess, AWS might not updated their cloud-formation documentation as it is a new feature. Any help to find the documentation would be appreciated. Thanking you in advance.

Support for blue/green deployment in CloudFormation has been added. It can be found here in the documentation:
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-ecs-service-deploymentcontroller.html
In the "Type" property you can choose "CODE_DEPLOY" as the deployment type. Hope this helps!

Currently cloudformation does not support the DeploymentController parameter in which you can specify CODE_DEPLOY.
Keep yourself update by visiting this page for documentation updates:
https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-ecs-service.html
For now - use custom cloudformation resource. Use Boto3 library to create the service with CODE_DEPLOY setting. Read more here:
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ecs.html#ECS.Client.create_service
This is the python class which can create/delete/upadte ecs:
import boto3
from botocore.exceptions import ClientError
from typing import Any, Dict
client = boto3.client('ecs')
class Service:
#staticmethod
def create(**kwargs) -> Dict[str, Any]:
kwargs = dict(
cluster=kwargs.get('cluster'),
serviceName=kwargs.get('serviceName'),
taskDefinition=kwargs.get('taskDefinition'),
loadBalancers=kwargs.get('loadBalancers'),
serviceRegistries=kwargs.get('serviceRegistries'),
desiredCount=kwargs.get('desiredCount'),
clientToken=kwargs.get('clientToken'),
launchType=kwargs.get('launchType'),
platformVersion=kwargs.get('platformVersion'),
role=kwargs.get('role'),
deploymentConfiguration=kwargs.get('deploymentConfiguration'),
placementConstraints=kwargs.get('placementConstraints'),
placementStrategy=kwargs.get('placementStrategy'),
networkConfiguration=kwargs.get('networkConfiguration'),
healthCheckGracePeriodSeconds=kwargs.get('healthCheckGracePeriodSeconds'),
schedulingStrategy=kwargs.get('schedulingStrategy'),
deploymentController=kwargs.get('deploymentController'),
tags=kwargs.get('tags'),
enableECSManagedTags=kwargs.get('enableECSManagedTags'),
propagateTags=kwargs.get('propagateTags'),
)
kwargs = {key: value for key, value in kwargs.items() if key and value}
return client.create_service(**kwargs)
#staticmethod
def update(**kwargs: Dict[str, Any]) -> Dict[str, Any]:
filtered_kwargs = dict(
cluster=kwargs.get('cluster'),
service=kwargs.get('serviceName'),
desiredCount=kwargs.get('desiredCount'),
taskDefinition=kwargs.get('taskDefinition'),
deploymentConfiguration=kwargs.get('deploymentConfiguration'),
networkConfiguration=kwargs.get('networkConfiguration'),
platformVersion=kwargs.get('platformVersion'),
forceNewDeployment=kwargs.get('forceNewDeployment'),
healthCheckGracePeriodSeconds=kwargs.get('healthCheckGracePeriodSeconds')
)
try:
filtered_kwargs = {key: value for key, value in filtered_kwargs.items() if key and value}
return client.update_service(**filtered_kwargs)
except ClientError as ex:
if ex.response['Error']['Code'] == 'InvalidParameterException':
if 'use aws codedeploy' in ex.response['Error']['Message'].lower():
# For services using the blue/green (CODE_DEPLOY ) deployment controller,
# only the desired count, deployment configuration, and health check grace period
# can be updated using this API. If the network configuration, platform version, or task definition
# need to be updated, a new AWS CodeDeploy deployment should be created.
filtered_kwargs = dict(
cluster=kwargs.get('cluster'),
service=kwargs.get('serviceName'),
desiredCount=kwargs.get('desiredCount'),
deploymentConfiguration=kwargs.get('deploymentConfiguration'),
healthCheckGracePeriodSeconds=kwargs.get('healthCheckGracePeriodSeconds'),
)
filtered_kwargs = {key: value for key, value in filtered_kwargs.items() if key and value}
return client.update_service(**filtered_kwargs)
elif ex.response['Error']['Code'] == 'ServiceNotActiveException':
# We can not update ecs service if it is inactive.
return {'Code': 'ServiceNotActiveException'}
elif ex.response['Error']['Code'] == 'ServiceNotFoundException':
# If for some reason service was not found - don't update and return.
return {'Code': 'ServiceNotFoundException'}
raise
#staticmethod
def delete(**kwargs: Dict[str, Any]) -> Dict[str, Any]:
kwargs = dict(
cluster=kwargs.get('cluster'),
service=kwargs.get('serviceName'),
force=True
)
kwargs = {key: value for key, value in kwargs.items() if key and value}
return client.delete_service(**kwargs)

How can I set a timeout on Dataflow?

I am using Composer to run my Dataflow pipeline on a schedule. If the job is taking over a certain amount of time, I want it to be killed. Is there a way to do this programmatically either as a pipeline option or a DAG parameter?

Not sure how to do it as a pipeline config option, but here is an idea.
You could launch a taskqueue task with countdown set to your timeout value. When the task does launch, you could check to see if your task is still running:
https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs/list
If it is, you can call update on it with job state JOB_STATE_CANCELLED
https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs/update
https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs#jobstate
This is done through the googleapiclient lib: https://developers.google.com/api-client-library/python/apis/discovery/v1
Here is an example of how to use it
class DataFlowJobsListHandler(InterimAdminResourceHandler):
def get(self, resource_id=None):
"""
Wrapper to this:
https://cloud.google.com/dataflow/docs/reference/rest/v1b3/projects.jobs/list
"""
if resource_id:
self.abort(405)
else:
credentials = GoogleCredentials.get_application_default()
service = discovery.build('dataflow', 'v1b3', credentials=credentials)
project_id = app_identity.get_application_id()
_filter = self.request.GET.pop('filter', 'UNKNOWN').upper()
jobs_list_request = service.projects().jobs().list(
projectId=project_id,
filter=_filter) #'ACTIVE'
jobs_list = jobs_list_request.execute()
return {
'$cursor': None,
'results': jobs_list.get('jobs', []),
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to stop Dataproc Job from airflow - google-cloud-platform

From Airflow we can submit dataproc jobs via DataprocSubmitJobOperator. We can stop dataproc jobs in development environment but not in Production environnment via GCP Console. Is there any way, we can kill dataproc jobs directly via Airflow, if dataproc job id is provided as parameter.

It seems that Airflow Dataproc doesn't include an operator to cancel job. See https://airflow.apache.org/docs/apache-airflow-providers-google/stable/operators/cloud/dataproc.html. Maybe you can raise a feature request in the airflow community.

Related

How to get the list of Nitro system based EC2 instance type by CLI?

How to get list of active connections on RDS using boto3

Sending EMR Logs to CloudWatch

Required Cloudformation Script for Blue/Green deployment on ECS

How can I set a timeout on Dataflow?

Categories

Resources