How to enable glue logging for AWS Glue script only - amazon-web-services

I am struggling to enable DEBUG logging for a Glue script using PySpark only.
I have tried:
import...
def quiet_logs(sc):
logger = sc._jvm.org.apache.log4j
logger.LogManager.getLogger("org").setLevel(logger.Level.ERROR)
logger.LogManager.getLogger("akka").setLevel(logger.Level.ERROR)
def main():
# Get the Spark Context
sc = SparkContext.getOrCreate()
sc.setLogLevel("DEBUG")
quiet_logs(sc)
context = GlueContext(sc)
logger = context.get_logger()
logger.debug("I only want to see this..., and for all others, only ERRORS")
...
I have '--enable-continuous-cloudwatch-log' set to true, but simply cannot get the log trail to only write debug messages for my own script.

I haven't managed to do exactly what you want, but I was able to do something similar by setting up a separate custom log, and this might achieve what you're after.
import os
from watchtower import CloudWatchLogHandler
import logging
args = getResolvedOptions(sys.argv,["JOB_RUN_ID"])
job_run_id = args["JOB_RUN_ID"]
os.environ["AWS_DEFAULT_REGION"] = "eu-west-1"
lsn = f"{job_run_id}_custom"
cw = CloudWatchLogHandler(
log_group="/aws-glue/jobs/logs-v2", stream_name=lsn, send_interval=4
)
slog = logging.getLogger()
slog.setLevel(logging.DEBUG)
slog.handlers = []
slog.addHandler(cw)
slog.info("hello from the custom logger")
Now anything you log to slog will go into a separate logger accessible as one of the entries in the 'output' logs
Note you need to include watchtower as a --additional-python-modules when you run the glue job
More info here

This should be regular logging setup in Python.
I have tested this in a Glue job and only one debug-message was visible. The job was configured with "Continuous logging" and they ended up in "Output logs"
import logging
# Warning level on root
logging.basicConfig(level=logging.WARNING, format='%(asctime)s [%(levelname)s] [%(name)s] %(message)s')
logger = logging.getLogger(__name__)
# Debug level only for this logger
logger.setLevel(logging.DEBUG)
logger.debug("DEBUG_LOG test")
You can also mute specific loggers:
logging.getLogger('botocore.vendored.requests.packages.urllib3.connectionpool').setLevel(logging.WARN)

Related

Django: Logging to custom files every day

I'm running Django 3.1 on Docker and I want to log to different files everyday. I have a couple of crons running and also celery tasks. I don't want to log to one file because a lot of processes will be writing and debugging/reading the file will be difficult.
If I have cron tasks my_cron_1, my_cron_2,my_cron_3
I want to be able to log to a file and append the date
MyCron1_2020-12-14.log
MyCron2_2020-12-14.log
MyCron3_2020-12-14.log
MyCron1_2020-12-15.log
MyCron2_2020-12-15.log
MyCron3_2020-12-15.log
MyCron1_2020-12-16.log
MyCron2_2020-12-16.log
MyCron3_2020-12-16.log
Basically, I want to be able to pass in a name to a function that will write to a log file.
Right now I have a class MyLogger
import logging
class MyLogger:
def __init__(self,filename):
# Gets or creates a logger
self._filename = filename
def log(self,message):
message = str(message)
print(message)
logger = logging.getLogger(__name__)
# set log level
logger.setLevel(logging.DEBUG)
# define file handler and set formatter
file_handler = logging.FileHandler('logs/'+self._filename+'.log')
#formatter = logging.Formatter('%(asctime)s : %(levelname)s: %(message)s')
formatter = logging.Formatter('%(asctime)s : %(message)s')
file_handler.setFormatter(formatter)
# add file handler to logger
logger.addHandler(file_handler)
# Logs
logger.info(message)
I call the class like this
logger = MyLogger("FirstLogFile_2020-12-14")
logger.log("ONE")
logger1 = MyLogger("SecondLogFile_2020-12-14")
logger1.log("TWO")
FirstLogFile_2020-12-14 will have ONE TWO but it should only have ONE
SecondLogFile_2020-12-14 will have TWO
Why is this? Why are the logs being written to the incorrect file? What's wrong with my code?

Watchtower configuration for logging python log in CloudWatch

I am developing a REST service in python which is deployed as Lambda on AWS.Initially nothing was logged on Cloud Watch CLI so i introduced watchtower.
Below is my logging.in file.
[loggers]
keys=root
[handlers]
keys=screen, WatchtowerHandler
[formatters]
keys=logfileformatter
[logger_root]
level=DEBUG
handlers=screen
[logger_xyz]
level=DEBUG
handlers=screen, WatchtowerHandler
qualname=xyz
[formatter_logfileformatter]
format=%(asctime)s %(name)-12s: %(levelname)s %(message)s
class=logging.Formatter
[handler_logfile]
class=handlers.RotatingFileHandler
level=NOTSET
args=('log/xyz.log','a',100000,100)
formatter=logfileformatter
[handler_screen]
class=StreamHandler
args = (sys.stdout)
formatter=logfileformatter
[handler_WatchtowerHandler]
class=watchtower.CloudWatchLogHandler
formatter=formatter
send_interval=1
args= ()
the above works fine for logging config files.
LOG.info("dev config detected")
But not able to log the LOG.info() from any other code in the application.
specifically REST calls, whereas logging is same everywhere.
You could either use Watchtower or Cloudwatch Handler
Watchtower: https://github.com/kislyuk/watchtower
import watchtower, logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
logger.addHandler(watchtower.CloudWatchLogHandler())
logger.info("Hi")
logger.info(dict(foo="bar", details={}))
Cloudwatch Handler: https://pypi.org/project/cloudwatch/
import logging
from cloudwatch import cloudwatch
#Create the logger
logger = logging.getLogger('my_logger')
#Create the formatter
formatter = logging.Formatter('%(asctime)s : %(levelname)s - %(message)s')
# ---- Create the Cloudwatch Handler ----
handler = cloudwatch.CloudwatchHandler('AWS_KEY_ID','AWS_SECRET_KEY','AWS_REGION','AWS_LOG_GROUP','AWS_LOG_STREAM')
#Pass the formater to the handler
handler.setFormatter(formatter)
#Set the level
logger.setLevel(logging.WARNING)
#Add the handler to the logger
logger.addHandler(handler)
#USE IT!
logger.warning("Watch out! Something happened!")

Looking for a boto3 python example of injecting a aws pig step into an already running emr?

I'm looking for a good BOTO3 example of an AWS EMR already running and I wish to inject a Pig Step into that EMR. Previously, I used the boto2.42 version of:
from boto.emr.connection import EmrConnection
from boto.emr.step import InstallPigStep, PigStep
# AWS_ACCESS_KEY = '' # REQUIRED
# AWS_SECRET_KEY = '' # REQUIRED
# conn = EmrConnection(AWS_ACCESS_KEY, AWS_SECRET_KEY)
# loop next element on bucket_compare list
pig_file = 's3://elasticmapreduce/samples/pig-apache/do-reports2.pig'
INPUT = 's3://elasticmapreduce/samples/pig-apache/input/access_log_1'
OUTPUT = '' # REQUIRED, S3 bucket for job output
pig_args = ['-p', 'INPUT=%s' % INPUT,
'-p', 'OUTPUT=%s' % OUTPUT]
pig_step = PigStep('Process Reports', pig_file, pig_args=pig_args)
steps = [InstallPigStep(), pig_step]
conn.run_jobflow(name='prs-dev-test', steps=steps,
hadoop_version='2.7.2-amzn-2', ami_version='latest',
num_instances=2, keep_alive=False)
The main problem now is that, BOTO3 doesn't use: from boto.emr.connection import EmrConnection, nor from boto.emr.step import InstallPigStep, PigStep and I can't find an equivalent set of modules?
After a bit of checking, I've found a very simple way to inject Pig Script commands from within Python using the awscli and subprocess modules. One can import awscli & subprocess, and then encapsulate and inject the desired PIG steps to an already running EMR with:
import awscli
import subprocess
cmd='aws emr add-steps --cluster-id j-GU07FE0VTHNG --steps Type=PIG,Name="AggPigProgram",ActionOnFailure=CONTINUE,Args=[-f,s3://dev-end2end-test/pig_scripts/AggRuleBag.pig,-p,INPUT=s3://dev-end2end-test/input_location,-p,OUTPUT=s3://end2end-test/output_location]'
push=subprocess.Popen(cmd, shell=True, stdout = subprocess.PIPE)
print(push.returncode)
Of course, you'll have to find your JobFlowID using something like:
aws emr list-clusters --active
Using the same subprocess and push command above. Of course you can add monitoring to your hearts delight instead of just a print statement.
Here is how to add a new step to existing emr cluster job flow for a pig job sing boto3
Note: your script log file, input and output directories should have
the complete path in the format
's3://<bucket>/<directory>/<file_or_key>'
emrcon = boto3.client("emr")
cluster_id1 = cluster_status_file_content #Retrieved from S3, where it was recorded on creation
step_id = emrcon.add_job_flow_steps(JobFlowId=str(cluster_id1),
Steps=[{
'Name': str(pig_job_name),
'ActionOnFailure': 'CONTINUE',
'HadoopJarStep': {
'Jar': 'command-runner.jar',
'Args': ['pig', "-l", str(pig_log_file_full_path), "-f", str(pig_job_run_script_full_path), "-p", "INPUT=" + str(pig_input_dir_full_path),
"-p", "OUTPUT=" + str(pig_output_dir_full_path) ]
}
}]
)
Please see screenshot to monitor-

Python logger is printing messages twice on both the PyCharm console and the log file

Similar questions were caused by the logger getting called twice. So maybe in my case it is caused by the second getLogger(). Any ideas how I can fix it?
import logging
import logging.handlers
logger = logging.getLogger("")
logger.setLevel(logging.DEBUG)
handler = logging.handlers.RotatingFileHandler(
"the_log.log", maxBytes=3000000, backupCount=2)
formatter = logging.Formatter(
'[%(asctime)s] {%(filename)s:%(lineno)d} %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
# This is required to print messages to the console as well as the log file.
logging.getLogger().addHandler(logging.StreamHandler())
Using a config file. e.g. logging.config.fileConfig('logging.ini')
[logger_root]
level=ERROR
handlers=stream_handler
[logger_webserver]
level=DEBUG
handlers=stream_handler
qualname=webserver
propagate=0
You have to set logger.propagate = 0 (Python 3 Docs) when you're configuring the root logger and using non-root-loggers at the same time.
I know this was asked a long time ago, but it's the top result on DuckDuckGo.

Where are python logs default stored when ran through IPython notebook?

In an IPython notebook cell I wrote:
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
handler = logging.FileHandler('model.log')
handler.setLevel(logging.INFO)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
Notice that I am supplying a file name, but not a path.
Where could I find that log? (ran a 'find' and couldn't locate it...)
There's multiple ways to set the IPython working directory. If you don't set any of that in your IPython profile/config, environment or notebook, the log should be in your working directory. Also try $ ipython locate to print the default IPython directory path, the log may be there.
What about giving it an absolute file path to see if it works at all?
Other than that the call to logging.basicConfig doesn't seem to do anything inside an IPython notebook:
# In:
import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger()
logger.debug('root debug test')
There's no output.
As per the docs, the logging.basicConfig doesn't do anything if the root logger already has handlers configured for it. This seems to be the case, IPython apparently already has the root logger set up. We can confirm it:
# In:
import logging
logger = logging.getLogger()
logger.handlers
# Out:
[<logging.StreamHandler at 0x106fa19d0>]
So we can try setting the root logger level manually:
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
logger.debug('root debug test')
which yields a formatted output in the notebook:
Now onto setting up the file logger:
# In:
import logging
# set root logger level
root_logger = logging.getLogger()
root_logger.setLevel(logging.DEBUG)
# setup custom logger
logger = logging.getLogger(__name__)
handler = logging.FileHandler('model.log')
handler.setLevel(logging.INFO)
logger.addHandler(handler)
# log
logger.info('test info my')
which results in writing the output both to the notebook and the model.log file, which for me is located in a directory I started IPython and notebook from.
Mind that repeated calls to this piece of code without restarting the IPython kernel will result in creating and attaching yet another handler to the logger on every run and the number of messages being logged to the file with each log call will grow.
Declare the path of the log file in the basicConfig like this :
log_file_path = "/your/path/"
logging.basicConfig(level = logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
filename = log_file_path,
filemode = 'w')
You can then start logging and why not add a different log format to the console if you want :
# define a Handler which writes INFO messages or higher to the sys.stderr
console = logging.StreamHandler()
console.setLevel(logging.DEBUG)
# set a format which is simpler for console use
formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s')
# tell the handler to use this format
console.setFormatter(formatter)
# add the handler to the root logger
logging.getLogger().addHandler(console)
logger = logging.getLogger()
et voilĂ  .