Get youtube-dl to write to amazon s3 - amazon-web-services

I have a function right now that runs youtube-dl to convert a video.
def start_audio_extraction(url, audio_filename):
localfile = 'music/%s.mp3' % audio_filename
temp_filepath = os.environ.get(s3.Object(bucketname, localfile))
ydl_opts = {
'format': 'bestaudio/best', # choice of quality
'extractaudio' : True, # only keep the audio
'outtmpl': temp_filepath, # name the location
'noplaylist' : True, # only download single song, not playlist
'prefer-ffmpeg' : True,
# 'verbose': True,
'postprocessors': [{
'key': 'FFmpegMetadata'
},
{
'key': 'FFmpegExtractAudio',
'preferredcodec': 'mp3',
'preferredquality': '192',
}],
'logger': MyLogger(),
'progress_hooks': [my_hook],
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
result = ydl.download([url])
return result
But the problem is when I run this I end up getting this error
File "/home/john/.virtualenvs/yout/local/lib/python2.7/site-packages/youtube_dl/YoutubeDL.py", line 578, in prepare_filename
tmpl = compat_expanduser(outtmpl)
File "/home/john/.virtualenvs/yout/local/lib/python2.7/site-packages/youtube_dl/compat.py", line 353, in compat_expanduser
if not path.startswith('~'):
AttributeError: 'NoneType' object has no attribute 'startswith'
I tried asking in the youtube-dl repository, and told outtmpl must be a string.
Since I believe that the s3 object is a lambda function is my only solution to move hosting over to Amazon?

You can use something like goofys to redirect youtube-dl's output to S3.

Related

Missing field 'SetIdentifier' in Change when deleting/upserting with ChangeResourceRecordSets Boto3

I've struggled for a week trying to delete/upsert a simple Route 53 resource record with Boto3 v1.10.39.
My code:
resp=r53_client.change_resource_record_sets(
HostedZoneId=<ZONE_ID>,
ChangeBatch={
'Comment': 'del_ip',
'Changes': [
{
'Action': 'DELETE',
'ResourceRecordSet': {
'Name': <SUBDOMAIN>,
'Type': 'A',
'Region': 'us-east-1',
'TTL': 300,
'ResourceRecords': [{'Value': <OLD_IP>}]
}
}
]
}
)
Error msg:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/botocore/client.py", line 272, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/lib/python2.7/dist-packages/botocore/client.py", line 576, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidInput: An error occurred (InvalidInput) when calling the ChangeResourceRecordSets operation: Invalid request: Missing field 'SetIdentifier' in Change with [Action=DELETE, Name=<SUBDOMAIN>., Type=A, SetIdentifier=null]
Troubleshoot steps:
I've gone through all possible documentations (Boto3, AWSCLI, AWS developer guide), the 'SetIdentifier' field is only required when it's not a simple RR (weighted, multivalue, failover etc.):
SetIdentifier (string)
Resource record sets that have a routing policy other than simple: An identifier that differentiates among multiple resource record sets that have the same combination of name and type, such as multiple weighted resource record sets named acme.example.com that have a type of A. In a group of resource record sets that have the same name and type, the value of SetIdentifier must be unique for each resource record set.
For information about routing policies, see Choosing a Routing Policy in the Amazon Route 53 Developer Guide.
Tried the same operation with AWS CLI, same error as calling with Boto3.
Called list_resource_record_sets and verified there's no set identifier value exists.
Tried to add 'SetIdentifier': None, 'SetIdentifier': '' and 'SetIdentifier': , none of them works, which make sense as the original RR doesn't have this property at all.
Environment:
OS: amzn-ami-hvm-2015.09.1.x86_64-gp2
Python: v2.7.14
BOTO3: v1.10.39
botocore: 1.13.39
aws-cli: v1.16.301
I'm wondering if this is a bug from AWS API.
Created an issue in Boto3 github and got help from them saying removing 'Region' attribute is going to fix my issue, and it works like a charm for my code once I do that.
resp=r53_client.change_resource_record_sets(
HostedZoneId=<ZONE_ID>,
ChangeBatch={
'Comment': 'del_ip',
'Changes': [
{
'Action': 'DELETE',
'ResourceRecordSet': {
'Name': <SUBDOMAIN>,
'Type': 'A',
'TTL': 300,
'ResourceRecords': [{'Value': <OLD_IP>}]
}
}
]
}
)

App Engine stackdriver logging to Global log instead of service log

I'm trying to set up logging for a django app hosted as an App Engine service on GAE.
I have set up the logging succesfully, except that the logging is showing up in the global log for that entire project instead of the log for that service. I would like for the logs to show up only in the specific service logs
this is my django logging config:
from google.cloud import logging as google_cloud_logging
log_client = google_cloud_logging.Client()
log_client.setup_logging()
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'handlers': {
'stackdriver_logging': {
'class': 'google.cloud.logging.handlers.CloudLoggingHandler',
'client': log_client
},
},
'loggers': {
'': {
'handlers': ['stackdriver_logging'],
'level': 'INFO',
}
},
}
And I am able to succesfully log to the Global project log by calling like this:
def fetch_orders(request):
logger.error('test error')
logger.critical('test critical')
logger.warning('test warning')
logger.info('test info')
return redirect('dashboard')
I'm wanting to figure out if I can configure the logger to always use the log for the service that it's running in.
EDIT:
I tried the suggestion below, however now it is returning the following error:
Traceback (most recent call last):
File "/env/lib/python3.7/site-packages/google/cloud/logging/handlers/transports/background_thread.py", line 122, in _safely_commit_batch
batch.commit()
File "/env/lib/python3.7/site-packages/google/cloud/logging/logger.py", line 381, in commit
entries = [entry.to_api_repr() for entry in self.entries]
File "/env/lib/python3.7/site-packages/google/cloud/logging/logger.py", line 381, in <listcomp>
entries = [entry.to_api_repr() for entry in self.entries]
File "/env/lib/python3.7/site-packages/google/cloud/logging/entries.py", line 318, in to_api_repr
info = super(StructEntry, self).to_api_repr()
File "/env/lib/python3.7/site-packages/google/cloud/logging/entries.py", line 241, in to_api_repr
info["resource"] = self.resource._to_dict()
AttributeError: 'ConvertingDict' object has no attribute '_to_dict'
I can over-ride this in the package source code to make it work, however the GAE environment requires that I use the package as supplied by google for the cloud logging. Is there any way to go from here?
To my understanding, it should be possible to accomplish what you want using the resource option of CloudLoggingHandler. In the Stackdriver Logging (and Stackdriver Monitoring) API, each object (log line, time-series point) is associated with a "resource" (some thing that exists in a project, that can be provisioned, and can be the source of logs or time-series or the thing that the logs or time-series are being written about). When the resource option is omitted, the CloudLoggingHandler defaults to global as you have observed.
There are a number of monitored resource types, including gae_app, which can be used to represent a specific version of a particular service that is deployed on GAE. Based on your code, this would look something like:
from google.cloud.logging import resource
def get_monitored_resource():
project_id = get_project_id()
gae_service = get_gae_service()
gae_service_version = get_gae_service_version()
resource_type = 'gae_app'
resource_labels = {
'project_id': project_id,
'module_id': gae_service,
'version_id': gae_service_version
}
return resource.Resource(resource_type, resource_labels)
GAE_APP_RESOURCE = get_monitored_resource()
LOGGING = {
# ...
'handlers': {
'stackdriver_logging': {
'class': 'google.cloud.logging.handlers.CloudLoggingHandler',
'client': log_client,
'resource': GAE_APP_RESOURCE,
},
},
# ...
}
In the code above, the functions get_project_id, get_gae_service, and get_gae_service_version can be implemented in terms of the environment variables GOOGLE_CLOUD_PROJECT, GAE_SERVICE, and GAE_VERSION in the Python flexible environment as documented by The Flexible Python Runtime as in:
def get_project_id():
return os.getenv('GOOGLE_CLOUD_PROJECT')

Getting error of PipelineActivity must have one and only one member when using Boto3 and Create_Pipeline

I have a python program that is using boto3 to create an IoT Analytics path. My program was able to successfully create the channel and the datastore but fails when I try to connect the two through the create pipeline function. My code is as follows:
dactivity = [{
"channel": {
"channelName": channel["channelName"],
"name": IoTAConfig["channelName"],
"next" : IoTAConfig["datastoreName"]
},
"datastore": {
"datastoreName": ds["datastoreName"],
"name": IoTAConfig["datastoreName"]
}
}]
pipeline = iota.create_pipeline(
pipelineActivities = dactivity,
pipelineName = IoTAConfig["pipelineName"]
)
The error code is as follows:
Traceback (most recent call last):
File "createFullGG.py", line 478, in <module>
createIoTA()
File "createFullGG.py", line 268, in createIoTA
pipelineName = IoTAConfig["pipelineName"]
File "/usr/lib/python2.7/site-packages/botocore/client.py", line 320, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/lib/python2.7/site-packages/botocore/client.py", line 623, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidRequestException: An error occurred (InvalidRequestException) when calling the UpdatePipeline operation: PipelineActivity must have one and only one member
According to the documentation pipeline activities can contain from 1 to 25 entries as long as they are in an array of 1 object. I have no idea why this continues to fail. Any help is appreciated.
The public documentation looks a little confusing because of the way that optional elements are represented, the good news is that this is an easy fix.
A corrected version of what you are trying would be written as;
dactivity=[
{
"channel": {
"channelName": channel["channelName"],
"name": IoTAConfig["channelName"],
"next" : IoTAConfig["datastoreName"]
}
},
{
"datastore": {
"datastoreName": ds["datastoreName"],
"name": IoTAConfig["datastoreName"]
}
}
]
response = client.create_pipeline(
pipelineActivities = dactivity,
pipelineName = IoTAConfig["pipelineName"]
)
So it's an array of activities that you are providing, like [ {A1},{A2} ] if that makes sense?
Does that help?

json.loads() giving "ValueError: Expecting property name:"

I'm trying to convert sttr data type to dict in python using json.loads() function.
But I'm getting the error as:
File "/usr/local/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/l`enter code here`ocal/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python2.7/json/decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 1 column 2 (char 1)
In comments you gave the following example of the text you are trying to load as JSON:
[
{
'entity_class': 'Hardware Entities (AHV)',
'status': 'recommended',
'uuid': 'de74178a-cbc7-4c69-ae2f-9e7042bf8e98',
'zprotobuf': 'eNolzD1LAzEYB/DFScHBRXAKoUMr5khiLi/dCgpdCg7qIiLPJU/KQS4nyeEL6ne3p+vv/3K8CGiUMBaY77xhymvHAGVkDg1XsosWnT1bcI42tqZlwbWOqWAis5oD08J3VinOOwMXp',
'version': '2.4-1542668003',
'dependencies': '[{"entity_class": "Dell Update Manager", "version": "1.8-0.112", "exact": "false", "entity_model": "PT Agent on AHV (el6)"}]',
'entity_uuid': '00e8f575-d959-4d7f-860a-61cb84400b7a',
'order': 4,
}
]
JSON should use double quotes, not single quotes. The above looks like native python data struct syntax. Here is the above dumped from native python to JSON.
import json
native_python_data = [
{
'entity_class': 'Hardware Entities (AHV)',
'status': 'recommended',
'uuid': 'de74178a-cbc7-4c69-ae2f-9e7042bf8e98',
'zprotobuf': 'eNolzD1LAzEYB/DFScHBRXAKoUMr5khiLi/dCgpdCg7qIiLPJU/KQS4nyeEL6ne3p+vv/3K8CGiUMBaY77xhymvHAGVkDg1XsosWnT1bcI42tqZlwbWOqWAis5oD08J3VinOOwMXp',
'version': '2.4-1542668003',
'dependencies': [
{
"entity_class": "Dell Update Manager",
"version": "1.8-0.112",
"exact": "false",
"entity_model": "PT Agent on AHV (el6)"
}
],
'entity_uuid': '00e8f575-d959-4d7f-860a-61cb84400b7a',
'order': 4,
}
]
json_string_with_escaping = json.dumps(js, indent=4)
print json_string_with_escaping

AWS: Boto3 configuring bucket lifecycle - Malformed XML

The following code should enable versioning on a bucket/list of buckets, and then set the lifecycle configuration.
import boto3
# Create session
s3 = boto3.resource('s3')
s3Client = boto3.client('s3')
# Bucket list
buckets = ['BUCKETNAMEHERE']
# iterate through list of buckets
for bucket in buckets:
# Enable Versioning
bucketVersioning = s3.BucketVersioning(bucket)
bucketVersioning.enable()
# Configure Lifecycle
s3Client.put_bucket_lifecycle_configuration(
Bucket=bucket,
LifecycleConfiguration={
'Rules': [
{
'Status': 'Enabled',
'NoncurrentVersionTransitions': [
{
'NoncurrentDays': 7,
'StorageClass': 'GLACIER'
},
],
'NoncurrentVersionExpiration': {
'NoncurrentDays': 30
}
},
]
}
)
print "Versioning and lifecycle have been enabled for buckets."
However, whenever I run this I get the following error:
File "putVersioning.py", line 42, in <module>
'NoncurrentDays': 30
File "/home/user/.local/lib/python2.7/site-packages/botocore/client.py", line 253, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/home/user/.local/lib/python2.7/site-packages/botocore/client.py", line 557, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (MalformedXML) when calling the PutBucketLifecycleConfiguration operation: The XML you provided was not well-formed or did not validate against our published schema
As far as I can tell, everything looks correct?
According to the docs here you need to add Filter element, which is required as per Amazon API, and confusingly enough, not required by boto. I added the deprecated Prefix argument instead of the Filter and it seems to be working too.
This one works for me:
client.put_bucket_lifecycle_configuration(
Bucket=s3_bucket,
LifecycleConfiguration={
'Rules': [
{
'Expiration': {'Days': 5},
'Filter': {'Prefix': 'folder1/'},
'ID': 'id',
'Status': 'Enabled'
}
]
})
To see an actual Schema, create a new rule in S3, and then use client.get_bucket_lifecycle_configuration(Bucket=s3_bucket)