Deserialize the protobuf message in AirFlow - amazon-web-services

As I understood correctly, to deserialize the protobuf message, I have to generate a special python class using the command protoc -I=$SRC_DIR --python_out=$DST_DIR $SRC_DIR/scheme.proto (https://developers.google.com/protocol - buffers/documents/pythontutorial).
But how can I do it in AirFlow? Let's say I'm just a user and don't have access to execute a command on the command line.
And an even deeper question, how do I do this in AirFlow on AWS? We install a new python package via a text file by simply adding a new line with the package and version. If I create a package anywhere with a python class from a .proto file - how can I install it on AWS?

Related

Zip Go file using AWS Lambda Tool

I am trying to generate an exe file with this command in windows 10
go.exe get -u github.com/aws/aws-lambda-go/cmd/build-lambda-zip
the file comes back as linux_amd64/build-lambda-zip instead of build-lambda-zip.exe
Has anyone experienced this and know what the fix is?
I am using the AWS docs here https://docs.aws.amazon.com/lambda/latest/dg/golang-package.html
If you want to create bin, use install command with override $GOOS var (Compile and install packages and dependencies ):
GOOS=windows go install github.com/aws/aws-lambda-go/cmd/build-lambda-zip
exe file will be store to $GOBIN.
there was another way to access the aws lambda tools
I found it in
%USERPROFILE%\dotnet\tools.store\amazon.lambda.tools\4.0.0\amazon.lambda.tools\4.0.0\tools\netcoreapp2.1\any\Resources\build-lambda-zip.exe
if its not there we can get it from aws directly but running this command
dotnet tool update -g Amazon.Lambda.Tools

Unable to execute a step on a running EMR

I have an EMR cluster 5.28.1 running in AWS but I forgot to install from python libraries as part of the bootstrap action. Now that the cluster is running, I was simply attempting to add a step via the EMR console. Here are my settings
JAR: s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
Main class: None
Arguments: s3://xxxx/install_python_libraries.sh
Unfortunately, I get the following error.
Cannot run program "s3://xxxxx/install_python_libraries.sh" (in directory "."): error=2, No such file or directory
I am not sure what I am doing wrong. The shell script looks like this.
#!/bin/bash -xe
# Non-standard and non-Amazon Machine Image Python modules:
sudo pip-3.6 install boto3
sudo pip-3.6 install xmltodict
I also tried this by simply using 'command-runner.jar' but I get the same error. Can you please help me figure out the problem so I do this via the console? I would like to install the libraries on all nodes - master and core.
Thanks
The issue is the xxx.sh files EOL/carriage return type.
In other words, if it is Windows ("\r\n") then it will not work and return the ./ file not found error.
Convert it to unix type ("\n") using something like notepad++ and it will run fine.
(In notepad++ edit>EOL Conversion>Unix(LF) hit save and try again)

Jenkins post build python script file

I am new to Jenkins, specially with using python script in Jenkins. The problem I am facing is as follow:
I am trying to run a python script from a python file in the post-build step of the Jenkins. I have added all the plugins required for that purpose to my understanding. i.e I have included Post-BuildScript plugin, python jenkins plugin etc.
Now when I build console output shows invalid script command caused the failure. I have attached the results below. can anybody help me with that please?
In post build step I am providing the full or absolute path to the python script file i.e
ExecutepythonScriptpath
Results
It may be useful to mention here I have also tried using just the path without writing python preceding the path, also tried with forward as well as backward slash in the path. without any success.
I have managed to resolve that issue. There are two parts of solution:
First one is if you want to run simple python script in post-build -->Add a post build step for Execute python Script (That will require you install plugin for post build ) . In that window created after adding post build step you can simply put any python command to run.
Second part of the solution is for, when user would like to run a list of commands from a python script file from the same post build step window in that case user has to make sure to put all the required python files which you want to execute into the Jenkins workspace->project directory(project for which we are running the Jenkins ) .
Moreover, for Python2.7 in order to execute that python script file user simply need to write script as
execfile(file.py)
One more thing to remember is insert python.exe path in the environment variables.

How to import Spark packages in AWS Glue?

I would like to use the GrameFrames package, if I were to run pyspark locally I would use the command:
~/hadoop/spark-2.3.1-bin-hadoop2.7/bin/pyspark --packages graphframes:graphframes:0.6.0-spark2.3-s_2.11
But how would I run a AWS Glue script with this package? I found nothing in the documentation...
You can provide a path to extra libraries packaged into zip archives located in s3.
Please check out this doc for more details
It's possible to using graphframes as follows:
Download the graphframes python library package file e.g. from here. Unzip the .tar.gz and then re-archive to a .zip. Put somewhere in s3 that your glue job has access to
When setting up your glue job:
Make sure that your Python Library Path references the zip file
For job parameters, you need {"--conf": "spark.jars.packages=graphframes:graphframes:0.6.0-spark2.3-s_2.11"}
Every one looking for an answer please read this comment..
In order to use an external package in AWS Glue pySpark or Python-shell:
1)
Clone the repo from follwing url..
https://github.com/bhavintandel/py-packager/tree/master
git clone git#github.com:bhavintandel/py-packager.git
cd py-packager
2)
Add your required package under requirements.txt. For ex.,
pygeohash
Update the version and project name under setup.py. For ex.,
VERSION = "0.1.0"
PACKAGE_NAME = "dependencies"
3) Run the follwing "command1" to create .zip package for pyspark OR "command2" to create egg files for python-shell..
command1:
sudo make build_zip
Command2:
sudo make bdist_egg
Above commands will generate packae in dist folder.
4) Finally upload this pakcage from dist directory to S3 bucket. Then goto AWS Glue Job Console, edit job, find script libraries option, click on folder icon of "python library path" .. then select your s3 path.
finally use in your glue script:
import pygeohash as pgh
Done!
Also set --user-jars-firs: "true" parameter in glue job.

Boto.conf not found

I am running a flask app on an AWS EC2 server, and have been using boto to access data stored in dynamoDB. After accidentally adding boto.conf to a git commit (and push and pull on the server), I have found that my python code can no longer locate the boto.conf file. I rolled back the changes with git, but the problem remains.
The python module and boto.conf file exist in the same directory, but when the module calls
boto.config.load_credential_file('boto.conf')
I get the flask error IOError: [Errno 2] No such file or directory: 'boto.conf'.
As per Documentation:
I'm not really sure why you are using boto.config_load_credential_file. In general you can pick up the config in a file called either ~/.boto or /etc/boto.cfg.
You can also look at this questions from SO that also answers how to get the configuration for boto: Getting Credentials File in the boto.cfg for Python