AWS Elastic Beanstalk Python 3.7 Deployment Location - amazon-web-services

I'm trying to upgrade an existing application on AWS from the now deprecated Python 3.4 platform to 3.7 with Amazon Linux 2/3.0.1, and in the process I ran into an issue with where the application source code is deployed on the EC2 instance.
From some empirical testing, I found that instead of /opt/python/current/app directory that most if not all AWS documentations say (e.g Troubleshooting issues with the EB CLI - AWS Elastic Beanstalk), with Python 3.7 it is actually deployed in /var/app/current/. I wasn't able to find any documentation regarding this change, and it is causing some issues with the application. I'm wondering is there any reason that this change is made? And if it is possible to revert it, how to do so?
Thanks in advance!

This is because the 3.7 Python Elastic Beanstalk distribution uses Amazon Linux 2 which is fundamentally different from the AMI predecessor. If you opt to use Python 3.6 instead you should be able to avoid this issue, as it runs on the earlier Linux version where deployments still occur in /opt/var/app/current. Most tutorials I've found are designed to work with this older rollout, including the most up-to-date Amazon start guide.
If you have the time, try migrating your code to the newer version, as this seems to be the workflow Amazon is embracing going forward, for all newer versions of Python (such as 3.8 and others yet to come).

Related

How to get a local Cloud Foundry Instance?

I’m looking to learn about Cloud Foundry and I’m trying to get a development instance of it set up on my local Windows 10 PC. But I’m not having any luck.
I’m finding a lot of information about PCF Dev which was deprecated a while ago. I also looked at the replacement for PCF Dev, CF Dev (https://github.com/cloudfoundry-attic/cfdev). Its git page mentions that its repository is no longer receiving updates. I still went ahead and tried installing it using the instructions in the README:
cf install-plugin -r CF-Community cfdev
But the link it uses to download the plugin is broken:
Starting download of plugin binary from repository CF-Community...
Get "https://d3p1cc0zb2wjno.cloudfront.net/cfdev/cfdev-v0.0.18-rc.36-windows.exe": dial tcp: lookup d3p1cc0zb2wjno.cloudfront.net: no such host
Can anyone recommend a way to get a development instance of Cloud Foundry set up on my local machine so I can play around with it?
Thanks
Yes, steer clear of pcf-dev and cf-dev, they may still work but are definitely not getting updates so will be way out of date by now.
My understanding, although I haven't tried this process in a while, is that the way to run locally is with VirtualBox. You can run one locally using bosh-deployment & cf-deployment and Virtualbox.
For instructions installing Bosh in VirtualBox using bosh-deployment, see the Install Section to install Bosh.
With Bosh installed, follow the deployment guide to get CF installed. You can skip to step 4, since you're installing into VirtualBox. Be sure to read the entire document before you begin, however pay specific attention to this section which has specific instructions for running locally.

Versions of Python & Spark to work with VS Code Notebooks

I'm developing scripts for AWS Glue, and trying to mimic development environment as close as possible to their specs here. Since it's a bit costly to run a Notebook server/development endpoint, I set everything up on local machine instead, develop scripts on VS Code Notebook, due to its usefulness.
There're some troubles with Notebook setup due to incompatible versions between installed Python & Spark.
For Python, I have gone through some harsh time to clean up, and its version is 3.8.3 now
For Spark, I use the manual method with version of 2.4.3, since I plan to use Scala alongside at later time. I install the findspark package to load that version as expected.
And it doesn't work! The error was TypeError: an integer is required (got type bytes)
I've searched around, and people said to downgrade to Python 3.7 using pyenv, and I got 3.7.7 installed but still had the same error
As a last resort, I tried pip install pyspark. it's Spark 3.0.0, and works fine, but not as expected.
Hope there's someone have experiences of this matter
A better approach would be to install the glue dependencies on docker then ssh into that docker container using VS code to mimic exact glue local dev environment.
I've written a blog about the same if you like to refer
https://towardsdatascience.com/develop-glue-jobs-locally-using-docker-containers-bffc9d95bd1

Ambari-agent "CERTIFICATE_VERIFY_FAILED", Is it safe to disable the certificate verification in Python?

Ambari version: 2.2.2.18
HDP stack: 2.4.3
OS: centos 7.3
Issue description:
Ambari-server can't communicate with Ambari agent. I can see below error in the ambari-agent logs:
ERROR 2017-09-18 06:35:34,684 NetUtil.py:84 - [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:579)
ERROR 2017-09-18 06:35:34,684 NetUtil.py:85 - SSLError: Failed to connect. Please check openssl library versions.
I am facing this issue recently and it appears this can be replicated consistently after the instances are restarted. (I am using EC2 instances).
I am able to register agent nodes successfully, install HDP cluster, run yarn jobs etc.. no problem at all. Once i restart my instances, I see this problem.
There are some solutions already posted for this problem like:
Downgrade the Python from 2.7 to lower. This is a known problem of
Ambari with Python 2.7
Control the certificate verification by disabling it.
Set "verify = disable"; under /etc/python/cert-verification.cfg
I don't want to play with Python as it can disrupt lot many things like Cassandra, yum package manager etc...
Second work around is very much easy and it works well!
Now comes my question :- Is it safe to disable the certificate verification in Python ? i.e. by setting property verify = disable
Generally, it's a bad idea. If somebody has access to port on server that is used for agent-server communication (8443 if I'm not mistaken), he can register as agent and get all your cluster configs&passwords. Or classic man-in-the-middle attack would allow to do the same by reading your unencrypted traffic. A bit more difficult attack would allow to send commands to agents (probably with root permissions).
Your issue sounds like you reprovisioned your ambari-server host, and left old ambari-agent instances running, or maybe your certificates became outdated? At first connection to ambari-server, agents generate certificates and send to server. Server signs these certificates with it's own key, so now server-agent connection is encrypted. Did you try to remove old certificates and restart server&agents as suggested here?
How did we investigate this issue and What solution we adopted:
Investigation Details:
Downgrading to Python 2.6 is not feasible as there are OS dependencies and as per Suggestion from 'Dmitriusan' in the previous comment, it's not a good idea to disable certificate verification in Python.
We use AWS EC2
With Python 2.7, JDK 1.8 and Cent OS 7.2 there is no issue. Everything is smooth.
With Python 2.7, JDK 1.8 and Cent OS 7.3 and Centos 7.4 we are seeing this issue.
Issue which I have reported here, is with respect to Centos 7.3 and with Centos 7.4 Issue is slightly different. Certificate verification fails while adding nodes to the cluster itself.
Downgrading from centos 7.3 to 7.2 is not straight forward. And AWS EC2 market place provides Centos 7.0 Image and when we create instance from this image, it applies security and patch updates resulting in Centos 7.3.
We can create our own Image of Centos 7.2 from existing servers but, It's always good to be with the latest update for the OS for security reasons.
To describe it shortly, we had workarounds but not a solution.
Solution which we adopted:
After series of tests, we decided to upgrade to Centos 7.4, HDP-2.6.3.0, and Ambari 2.6.0.0
With Centos 7.4 and Ambari Version 2.6.0.0, we don't see this issue even though I have 'Python 2.7.5' installed.
So this looks to be an Issue with Ambari
Older version of Ambari (2.4.2) does not recognize the force TLS configuration. We upgraded Ambari to 2.6.2 and heart beat started working.

How to run pdftk on elastic beanstalk

I am trying to run pdftk on an Elastic Beanstalk. The first problem I run into is that I cannot install pdftk on an instance of a Amazon Linux AMI because one of the dependencies (gcj) is not supported.
One of the options I am looking at is creating my own AMI and using that for my Elastic Beanstalk. Amazon recommends not doing this, and there are no community images for EB and Ubuntu.
Another option is using Docker. I am not as familiar with Docker, but I think I would be able to install pdftk in a container and then deploy that to EB. I am using Codeship for deployments and it looks like they have some options for Docker. (This is the options I'm currently exploring)
The last option I can think of is writing a library for encrypting pdfs on my own. I had a look at the encryption specifications for pdfs and I think this is not a time efficient option.
Has any one had a similar problem and found a good solution to the problem?
UPDATE:
After some more research I discovered that the issue was not with Amazon Linux bug with Fedora. Fedora dropped gcj because there was a lack of maintainers on the project, then dropped pdftk because it depends on gcj.
If you need another pdf tool kit I have found podofo to be a good replacement for what I've needed.
First I apologise for resurrecting an old thread! Recently we wanted to create an Elastic Beanstalk worker environment that uses pdftk. Of course we also stumbled on the same issue, so this is what we did and it works for us so far. I hope it'll work for others too.
In the .ebextensions folder add the linked configs:
The needed LaTeX packages:
packages.config
You'll also need to add the el5 library in order to install libgcj.
01_el5_yum.config
Next add this config with the commands to install libgcj, pdftk and pdfjam
02_pdftk.config
And that should be it.
In case anyone comes here having problems with pdftk - poppler-utils also cover some tasks done by pdftk (in my case it was pdf splitting) and can be easily set up on an EB instance through .ebextensions:
packages:
yum:
poppler-utils: []

Python/Django Elastic Beanstalk now failing on deploy

I'm working on a project that I haven't touched in about 4 months. Before everything on the deploy was working fine, but now I'm getting an error when trying to deploy an update.
Failed to pull Docker image amazon/aws-eb-python:3.4.2-onbuild-3.5.1: Pulling repository amazon/aws-eb-python time="2016-01-17T01:40:45Z" level="fatal" msg="Could not reach any registry endpoint" . Check snapshot logs for details. Hook /opt/elasticbeanstalk/hooks/appdeploy/pre/03build.sh failed. For more detail, check /var/log/eb-activity.log using console or EB CLI.
In the eb-activity log, it further states [CMD-AppDeploy/AppDeployStage0/AppDeployPreHook/03build.sh] : Activity execution failed, because: Pulling repository amazon/aws-eb-python before repeating what was shown in the UI.
The original was using a Preconfigured Docker 64bit Debian jessie v1.3.1 running Python 3.4. I've tried upgrading to the latest, which is version 2.0.6, but it never completes (don't need to get into specifics of that error, separate issue and I'd like to stay on 1.3.1 if possible). I've also tried upgrading to the latest 1.x but it has the same result of upgrading to 2.0.6.
Any ideas, or anything else I should be looking for clues?
Docker Hub has deprecated pulls from Docker clients on 1.5 and earlier. Make sure that your docker client version is at least above 1.5. See https://blog.docker.com/2015/10/docker-hub-deprecation-1-5/ for more information.