EMR serverless using Docker- how to install JAR files - dockerfile

I am trying to install EMR serverless, for which i have two options
Using Terraform script - which let me chose initial size, max memory etc. however i do not have an option to install jar files / external libraries
Using docker image - which doen't let me select initial size, max memory etc.
I am thinking of using terraform script within docker however i dont know how to install JAR files on it. Can someone please share some thoughts.
Also my libraries are internal and written in JAVA / Scala
TY
I tried docker as well as TF

Related

How to compile C++ inside RapidsAI Docker Container

When inside the RapidsAI docker image with examples, how does one recompile the C++ code after modifying? I've tried running the build scripts from a terminal sessions inside Jupyter but it cannot find CMake.
In order to be able to recompile the C++ code in a docker container you need to use the RAPIDS Docker + Dev Env container provided on https://rapids.ai/start.html.
The RAPIDS Docker + Examples container installs the RAPIDS libraries using conda install and does not contain the source C++ code or Cmake.
If you would like to continue to use the RAPIDS Docker + Examples container then I would suggest:
First uninstall the existing library that you want to modify from the container.
Then pull the source code of the desired library and make the
required modifications.
Once the above steps are done please follow
the steps provided in the libraries github repo to build it from
source.

Node.JS native addons on LINUX [duplicate]

I'm using AWS Lambda, which involves creating an archive of my node.js script, including the node_modules folder and uploading that to their infrastructure to run.
This works fine, except when it comes to node modules with native bindings (using node-gyp). Because the binding was complied and project archived on my local computer (OS X), it is not compatible with AWS's (Amazon Linux) servers.
How can I cross-compile/install a node module (specifically, node-sqlite3) so when I upload it to another server arch it runs?
While not really a solution to your problem, a very easy workaround could be to simply compile the native addons on a Linux machine.
For your particular situation, I would use Vagrant. Vagrant can create virtual machines and configure them within seconds.
Find an OS image that resembles Amazon's Linux distro (Fedora, CentOS, others that use yum as package manager - see Wiki)
Use a simple configuration script that, when run by Vagrant on machine startup, will run npm install (optionally it might also remove the node_modules folder before to ensure a clean installation)
For extra comfort, the script can also create the zip file for deployment
Once the installation finishes, the script will shutdown the VM to avoid unnecessary consumption of system resources
Deploy!
It might require some tuning if the linked libraries are not at the same place on the target machine but generally this seems to me like the best and quickest solution.
While installing the app using Vagrant might be sufficient in some cases, I have found it necessary to build the app on Linux which is as close to Lambda's Amazon Linux AMI as possible.
You can read the original answer here: https://stackoverflow.com/a/34019739/303184
Steps to make it work:
Spawn new EC2 instance. Make sure it is based on exactly the same image as your AWS Lambda runtime. You can review Lambda env details here: http://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html. In our case, it was Amazon Linux AMI called amzn-ami-hvm-2015.03.0.x86_64-gp2.
Install nvm and use it to install the same version of Node.js as on the AWS Lambda. At the time of writing this, it was v0.10.36. You can refer to http://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html again to find out.
You will probably need to install git & g++ compiler on the EC2. You can do this running
sudo yum install git gcc-c++
Finally, clone your app to your new EC2 and install your app's dependecies:
nvm use 0.10.36
npm install --production
You can then easily download the node_modules using scp or such.
Same lines as Robert's answer, when I had to work on my MAC in a different OS I use vm ware like Oracle's free virtualizer VirtualBox to get a linux on my mac, no cost to me. Or sign up for a new AWS account, you get a micro for a year free. Use that to get your linux box, do whatever you need there.
AWS has a page describing how to deal with native NPM modules: https://aws.amazon.com/blogs/compute/nodejs-packages-in-lambda/

AWS credentilas in dockerfile while building image

I am newbie to docker and AWS. I wanted to have container running image for Maven and Java. I was able to refer https://github.com/carlossg/docker-maven/blob/master/jdk-8/Dockerfile and could create dockerfile for the same. Through terminal, I could see new container is created , with image of Java and Maven. Upto this point it is simple , took me while to figure out though.
(1) I think this is the way you can always have Maven plus Java image , and there is no other way with lot less files/ coding. Is it right? This is just for my information. The real question is the next one.
(2) If I get image of AWS Cli, once the container starts I can login to AWS using the credentials. I know how to do it using terminal. Not a big deal. If I want to have CI/CD pipeline, where do I provide the command - docker build -t <imageName> . and command to start container. Right now I use mac terminal, but not sure how it would play out in CI/CD. I did research on here but nothing conclusive. Does it go inside .yml file?
(3) How do I send the AWS credentials while building docker image? I do not want to put into dockerfile. How do you guys do it so its safe?

Can you load standard zeppelin interpreter settings from S3?

Our company is building up a suite of common internal Spark functions and jobs, and I'd like to make sure that our data scientists have access to all of these when they prototype in Zeppelin.
Ideally, I'd like a way for them to start up a Zeppelin notebook on AWS EMR, and have the dependency jar we build automatically loaded onto it without them having to manually type in the maven information manually every time (private repo location/credentials, package info, etc).
Right now we have the dependency jar loaded on S3, and with some work we could get a private maven repository to host it on.
I see that ZEPPELIN_INTERPRETER_DIR saves off interpreter settings, but I don't think it can load from a common default location (like S3, or something)
Is there a way to tell Zeppelin on an EMR cluster to load it's interpreter settings from a common location? I can't be the first person to want this.
Other thoughts I've had but have not tried yet:
Have a script that uses aws cmd line options to start a EMR cluster with all the necessary settings pre-made for you. (Could also upload the .jar dependency if we can't get maven to work)
Use a infrastructure-as-code framework to start up the clusters with the required settings.
I don't believe it's possible to tell EMR to load settings from a common location. The first thought you included is the way to go imo - you would aws emr create ... and that create would include a shell script step to replace /etc/zeppelin/conf.dist/interpreter.json by downloading the interpreter.json of interest from S3, and then hard restart zeppelin (sudo stop zeppelin; sudo start zeppelin).

Blender on IBM Cloud (Cloud Foundry)

I'm currently developing a web application (Django 2.0) application.
My app will be deployed on IBM Cloud (Cloud Foundry) using python build-pack.
One of my requirements is to install blender.
Everything else is very well, but for blender installation.
What I've tried so far was:
I tried access my app using SSH connection, but surely I don't have root access to apt-get install blender!!
And tried to include blender in packages.json file and push that file using cf push my-app.
But nothing worked for me.
In another shorter question: what is the main approach in Cloud Foundry Apps to install packages like when we use apt-get install in Ubuntu / Debian.
Please correct me if I did anything wrong, or guide me with headlines to solve this problem!!
I see a couple options for you to install packages if they cannot be installed using the regular requirements file (which is the preferred way):
Download the relevant libraries and put them in subfolders of the app before pushing it. The libraries will be uploaded. That is how I would do it.
Once you have an SSH connection, use secure copy (scp) to upload the files and place them in the subfolders where they are expected.
Regarding Blender, the question is what you need in addition to having the code copied over. Does it need a running daemon? Are there more dependencies? You would need to share more information about your specific app to answer that. Maybe, packaging everything as one or more containers and run it on Kubernetes or a combination of Cloud Foundry and Kubernetes is a better way.