AWS EMR step doesn't find jar imported from s3 - amazon-web-services

I am attempting to run a spark application on aws emr in client mode. I have setup a bootstrap action to import needed files and the jar from s3, and I have a step to run a single spark job.
However when the step executes, the jar I have imported isn't found. Here is the stderr output:
19/12/01 13:42:05 WARN DependencyUtils: Local jar /mnt/var/lib/hadoop/steps/s-2HLX7KPZCA07B/~/myApplicationDirectory does not exist, skipping.
I am able to successfully import the jar and other needed files for the application from my s3 bucket to the master instance, I simply import them to home/ec2-user/myApplicationDirectory/myJar.jar via a bootstrap action.
However I don't understand why the step is looking for the jar at mnt/var/lib/hadoop/...etc.
here are the relevant parts of the cli configuration:
--steps '[{"Args":["spark-submit",
"--deploy-mode","client",
"--num-executors","1",
“--driver-java-options","-Xss4M",
"--conf","spark.driver.maxResultSize=20g",
"--class”,”myApplicationClass”,
“~/myApplicationDirectory”,
“myJar.jar",
…
application specific arguments and paths to folders here
…],
”Type":"CUSTOM_JAR",
thanks for any help,

It looks like it doesn't understand the ~ as referring to the home directory. Try changing "~/myApplicationDirectory" to "/home/ec2-user/myApplicationDirectory".
A little warning: in the sample in your question, straight quotation marks " are mixed with "smart" ones “. Make sure the "smart" quotation marks don't end up in your configuration file, or you will get very confusing error messages.

Related

AWSDeploy to re-deploy ASP.NET WebAPI ELB application isn't working

I am using the Visual Studio AWS add-on/plugin to deploy my application, but want to move to a CI/CD server and scripted deployment.
I've installed the AWS SDK for Windows and thus want to use the awsdeploy.exe command line to accomplish this.
I've used msbuild and a publish profile to create the .zip deployable of my application (ASP.NET WebApi project)
I've put together the following command line command:
awsdeploy.exe -r -w -v -l "C:\<path_to>\deploylog.txt" "-DDeploymentPackage=C:\<path_to>\my_app.zip" "-DAWSAccessKey=<my_access_key>" "-DAWSSecretKey=<my_secret_key>" "C:\<path_do>\AWSDeployConfiguration.txt"
The "AWSDeployConfiguration.txt" file is what was generated by VisualStudio when I did the first deployment.
RESULT:
The console output and the text written to the log is:
INFO - Scanning configuration.
INFO - ...inspecting application '<my_app_name>' for environment '<my_environment_name>' and version 'v20180918223701'
Nothing happens with the ELB application.
What am I missing and/or how do I get more information to figure this out?
I posted this question on the AWS forums and got the following answer that also worked for me.
Hi! I have this same what You when I trying run this from cmd. But it You will try check what application is returning You will see that value is 3. Generally everything !=0 is error.
What I did?
1. I checked with Process Monitor if application is doing any network request to AWS - no it even not trying. https://learn.microsoft.com/en-us/sysinternals/downloads/procmon
I decided to recompile awasdeploy.exe and I found out that in the main procedure is a try... catch.. without any logs and just return(3). I added some logs and get a detailed error - look at attached image.
After few attempts I get a list of missing dll files:
AWSSDK.MobileAnalytics.dll
AWSSDK.CognitoIdentity.dll
All these files I found in: C:\Program Files (x86)\AWS SDK for .NET\bin and just simply copied to: C:\Program Files (x86)\AWS Tools\Deployment Tool (next to awsdeploy.exe)
Now deploy is working again.

GCloud Error: Source code size exceeds the limit

I'm doing the tutorial of basic fulfillment and conversation setup of api.ai tutorial to make a chat bot, and when I try to deploy the function with the command:
gcloud beta functions deploy --stage-bucket venky-bb7c4.appspot.com --trigger-http
(where 'venky-bb7c4.appspot.com' is the bucket_name)
It return the following error message:
ERROR: (gcloud.beta.functions.deploy) OperationError: code=3, message=Source code size exceeds the limit
I've searched but not found any answer, I don't know where is the error.
this is the JS file that appear in the tutorial:
/
HTTP Cloud Function.
#param {Object} req Cloud Function request context.
#param {Object} res Cloud Function response context.
*/
exports.helloHttp = function helloHttp (req, res) {
response = "This is a sample response from your webhook!" //Default response from the webhook to show it's working
res.setHeader('Content-Type', 'application/json'); //Requires application/json MIME type
res.send(JSON.stringify({ "speech": response, "displayText": response
//"speech" is the spoken version of the response, "displayText" is the visual version
}));
};
Neither of these worked for me. The way I was able to fix this was to make sure I was running the deploy from my project directory (the directory containing index.js)
The command creates zip with whole content of your current directory (except node_modules subdirectory) not just the JS file (this is because your function may use other resources).
The error you see is because size of (uncompressed) files in the directory is bigger than 512MB.
The easiest way to solve this is by moving the .js file to its own directory and deploying from there (you can use --local-path to point to directory containing the source file if you want your working directory to be different from directory with function source).
I tried with source option or deploying from the index.js folder and still a different problem exists.
This error usually happens if the code that is being uploaded is large. In my tests I found more than 100MB lead to the mentioned error.
However,
To resolve this there are two solutions.
Update .gcloudignore to ignore the folders which aren't required for your function
Still if option 1 doesn't resolve, you need to create a bucket in storage and mention it with --stage-bucket option.
Create a new bucket for deployment (one time)
gsutil mb my-cloud-functions-deployment-bucket
The bucket you created needs to be unique else it throws already created
Deploy
gcloud functions deploy subscribers-firestoreDatabaseChange
--trigger-topic firestore-database-change
--region us-central1
--runtime nodejs10
--update-env-vars "REDIS_HOST=10.128.0.2"
--stage-bucket my-cloud-functions-deployment-bucket
I had similar problems while deploying cloud functions. What is working for me was specifying the js files source folder.
gcloud functions deploy functionName --trigger-http **--source path_to_project_root_folder**
Also be sure to include all unnecessary folders in .gcloudignore.
Ensure the package folder has a .gitignore file (excluding node_modules).
The most recent version of gcloud requires it in order to not load node_modules. My code size went from 119MB to 17Kb.
Once I've added the .gitignore file, the log printed as well
created .gcloudignore file. See `gcloud topic gcloudignore` for details.

S3-Dist-Cp Failing on EMR5

I am facing issues with s3-dist-cp command in emr-5.0.0 version. In my application, I need to push some files from hdfs to S3. I am using s3-dist-cp command to achieve this. It was working fine in emr-4.2.0. But its not working in emr-5.0.0. If I run the command manually it works fine. But it fails in my application. I didn't make any change in my application to run it on emr-5.
Do I need to make any change if I need to use emr-5? Has there been any change in way we use s3-dist-cp command in emr-5?
I am using following command:
s3-dist-cp --src /user/hive/warehouse/abc.text --dest s3n://bucket/abc.text
s3-dist-cp is only available on the master node(s3-dist-cp.jar).
The following is the location of the application.
/usr/share/aws/emr/s3-dist-cp/
The s3-dist-cp.jar is not available in the slave nodes.
You can login into slave machine and verify it.
So the reason your application failure might be, In new emr you might be using some workflow management tool which deploy the application on slaves and start from there. As s3 s3-dist-cp is not available and it fails.
Work Around
First Option
bundle the jar and use following commands
hadoop jar s3-dist-cp.jar --src location --dest location
Second
Boot Strap the s3-dist-cp.jars on the cluster
You can even run it as java program
First thing, s3n:// is now deprecated, start using s3:// for S3 paths.
Secondly, if you're merely copying a file into S3 from a local file on your cluster, you can use aws s3 cp:
aws s3 cp /user/hive/warehouse/abc.text s3://bucket/abc.text
The syntax that you have used for s3-dist-cp is incorrect. Please try again with the command below.
s3-dist-cp --src hdfs:///user/hive/warehouse/abc.text --dest s3n://bucket/abc.text
Let me know if this solves your proble.

Error starting Spark in EMR 4.0

I created an EMR 4.0 instance in AWS with all available applications, including Spark. I did it manually, through AWS Console. I started the cluster and SSHed to the master node when it was up. There I ran pyspark. I am getting the following error when pyspark tries to create SparkContext:
2015-09-03 19:36:04,195 ERROR Thread-3 spark.SparkContext
(Logging.scala:logError(96)) - -ec2-user, access=WRITE,
inode="/user":hdfs:hadoop:drwxr-xr-x at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
I haven't added any custom applications, nor bootstrapping and expected everything to work without errors. Not sure what's going on. Any suggestions will be greatly appreciated.
Login as the user "hadoop" (http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-connect-master-node-ssh.html). It has all the proper environment and related settings for working as expected. The error you are receiving is due to logging in as "ec2-user".
I've been working with Spark on EMR this week, and found a few weird things relating to user permissions and relative paths.
It seems that running Spark from a directory which you don't 'own', as a user, is problematic. In some situations Spark (or some of the underlying Java pieces) want to create files or folders, and they think that pwd - the current directory - is the best place to do that.
Try going to the home directory
cd ~
then running pyspark.

AWS Code Deploy Error on Before Install Cannot Solve

So I am attempting to setup CodeDeploy for my application and I keep getting an error during the BeforeInstall part of the deployment. Below is the error.
Error Code UnknownError
Script Name
Message No such file or directory - /opt/codedeploy-agent/deployment-root/06100f1b-5495-42d9-bd01-f33d59fb5deb/d-NL5K1THE8/deployment-archive/appspec.yml
Log Tail
I assumed this meant the YAML file was in the wrong place. However it is in the root directory of my revision. I have tried using a simple AppSpec file like so instead of a more complex one.
## YAML Template.
---
version: 0.0
os: linux
files:
- source: /
destination: /home/ubuntu/www
More or less since this is a first deployment I want it to add all files in the revision to the public directory on the web server.
I am tearing my hair out over this and I feel it is a simple issue. I have the IAM policies and roles correct and I have CodeDeploy setup and running on my instance I am trying to deploy to.
It seems to think you had a successful deploy at some point.
Go into /opt/codedeploy-agent/deployment-root/deployment-instructions/ and delete all the files in there. Then it won't look for this last deploy.
I just had this SAME problem and I figured it out! Make sure your AppSpec file has the right EXTENSION! I was using yaml and not yml, now everything works perfectly.
I made it work like this:
I had a couple of failed deployments for various reasons.
The thing is that CD keeps in the EC2 instance and in the path /opt/codedeploy-agent/deployment-root/​ a folder named by the ID of the failed deployment [a very long alphanumeric sting] .
Delete this folder and create a new deployment [from the aws UI console] and redeploy the application. This way the appspec.yml file that is in the wrong place will be deleted.
It should now succeed.
Extra Notice:
CD does not rewrite files [that have not been created by it's specific deployment]
CodeDeploy does not deploy in a folder that there is already code[files] as it does not want to interfere with different CD deployments and/or other CI/CD tools [like Jenkins].
It only deploys in a path that has already deploy code with the specific deployment.
You can empty the folder where your deployment want to happen and redeploy your code via CD.
When you login to the host, do you see the appspec.yml file in the directory there? If not are you positive it has been checked in with the rest of your deployed code?
Just encountered this issue too. In my case, the revision zip file extracts into a directory when deployed. Because of that /opt/codedeploy-agent/deployment-root/xxx/xxx/deployment-archive contains the parent directory of my revision files (instead of the actual revision files).
The key is to compress your revision without the parent directory. In mac terminal,
cd your-app-directory-containing-appspec
zip -r app.zip .