Error in executing Customised WordCount jar in AWS EMR - amazon-web-services

Hi I am trying to execute customised WordCount jar on AWs EMR.
My word count jar is working properly because I tried adding it as a step without job arguments and it is running successfully. My problem is when I run it with job arguments.
In my s3 I have 2 folders
Jar location -> s3n://word-count123/WordCount.jar
jar Arguments ->s3n://word-count123/input
s3n://word-count123/output
input folder contains one txt file and output folder one txt file.
Am I doing something wrong? I can't seem to figure it out. Thanks.
P.S I dont wanna execute it from CLI.

Just executed a existing WordCount jar.. Seems to be a problem with my JAR.

Related

CodeDeploy pipeline not finding AppSpec.yml - but is clearly available

I've had this running months ago, so I know it works, but have created a new EC2 instance to deploy my code and stuck at the first hurdle.
My Deployment Details runs as follows:
Application Stop - succeeded
Download Bundle - succeeded
BeforeInstall - Failed
Upon looking at the failed event, I get:
The CodeDeploy agent did not find an AppSpec file within the unpacked revision directory at revision-relative path "appspec.yml". The revision was unpacked to directory "C:\ProgramData/Amazon/CodeDeploy/57f7ec1b-0452-444e-840c-4deb4566e82d/d-WH9HTZAW0/deployment-archive", and the AppSpec file was expected but not found at path "C:\ProgramData/Amazon/CodeDeploy/57f7ec1b-0452-444e-840c-4deb4566e82d/d-WH9HTZAW0/deployment-archive/appspec.yml". Consult the AWS CodeDeploy Appspec documentation for more information at http://docs.aws.amazon.com/codedeploy/latest/userguide/reference-appspec-file.html
Thing is, if I jump onto my EC2 and copy and paste the full path, sure enough I see the YML file, along with the files that were in a ZIP file within my S3 bucket, so they've been successfully sent to the EC2 and unzipped.
So I'm sure it's not a permissions things, the connection is being clearly made, and the S3 Bucket, CodeDeploy and my EC2 are all happy.
I read various posts on StackOverflow about changing the AppSpec.yml file to "appspec.yml", "AppSpec.yaml", "appspec.yaml", and still nothing works.
Anything obvious to try out?
OK, after a few days back and forth, the solution was incredibly annoying (and embarrassing)...
On my EC2 instance, the "File Name Extensions" was unticked, so my AppSpec.yml was actually AppSpec.yml.txt
IF anyone else has a similar issue, do check this first!!
How are you zipping the file. A lot of times users end up "double-zipping". To check if you unzip the .zip file does it gives you the files or the folder?
When we zip a folder on Windows, it basically creates a folder inside the zip folder and thus, CodeDeploy agent cannot read it. So to zip the artifact, please select all the files and then right click to zip it on the same location. This would avoid creating a new folder inside the zip.

AWS EMR step doesn't find jar imported from s3

I am attempting to run a spark application on aws emr in client mode. I have setup a bootstrap action to import needed files and the jar from s3, and I have a step to run a single spark job.
However when the step executes, the jar I have imported isn't found. Here is the stderr output:
19/12/01 13:42:05 WARN DependencyUtils: Local jar /mnt/var/lib/hadoop/steps/s-2HLX7KPZCA07B/~/myApplicationDirectory does not exist, skipping.
I am able to successfully import the jar and other needed files for the application from my s3 bucket to the master instance, I simply import them to home/ec2-user/myApplicationDirectory/myJar.jar via a bootstrap action.
However I don't understand why the step is looking for the jar at mnt/var/lib/hadoop/...etc.
here are the relevant parts of the cli configuration:
--steps '[{"Args":["spark-submit",
"--deploy-mode","client",
"--num-executors","1",
“--driver-java-options","-Xss4M",
"--conf","spark.driver.maxResultSize=20g",
"--class”,”myApplicationClass”,
“~/myApplicationDirectory”,
“myJar.jar",
…
application specific arguments and paths to folders here
…],
”Type":"CUSTOM_JAR",
thanks for any help,
It looks like it doesn't understand the ~ as referring to the home directory. Try changing "~/myApplicationDirectory" to "/home/ec2-user/myApplicationDirectory".
A little warning: in the sample in your question, straight quotation marks " are mixed with "smart" ones “. Make sure the "smart" quotation marks don't end up in your configuration file, or you will get very confusing error messages.

How to write a bootstrap action to download a file to each node in EMR?

I'm trying to download a postgres driver to each node of my cluster. I wrote the following bootstrap action, but it doesn't seem to have worked:
#!/bin/bash
aws s3 cp s3://path/to/driver/jars/postgresql-9.4.1210.jre7.jar .
I know this must be an easy thing to do, but I can't seem to find an obvious example.
The bootstrap action you have looks fine and is probably working. It's just that you are probably assuming that it will download the file to the same directory where you land when ssh'ing to the cluster, which is /home/hadoop, but that is not the case. The working directory of bootstrap actions is somewhere under /var/lib/bootstrap-actions, if I remember correctly.
It would be easier to find the file you've downloaded if you change "." to something like "/home/hadoop". You could also create some other new directory to which to download the file as part of this script (using "sudo mkdir" and "sudo chown" if necessary).

AWS Code Deploy Error on Before Install Cannot Solve

So I am attempting to setup CodeDeploy for my application and I keep getting an error during the BeforeInstall part of the deployment. Below is the error.
Error Code UnknownError
Script Name
Message No such file or directory - /opt/codedeploy-agent/deployment-root/06100f1b-5495-42d9-bd01-f33d59fb5deb/d-NL5K1THE8/deployment-archive/appspec.yml
Log Tail
I assumed this meant the YAML file was in the wrong place. However it is in the root directory of my revision. I have tried using a simple AppSpec file like so instead of a more complex one.
## YAML Template.
---
version: 0.0
os: linux
files:
- source: /
destination: /home/ubuntu/www
More or less since this is a first deployment I want it to add all files in the revision to the public directory on the web server.
I am tearing my hair out over this and I feel it is a simple issue. I have the IAM policies and roles correct and I have CodeDeploy setup and running on my instance I am trying to deploy to.
It seems to think you had a successful deploy at some point.
Go into /opt/codedeploy-agent/deployment-root/deployment-instructions/ and delete all the files in there. Then it won't look for this last deploy.
I just had this SAME problem and I figured it out! Make sure your AppSpec file has the right EXTENSION! I was using yaml and not yml, now everything works perfectly.
I made it work like this:
I had a couple of failed deployments for various reasons.
The thing is that CD keeps in the EC2 instance and in the path /opt/codedeploy-agent/deployment-root/​ a folder named by the ID of the failed deployment [a very long alphanumeric sting] .
Delete this folder and create a new deployment [from the aws UI console] and redeploy the application. This way the appspec.yml file that is in the wrong place will be deleted.
It should now succeed.
Extra Notice:
CD does not rewrite files [that have not been created by it's specific deployment]
CodeDeploy does not deploy in a folder that there is already code[files] as it does not want to interfere with different CD deployments and/or other CI/CD tools [like Jenkins].
It only deploys in a path that has already deploy code with the specific deployment.
You can empty the folder where your deployment want to happen and redeploy your code via CD.
When you login to the host, do you see the appspec.yml file in the directory there? If not are you positive it has been checked in with the rest of your deployed code?
Just encountered this issue too. In my case, the revision zip file extracts into a directory when deployed. Because of that /opt/codedeploy-agent/deployment-root/xxx/xxx/deployment-archive contains the parent directory of my revision files (instead of the actual revision files).
The key is to compress your revision without the parent directory. In mac terminal,
cd your-app-directory-containing-appspec
zip -r app.zip .

Spark - Writing into HDFS does not complete successfully

My question is similar to (Spark writing to hdfs not working with the saveAsNewAPIHadoopFile method)! I am using Spark 1.1.0 on CDH 5.2.1
I am trying to save a file to hdfs system through saveAsTextFile method from Spark. The job completes successfully but when I look into the folder path, I see _temporary folder with data files inside it in various tasks and attempt folder. This tells me Spark is quitting the job as succeeded even before the files are completely moved into hdfs in the right output folder. This is the same issue with saveAsParquetFile method too. Please let me know if you have any idea about this?
Thanks