hadoop 3.3.1 show job history error: Exception in thread "main" java.lang.IllegalArgumentException: JobId string : /output_dir is not properly formed - hadoop3

hadoop 3.3.1
I have successfully run a program
hadoop jar units.jar com.clx.bigdata.ProcessUnits /input_dir /output_dir
from the printed message,I get the job id:job_1625033931379_0001
I can get job history list from web page of 'http://localhost:19888/jobhistory'.
but when I run
hadoop job -history /output_dir
returns error:
2021-06-30 14:54:56,356 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at >/0.0.0.0:8032
Exception in thread "main" java.lang.IllegalArgumentException: JobId string : /output_dir is not properly formed
at org.apache.hadoop.mapreduce.JobID.forName(JobID.java:156)
at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:401)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1277)

Related

Deployment of process archive ‘null’: Cannot deploy process archive ‘null’ to default process: no such process engine exists: processEngine is null

I am getting the below error when I deploy my war file on the tomcat server and it doesn’t deploy it. Please advise.
localhost.log
19-Dec-2022 11:29:41.718 INFO [Catalina-utility-2] org.apache.catalina.core.ApplicationContext.log 2 Spring WebApplicationInitializers detected on classpath
19-Dec-2022 11:29:42.073 SEVERE [Catalina-utility-2] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [org.camunda.bpm.platform.example.migration.ExampleProcessApplication]
org.camunda.bpm.engine.ProcessEngineException: ENGINE-08043 Exception while performing ‘Deployment of Process Application camunda-example-migrate-on-deployment-1.0-SNAPSHOT’ => 'Deployment of process archive ‘null’: Cannot deploy process archive ‘null’ to process engine ‘default’ no such process engine exists: processEngine is null
I have processes.xml file with the following configurations.

why eb deploy failed on first deploy?

I have created 2 NodeJs environments. It works fine.
Later created a new environment and eb deploy env_name give me the following error:
Creating application version archive "app-ceb7-200713_223016".
Uploading appname/app-ceb7-xxxxx.zip to S3. This may take a while.
Upload Complete.
2020-07-13 17:00:23 INFO Environment update is starting.
2020-07-13 17:00:44 INFO Deploying new version to instance(s).
2020-07-13 17:00:50 ERROR [Instance: i-066f599db7ee34] Command failed on instance.
An unexpected error has occurred [ErrorCode: 0000000001].
2020-07-13 17:00:50 INFO Command execution completed on all instances. Summary:
[Successful: 0, Failed: 1].
2020-07-13 17:00:50 ERROR Unsuccessful command execution on instance id(s) 'i-
06682f599db76ee34'. Aborting the operation.
2020-07-13 17:00:50 ERROR Failed to deploy application.
ERROR: ServiceError - Failed to deploy application.
Why this happen? Thanks in advance
Based on the comments.
To solve the issue, EB logs were inspected to identify the error message:
[app-deploy] - [Install customer specified node.js version]. Stop running the command. Error: unsupported node version >=6, please specify any of node versions in [v12.0.0 v12.1.0 v12.10.0 v12.11.0 v12.11.1 v12.12.0 v12.13.0 v12.13.1 v12.14.0 v12.14.1 v12.15.0 v12.16.0 v12.16.1 v12.16.2 v12.16.3 v12.17.0 v12.18.0 v12.18.1 v12.2.0 v12.3.0 v12.3.1 v12.4.0 v12.5.0 v12.6.0 v12.7.0 v12.8.0 v12.8.1 v12.9.0 v12.9.1]
Then, the correct version of the nodejs was specified in package.json:
"engines": { "node": "12.13.0"},

AWS EMR pyspark notebook fails with `Failed to run command /usr/bin/virtualenv (...)`

I have created a basic EMR cluster in AWS, and I'm trying to use the Jupyter Notebooks provided through the AWS Console. Launching the notebooks seems to work fine, and I'm also able to run basic python code in notebooks started with the pyspark kernel. Two variables are set up in the notebook: spark is a SparkSession instance, and sc is a SparkContext instance. Displaying sc yields <SparkContext master=yarn appName=livy-session-0> (the output can of course vary slightly depending on the session).
The problem arises once I perform operations that actually hit the spark machinery. For example:
sc.parallelize(list(range(10))).map(lambda x: x**2).collect()
I am no spark expert, but I believe this code should distribute the integers from 0 to 9 across the cluster, square them, and return the results in a list. Instead, I get a lengthy stack trace, mostly from the JVM, but also some python components. I believe the central part of the stack trace is the following:
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 116, ip-XXXXXXXXXXXXX.eu-west-1.compute.internal, executor 17): java.lang.RuntimeException: Failed to run command: /usr/bin/virtualenv -p python3 --system-site-packages virtualenv_application_1586243436143_0002_0
The full stack trace is here.
A bit of digging in the AWS portal led me to log output from the nodes. stdout from one of the nodes includes the following:
The path python3 (from --python=python3) does not exist
I tried running the /usr/bin/virtualenv command on the master node manually (after logging in through), and that worked fine, but the error is of course still present after I did that.
While this error occurs most of the time, I was able to get this working in one session, where I could run several operations against the spark cluster as I was expecting.
Technical information on the cluster setup:
emr-6.0.0
Applications installed are "Ganglia 3.7.2, Spark 2.4.4, Zeppelin 0.9.0, Livy 0.6.0, JupyterHub 1.0.0, Hive 3.1.2". Hadoop is also included.
3 nodes (one of them as master), all r5a.2xlarge.
Any ideas what I'm doing wrong? Note that I am completely new to EMR and Spark.
Edit: Added the stdout log and information about running the virtualenv command manually on the master node through ssh.
I have switched to using emr-5.29.0, which seems to resolve the problem. Perhaps this is an issue with emr-6.0.0? In any case, I have a functional workaround.
The issue for me was that the virtualenv was being made on the executors with a python path that didn't exist. Pointing the executors to the right one did the job for me:
"spark.pyspark.python": "/usr/bin/python3.7"
Here is how I reconfiged the spark app at the beginning of the notebook:
{"conf":{"spark.pyspark.python": "/usr/bin/python3.7",
"spark.pyspark.virtualenv.enabled": "true",
"spark.pyspark.virtualenv.type": "native",
"spark.pyspark.virtualenv.bin.path":"/usr/bin/virtualenv"}
}

Deploying Customized JAR in AWS failing Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS

I am trying to deploy a customized JAR in AWS (Map Reduce). I have to read files from S3. The paths to S3 are given as command line arguments. While the cluster is running, I see the following in the 'Steps' section of the Cluster:
Status:FAILED
Reason:Illegal Argument.
Log File:s3://aws-logs-502743756123-us-east-1/elasticmapreduce/j-3U1NGY5JNUBK2/steps/s-O3W3I4RU4NXS/stderr.gz
Details:Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: s3n://****/input, expected: hdfs://ip-172-31-45-130.ec2.internal:8020
JAR location: s3://****/ChainMapperDriver.jar
Main class: None
Arguments: ChainMapperDriver s3://****/input s3://****/output/
Action on failure: Terminate cluster
ChainMapperDriver is the name of the Main Class.
Do I have to do anything in the JAVA code that I have written to handle the case when the files are in S3? Your help is greatly appreciated.

Spring-xd hdfs sink- Error Creating bean

I am getting the following exception when deploying stream with hdfs as sink in spring-xd.
Error creating bean with name 'hadoopConf iguration': Invocation of init method failed; nested exception is java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf
I have my spring-xd app running on yarn successfully. Appreciate your help.
The problem is with the configuration in siteMapreduceAppClassPath in servers.yml. The classpath should include the path of hadoop-core jar, as the jar is not included the app it is giving NoClassDefFoundError.