SPARK_CONF not found in AWS EMR - amazon-web-services

I am trying to deploy a spark application in EMR and facing the following issue.
java.io.FileNotFoundException: File does not exist: hdfs://ip-10-184-176-172.ec2.internal:8020/user/hadoop/.sparkStaging/application_1446113189622_0004/__spark_conf__2712437380309904293.zip
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
I am deploying in cluster mode using the emr console UI. In the first line it specifies the SPARK_CONF zip is uploaded in the hdfs location but the error says file not found on the same location. have anyone faced similar issue?

Issue Resolved. I was using unsupported JAVA version. EMR has java 7 and my application was developed in java 8.

Related

UnsupportedClassVersionError with mysql jdbc driver in AWS Data Pipeline

I am trying to run a Data Pipeline job in AWS. I added the field "Jdbc Driver Jar Uri" and placed the jar file in my s3 bucket, per instructions here, because it seems "Connector/J" that is installed by AWS Data Pipeline does not work.
I'm using mysql-connector-java-8.0.23 and my mysql database version is the same.
java.lang.UnsupportedClassVersionError: com/mysql/jdbc/Driver : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:808)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:443)
at java.net.URLClassLoader.access$100(URLClassLoader.java:65)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.net.URLClassLoader$1.run(URLClassLoader.java:349)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:348)
at java.lang.ClassLoader.loadClass(ClassLoader.java:430)
at java.lang.ClassLoader.loadClass(ClassLoader.java:363)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at amazonaws.datapipeline.database.JdbcDriverInitializer.getDriver(JdbcDriverInitializer.java:75)
at amazonaws.datapipeline.database.ConnectionFactory.getRdsDatabaseConnection(ConnectionFactory.java:158)
at amazonaws.datapipeline.database.ConnectionFactory.getConnection(ConnectionFactory.java:74)
at amazonaws.datapipeline.database.ConnectionFactory.getConnectionWithCredentials(ConnectionFactory.java:302)
at amazonaws.datapipeline.connector.SqlDataNode.createConnection(SqlDataNode.java:100)
at amazonaws.datapipeline.connector.SqlDataNode.getConnection(SqlDataNode.java:94)
at amazonaws.datapipeline.connector.SqlDataNode.prepareStatement(SqlDataNode.java:162)
at amazonaws.datapipeline.connector.SqlInputConnector.open(SqlInputConnector.java:49)
at amazonaws.datapipeline.connector.SqlInputConnector.<init>(SqlInputConnector.java:26)
at amazonaws.datapipeline.connector.SqlDataNode.getInputConnector(SqlDataNode.java:79)
at amazonaws.datapipeline.activity.copy.SingleThreadedCopyActivity.processAll(SingleThreadedCopyActivity.java:47)
at amazonaws.datapipeline.activity.copy.SingleThreadedCopyActivity.runActivity(SingleThreadedCopyActivity.java:35)
at amazonaws.datapipeline.activity.CopyActivity.runActivity(CopyActivity.java:22)
at amazonaws.datapipeline.objects.AbstractActivity.run(AbstractActivity.java:16)
at amazonaws.datapipeline.taskrunner.TaskPoller.executeRemoteRunner(TaskPoller.java:136)
at amazonaws.datapipeline.taskrunner.TaskPoller.executeTask(TaskPoller.java:105)
at amazonaws.datapipeline.taskrunner.TaskPoller$1.run(TaskPoller.java:81)
at private.com.amazonaws.services.datapipeline.poller.PollWorker.executeWork(PollWorker.java:76)
at private.com.amazonaws.services.datapipeline.poller.PollWorker.run(PollWorker.java:53)
at java.lang.Thread.run(Thread.java:748)
I've looked at this question for a solution, but I wasn't able to figure out how to adapt those answers to solving it in AWS Data Pipeline.
Can someone explain what steps need to be taken to fix this ClassVersion error?

Zeppelin error org.apache.thrift.transport.TTransportException

I'm using Zeppelin in local connected to an Dev AWS Glue EndPoint. Until two weeks ago all worked fine, but now I'm getting this error:
org.apache.thrift.transport.TTransportException at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:266)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:250)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:373)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:406)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175) at
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
The spark interpreter is configured as AWS tells:
As I know the problem is that zeppelin can't connect to spark interpreter. What I'm missing?
Many thanks.

AWS EMR Mapreduce failure

We have an installation of AWS EMR in a client environment. The encryption in transit and the encryption at rest has been enabled using security configuration. We continue to get the below mapreduce errors when we execute a simple Hive query.
Diagnostic Messages for this Task:
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError:
error in shuffle in fetcher#1
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:377)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by:
java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:366)
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:288)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:282)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
Please let me know if anyone has faced this error before.

WSO2-CEP KafkaReceiver deployment

I have defined a KafkaReceiver usind WSO2-CEP in management console.
It doesn't work and I can read this error on server-console log
[2016-02-26 10:17:25,412] INFO {org.wso2.carbon.event.input.adapter.core.internal.InputAdapterRuntime} - Connecting receiver kafkaReceiver
[2016-02-26 10:17:25,415] ERROR {org.wso2.carbon.event.input.adapter.core.internal.InputAdapterRuntime} - Error initializing Input Adapter 'kafkaReceiver, hence this will be suspended indefinitely, Cannot access kafka context due to missing jars
org.wso2.carbon.event.input.adapter.core.exception.InputEventAdapterRuntimeException: Cannot access kafka context due to missing jars
at org.wso2.carbon.event.input.adapter.kafka.KafkaEventAdapter.createConsumerConfig(KafkaEventAdapter.java:114)
at org.wso2.carbon.event.input.adapter.kafka.KafkaEventAdapter.createKafkaAdaptorListener(KafkaEventAdapter.java:132)
at org.wso2.carbon.event.input.adapter.kafka.KafkaEventAdapter.connect(KafkaEventAdapter.java:66)
at org.wso2.carbon.event.input.adapter.core.internal.InputAdapterRuntime.start(InputAdapterRuntime.java:72)
at org.wso2.carbon.event.input.adapter.core.internal.InputAdapterRuntime.startPolling(InputAdapterRuntime.java:62)
at org.wso2.carbon.event.input.adapter.core.internal.CarbonInputEventAdapterService.startPolling(CarbonInputEventAdapterService.java:187)
at org.wso2.carbon.event.receiver.core.internal.CarbonEventReceiverService.startPolling(CarbonEventReceiverService.java:620)
at org.wso2.carbon.event.receiver.core.internal.CarbonEventReceiverManagementService.startPolling(CarbonEventReceiverManagementService.java:99)
at org.wso2.carbon.event.processor.manager.core.internal.CarbonEventManagementService$2.run(CarbonEventManagementService.java:184)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError: kafka/consumer/ConsumerConfig
at org.wso2.carbon.event.input.adapter.kafka.KafkaEventAdapter.createConsumerConfig(KafkaEventAdapter.java:112)
... 15 more
Caused by: java.lang.ClassNotFoundException: kafka.consumer.ConsumerConfig cannot be found by org.wso2.carbon.event.input.adapter.kafka_5.0.3
at org.eclipse.osgi.internal.loader.BundleLoader.findClassInternal(BundleLoader.java:501)
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:421)
at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:412)
at org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.java:107)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 16 more
If we read the class source at this link https://github.com/wso2/carbon-analytics-common/blob/master/components/event-receiver/event-input-adapters/org.wso2.carbon.event.input.adapter.kafka/src/main/java/org/wso2/carbon/event/input/adapter/kafka/KafkaEventAdapter.java
I read that the error is in ConsumerConfig creation
In order to use kafka you need to add kafka download and copy following client JAR files to /repository/components/lib/ directory
kafka_2.10-0.8.2.1.jar
zkclient-0.3.jar
scala-library-2.10.4.jar
zookeeper-3.4.6.jar
metrics-core-2.2.0.jar
kafka-clients-0.8.2.1.jar
[1]. https://docs.wso2.com/display/CEP400/Supporting+Different+Transports#SupportingDifferentTransports-KafkaTransport

SAAJMetaFactoryImpl not found JBOSS EAP 6.3

I am trying to deploy my application pharma.ear onto JBOSS EAP 6.3 but getting an error:
17:23:42,622 SEVERE [javax.enterprise.resource.webservices.jaxws.server.http] (ServerService Thread Pool -- 78) WSSERVLET11: failed to parse runtime descriptor: java.lang.Error: javax.xml.soap.SOAPException: Unable to create SAAJ meta-factoryProvider com.sun.xml.internal.messaging.saaj.soap.SAAJMetaFactoryImpl not found: java.lang.Error: javax.xml.soap.SOAPException: Unable to create SAAJ meta-factoryProvider com.sun.xml.internal.messaging.saaj.soap.SAAJMetaFactoryImpl not found
at com.sun.xml.ws.api.SOAPVersion.<init>(SOAPVersion.java:181) [jaxws-rt.jar:2.1.3]
at com.sun.xml.ws.api.SOAPVersion.<clinit>(SOAPVersion.java:83) [jaxws-rt.jar:2.1.3]
at com.sun.xml.ws.api.BindingID.<clinit>(BindingID.java:318) [jaxws-rt.jar:2.1.3]
at com.sun.xml.ws.transport.http.DeploymentDescriptorParser.createBinding(DeploymentDescriptorParser.java:325) [classes:2.1.3]
at com.sun.xml.ws.transport.http.DeploymentDescriptorParser.parseAdapters(DeploymentDescriptorParser.java:264) [classes:2.1.3]
at com.sun.xml.ws.transport.http.DeploymentDescriptorParser.parse(DeploymentDescriptorParser.java:156) [classes:2.1.3]
at com.sun.xml.ws.transport.http.servlet.WSServletContextListener.contextInitialized(WSServletContextListener.java:108) [jaxws-rt.jar:2.1.3]
at org.apache.catalina.core.StandardContext.contextListenerStart(StandardContext.java:3339) [jbossweb-7.4.8.Final-redhat-4.jar:7.4.8.Final-redhat-4]
at org.apache.catalina.core.StandardContext.start(StandardContext.java:3777) [jbossweb-7.4.8.Final-redhat-4.jar:7.4.8.Final-redhat-4]
at org.jboss.as.web.deployment.WebDeploymentService.doStart(WebDeploymentService.java:161) [jboss-as-web-7.4.0.Final-redhat-19.jar:7.4.0.Final-redhat-19]
at org.jboss.as.web.deployment.WebDeploymentService.access$000(WebDeploymentService.java:59) [jboss-as-web-7.4.0.Final-redhat-19.jar:7.4.0.Final-redhat-19]
at org.jboss.as.web.deployment.WebDeploymentService$1.run(WebDeploymentService.java:94) [jboss-as-web-7.4.0.Final-redhat-19.jar:7.4.0.Final-redhat-19]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_17]
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_17]
at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_17]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_17]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_17]
at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_17]
at org.jboss.threads.JBossThread.run(JBossThread.java:122)
Caused by: javax.xml.soap.SOAPException: Unable to create SAAJ meta-factoryProvider com.sun.xml.internal.messaging.saaj.soap.SAAJMetaFactoryImpl not found
at javax.xml.soap.SAAJMetaFactory.getInstance(SAAJMetaFactory.java:83) [jboss-saaj-api_1.3_spec-1.0.3.Final-redhat-1.jar:1.0.3.Final-redhat-1]
at javax.xml.soap.MessageFactory.newInstance(MessageFactory.java:144) [jboss-saaj-api_1.3_spec-1.0.3.Final-redhat-1.jar:1.0.3.Final-redhat-1]
at com.sun.xml.ws.api.SOAPVersion.<init>(SOAPVersion.java:178) [jaxws-rt.jar:2.1.3]
I have saaj-impl.jar and saaj-api.jar in my pharma.war/WEB-INF/lib.
Can anyone please help me in resolving this issue.
Can you deploy the ear file on another app server, e.g. JBoss 7.1.1.Final. This is just to verify that the issue is not with your ear file.
Also try not bundle the SAAJ jar with your war/ear file. Let JBoss pick it up from it own modules. In Maven you can do that by provided