AWS EMR Mapreduce failure - amazon-web-services

We have an installation of AWS EMR in a client environment. The encryption in transit and the encryption at rest has been enabled using security configuration. We continue to get the below mapreduce errors when we execute a simple Hive query.
Diagnostic Messages for this Task:
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError:
error in shuffle in fetcher#1
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:377)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by:
java.io.IOException: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkReducerHealth(ShuffleSchedulerImpl.java:366)
at org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:288)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.openShuffleUrl(Fetcher.java:282)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:323)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:193)
Please let me know if anyone has faced this error before.

Related

UnsupportedClassVersionError with mysql jdbc driver in AWS Data Pipeline

I am trying to run a Data Pipeline job in AWS. I added the field "Jdbc Driver Jar Uri" and placed the jar file in my s3 bucket, per instructions here, because it seems "Connector/J" that is installed by AWS Data Pipeline does not work.
I'm using mysql-connector-java-8.0.23 and my mysql database version is the same.
java.lang.UnsupportedClassVersionError: com/mysql/jdbc/Driver : Unsupported major.minor version 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:808)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:443)
at java.net.URLClassLoader.access$100(URLClassLoader.java:65)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.net.URLClassLoader$1.run(URLClassLoader.java:349)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:348)
at java.lang.ClassLoader.loadClass(ClassLoader.java:430)
at java.lang.ClassLoader.loadClass(ClassLoader.java:363)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at amazonaws.datapipeline.database.JdbcDriverInitializer.getDriver(JdbcDriverInitializer.java:75)
at amazonaws.datapipeline.database.ConnectionFactory.getRdsDatabaseConnection(ConnectionFactory.java:158)
at amazonaws.datapipeline.database.ConnectionFactory.getConnection(ConnectionFactory.java:74)
at amazonaws.datapipeline.database.ConnectionFactory.getConnectionWithCredentials(ConnectionFactory.java:302)
at amazonaws.datapipeline.connector.SqlDataNode.createConnection(SqlDataNode.java:100)
at amazonaws.datapipeline.connector.SqlDataNode.getConnection(SqlDataNode.java:94)
at amazonaws.datapipeline.connector.SqlDataNode.prepareStatement(SqlDataNode.java:162)
at amazonaws.datapipeline.connector.SqlInputConnector.open(SqlInputConnector.java:49)
at amazonaws.datapipeline.connector.SqlInputConnector.<init>(SqlInputConnector.java:26)
at amazonaws.datapipeline.connector.SqlDataNode.getInputConnector(SqlDataNode.java:79)
at amazonaws.datapipeline.activity.copy.SingleThreadedCopyActivity.processAll(SingleThreadedCopyActivity.java:47)
at amazonaws.datapipeline.activity.copy.SingleThreadedCopyActivity.runActivity(SingleThreadedCopyActivity.java:35)
at amazonaws.datapipeline.activity.CopyActivity.runActivity(CopyActivity.java:22)
at amazonaws.datapipeline.objects.AbstractActivity.run(AbstractActivity.java:16)
at amazonaws.datapipeline.taskrunner.TaskPoller.executeRemoteRunner(TaskPoller.java:136)
at amazonaws.datapipeline.taskrunner.TaskPoller.executeTask(TaskPoller.java:105)
at amazonaws.datapipeline.taskrunner.TaskPoller$1.run(TaskPoller.java:81)
at private.com.amazonaws.services.datapipeline.poller.PollWorker.executeWork(PollWorker.java:76)
at private.com.amazonaws.services.datapipeline.poller.PollWorker.run(PollWorker.java:53)
at java.lang.Thread.run(Thread.java:748)
I've looked at this question for a solution, but I wasn't able to figure out how to adapt those answers to solving it in AWS Data Pipeline.
Can someone explain what steps need to be taken to fix this ClassVersion error?

Pyspark read jdbc giving errors . How to fix?

I am connecting to RDS MySQL using JDBC in pyspark . I have tried almost everything that I found on Stackoverflow for debugging but still, i am unable to make it work .
spark = SparkSession.builder.config("spark.jars", mysql_jar) \
.master("local[*]").appName("PySpark_MySQL_test").getOrCreate()
df= spark.read.format("jdbc").option("url", "jdbc:mysql://hostname.amazonaws.com:1150/dbname?user=user_name&password=password") \
.option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "table_name").load()
I have tried using the same connection details in pymysql library of python it connects and brings back the result.
But here I getting the below error and am unable to solve it.
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o38.load.
: com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at com.mysql.cj.jdbc.exceptions.SQLError.createCommunicationsException(SQLError.java:174)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:64)
at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:827)
at com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:447)
at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:237)
at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:199)
at org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(BasicConnectionProvider.scala:49)
at org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider$.create(ConnectionProvider.scala:68)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$createConnectionFactory$1(JdbcUtils.scala:62)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:56)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.mysql.cj.exceptions.CJCommunicationsException: Communications link failure
I have experienced the same issues.Now it is worked.The core reason is spark use master node to connect mysql and use work nodes to execute task.So you can connect mysql while raise communication error.Based on this theory,you can open the security rules on mysql to let all spark node can connect to mysql
For anyone coming here for an answer using Docker give the below solution a try.
use the below configuration
source_df = spark.read.format('jdbc').options(
url='jdbc:mysql://host.docker.internal:3306/superset?useSSL=false&allowPublicKeyRetrieval=true',
driver='com.mysql.cj.jdbc.Driver',
dbtable='table',
user='root',
password='root').load()
I have tried the host with localhost, 127.0.0.1, and even the IPAddress from docker inspect but didn't work then changed it to host.docker.internal and it worked.

AWS Glue spark job - Snowflake connection error

I am getting the below error while running the AWS glue job using spark
Glue version 2.0 spark 2.4 python 3
Could you please let me know if anyone encountered similar issue using AWS glue and snowflake.
2021-04-27 13:59:23,858 ERROR [main] glue.ProcessLauncher (Logging.scala:logError(91)): Exception in User Class
java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1862)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:281)
at org.apache.spark.executor.CoarseGrainedExecutorBackendPlugin$class.launch(CoarseGrainedExecutorBackendWrapper.scala:10)
at org.apache.spark.executor.CoarseGrainedExecutorBackendWrapper$$anon$1.launch(CoarseGrainedExecutorBackendWrapper.scala:15)
at org.apache.spark.executor.CoarseGrainedExecutorBackendWrapper.launch(CoarseGrainedExecutorBackendWrapper.scala:19)
at org.apache.spark.executor.CoarseGrainedExecutorBackendWrapper$.main(CoarseGrainedExecutorBackendWrapper.scala:5)
at org.apache.spark.executor.CoarseGrainedExecutorBackendWrapper.main(CoarseGrainedExecutorBackendWrapper.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.amazonaws.services.glue.SparkProcessLauncherPlugin$class.invoke(ProcessLauncher.scala:44)
at com.amazonaws.services.glue.ProcessLauncher$$anon$1.invoke(ProcessLauncher.scala:75)
at com.amazonaws.services.glue.ProcessLauncher.launch(ProcessLauncher.scala:114)
at com.amazonaws.services.glue.ProcessLauncher$.main(ProcessLauncher.scala:26)
at com.amazonaws.services.glue.ProcessLauncher.main(ProcessLauncher.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
... 17 more
Caused by: java.io.IOException: Failed to connect to /172.36.143.34:41447

AWS Glue: Command failed with error code 1

I've been battling this error for weeks now. I have tried many different applications and cannot find any consistency with this error. Sometimes, if I change the job bookmarking setting it paused, enabled, or back to disabled it starts working. I have a java jar that I am referencing to the glue job and I am calling a few methods from it. Sometimes if I rebuild the artifact the job starts working and no longer throws this error. I have another job that uses the same exact jar and doesn't throw the error ever. I have tried creating a new job to start over, but I am seeing the same issue. Here is there error stack that it gives in the logs. The code from the application is a simple dataframe read from S3 and write to another location in s3.
val df = spark.read.parquet(source)
df.write.mode("overwrite").format("parquet").save(destination)
The error stack:
18/04/30 14:40:35 ERROR ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: com.amazonaws.services.glue.AWSGlue.getJobBookmark(Lcom/amazonaws/services/glue/model/GetJobBookmarkRequest;)Lcom/amazonaws/services/glue/model/GetJobBookmarkResult;
java.lang.NoSuchMethodError: com.amazonaws.services.glue.AWSGlue.getJobBookmark(Lcom/amazonaws/services/glue/model/GetJobBookmarkRequest;)Lcom/amazonaws/services/glue/model/GetJobBookmarkResult;
at com.amazonaws.services.glue.util.JobBookmarkServiceShim$$anonfun$2.apply(JobBookmarkUtils.scala:54)
at com.amazonaws.services.glue.util.JobBookmarkServiceShim$$anonfun$2.apply(JobBookmarkUtils.scala:54)
at scala.util.Try$.apply(Try.scala:192)
at com.amazonaws.services.glue.util.JobBookmarkServiceShim.<init>(JobBookmarkUtils.scala:54)
at com.amazonaws.services.glue.util.JobBookmark$.configure(JobBookmarkUtils.scala:178)
at com.amazonaws.services.glue.util.Job$.init(Job.scala:68)
at com.amazonaws.services.glue.util.Job$.init(Job.scala:32)
at NetezzaRawToRefined$.main(script_2018-04-30-14-39-54.scala:16)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.amazonaws.services.glue.util.GlueExceptionWrapper$$anonfun$1.apply$mcV$sp(GlueExceptionWrapper.scala:29)
at com.amazonaws.services.glue.util.GlueExceptionWrapper$$anonfun$1.apply(GlueExceptionWrapper.scala:29)
at com.amazonaws.services.glue.util.GlueExceptionWrapper$$anonfun$1.apply(GlueExceptionWrapper.scala:29)
at scala.util.Try$.apply(Try.scala:192)
at com.amazonaws.services.glue.util.GlueExceptionWrapper$.delayedEndpoint$com$amazonaws$services$glue$util$GlueExceptionWrapper$1(GlueExceptionWrapper.scala:28)
at com.amazonaws.services.glue.util.GlueExceptionWrapper$delayedInit$body.apply(GlueExceptionWrapper.scala:11)
at scala.Function0$
class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at com.amazonaws.services.glue.util.GlueExceptionWrapper$.main(GlueExceptionWrapper.scala:11)
at com.amazonaws.services.glue.util.GlueExceptionWrapper.main(GlueExceptionWrapper.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)
18/04/30 14:40:35 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.NoSuchMethodError: com.amazonaws.services.glue.AWSGlue.getJobBookmark(Lcom/amazonaws/services/glue/model/GetJobBookmarkRequest;)Lcom/amazonaws/services/glue/model/GetJobBookmarkResult;)
java.lang.NoSuchMethodError: com.amazonaws.services.glue.AWSGlue.getJobBookmark(Lcom/amazonaws/services/glue/model/GetJobBookmarkRequest;)Lcom/amazonaws/services/glue/model/GetJobBookmarkResult;
This error typically happens if your run time has different version of jar than your packaged version. I would suggest to make sure you are packing same version of jar.

DataStax Enterpise on AWS - Futures Timed Out After [120] when running Spark job

We are currently running into the following error when attempting to run a Spark Job on DSE 4.8 Analytics
ERROR 2016-04-11 20:59:42,825 UserGroupInformation.java:1128 -
org.apache.hadoop.security.UserGroupInformation:
PriviledgedActionException as:ubuntu
cause:java.util.concurrent.TimeoutException: Futures timed out after
[120 seconds] Exception in thread "main"
java.lang.reflect.InvocationTargetException at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.spark.DseSecureRunner.(DseSecureRunner.scala:24) at
org.apache.spark.DseSecureRunner$.main(DseSecureRunner.scala:34) at
org.apache.spark.DseSecureRunner.main(DseSecureRunner.scala) Caused
by: java.lang.reflect.UndeclaredThrowableException: Unknown exception
in doAs at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1138)
at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:67)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:146)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:245)
at
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
... 7 more Caused by: java.security.PrivilegedActionException:
java.util.concurrent.TimeoutException: Futures timed out after [120
seconds] at java.security.AccessController.doPrivileged(Native
Method) at javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1125)
... 11 more Caused by: java.util.concurrent.TimeoutException: Futures
timed out after [120 seconds] at
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107) at
org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:97) at
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:159)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67)
... 14 more
This error occurs on the 2 worker nodes, while the node which is running the driver runs fine.
We have the following Security Group configuration for a Cluster of 3 nodes created using the DataStax AMI of the 2.6 flavor
Scraped from the documentation my Security Group is like this with one minor exception
The following port was ignored
8983 Custom TCP Rule TCP 0.0.0.0/0 Solr port and Demo applications web site port (Portfolio, Search, Search log, Weather Sensors)
The only way to get around this error was to do the following
ALL TCP
TCP (6)
ALL
cluster-security-group (using the picture as reference this would be sg-bbc40aff)
Which leads me to believe that some process is trying to communicate with nodes in the cluster via another port.
http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/install/installAMIsecurity.html
Has anyone ran into this problem running Spark Jobs using DSE Analytics on AWS?
Thanks