Zeppelin error org.apache.thrift.transport.TTransportException - amazon-web-services

I'm using Zeppelin in local connected to an Dev AWS Glue EndPoint. Until two weeks ago all worked fine, but now I'm getting this error:
org.apache.thrift.transport.TTransportException at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:266)
at
org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:250)
at
org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:373)
at
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:406)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175) at
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
The spark interpreter is configured as AWS tells:
As I know the problem is that zeppelin can't connect to spark interpreter. What I'm missing?
Many thanks.

Related

Pyspark read jdbc giving errors . How to fix?

I am connecting to RDS MySQL using JDBC in pyspark . I have tried almost everything that I found on Stackoverflow for debugging but still, i am unable to make it work .
spark = SparkSession.builder.config("spark.jars", mysql_jar) \
.master("local[*]").appName("PySpark_MySQL_test").getOrCreate()
df= spark.read.format("jdbc").option("url", "jdbc:mysql://hostname.amazonaws.com:1150/dbname?user=user_name&password=password") \
.option("driver", "com.mysql.cj.jdbc.Driver").option("dbtable", "table_name").load()
I have tried using the same connection details in pymysql library of python it connects and brings back the result.
But here I getting the below error and am unable to solve it.
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o38.load.
: com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
at com.mysql.cj.jdbc.exceptions.SQLError.createCommunicationsException(SQLError.java:174)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:64)
at com.mysql.cj.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:827)
at com.mysql.cj.jdbc.ConnectionImpl.<init>(ConnectionImpl.java:447)
at com.mysql.cj.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:237)
at com.mysql.cj.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:199)
at org.apache.spark.sql.execution.datasources.jdbc.connection.BasicConnectionProvider.getConnection(BasicConnectionProvider.scala:49)
at org.apache.spark.sql.execution.datasources.jdbc.connection.ConnectionProvider$.create(ConnectionProvider.scala:68)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$createConnectionFactory$1(JdbcUtils.scala:62)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:56)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:355)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.mysql.cj.exceptions.CJCommunicationsException: Communications link failure
I have experienced the same issues.Now it is worked.The core reason is spark use master node to connect mysql and use work nodes to execute task.So you can connect mysql while raise communication error.Based on this theory,you can open the security rules on mysql to let all spark node can connect to mysql
For anyone coming here for an answer using Docker give the below solution a try.
use the below configuration
source_df = spark.read.format('jdbc').options(
url='jdbc:mysql://host.docker.internal:3306/superset?useSSL=false&allowPublicKeyRetrieval=true',
driver='com.mysql.cj.jdbc.Driver',
dbtable='table',
user='root',
password='root').load()
I have tried the host with localhost, 127.0.0.1, and even the IPAddress from docker inspect but didn't work then changed it to host.docker.internal and it worked.

Developing Glue ETL script locally - java.lang.IllegalStateException: Connection pool shut down

I'm currently developing ETL scripts locally using the AWS Glue ETL library.
I'm facing an issue when extracting data from S3 bucket as DynamicFrame.
When I want to convert to a DataFrame using toDF(), it will always trigger this exception:
py4j.protocol.Py4JJavaError: An error occurred while calling o52.toDF
...
ERROR Executor: Exception in task 5.0 in stage 3.0 (TID 29)
java.lang.IllegalStateException: Connection pool shut down
at org.apache.http.util.Asserts.check(Asserts.java:34)
at org.apache.http.pool.AbstractConnPool.lease(AbstractConnPool.java:191)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.requestConnection(PoolingHttpClientConnectionManager.java:267)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
at com.amazonaws.http.conn.$Proxy15.requestConnection(Unknown Source)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:176)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1330)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5062)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5008)
at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1490)
at org.apache.hadoop.fs.s3a.S3AInputStream.reopen(S3AInputStream.java:148)
at org.apache.hadoop.fs.s3a.S3AInputStream.lazySeek(S3AInputStream.java:281)
at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:364)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:179)
at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:163)
at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:284)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at com.amazonaws.services.glue.readers.BufferedStream.read(DynamicRecordReader.scala:91)
at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.ensureLoaded(ByteSourceJsonBootstrapper.java:489)
at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.detectEncoding(ByteSourceJsonBootstrapper.java:126)
at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.constructParser(ByteSourceJsonBootstrapper.java:215)
I tried the same code on AWS Glue DevEndpoint and it works fine. Any idea how to resolve this?
Please go with java8, your issue will be resolved
check java -version
I had the same issue when running the dev env with following specs:
Scala version 2.11.12
Spark version 2.4.3
Glue 1.0.0
To fix it, add the following line to the spark configuration in $SPARK_HOME/conf/spark-defaults.conf
spark.master local
Alternatively, depending on how you are running your job, you can configure this dynamically if you are in control of the spark context. i.e.
from pyspark.conf import SparkConf
from pyspark.context import SparkContext
conf = SparkConf()
conf.setMaster("local").setAppName("My app")
sc = SparkContext(conf=conf)
I have found this happens when running in local mode with multiple threads, I have found that increasing fs.s3.connection.maximum or fs.s3a.connection.maximum does not fix the issue. Although this post indicates that it should https://kb.databricks.com/jobs/job-fails-connection-pool.html

SPARK_CONF not found in AWS EMR

I am trying to deploy a spark application in EMR and facing the following issue.
java.io.FileNotFoundException: File does not exist: hdfs://ip-10-184-176-172.ec2.internal:8020/user/hadoop/.sparkStaging/application_1446113189622_0004/__spark_conf__2712437380309904293.zip
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
I am deploying in cluster mode using the emr console UI. In the first line it specifies the SPARK_CONF zip is uploaded in the hdfs location but the error says file not found on the same location. have anyone faced similar issue?
Issue Resolved. I was using unsupported JAVA version. EMR has java 7 and my application was developed in java 8.

SAAJMetaFactoryImpl not found JBOSS EAP 6.3

I am trying to deploy my application pharma.ear onto JBOSS EAP 6.3 but getting an error:
17:23:42,622 SEVERE [javax.enterprise.resource.webservices.jaxws.server.http] (ServerService Thread Pool -- 78) WSSERVLET11: failed to parse runtime descriptor: java.lang.Error: javax.xml.soap.SOAPException: Unable to create SAAJ meta-factoryProvider com.sun.xml.internal.messaging.saaj.soap.SAAJMetaFactoryImpl not found: java.lang.Error: javax.xml.soap.SOAPException: Unable to create SAAJ meta-factoryProvider com.sun.xml.internal.messaging.saaj.soap.SAAJMetaFactoryImpl not found
at com.sun.xml.ws.api.SOAPVersion.<init>(SOAPVersion.java:181) [jaxws-rt.jar:2.1.3]
at com.sun.xml.ws.api.SOAPVersion.<clinit>(SOAPVersion.java:83) [jaxws-rt.jar:2.1.3]
at com.sun.xml.ws.api.BindingID.<clinit>(BindingID.java:318) [jaxws-rt.jar:2.1.3]
at com.sun.xml.ws.transport.http.DeploymentDescriptorParser.createBinding(DeploymentDescriptorParser.java:325) [classes:2.1.3]
at com.sun.xml.ws.transport.http.DeploymentDescriptorParser.parseAdapters(DeploymentDescriptorParser.java:264) [classes:2.1.3]
at com.sun.xml.ws.transport.http.DeploymentDescriptorParser.parse(DeploymentDescriptorParser.java:156) [classes:2.1.3]
at com.sun.xml.ws.transport.http.servlet.WSServletContextListener.contextInitialized(WSServletContextListener.java:108) [jaxws-rt.jar:2.1.3]
at org.apache.catalina.core.StandardContext.contextListenerStart(StandardContext.java:3339) [jbossweb-7.4.8.Final-redhat-4.jar:7.4.8.Final-redhat-4]
at org.apache.catalina.core.StandardContext.start(StandardContext.java:3777) [jbossweb-7.4.8.Final-redhat-4.jar:7.4.8.Final-redhat-4]
at org.jboss.as.web.deployment.WebDeploymentService.doStart(WebDeploymentService.java:161) [jboss-as-web-7.4.0.Final-redhat-19.jar:7.4.0.Final-redhat-19]
at org.jboss.as.web.deployment.WebDeploymentService.access$000(WebDeploymentService.java:59) [jboss-as-web-7.4.0.Final-redhat-19.jar:7.4.0.Final-redhat-19]
at org.jboss.as.web.deployment.WebDeploymentService$1.run(WebDeploymentService.java:94) [jboss-as-web-7.4.0.Final-redhat-19.jar:7.4.0.Final-redhat-19]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) [rt.jar:1.7.0_17]
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) [rt.jar:1.7.0_17]
at java.util.concurrent.FutureTask.run(FutureTask.java:166) [rt.jar:1.7.0_17]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [rt.jar:1.7.0_17]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [rt.jar:1.7.0_17]
at java.lang.Thread.run(Thread.java:722) [rt.jar:1.7.0_17]
at org.jboss.threads.JBossThread.run(JBossThread.java:122)
Caused by: javax.xml.soap.SOAPException: Unable to create SAAJ meta-factoryProvider com.sun.xml.internal.messaging.saaj.soap.SAAJMetaFactoryImpl not found
at javax.xml.soap.SAAJMetaFactory.getInstance(SAAJMetaFactory.java:83) [jboss-saaj-api_1.3_spec-1.0.3.Final-redhat-1.jar:1.0.3.Final-redhat-1]
at javax.xml.soap.MessageFactory.newInstance(MessageFactory.java:144) [jboss-saaj-api_1.3_spec-1.0.3.Final-redhat-1.jar:1.0.3.Final-redhat-1]
at com.sun.xml.ws.api.SOAPVersion.<init>(SOAPVersion.java:178) [jaxws-rt.jar:2.1.3]
I have saaj-impl.jar and saaj-api.jar in my pharma.war/WEB-INF/lib.
Can anyone please help me in resolving this issue.
Can you deploy the ear file on another app server, e.g. JBoss 7.1.1.Final. This is just to verify that the issue is not with your ear file.
Also try not bundle the SAAJ jar with your war/ear file. Let JBoss pick it up from it own modules. In Maven you can do that by provided

WSO2 BAM New Error - SSTableBatchOpen

Yesterday, I configured BAM with AM and MySQL.
But, today, when I have rebooted the system, a new error appears.
This is not the only problem I have encountered.
Here is the error :
[2014-05-14 09:00:08,372] ERROR {org.apache.cassandra.service.AbstractCassandraDaemon} - Exception in thread Thread[SSTableBatchOpen:1,5,main]
java.lang.NegativeArraySizeException
at org.apache.cassandra.utils.EstimatedHistogram$EstimatedHistogramSerializer.deserialize(EstimatedHistogram.java:241)
at org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:217)
at org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:206)
at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:171)
at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:153)
at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:242)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Where could it come from?