ProtocolException on writing to s3 from spark - amazon-web-services

I am trying to read from hdfs location and save into an AWS s3 Bucket as follows :
spark.sql("select * from sometbl")
.coalesce(1)
.write.mode(SaveMode.Overwrite)
.format("csv")
.save(s"s3a://mybucket/foldername")
But i get this Exception :
Caused by: org.apache.spark.SparkException: Task failed while writing rows.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:257)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: null
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1113)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1063)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4229)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4176)
at com.amazonaws.services.s3.AmazonS3Client.copyObject(AmazonS3Client.java:1852)
at com.amazonaws.services.s3.transfer.internal.CopyCallable.copyInOneChunk(CopyCallable.java:146)
at com.amazonaws.services.s3.transfer.internal.CopyCallable.call(CopyCallable.java:134)
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:133)
at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:44)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Caused by: com.amazonaws.thirdparty.apache.http.client.ClientProtocolException
at com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
at com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at com.amazonaws.thirdparty.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1235)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1055)
... 17 more
Caused by: com.amazonaws.thirdparty.apache.http.ProtocolException: Content-Length header already present
at com.amazonaws.thirdparty.apache.http.protocol.RequestContent.process(RequestContent.java:96)
at com.amazonaws.thirdparty.apache.http.protocol.ImmutableHttpProcessor.process(ImmutableHttpProcessor.java:132)
at com.amazonaws.thirdparty.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:182)
at com.amazonaws.thirdparty.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
... 22 more
Is there a bug in s3 client ? these are versions used
"com.amazonaws" % "aws-java-sdk" % "1.7.4",
"org.apache.hadoop" % "hadoop-aws" % "2.7.3"

Related

wso2 VFS Transport Sender Error while writing to windows sftp path

I'm stuck up on more errors in the same code mentioned in the question linked here.
I get this error as given below:-
(Error - 1)
ERROR {VFSUtils} - Couldn't verify the lock java.io.IOException: inputstream is closed
at com.jcraft.jsch.ChannelSftp.fill(ChannelSftp.java:2911)
at com.jcraft.jsch.ChannelSftp.header(ChannelSftp.java:2935)
at com.jcraft.jsch.ChannelSftp.access$500(ChannelSftp.java:36)
at com.jcraft.jsch.ChannelSftp$2.read(ChannelSftp.java:1417)
at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:290)
at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
at org.apache.commons.vfs2.util.MonitorInputStream.read(MonitorInputStream.java:91)
at java.base/java.io.BufferedInputStream.fill(BufferedInputStream.java:252)
at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:292)
at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
at org.apache.commons.vfs2.util.MonitorInputStream.read(MonitorInputStream.java:91)
at java.base/java.io.FilterInputStream.read(FilterInputStream.java:107)
at org.apache.synapse.commons.vfs.VFSUtils.verifyLock(VFSUtils.java:356)
at org.apache.synapse.commons.vfs.VFSUtils.acquireLock(VFSUtils.java:231)
at org.apache.synapse.commons.vfs.VFSUtils.acquireLock(VFSUtils.java:175)
at org.apache.synapse.transport.vfs.VFSTransportSender.acquireLockForSending(VFSTransportSender.java:396)
at org.apache.synapse.transport.vfs.VFSTransportSender.writeFile(VFSTransportSender.java:279)
at org.apache.synapse.transport.vfs.VFSTransportSender.sendMessage(VFSTransportSender.java:189)
at org.apache.axis2.transport.base.AbstractTransportSender.invoke(AbstractTransportSender.java:112)
at org.apache.axis2.engine.AxisEngine.send(AxisEngine.java:442)
at org.apache.axis2.description.OutOnlyAxisOperationClient.executeImpl(OutOnlyAxisOperation.java:297)
at org.apache.axis2.client.OperationClient.execute(OperationClient.java:149)
at org.apache.synapse.core.axis2.Axis2FlexibleMEPClient.send(Axis2FlexibleMEPClient.java:678)
at org.apache.synapse.core.axis2.Axis2Sender.sendOn(Axis2Sender.java:86)
at org.apache.synapse.core.axis2.Axis2SynapseEnvironment.send(Axis2SynapseEnvironment.java:586)
at org.apache.synapse.endpoints.AbstractEndpoint.send(AbstractEndpoint.java:409)
at org.apache.synapse.endpoints.AddressEndpoint.send(AddressEndpoint.java:77)
at org.apache.synapse.mediators.builtin.SendMediator.mediate(SendMediator.java:123)
at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:109)
at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:71)
at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:158)
at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:214)
at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:109)
at org.apache.synapse.mediators.eip.splitter.CloneMediator.mediate(CloneMediator.java:213)
at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:249)
at org.apache.synapse.core.axis2.Axis2SynapseEnvironment.mediateFromContinuationStateStack(Axis2SynapseEnvironment.java:820)
at org.apache.synapse.core.axis2.Axis2SynapseEnvironment.injectMessage(Axis2SynapseEnvironment.java:322)
at org.apache.synapse.core.axis2.SynapseCallbackReceiver.handleMessage(SynapseCallbackReceiver.java:608)
at org.apache.synapse.core.axis2.SynapseCallbackReceiver.receive(SynapseCallbackReceiver.java:207)
at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
at org.apache.synapse.transport.passthru.ClientWorker.run(ClientWorker.java:298)
at org.apache.axis2.transport.base.threads.NativeWorkerPool$1.run(NativeWorkerPool.java:172)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
WARN {VFSTransportSender} - Couldn't get the lock for the file : sftp://{sftp_path}/abc.json, retry : 1 scheduled after : 30000
(Error - 2)
ERROR {VFSUtils} - Cannot get the lock for the file : sftp://{sftp_path}/abc.json before processing org.apache.commons.vfs2.FileSystemException: Could not write to "sftp://{sftp_path}/xyz.json.lock".
at org.apache.commons.vfs2.provider.AbstractFileObject.getOutputStream(AbstractFileObject.java:1197)
at org.apache.commons.vfs2.provider.DefaultFileContent.getOutputStream(DefaultFileContent.java:413)
at org.apache.commons.vfs2.provider.DefaultFileContent.getOutputStream(DefaultFileContent.java:392)
at org.apache.synapse.commons.vfs.VFSUtils.createLockFile(VFSUtils.java:262)
at org.apache.synapse.commons.vfs.VFSUtils.acquireLock(VFSUtils.java:220)
at org.apache.synapse.commons.vfs.VFSUtils.acquireLock(VFSUtils.java:175)
at org.apache.synapse.transport.vfs.VFSTransportSender.acquireLockForSending(VFSTransportSender.java:396)
at org.apache.synapse.transport.vfs.VFSTransportSender.writeFile(VFSTransportSender.java:279)
at org.apache.synapse.transport.vfs.VFSTransportSender.sendMessage(VFSTransportSender.java:189)
at org.apache.axis2.transport.base.AbstractTransportSender.invoke(AbstractTransportSender.java:112)
at org.apache.axis2.engine.AxisEngine.send(AxisEngine.java:442)
at org.apache.axis2.description.OutOnlyAxisOperationClient.executeImpl(OutOnlyAxisOperation.java:297)
at org.apache.axis2.client.OperationClient.execute(OperationClient.java:149)
at org.apache.synapse.core.axis2.Axis2FlexibleMEPClient.send(Axis2FlexibleMEPClient.java:678)
at org.apache.synapse.core.axis2.Axis2Sender.sendOn(Axis2Sender.java:86)
at org.apache.synapse.core.axis2.Axis2SynapseEnvironment.send(Axis2SynapseEnvironment.java:586)
at org.apache.synapse.endpoints.AbstractEndpoint.send(AbstractEndpoint.java:409)
at org.apache.synapse.endpoints.AddressEndpoint.send(AddressEndpoint.java:77)
at org.apache.synapse.mediators.builtin.SendMediator.mediate(SendMediator.java:123)
at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:109)
at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:71)
at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:158)
at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:214)
at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:109)
at org.apache.synapse.mediators.eip.splitter.CloneMediator.mediate(CloneMediator.java:213)
at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:249)
at org.apache.synapse.core.axis2.Axis2SynapseEnvironment.mediateFromContinuationStateStack(Axis2SynapseEnvironment.java:820)
at org.apache.synapse.core.axis2.Axis2SynapseEnvironment.injectMessage(Axis2SynapseEnvironment.java:322)
at org.apache.synapse.core.axis2.SynapseCallbackReceiver.handleMessage(SynapseCallbackReceiver.java:608)
at org.apache.synapse.core.axis2.SynapseCallbackReceiver.receive(SynapseCallbackReceiver.java:207)
at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
at org.apache.synapse.transport.passthru.ClientWorker.run(ClientWorker.java:298)
at org.apache.axis2.transport.base.threads.NativeWorkerPool$1.run(NativeWorkerPool.java:172)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: 4:
at com.jcraft.jsch.ChannelSftp.put(ChannelSftp.java:882)
at com.jcraft.jsch.ChannelSftp.put(ChannelSftp.java:709)
at com.jcraft.jsch.ChannelSftp.put(ChannelSftp.java:703)
at org.apache.commons.vfs2.provider.sftp.SftpFileObject.doGetOutputStream(SftpFileObject.java:519)
at org.apache.commons.vfs2.provider.AbstractFileObject.getOutputStream(AbstractFileObject.java:1193)
... 35 more
Caused by: java.io.IOException: Pipe closed
at java.base/java.io.PipedInputStream.read(PipedInputStream.java:307)
at java.base/java.io.PipedInputStream.read(PipedInputStream.java:377)
at com.jcraft.jsch.ChannelSftp.fill(ChannelSftp.java:2909)
at com.jcraft.jsch.ChannelSftp.header(ChannelSftp.java:2935)
at com.jcraft.jsch.ChannelSftp.put(ChannelSftp.java:757)
... 39 more
(Error - 3)
ERROR {VFSUtils} - Couldn't release the lock for the file : sftp://{sftp_path}/def.json after processing
ERROR {VFSTransportSender} - IO Error while creating response file : sftp://{sftp_path}/def.json org.apache.commons.vfs2.FileSystemException: Could not write to "sftp://{sftp_path}/def.json".
at org.apache.commons.vfs2.provider.AbstractFileObject.getOutputStream(AbstractFileObject.java:1197)
at org.apache.commons.vfs2.provider.DefaultFileContent.getOutputStream(DefaultFileContent.java:413)
at org.apache.synapse.transport.vfs.VFSTransportSender.populateResponseFile(VFSTransportSender.java:352)
at org.apache.synapse.transport.vfs.VFSTransportSender.writeFile(VFSTransportSender.java:280)
at org.apache.synapse.transport.vfs.VFSTransportSender.sendMessage(VFSTransportSender.java:189)
at org.apache.axis2.transport.base.AbstractTransportSender.invoke(AbstractTransportSender.java:112)
at org.apache.axis2.engine.AxisEngine.send(AxisEngine.java:442)
at org.apache.axis2.description.OutOnlyAxisOperationClient.executeImpl(OutOnlyAxisOperation.java:297)
at org.apache.axis2.client.OperationClient.execute(OperationClient.java:149)
at org.apache.synapse.core.axis2.Axis2FlexibleMEPClient.send(Axis2FlexibleMEPClient.java:678)
at org.apache.synapse.core.axis2.Axis2Sender.sendOn(Axis2Sender.java:86)
at org.apache.synapse.core.axis2.Axis2SynapseEnvironment.send(Axis2SynapseEnvironment.java:586)
at org.apache.synapse.endpoints.AbstractEndpoint.send(AbstractEndpoint.java:409)
at org.apache.synapse.endpoints.AddressEndpoint.send(AddressEndpoint.java:77)
at org.apache.synapse.mediators.builtin.SendMediator.mediate(SendMediator.java:123)
at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:109)
at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:71)
at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:158)
at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:214)
at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:109)
at org.apache.synapse.mediators.eip.splitter.CloneMediator.mediate(CloneMediator.java:213)
at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:249)
at org.apache.synapse.core.axis2.Axis2SynapseEnvironment.mediateFromContinuationStateStack(Axis2SynapseEnvironment.java:820)
at org.apache.synapse.core.axis2.Axis2SynapseEnvironment.injectMessage(Axis2SynapseEnvironment.java:322)
at org.apache.synapse.core.axis2.SynapseCallbackReceiver.handleMessage(SynapseCallbackReceiver.java:608)
at org.apache.synapse.core.axis2.SynapseCallbackReceiver.receive(SynapseCallbackReceiver.java:207)
at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
at org.apache.synapse.transport.passthru.ClientWorker.run(ClientWorker.java:298)
at org.apache.axis2.transport.base.threads.NativeWorkerPool$1.run(NativeWorkerPool.java:172)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: 4:
at com.jcraft.jsch.ChannelSftp.put(ChannelSftp.java:882)
at com.jcraft.jsch.ChannelSftp.put(ChannelSftp.java:709)
at com.jcraft.jsch.ChannelSftp.put(ChannelSftp.java:703)
at org.apache.commons.vfs2.provider.sftp.SftpFileObject.doGetOutputStream(SftpFileObject.java:519)
at org.apache.commons.vfs2.provider.AbstractFileObject.getOutputStream(AbstractFileObject.java:1193)
... 31 more
Caused by: java.io.IOException: Pipe closed
at java.base/java.io.PipedInputStream.read(PipedInputStream.java:307)
at java.base/java.io.PipedInputStream.read(PipedInputStream.java:377)
at com.jcraft.jsch.ChannelSftp.fill(ChannelSftp.java:2909)
at com.jcraft.jsch.ChannelSftp.header(ChannelSftp.java:2935)
at com.jcraft.jsch.ChannelSftp.put(ChannelSftp.java:757)
... 35 more
I'm reading from and writing to multiple files in a windows sftp.
Can this be an issue with my sftp server ? or can i make changes to the wso2 config files or add some property to the proxy that could fix this issue?
Looks to me like there's a permission issue on the ftp. I would first check those and if they are correct try with disabling the lock mechanism.
Not sure if it works if you set transport.vfs.Locking=false in the sftp url, another way could be set it to <property name="transport.vfs.Locking" scope="transport" value="false"> before you write to the ftp.
https://ei.docs.wso2.com/en/latest/micro-integrator/references/synapse-properties/transport-parameters/vfs-transport-parameters/
Hope that helps.

PySpark SparkSession error when trying to write parquet files to S3 bucket: org.apache.spark.SparkException: Task failed while writing rows

New to Spark and data engineering as a whole. I wrote a Spark application (on my local) that's meant to use Spark SQL to push parquet files to an S3 bucket. My code fails at this point in the file:
Config:
os.environ['AWS_ACCESS_KEY_ID'] = config['AWS']['AWS_ACCESS_KEY_ID']
os.environ['AWS_SECRET_ACCESS_KEY'] = config['AWS']['AWS_SECRET_ACCESS_KEY']
spark = SparkSession.builder\
.config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.2.0,com.amazonaws:aws-java-sdk:1.12.369")\
.config('spark.hadoop.fs.s3a.aws.credentials.provider', 'org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider')\
.getOrCreate()
Part of the code that fails (where calendar_dim, etc are spark.sql dataframes):
output_path = "s3a://i94immigration/"
calendar_dim.write.parquet(output_path + "calendar_dim", partitionBy=['year', 'month', 'week'], mode="overwrite")
us_demographics_dim.write.parquet(output_path + "us_demographics_dim", partitionBy='state_code', mode="overwrite")
us_airport_dim.write.parquet(output_path + "us_airport_dim", mode="overwrite")
country_dim.write.parquet(output_path + "country_dim", mode="overwrite")
immigration_fact.write.parquet(output_path + "immigration_fact", mode="overwrite")
The full error:
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o865.parquet.
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:651)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:278)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:186)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:116)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:860)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:390)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:363)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:239)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:793)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 12.0 failed 1 times, most recent failure: Lost task 0.0 in stage 12.0 (TID 53) (192.168.0.22 executor driver): org.apache.spark.SparkException: Task failed while writing rows.
at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:655)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:348)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$21(FileFormatWriter.scala:256)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V
at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:772)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1175)
at org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74)
at org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:329)
at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:482)
at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:420)
at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:409)
at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:36)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:155)
at org.apache.spark.sql.execution.datasources.BaseDynamicPartitionDataWriter.renewCurrentWriter(FileFormatDataWriter.scala:298)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataSingleWriter.write(FileFormatDataWriter.scala:365)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:85)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:92)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:331)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1538)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:338)
... 9 more
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2672)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2608)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2607)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2607)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1182)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1182)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1182)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2860)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2228)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:245)
... 42 more
Caused by: org.apache.spark.SparkException: Task failed while writing rows.
at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:655)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:348)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$21(FileFormatWriter.scala:256)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Lcom/google/common/util/concurrent/ListeningExecutorService;IZ)V
at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:772)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1175)
at org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74)
at org.apache.parquet.hadoop.ParquetFileWriter.<init>(ParquetFileWriter.java:329)
at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:482)
at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:420)
at org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:409)
at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.<init>(ParquetOutputWriter.scala:36)
at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:155)
at org.apache.spark.sql.execution.datasources.BaseDynamicPartitionDataWriter.renewCurrentWriter(FileFormatDataWriter.scala:298)
at org.apache.spark.sql.execution.datasources.DynamicPartitionDataSingleWriter.write(FileFormatDataWriter.scala:365)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:85)
at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:92)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:331)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1538)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:338)
... 9 more
I can confirm I am able to read and write simple csv/txt files to the S3 bucket. Also, the parquet worked objects are created normally when writing to my local storage. I've done a lot of googling/reading and I'm not sure where I'm going wrong here. Would love some guidance.
It turns out there was a dependency issue with the Hadoop version I was using. Changing the hadoop-aws version to 3.2.2 solved the issue for me.
spark = SparkSession.builder\
.config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.2.2,com.amazonaws:aws-java-sdk:1.12.369")\
.config('spark.hadoop.fs.s3a.aws.credentials.provider', 'org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider')\
.getOrCreate()

AWS glue job failing when running for huge size data

I am reading bunch of gz files from S3 bucket and doing some transformations,post that writing the transformed data to S3 in parquet format. I am not getting the error when I am executing for smaller number of files.But when the data is getting huge Below is the error.Even if changing the number of DPU while job execution , the error remains the same.
18/11/23 04:54:32 INFO MultipartUploadOutputStream: close closed:false
s3://path to s3 bucket/part-xxx.snappy.parquet
18/11/23 04:54:32 ERROR FileFormatWriter: Job job_xxx_0017 aborted.
18/11/23 04:54:32 ERROR Executor: Exception in task 154.1 in stage 17.0 (TID 4186)
org.apache.spark.SparkException: Task failed while writing rows
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sq l$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:270)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$an onfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:189)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:188)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:509)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:402)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:539)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:502)
at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)
at org.apache.spark.mapred.SparkHadoopMap
RedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:171)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:254)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:259)
... 8 more
18/11/23 04:54:32 ERROR Utils: Aborting task
java.lang.NullPointerException
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:509)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:402)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:539)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:502)
at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)
at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:171)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:254)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:259)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:189)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:188)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/11/23 04:54:32 ERROR Utils: Aborting task
java.lang.NullPointerExc
eption
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:509)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:402)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:539)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:502)
at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)
at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:171)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:254)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:259)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:189)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:188)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/11/23 04:54:32 INFO SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.parquet.hadoop.ParquetOutputCommitter
AWS glue has the proper permissions to write data to the target directories.
Any help will be much appreciated.

SolrCloud connection timeout pool

I have solr cloud cluster and it is throwing the following exception, it is happening when solr does something with the replicas
null:org.apache.solr.common.SolrException: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://10.14.0.72:8080/solr/IE_big_index_shard1_0_replica2
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:407)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2033)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:652)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:460)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:184)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:518)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://10.14.0.72:8080/solr/IE_big_index_shard1_0_replica2
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:591)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:198)
at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:163)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
... 1 more
Caused by: org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
at org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:226)
at org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:195)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:423)
at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:482)
... 12 more

WSO2 BAM 2.1.0 installation

I have installed WSO2 Bam 2.1.0 to my local environment and i have redirect my logs to bam. I could see my logs on cassandra however when i try to execute service_stats on bam i get exception. You could see the errors at the below.
Do you have any suggestion?
[2013-01-22 12:33:53,659] ERROR {org.wso2.carbon.rssmanager.core.service.RSSAdmin} - Error occurred while retrieving the database list of the tenant 'carbon.super'
org.wso2.carbon.rssmanager.core.RSSManagerException: Error occurred while retrieving all databases
at org.wso2.carbon.rssmanager.core.internal.dao.RSSDAOImpl.getAllDatabases(RSSDAOImpl.java:291)
at org.wso2.carbon.rssmanager.core.internal.manager.RSSManager.getDatabases(RSSManager.java:146)
at org.wso2.carbon.rssmanager.core.service.RSSAdmin.getDatabases(RSSAdmin.java:101)
at org.wso2.carbon.rssmanager.core.service.RSSAdmin.getDatabasesForTenant(RSSAdmin.java:336)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.axis2.rpc.receivers.RPCUtil.invokeServiceClass(RPCUtil.java:212)
at org.apache.axis2.rpc.receivers.RPCMessageReceiver.invokeBusinessLogic(RPCMessageReceiver.java:117)
at org.apache.axis2.receivers.AbstractInOutMessageReceiver.invokeBusinessLogic(AbstractInOutMessageReceiver.java:40)
at org.apache.axis2.receivers.AbstractMessageReceiver.receive(AbstractMessageReceiver.java:110)
at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
at org.apache.axis2.transport.http.HTTPTransportUtils.processHTTPPostRequest(HTTPTransportUtils.java:172)
at org.apache.axis2.transport.http.AxisServlet.doPost(AxisServlet.java:146)
at org.wso2.carbon.core.transports.CarbonServlet.doPost(CarbonServlet.java:231)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:641)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at org.eclipse.equinox.http.servlet.internal.ServletRegistration.handleRequest(ServletRegistration.java:90)
at org.eclipse.equinox.http.servlet.internal.ProxyServlet.processAlias(ProxyServlet.java:111)
at org.eclipse.equinox.http.servlet.internal.ProxyServlet.service(ProxyServlet.java:67)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
at org.wso2.carbon.tomcat.ext.servlet.DelegationServlet.service(DelegationServlet.java:68)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.wso2.carbon.tomcat.ext.filter.CharacterSetFilter.doFilter(CharacterSetFilter.java:61)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.wso2.carbon.tomcat.ext.valves.CompositeValve.invoke(CompositeValve.java:172)
at org.wso2.carbon.tomcat.ext.valves.CarbonStuckThreadDetectionValve.invoke(CarbonStuckThreadDetectionValve.java:156)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
at org.wso2.carbon.tomcat.ext.valves.CarbonContextCreatorValve.invoke(CarbonContextCreatorValve.java:52)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1653)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: org.h2.jdbc.JdbcSQLException: Column "D.ID" not found; SQL statement:
SELECT d.ID AS DATABASE_ID, d.NAME, d.TENANT_ID, s.NAME AS RSS_INSTANCE_NAME, s.SERVER_URL, s.TENANT_ID AS RSS_INSTANCE_TENANT_ID, d.TYPE FROM RM_SERVER_INSTANCE s, RM_DATABASE d WHERE s.ID = d.RSS_INSTANCE_ID AND d.TENANT_ID = ? [42122-140]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:327)
at org.h2.message.DbException.get(DbException.java:167)
at org.h2.message.DbException.get(DbException.java:144)
at org.h2.expression.ExpressionColumn.optimize(ExpressionColumn.java:127)
at org.h2.expression.Alias.optimize(Alias.java:47)
at org.h2.command.dml.Select.prepare(Select.java:738)
at org.h2.command.Parser.prepare(Parser.java:202)
at org.h2.command.Parser.prepareCommand(Parser.java:214)
at org.h2.engine.Session.prepareLocal(Session.java:434)
at org.h2.engine.Session.prepareCommand(Session.java:384)
at org.h2.jdbc.JdbcConnection.prepareCommand(JdbcConnection.java:1071)
at org.h2.jdbc.JdbcPreparedStatement.(JdbcPreparedStatement.java:71)
at org.h2.jdbc.JdbcConnection.prepareStatement(JdbcConnection.java:234)
at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.tomcat.jdbc.pool.ProxyConnection.invoke(ProxyConnection.java:126)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:99)
at org.wso2.carbon.ndatasource.rdbms.ConnectionRollbackOnReturnInterceptor.invoke(ConnectionRollbackOnReturnInterceptor.java:51)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:99)
at org.apache.tomcat.jdbc.pool.interceptor.AbstractCreateStatementInterceptor.invoke(AbstractCreateStatementInterceptor.java:67)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:99)
at org.apache.tomcat.jdbc.pool.interceptor.ConnectionState.invoke(ConnectionState.java:153)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:99)
at org.wso2.carbon.ndatasource.rdbms.ConnectionRollbackOnReturnInterceptor.invoke(ConnectionRollbackOnReturnInterceptor.java:51)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:99)
at org.apache.tomcat.jdbc.pool.interceptor.AbstractCreateStatementInterceptor.invoke(AbstractCreateStatementInterceptor.java:67)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:99)
at org.apache.tomcat.jdbc.pool.interceptor.ConnectionState.invoke(ConnectionState.java:153)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:99)
at org.apache.tomcat.jdbc.pool.TrapException.invoke(TrapException.java:41)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:99)
at org.apache.tomcat.jdbc.pool.DisposableConnectionFacade.invoke(DisposableConnectionFacade.java:63)
at $Proxy20.prepareStatement(Unknown Source)
at org.wso2.carbon.rssmanager.core.internal.dao.RSSDAOImpl.getAllDatabases(RSSDAOImpl.java:277)
... 44 more
[2013-01-22 12:33:53,674] ERROR {org.wso2.carbon.analytics.hive.multitenancy.HiveRSSMetastoreManager} - Error initializing tenant Hive meta store..
org.wso2.carbon.rssmanager.ui.stub.RSSAdminRSSManagerExceptionException: RSSAdminRSSManagerExceptionException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
at java.lang.Class.newInstance0(Class.java:372)
at java.lang.Class.newInstance(Class.java:325)
at org.wso2.carbon.rssmanager.ui.stub.RSSAdminStub.getDatabasesForTenant(RSSAdminStub.java:819)
at org.wso2.carbon.analytics.hive.multitenancy.HiveRSSMetastoreManager.getHiveMetaDatabase(HiveRSSMetastoreManager.java:259)
at org.wso2.carbon.analytics.hive.multitenancy.HiveRSSMetastoreManager.getRSSMetaStore(HiveRSSMetastoreManager.java:236)
at org.wso2.carbon.analytics.hive.multitenancy.HiveRSSMetastoreManager.prepareRSSMetaStore(HiveRSSMetastoreManager.java:212)
at org.wso2.carbon.hive.data.source.access.util.DataSourceAccessUtil.getMetaStoreConnectionURL(DataSourceAccessUtil.java:85)
at org.wso2.carbon.hive.data.source.access.util.DataSourceAccessUtil.getJdoConnectionUrl(DataSourceAccessUtil.java:98)
at org.apache.hadoop.hive.metastore.HiveContext.getCurrentContext(HiveContext.java:104)
at org.apache.hadoop.hive.jdbc.HiveConnection.(HiveConnection.java:63)
at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:104)
at java.sql.DriverManager.getConnection(DriverManager.java:620)
at java.sql.DriverManager.getConnection(DriverManager.java:200)
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl$ScriptCallable.call(HiveExecutorServiceImpl.java:234)
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl$ScriptCallable.call(HiveExecutorServiceImpl.java:217)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
[2013-01-22 12:33:53,678] ERROR {org.wso2.carbon.analytics.hive.multitenancy.HiveRSSMetastoreManager} - Error while retrieving setting the hive meta store for tenant domain:carbon.super
[2013-01-22 12:33:53,680] ERROR {org.apache.hadoop.hive.metastore.HiveContext} - Unable to fetch the JDO Connection URL for meta store..
java.lang.NullPointerException
at org.wso2.carbon.analytics.hive.multitenancy.HiveRSSMetastoreManager.getMetaDataStoreConnectionURL(HiveRSSMetastoreManager.java:275)
at org.wso2.carbon.hive.data.source.access.util.DataSourceAccessUtil.getMetaStoreConnectionURL(DataSourceAccessUtil.java:89)
at org.wso2.carbon.hive.data.source.access.util.DataSourceAccessUtil.getJdoConnectionUrl(DataSourceAccessUtil.java:98)
at org.apache.hadoop.hive.metastore.HiveContext.getCurrentContext(HiveContext.java:104)
at org.apache.hadoop.hive.jdbc.HiveConnection.(HiveConnection.java:63)
at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:104)
at java.sql.DriverManager.getConnection(DriverManager.java:620)
at java.sql.DriverManager.getConnection(DriverManager.java:200)
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl$ScriptCallable.call(HiveExecutorServiceImpl.java:234)
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl$ScriptCallable.call(HiveExecutorServiceImpl.java:217)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
[2013-01-22 12:33:53,682] ERROR {org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl} - Error during query execution..
java.util.concurrent.ExecutionException: java.lang.RuntimeException: Unable to fetch the JDO Connection URL for meta store..
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
at java.util.concurrent.FutureTask.get(FutureTask.java:111)
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl.execute(HiveExecutorServiceImpl.java:91)
at org.wso2.carbon.analytics.hive.task.HiveScriptExecutorTask.execute(HiveScriptExecutorTask.java:60)
at org.wso2.carbon.ntask.core.impl.TaskQuartzJobAdapter.execute(TaskQuartzJobAdapter.java:56)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.lang.RuntimeException: Unable to fetch the JDO Connection URL for meta store..
at org.apache.hadoop.hive.metastore.HiveContext.handleException(HiveContext.java:265)
at org.apache.hadoop.hive.metastore.HiveContext.getCurrentContext(HiveContext.java:113)
at org.apache.hadoop.hive.jdbc.HiveConnection.(HiveConnection.java:63)
at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:104)
at java.sql.DriverManager.getConnection(DriverManager.java:620)
at java.sql.DriverManager.getConnection(DriverManager.java:200)
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl$ScriptCallable.call(HiveExecutorServiceImpl.java:234)
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl$ScriptCallable.call(HiveExecutorServiceImpl.java:217)
... 5 more
Caused by: java.lang.NullPointerException
at org.wso2.carbon.analytics.hive.multitenancy.HiveRSSMetastoreManager.getMetaDataStoreConnectionURL(HiveRSSMetastoreManager.java:275)
at org.wso2.carbon.hive.data.source.access.util.DataSourceAccessUtil.getMetaStoreConnectionURL(DataSourceAccessUtil.java:89)
at org.wso2.carbon.hive.data.source.access.util.DataSourceAccessUtil.getJdoConnectionUrl(DataSourceAccessUtil.java:98)
at org.apache.hadoop.hive.metastore.HiveContext.getCurrentContext(HiveContext.java:104)
... 11 more
[2013-01-22 12:33:53,684] ERROR {org.wso2.carbon.analytics.hive.task.HiveScriptExecutorTask} - Error while executing script : service_stats_336
org.wso2.carbon.analytics.hive.exception.HiveExecutionException: Error during query execution..
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl.execute(HiveExecutorServiceImpl.java:97)
at org.wso2.carbon.analytics.hive.task.HiveScriptExecutorTask.execute(HiveScriptExecutorTask.java:60)
at org.wso2.carbon.ntask.core.impl.TaskQuartzJobAdapter.execute(TaskQuartzJobAdapter.java:56)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Unable to fetch the JDO Connection URL for meta store..
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
at java.util.concurrent.FutureTask.get(FutureTask.java:111)
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl.execute(HiveExecutorServiceImpl.java:91)
... 9 more
Caused by: java.lang.RuntimeException: Unable to fetch the JDO Connection URL for meta store..
at org.apache.hadoop.hive.metastore.HiveContext.handleException(HiveContext.java:265)
at org.apache.hadoop.hive.metastore.HiveContext.getCurrentContext(HiveContext.java:113)
at org.apache.hadoop.hive.jdbc.HiveConnection.(HiveConnection.java:63)
at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:104)
at java.sql.DriverManager.getConnection(DriverManager.java:620)
at java.sql.DriverManager.getConnection(DriverManager.java:200)
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl$ScriptCallable.call(HiveExecutorServiceImpl.java:234)
at org.wso2.carbon.analytics.hive.impl.HiveExecutorServiceImpl$ScriptCallable.call(HiveExecutorServiceImpl.java:217)
... 5 more
Caused by: java.lang.NullPointerException
at org.wso2.carbon.analytics.hive.multitenancy.HiveRSSMetastoreManager.getMetaDataStoreConnectionURL(HiveRSSMetastoreManager.java:275)
at org.wso2.carbon.hive.data.source.access.util.DataSourceAccessUtil.getMetaStoreConnectionURL(DataSourceAccessUtil.java:89)
at org.wso2.carbon.hive.data.source.access.util.DataSourceAccessUtil.getJdoConnectionUrl(DataSourceAccessUtil.java:98)
at org.apache.hadoop.hive.metastore.HiveContext.getCurrentContext(HiveContext.java:104)
... 11 more
Can you please put the complete exception trace? And Have you add tenants to BAM and are you running on tenant mode? What is the port offset you are using here?
You may need to update the $BAM_HOME/repository/conf/advanced/hive-rss-config.xml with correct details. as explained below:
rss-server-url - you can use the same BAM server/ external BAM server/ Storage server for this. Please put the correct server you are going to use for RSS. If you are going to just run on the standalone mode, then https://127.0.0.1:9443+<port-offset>/ will work.
rss-server-admin-userName - SuperAdmin/superadmin user with all privileges username.
rss-server-admin-password - SuperAdmin/superadmin user with all privileges password.
Please restart the server to get the changes reflected.