How to store spark pair rdd as file into HDFS ? - hdfs

Hi, I created a kafka topic with 3 partitions and 2 replicas. I try to publish messages/records from kafka to spark streaming (for some process), then store data into HDFS. I tried to store pair RDD as text file, but is not working.
this code is not working,
JavaPairInputDStream<String, String> directKafkaStream = KafkaUtils
.createDirectStream(ssc, String.class, String.class,
StringDecoder.class, StringDecoder.class, kafkaParams,
topics);
directKafkaStream.foreachRDD(rdd -> {
if(!rdd.isEmpty()){
rdd.saveAsTextFile(path);
}
}
);
console output:
17/01/09 17:25:39 INFO KafkaRDD: Computing topic filebeat, partition 1 offsets 20 -> 32
17/01/09 17:25:39 INFO VerifiableProperties: Verifying properties
17/01/09 17:25:39 INFO VerifiableProperties: Property group.id is overridden to
17/01/09 17:25:39 INFO VerifiableProperties: Property zookeeper.connect is overridden to localhost:2181
17/01/09 17:25:39 INFO KafkaRDD: Computing topic filebeat, partition 0 offsets 22 -> 34
17/01/09 17:25:39 INFO VerifiableProperties: Verifying properties
17/01/09 17:25:39 INFO VerifiableProperties: Property group.id is overridden to
17/01/09 17:25:39 INFO VerifiableProperties: Property zookeeper.connect is overridden to localhost:2181
17/01/09 17:25:40 INFO JobScheduler: Added jobs for time 1483979140000 ms
17/01/09 17:25:40 ERROR Utils: Aborting task
java.lang.NoClassDefFoundError: org/apache/kafka/common/message/KafkaLZ4BlockOutputStream
at kafka.message.ByteBufferMessageSet$.decompress(ByteBufferMessageSet.scala:65)
at kafka.message.ByteBufferMessageSet$$anon$1.makeNextOuter(ByteBufferMessageSet.scala:179)
at kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:192)
at kafka.message.ByteBufferMessageSet$$anon$1.makeNext(ByteBufferMessageSet.scala:146)
at kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66)
at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58)
at scala.collection.Iterator$$anon$18.hasNext(Iterator.scala:764)
at org.apache.spark.streaming.kafka.KafkaRDD$KafkaRDDIterator.getNext(KafkaRDD.scala:211)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply$mcV$sp(PairRDDFunctions.scala:1203)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply(PairRDDFunctions.scala:1203)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply(PairRDDFunctions.scala:1203)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1325)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1211)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1190)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.message.KafkaLZ4BlockOutputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 22 more
17/01/09 17:25:40 ERROR Utils: Aborting task
In fact my pom.xml
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-8_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>0.9.0.0</version>
</dependency>

Related

Unable to create indexes or communicate with AWS Open search using spring boot application

I want to integrate AWS elastic search (Open search) with my spring boot application. For this, I followed the tutorial at https://medium.com/#prasanth_rajendran/how-to-integrate-spring-boot-elasticsearch-data-with-aws-445e6fc72998 with slight changes.
The error that I am getting is Elasticsearch version 6 or more is required and I am not able to fix it.
Here are the code snippets from my application
The dependencies I used
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-elasticsearch</artifactId>
<version>1.12.244</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
<version>2.7.0</version>
</dependency>
<dependency>
<groupId>io.awspring.cloud</groupId>
<artifactId>spring-cloud-starter-aws</artifactId>
<version>2.4.1</version>
</dependency>
This is the parent pom dependency
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.7.0</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
The application.yml looks like this
server:
port: 8087
spring:
application:
name: AWS ELK Demo
management:
health:
elasticsearch:
enabled: false
cloud:
aws:
credentials:
access-key: my-access-key
secret-key: my-secret key
region:
static: eu-central-1
Configuration class for Rest High-Level Client
#Configuration
#EnableElasticsearchRepositories(basePackages = "com.example.awselkdemo.repository")
public class ElasticSearchRestClientConfiguration {
#Bean
public RestHighLevelClient client() {
final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY,
new UsernamePasswordCredentials("myuser", "myuser-password")
);
RestClientBuilder builder =
RestClient.builder(httpHost()).setHttpClientConfigCallback(
httpClientBuilder -> httpClientBuilder
.setDefaultCredentialsProvider(credentialsProvider)
);
return new RestHighLevelClient(builder);
}
#Bean
public ElasticsearchOperations elasticsearchTemplate() {
return new ElasticsearchRestTemplate(client());
}
private HttpHost httpHost() {
return new HttpHost(
"search-my-test-cluster-jashsah67ashas.eu-central-1.es.amazonaws.com",
443,
"https"
);
}
}
And the exception I get is
2022-06-22 09:40:07.726 INFO 1299 --- [ main] o.apache.catalina.core.StandardService : Stopping service [Tomcat]
2022-06-22 09:40:07.739 INFO 1299 --- [ main] ConditionEvaluationReportLoggingListener :
Error starting ApplicationContext. To display the conditions report re-run your application with 'debug' enabled.
2022-06-22 09:40:07.757 ERROR 1299 --- [ main] o.s.boot.SpringApplication : Application run failed
org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'photoRepository' defined in com.bosch.awselkdemo.repository.PhotoRepository defined in #EnableElasticsearchRepositories declared on ElasticSearchRestClientConfiguration: Invocation of init method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.data.elasticsearch.repository.support.SimpleElasticsearchRepository]: Constructor threw exception; nested exception is org.springframework.data.elasticsearch.UncategorizedElasticsearchException: Elasticsearch version 6 or more is required; nested exception is ElasticsearchException[Elasticsearch version 6 or more is required]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1804) ~[spring-beans-5.3.20.jar:5.3.20]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:620) ~[spring-beans-5.3.20.jar:5.3.20]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:542) ~[spring-beans-5.3.20.jar:5.3.20]
at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:335) ~[spring-beans-5.3.20.jar:5.3.20]
at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:234) ~[spring-beans-5.3.20.jar:5.3.20]
at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:333) ~[spring-beans-5.3.20.jar:5.3.20]
at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208) ~[spring-beans-5.3.20.jar:5.3.20]
at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:934) ~[spring-beans-5.3.20.jar:5.3.20]
at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:918) ~[spring-context-5.3.20.jar:5.3.20]
at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:583) ~[spring-context-5.3.20.jar:5.3.20]
at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:147) ~[spring-boot-2.7.0.jar:2.7.0]
at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:734) ~[spring-boot-2.7.0.jar:2.7.0]
at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:408) ~[spring-boot-2.7.0.jar:2.7.0]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:308) ~[spring-boot-2.7.0.jar:2.7.0]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1306) ~[spring-boot-2.7.0.jar:2.7.0]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1295) ~[spring-boot-2.7.0.jar:2.7.0]
at com.bosch.awselkdemo.AwsElkDemoApplication.main(AwsElkDemoApplication.java:10) ~[classes/:na]
Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [org.springframework.data.elasticsearch.repository.support.SimpleElasticsearchRepository]: Constructor threw exception; nested exception is org.springframework.data.elasticsearch.UncategorizedElasticsearchException: Elasticsearch version 6 or more is required; nested exception is ElasticsearchException[Elasticsearch version 6 or more is required]
at org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:224) ~[spring-beans-5.3.20.jar:5.3.20]
at org.springframework.data.repository.core.support.RepositoryFactorySupport.lambda$instantiateClass$5(RepositoryFactorySupport.java:578) ~[spring-data-commons-2.7.0.jar:2.7.0]
at java.base/java.util.Optional.map(Optional.java:265) ~[na:na]
at org.springframework.data.repository.core.support.RepositoryFactorySupport.instantiateClass(RepositoryFactorySupport.java:578) ~[spring-data-commons-2.7.0.jar:2.7.0]
at org.springframework.data.repository.core.support.RepositoryFactorySupport.getTargetRepositoryViaReflection(RepositoryFactorySupport.java:543) ~[spring-data-commons-2.7.0.jar:2.7.0]
at org.springframework.data.elasticsearch.repository.support.ElasticsearchRepositoryFactory.getTargetRepository(ElasticsearchRepositoryFactory.java:74) ~[spring-data-elasticsearch-4.4.0.jar:4.4.0]
at org.springframework.data.repository.core.support.RepositoryFactorySupport.getRepository(RepositoryFactorySupport.java:324) ~[spring-data-commons-2.7.0.jar:2.7.0]
at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.lambda$afterPropertiesSet$5(RepositoryFactoryBeanSupport.java:322) ~[spring-data-commons-2.7.0.jar:2.7.0]
at org.springframework.data.util.Lazy.getNullable(Lazy.java:230) ~[spring-data-commons-2.7.0.jar:2.7.0]
at org.springframework.data.util.Lazy.get(Lazy.java:114) ~[spring-data-commons-2.7.0.jar:2.7.0]
at org.springframework.data.repository.core.support.RepositoryFactoryBeanSupport.afterPropertiesSet(RepositoryFactoryBeanSupport.java:328) ~[spring-data-commons-2.7.0.jar:2.7.0]
at org.springframework.data.elasticsearch.repository.support.ElasticsearchRepositoryFactoryBean.afterPropertiesSet(ElasticsearchRepositoryFactoryBean.java:69) ~[spring-data-elasticsearch-4.4.0.jar:4.4.0]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1863) ~[spring-beans-5.3.20.jar:5.3.20]
at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1800) ~[spring-beans-5.3.20.jar:5.3.20]
... 16 common frames omitted
Caused by: org.springframework.data.elasticsearch.UncategorizedElasticsearchException: Elasticsearch version 6 or more is required; nested exception is ElasticsearchException[Elasticsearch version 6 or more is required]
at org.springframework.data.elasticsearch.core.ElasticsearchExceptionTranslator.translateExceptionIfPossible(ElasticsearchExceptionTranslator.java:72) ~[spring-data-elasticsearch-4.4.0.jar:4.4.0]
at org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate.translateException(ElasticsearchRestTemplate.java:601) ~[spring-data-elasticsearch-4.4.0.jar:4.4.0]
at org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate.execute(ElasticsearchRestTemplate.java:584) ~[spring-data-elasticsearch-4.4.0.jar:4.4.0]
at org.springframework.data.elasticsearch.core.RestIndexTemplate.doExists(RestIndexTemplate.java:106) ~[spring-data-elasticsearch-4.4.0.jar:4.4.0]
at org.springframework.data.elasticsearch.core.AbstractIndexTemplate.exists(AbstractIndexTemplate.java:130) ~[spring-data-elasticsearch-4.4.0.jar:4.4.0]
at org.springframework.data.elasticsearch.repository.support.SimpleElasticsearchRepository.<init>(SimpleElasticsearchRepository.java:87) ~[spring-data-elasticsearch-4.4.0.jar:4.4.0]
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:na]
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[na:na]
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:na]
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) ~[na:na]
at org.springframework.beans.BeanUtils.instantiateClass(BeanUtils.java:211) ~[spring-beans-5.3.20.jar:5.3.20]
... 29 common frames omitted
Caused by: org.elasticsearch.ElasticsearchException: Elasticsearch version 6 or more is required
at org.elasticsearch.client.RestHighLevelClient.performClientRequest(RestHighLevelClient.java:2701) ~[elasticsearch-rest-high-level-client-7.17.3.jar:7.17.3]
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:2171) ~[elasticsearch-rest-high-level-client-7.17.3.jar:7.17.3]
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:2154) ~[elasticsearch-rest-high-level-client-7.17.3.jar:7.17.3]
at org.elasticsearch.client.IndicesClient.exists(IndicesClient.java:1279) ~[elasticsearch-rest-high-level-client-7.17.3.jar:7.17.3]
at org.springframework.data.elasticsearch.core.RestIndexTemplate.lambda$doExists$2(RestIndexTemplate.java:106) ~[spring-data-elasticsearch-4.4.0.jar:4.4.0]
at org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate.execute(ElasticsearchRestTemplate.java:582) ~[spring-data-elasticsearch-4.4.0.jar:4.4.0]
... 37 common frames omitted
Hitting the domain endpoint at AWS gives me this information
{
"name": "5a8d2c313919cc9hjh3287ehje872",
"cluster_name": "3873287387:my-test-cluster",
"cluster_uuid": "shas_asjasihhjdjhq",
"version": {
"distribution": "opensearch",
"number": "1.2.4",
"build_type": "tar",
"build_hash": "unknown",
"build_date": "2022-04-18T07:00:42.948341Z",
"build_snapshot": false,
"lucene_version": "8.10.1",
"minimum_wire_compatibility_version": "6.8.0",
"minimum_index_compatibility_version": "6.0.0-beta1"
},
"tagline": "The OpenSearch Project: https://opensearch.org/"
}
Looking forward to some suggestions.
Thanks
i had problems exactly like you. And i replace all my org.elasticsearch lib with new lib from org.opensearch then it worked!
<!-- https://mvnrepository.com/artifact/org.opensearch.client/opensearch-rest-high-level-client -->
<dependency>
<groupId>org.opensearch.client</groupId>
<artifactId>opensearch-rest-high-level-client</artifactId>
<version>1.2.4</version>
</dependency>
You can read Migrating to the OpenSearch Java high-level REST client for more information.
https://opensearch.org/docs/latest/clients/java-rest-high-level/

How to use Google Cloud Storage for checkpoint location in streaming query?

Im trying to run Spark Structured Streaming job and save checkpoint to Google Storage, I have a couple of jobs, one w/o aggregation works perfectly, but second with aggregations throw exception. I found that someone have similar issues with checkpointing on S3 because S3 doesn't support read after write semantics https://blog.yuvalitzchakov.com/improving-spark-streaming-checkpoint-performance-with-aws-efs/, but GS does and everything should be ok, I will be glad if anybody will share their experience with checkpointing.
val writeToKafka = stream.writeStream
.format("kafka")
.trigger(ProcessingTime(5000))
.option("kafka.bootstrap.servers", "localhost:29092")
.option("topic", "test_topic")
.option("checkpointLocation", "gs://test/check_test/Job1")
.start()
Executor task launch worker for task 1] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka version : 2.0.0
[Executor task launch worker for task 1] INFO org.apache.kafka.common.utils.AppInfoParser - Kafka commitId : 3402a8361b734732
[Executor task launch worker for task 1] INFO org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask - Committed partition 0 (task 1, attempt 0stage 1.0)
[Executor task launch worker for task 1] INFO org.apache.spark.sql.execution.streaming.CheckpointFileManager - Writing atomically to gs://test/check_test/Job1/state/0/0/1.delta using temp file gs://test/check_test/Job1/state/0/0/.1.delta.8a93d644-0d8e-4cb9-82b5-6418b9e63ffd.TID1.tmp
[Executor task launch worker for task 1] ERROR org.apache.spark.TaskContextImpl - Error in TaskCompletionListener
java.lang.NullPointerException
at com.google.cloud.hadoop.fs.gcs.GoogleHadoopOutputStream.write(GoogleHadoopOutputStream.java:114)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at net.jpountz.lz4.LZ4BlockOutputStream.finish(LZ4BlockOutputStream.java:261)
at net.jpountz.lz4.LZ4BlockOutputStream.close(LZ4BlockOutputStream.java:193)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
at org.apache.commons.io.IOUtils.closeQuietly(IOUtils.java:303)
at org.apache.commons.io.IOUtils.closeQuietly(IOUtils.java:274)
at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider.org$apache$spark$sql$execution$streaming$state$HDFSBackedStateStoreProvider$$cancelDeltaFile(HDFSBackedStateStoreProvider.scala:508)
at org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider$HDFSBackedStateStore.abort(HDFSBackedStateStoreProvider.scala:150)
at org.apache.spark.sql.execution.streaming.state.package$StateStoreOps$$anonfun$1$$anonfun$apply$1.apply(package.scala:65)
at org.apache.spark.sql.execution.streaming.state.package$StateStoreOps$$anonfun$1$$anonfun$apply$1.apply(package.scala:64)
at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:131)
at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117)
at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:117)
at org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:130)
at org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:128)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:128)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
at org.apache.spark.scheduler.Task.run(Task.scala:137)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[Executor task launch worker for task 1] ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 1.0 (TID 1)
org.apache.spark.util.TaskCompletionListenerException: null
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
at org.apache.spark.scheduler.Task.run(Task.scala:137)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[task-result-getter-1] WARN org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.apache.spark.util.TaskCompletionListenerException: null
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
at org.apache.spark.scheduler.Task.run(Task.scala:137)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[task-result-getter-1] ERROR org.apache.spark.scheduler.TaskSetManager - Task 0 in stage 1.0 failed 1 times; aborting job
[task-result-getter-1] INFO org.apache.spark.scheduler.TaskSchedulerImpl - Removed TaskSet 1.0, whose tasks have all completed, from pool
[dag-scheduler-event-loop] INFO org.apache.spark.scheduler.TaskSchedulerImpl - Cancelling stage 1
[dag-scheduler-event-loop] INFO org.apache.spark.scheduler.TaskSchedulerImpl - Killing all running tasks in stage 1: Stage cancelled
[dag-scheduler-event-loop] INFO org.apache.spark.scheduler.DAGScheduler - ResultStage 1 (start at Job1.scala:53) failed in 9.863 s due to Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.apache.spark.util.TaskCompletionListenerException: null
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
at org.apache.spark.scheduler.Task.run(Task.scala:137)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
[stream execution thread for [id = f130d772-fc9e-4b0f-a81e-942af0741ae9, runId = 7dc1cb33-c5f2-4ebe-8707-251de2503ee1]] INFO org.apache.spark.scheduler.DAGScheduler - Job 0 failed: start at Job1.scala:53, took 20.926657 s
[stream execution thread for [id = f130d772-fc9e-4b0f-a81e-942af0741ae9, runId = 7dc1cb33-c5f2-4ebe-8707-251de2503ee1]] ERROR org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec - Data source writer org.apache.spark.sql.execution.streaming.sources.MicroBatchWriter#228cec9e is aborting.
[stream execution thread for [id = f130d772-fc9e-4b0f-a81e-942af0741ae9, runId = 7dc1cb33-c5f2-4ebe-8707-251de2503ee1]] ERROR org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec - Data source writer org.apache.spark.sql.execution.streaming.sources.MicroBatchWriter#228cec9e aborted.
[stream execution thread for [id = f130d772-fc9e-4b0f-a81e-942af0741ae9, runId = 7dc1cb33-c5f2-4ebe-8707-251de2503ee1]] ERROR org.apache.spark.sql.execution.streaming.MicroBatchExecution - Query [id = f130d772-fc9e-4b0f-a81e-942af0741ae9, runId = 7dc1cb33-c5f2-4ebe-8707-251de2503ee1] terminated with error
org.apache.spark.SparkException: Writing job aborted.
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:92)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:296)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3384)
at org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2783)
at org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2783)
at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3365)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3364)
at org.apache.spark.sql.Dataset.collect(Dataset.scala:2783)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$17.apply(MicroBatchExecution.scala:537)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5.apply(MicroBatchExecution.scala:532)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:531)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.apache.spark.util.TaskCompletionListenerException: null
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
at org.apache.spark.scheduler.Task.run(Task.scala:137)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1875)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1874)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1874)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2108)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2057)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2046)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:64)
... 35 more
Caused by: org.apache.spark.util.TaskCompletionListenerException: null
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
at org.apache.spark.scheduler.Task.run(Task.scala:137)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "main" org.apache.spark.sql.streaming.StreamingQueryException: Writing job aborted.
=== Streaming Query ===
Identifier: [id = f130d772-fc9e-4b0f-a81e-942af0741ae9, runId = 7dc1cb33-c5f2-4ebe-8707-251de2503ee1]
Current Committed Offsets: {}
Current Available Offsets: {KafkaV2[Subscribe[NormalizedEvents]]: {"NormalizedEvents":{"0":46564}}}
Current State: ACTIVE
Thread State: RUNNABLE
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
Caused by: org.apache.spark.SparkException: Writing job aborted.
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:92)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:296)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3384)
at org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2783)
at org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2783)
at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3365)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3364)
at org.apache.spark.sql.Dataset.collect(Dataset.scala:2783)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$17.apply(MicroBatchExecution.scala:537)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5.apply(MicroBatchExecution.scala:532)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:531)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)
... 1 more
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.apache.spark.util.TaskCompletionListenerException: null
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
at org.apache.spark.scheduler.Task.run(Task.scala:137)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1887)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1875)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1874)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1874)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2108)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2057)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2046)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
at org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:64)
... 35 more
Caused by: org.apache.spark.util.TaskCompletionListenerException: null
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
at org.apache.spark.scheduler.Task.run(Task.scala:137)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[Thread-1] INFO org.apache.spark.SparkContext - Invoking stop() from shutdown hook
[Thread-1] INFO org.spark_project.jetty.server.AbstractConnector - Stopped Spark#1ce93c18{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}
[Thread-1] INFO org.apache.spark.ui.SparkUI - Stopped Spark web UI at http://10.25.12.222:4041
[dispatcher-event-loop-0] INFO org.apache.spark.MapOutputTrackerMasterEndpoint - MapOutputTrackerMasterEndpoint stopped!
[Thread-1] INFO org.apache.spark.storage.memory.MemoryStore - MemoryStore cleared
[Thread-1] INFO org.apache.spark.storage.BlockManager - BlockManager stopped
[Thread-1] INFO org.apache.spark.storage.BlockManagerMaster - BlockManagerMaster stopped
[dispatcher-event-loop-1] INFO org.apache.spark.scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint - OutputCommitCoordinator stopped!
[Thread-1] INFO org.apache.spark.SparkContext - Successfully stopped SparkContext
[Thread-1] INFO org.apache.spark.util.ShutdownHookManager - Shutdown hook called
[Thread-1] INFO org.apache.spark.util.ShutdownHookManager - Deleting directory /private/var/folders/_t/7m21x7313gs74_yfv4txsr69b8yh87/T/temporaryReader-75fdf46f-7de0-4ca7-9c77-8bd034e4f5a3
[Thread-1] INFO org.apache.spark.util.ShutdownHookManager - Deleting directory /private/var/folders/_t/7m21x7313gs74_yfv4txsr69b8yh87/T/spark-bde783f1-fa66-420f-87e7-5c1895ab7ccc
Spark Streaming jobs checkpointing to Google Cloud Storage was fixed. This fix will be included in GCS connector 2.1.4 and 2.2.0 releases.
You cannot use GCS as checkpoint store if you make aggregations in your stream, at least in version 2.1.3 (hadoop 2). It's perfectly fine if your transforms doesn't include any groupBy, but if that's the case, you should save your checkpoints in HDFS or something else.
I got the same issue trying to write a stream to GCS in Spark 2.4.4. There is no problem using GCS as writestream, but i got same null pointer exception when using GCS as checkpoint location. As I am running spark over Google Dataproc, i can use dataproc HDFS capabilities of the nodes.
I had to port a code from private cloud to gcs. After some these are the changes that I made in order to run the code
For gcs i setup for dual region and I setup the retention policy for it. (I know it's weird but I found this worked for me). Though I set it up for only one day. You can set up a lifecycle policy as well if you want.
I used OutputMode.Append instead of Update
I replaced agg with flapMapGroupWithState function.
For example here is the sample code
events.withWatermark(eventTime = "timestamp", delayThreshold = configs(waterMarkConst))
.groupBy("timestamp", "name").agg(expr("sum(count) as cnt")).select("timestamp", "name", "cnt").toDF().as[(Timestamp, String, Double)]
.map(record => M(record._2, record._3, record._1))
which was replaced by the following code:
events.withWatermark(eventTime = "timestamp", delayThreshold = configs(waterMarkConst))
.groupByKey(m => m._1 + "." + m._2)
.flatMapGroupsWithState(OutputMode.Append(), GroupStateTimeout.EventTimeTimeout())(updateSentMetricsAggregatedState)

WSO2 - Problems while trying to START WSO2

Following is the error when I try to start WSO2 server (on Ubuntu). Its a SQLite Database. Looking at the error, looks like it is DB related. What could have gone wrong? Kindly let me know some guidelines around how I can fix it :
testteam#gmcs:~/gauri/wso2am-2.1.0/bin$ bash wso2server.sh
JAVA_HOME environment variable is set to /usr/lib/jvm/java-8-oracle
CARBON_HOME environment variable is set to /home/testteam/gauri/wso2am-2.1.0
Using Java memory options: -Xms256m -Xmx1024m
[2017-08-21 11:50:23,345] INFO - QpidBundleActivator Setting BundleContext in PluginManager
[2017-08-21 11:50:25,532] INFO - CarbonCoreActivator Starting WSO2 Carbon...
[2017-08-21 11:50:25,532] INFO - CarbonCoreActivator Operating System : Linux 4.10.0-32-generic, amd64
[2017-08-21 11:50:25,532] INFO - CarbonCoreActivator Java Home : /usr/lib/jvm/java-8-oracle/jre
[2017-08-21 11:50:25,532] INFO - CarbonCoreActivator Java Version : 1.8.0_131
[2017-08-21 11:50:25,533] INFO - CarbonCoreActivator Java VM : Java HotSpot(TM) 64-Bit Server VM 25.131-b11,Oracle Corporation
[2017-08-21 11:50:25,533] INFO - CarbonCoreActivator Carbon Home : /home/testteam/gauri/wso2am-2.1.0
[2017-08-21 11:50:25,533] INFO - CarbonCoreActivator Java Temp Dir : /home/testteam/gauri/wso2am-2.1.0/tmp
[2017-08-21 11:50:25,533] INFO - CarbonCoreActivator User : testteam, en-IN, Asia/Kolkata
[2017-08-21 11:50:25,839] WARN - ValidationResultPrinter Carbon is configured to use the default keystore (wso2carbon.jks). To maximize security when deploying to a production environment, configure a new keystore with a unique password in the production server profile.
[2017-08-21 11:50:26,229] INFO - KafkaEventAdapterServiceDS Successfully deployed the Kafka output event adaptor service
[2017-08-21 11:50:26,370] INFO - ManagementModeConfigurationLoader CEP started in Single node mode
[2017-08-21 11:50:26,458] INFO - TemplateDeployerServiceTrackerDS Successfully deployed the execution manager tracker service
[2017-08-21 11:50:30,147] ERROR - DefaultRealm nullType class java.lang.reflect.InvocationTargetException
org.wso2.carbon.user.core.UserStoreException: nullType class java.lang.reflect.InvocationTargetException
at org.wso2.carbon.user.core.common.DefaultRealm.createObjectWithOptions(DefaultRealm.java:401)
at org.wso2.carbon.user.core.common.DefaultRealm.initializeObjects(DefaultRealm.java:222)
at org.wso2.carbon.user.core.common.DefaultRealm.init(DefaultRealm.java:127)
at org.wso2.carbon.user.core.common.DefaultRealmService.initializeRealm(DefaultRealmService.java:263)
at org.wso2.carbon.user.core.common.DefaultRealmService.<init>(DefaultRealmService.java:100)
at org.wso2.carbon.user.core.common.DefaultRealmService.<init>(DefaultRealmService.java:113)
at org.wso2.carbon.user.core.internal.Activator.startDeploy(Activator.java:68)
at org.wso2.carbon.user.core.internal.BundleCheckActivator.start(BundleCheckActivator.java:61)
at org.eclipse.osgi.framework.internal.core.BundleContextImpl$1.run(BundleContextImpl.java:711)
at java.security.AccessController.doPrivileged(Native Method)
at org.eclipse.osgi.framework.internal.core.BundleContextImpl.startActivator(BundleContextImpl.java:702)
at org.eclipse.osgi.framework.internal.core.BundleContextImpl.start(BundleContextImpl.java:683)
at org.eclipse.osgi.framework.internal.core.BundleHost.startWorker(BundleHost.java:381)
at org.eclipse.osgi.framework.internal.core.AbstractBundle.resume(AbstractBundle.java:390)
at org.eclipse.osgi.framework.internal.core.Framework.resumeBundle(Framework.java:1176)
at org.eclipse.osgi.framework.internal.core.StartLevelManager.resumeBundles(StartLevelManager.java:559)
at org.eclipse.osgi.framework.internal.core.StartLevelManager.resumeBundles(StartLevelManager.java:544)
at org.eclipse.osgi.framework.internal.core.StartLevelManager.incFWSL(StartLevelManager.java:457)
at org.eclipse.osgi.framework.internal.core.StartLevelManager.doSetStartLevel(StartLevelManager.java:243)
at org.eclipse.osgi.framework.internal.core.StartLevelManager.dispatchEvent(StartLevelManager.java:438)
at org.eclipse.osgi.framework.internal.core.StartLevelManager.dispatchEvent(StartLevelManager.java:1)
at org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.java:230)
at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.run(EventManager.java:340)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.wso2.carbon.user.core.common.DefaultRealm.createObjectWithOptions(DefaultRealm.java:355)
... 22 more
Caused by: org.wso2.carbon.user.core.UserStoreException: Error occurred while checking is existing domain : PRIMARY for tenant : -1234
at org.wso2.carbon.user.core.util.UserCoreUtil.persistDomain(UserCoreUtil.java:839)
at org.wso2.carbon.user.core.common.AbstractUserStoreManager.persistDomain(AbstractUserStoreManager.java:4064)
at org.wso2.carbon.user.core.jdbc.JDBCUserStoreManager.<init>(JDBCUserStoreManager.java:280)
at org.wso2.carbon.user.core.jdbc.JDBCUserStoreManager.<init>(JDBCUserStoreManager.java:222)
... 27 more
Caused by: org.wso2.carbon.user.core.UserStoreException: DB error occurred while checking is existing domain : PRIMARY & tenant id : -1234
at org.wso2.carbon.user.core.util.UserCoreUtil.isExistingDomain(UserCoreUtil.java:991)
at org.wso2.carbon.user.core.util.UserCoreUtil.persistDomain(UserCoreUtil.java:827)
... 30 more
Caused by: org.h2.jdbc.JdbcSQLException: General error: "java.lang.RuntimeException: rowCount expected 7545 got 13 T90.I92" [50000-175]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:332)
at org.h2.message.DbException.get(DbException.java:161)
at org.h2.message.DbException.convert(DbException.java:284)
at org.h2.table.RegularTable.addRow(RegularTable.java:137)
at org.h2.store.PageStore.redo(PageStore.java:1535)
at org.h2.store.PageLog.recover(PageLog.java:318)
at org.h2.store.PageStore.recover(PageStore.java:1371)
at org.h2.store.PageStore.openExisting(PageStore.java:361)
at org.h2.store.PageStore.open(PageStore.java:285)
at org.h2.engine.Database.getPageStore(Database.java:2298)
at org.h2.engine.Database.open(Database.java:626)
at org.h2.engine.Database.openDatabase(Database.java:244)
at org.h2.engine.Database.<init>(Database.java:239)
at org.h2.engine.Engine.openSession(Engine.java:56)
at org.h2.engine.Engine.openSession(Engine.java:160)
at org.h2.engine.Engine.createSessionAndValidate(Engine.java:139)
at org.h2.engine.Engine.createSession(Engine.java:122)
at org.h2.engine.Engine.createSession(Engine.java:28)
at org.h2.engine.SessionRemote.connectEmbeddedOrServer(SessionRemote.java:323)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:105)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:90)
at org.h2.Driver.connect(Driver.java:73)
at org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:278)
at org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:182)
at org.apache.tomcat.jdbc.pool.ConnectionPool.createConnection(ConnectionPool.java:701)
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:635)
at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:188)
at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:127)
at org.wso2.carbon.user.core.util.DatabaseUtil.getDBConnection(DatabaseUtil.java:565)
at org.wso2.carbon.user.core.util.UserCoreUtil.isExistingDomain(UserCoreUtil.java:973)
... 31 more
Caused by: java.lang.RuntimeException: rowCount expected 7545 got 13 T90.I92
at org.h2.message.DbException.throwInternalError(DbException.java:231)
at org.h2.table.RegularTable.checkRowCount(RegularTable.java:168)
at org.h2.table.RegularTable.addRow(RegularTable.java:120)
... 57 more
[2017-08-21 11:50:30,165] ERROR - Activator Cannot start User Manager Core bundle
org.wso2.carbon.user.core.UserStoreException: Cannot initialize the realm.
at org.wso2.carbon.user.core.common.DefaultRealmService.initializeRealm(DefaultRealmService.java:273)
at org.wso2.carbon.user.core.common.DefaultRealmService.<init>(DefaultRealmService.java:100)
at org.wso2.carbon.user.core.common.DefaultRealmService.<init>(DefaultRealmService.java:113)
at org.wso2.carbon.user.core.internal.Activator.startDeploy(Activator.java:68)
at org.wso2.carbon.user.core.internal.BundleCheckActivator.start(BundleCheckActivator.java:61)
at org.eclipse.osgi.framework.internal.core.BundleContextImpl$1.run(BundleContextImpl.java:711)
at java.security.AccessController.doPrivileged(Native Method)
at org.eclipse.osgi.framework.internal.core.BundleContextImpl.startActivator(BundleContextImpl.java:702)
at org.eclipse.osgi.framework.internal.core.BundleContextImpl.start(BundleContextImpl.java:683)
at org.eclipse.osgi.framework.internal.core.BundleHost.startWorker(BundleHost.java:381)
at org.eclipse.osgi.framework.internal.core.AbstractBundle.resume(AbstractBundle.java:390)
at org.eclipse.osgi.framework.internal.core.Framework.resumeBundle(Framework.java:1176)
at org.eclipse.osgi.framework.internal.core.StartLevelManager.resumeBundles(StartLevelManager.java:559)
at org.eclipse.osgi.framework.internal.core.StartLevelManager.resumeBundles(StartLevelManager.java:544)
at org.eclipse.osgi.framework.internal.core.StartLevelManager.incFWSL(StartLevelManager.java:457)
at org.eclipse.osgi.framework.internal.core.StartLevelManager.doSetStartLevel(StartLevelManager.java:243)
at org.eclipse.osgi.framework.internal.core.StartLevelManager.dispatchEvent(StartLevelManager.java:438)
at org.eclipse.osgi.framework.internal.core.StartLevelManager.dispatchEvent(StartLevelManager.java:1)
at org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.java:230)
at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.run(EventManager.java:340)
Caused by: org.wso2.carbon.user.core.UserStoreException: nullType class java.lang.reflect.InvocationTargetException
at org.wso2.carbon.user.core.common.DefaultRealm.initializeObjects(DefaultRealm.java:322)
at org.wso2.carbon.user.core.common.DefaultRealm.init(DefaultRealm.java:127)
at org.wso2.carbon.user.core.common.DefaultRealmService.initializeRealm(DefaultRealmService.java:263)
... 19 more
Caused by: org.wso2.carbon.user.core.UserStoreException: nullType class java.lang.reflect.InvocationTargetException
at org.wso2.carbon.user.core.common.DefaultRealm.createObjectWithOptions(DefaultRealm.java:401)
at org.wso2.carbon.user.core.common.DefaultRealm.initializeObjects(DefaultRealm.java:222)
... 21 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.wso2.carbon.user.core.common.DefaultRealm.createObjectWithOptions(DefaultRealm.java:355)
... 22 more
Caused by: org.wso2.carbon.user.core.UserStoreException: Error occurred while checking is existing domain : PRIMARY for tenant : -1234
at org.wso2.carbon.user.core.util.UserCoreUtil.persistDomain(UserCoreUtil.java:839)
at org.wso2.carbon.user.core.common.AbstractUserStoreManager.persistDomain(AbstractUserStoreManager.java:4064)
at org.wso2.carbon.user.core.jdbc.JDBCUserStoreManager.<init>(JDBCUserStoreManager.java:280)
at org.wso2.carbon.user.core.jdbc.JDBCUserStoreManager.<init>(JDBCUserStoreManager.java:222)
... 27 more
Caused by: org.wso2.carbon.user.core.UserStoreException: DB error occurred while checking is existing domain : PRIMARY & tenant id : -1234
at org.wso2.carbon.user.core.util.UserCoreUtil.isExistingDomain(UserCoreUtil.java:991)
at org.wso2.carbon.user.core.util.UserCoreUtil.persistDomain(UserCoreUtil.java:827)
... 30 more
Caused by: org.h2.jdbc.JdbcSQLException: General error: "java.lang.RuntimeException: rowCount expected 7545 got 13 T90.I92" [50000-175]
at org.h2.message.DbException.getJdbcSQLException(DbException.java:332)
at org.h2.message.DbException.get(DbException.java:161)
at org.h2.message.DbException.convert(DbException.java:284)
at org.h2.table.RegularTable.addRow(RegularTable.java:137)
at org.h2.store.PageStore.redo(PageStore.java:1535)
at org.h2.store.PageLog.recover(PageLog.java:318)
at org.h2.store.PageStore.recover(PageStore.java:1371)
at org.h2.store.PageStore.openExisting(PageStore.java:361)
at org.h2.store.PageStore.open(PageStore.java:285)
at org.h2.engine.Database.getPageStore(Database.java:2298)
at org.h2.engine.Database.open(Database.java:626)
at org.h2.engine.Database.openDatabase(Database.java:244)
at org.h2.engine.Database.<init>(Database.java:239)
at org.h2.engine.Engine.openSession(Engine.java:56)
at org.h2.engine.Engine.openSession(Engine.java:160)
at org.h2.engine.Engine.createSessionAndValidate(Engine.java:139)
at org.h2.engine.Engine.createSession(Engine.java:122)
at org.h2.engine.Engine.createSession(Engine.java:28)
at org.h2.engine.SessionRemote.connectEmbeddedOrServer(SessionRemote.java:323)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:105)
at org.h2.jdbc.JdbcConnection.<init>(JdbcConnection.java:90)
at org.h2.Driver.connect(Driver.java:73)
at org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:278)
at org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:182)
at org.apache.tomcat.jdbc.pool.ConnectionPool.createConnection(ConnectionPool.java:701)
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:635)
at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:188)
at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:127)
at org.wso2.carbon.user.core.util.DatabaseUtil.getDBConnection(DatabaseUtil.java:565)
at org.wso2.carbon.user.core.util.UserCoreUtil.isExistingDomain(UserCoreUtil.java:973)
... 31 more
Caused by: java.lang.RuntimeException: rowCount expected 7545 got 13 T90.I92
at org.h2.message.DbException.throwInternalError(DbException.java:231)
at org.h2.table.RegularTable.checkRowCount(RegularTable.java:168)
at org.h2.table.RegularTable.addRow(RegularTable.java:120)
... 57 more
[2017-08-21 11:50:53,572] INFO - TaglibUriRule TLD skipped. URI: http://tiles.apache.org/tags-tiles is already defined
You are using API Manager 2.1 with SQLLite database. But, we haven't tested this database against this product. Also, from the logs, it seems you are using H2 database driver class.
Can you use SQLLite database driver class org.sqlite.JDBC and see?

How to resolve the HCat Error - ClassNotFound - HCatOutputFormat not found

How to resolve the error below, I have exported the Hcat-core.jar before running the code, kindly help
java.lang.ClassNotFoundException: Class org.apache.hive.hcatalog.mapreduce.HCatOutputFormat not found
Full Trace:
2016-07-28 20:12:48,465 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created MRAppMaster for application appattempt_1468985268798_44020_000002
2016-07-28 20:12:48,653 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-07-28 20:12:48,690 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Executing with tokens:
2016-07-28 20:12:48,690 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (appAttemptId { application_id { id: 44020 cluster_timestamp: 1468985268798 } attemptId: 2 } keyId: 618886960)
2016-07-28 20:12:48,811 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Using mapred newApiCommitter.
2016-07-28 20:12:49,675 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: OutputCommitter set in config null
2016-07-28 20:12:49,744 INFO [main] org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.mapreduce.v2.app.MRAppMaster failed in state INITED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hive.hcatalog.mapreduce.HCatOutputFormat not found
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hive.hcatalog.mapreduce.HCatOutputFormat not found
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:478)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:458)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1560)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:458)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:377)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1518)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1515)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1448) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hive.hcatalog.mapreduce.HCatOutputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2195)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:222)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:474)
... 11 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hive.hcatalog.mapreduce.HCatOutputFormat not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2193)
... 13 more

Error running Hadoop pipes Program: "Server failed to authenticate"

While trying to run a C++ program referring this ( link ) on my hadoop cluster. I got the error mentioned below.
I referred related posts (this) regarding this error, and tried tweaking my Makefile, but still i am unable to resolve this issue.
( I am using fedora c12, i am also not able to locate .configure file to add . )
My Makefile looks like this:
CC = g++
HADOOP_INSTALL = /home/hadoop/Desktop/Cloudera/hadoop-0.20.2-cdh3u1
SSL_INSTALL = /usr/include/openssl
PLATFORM = Linux-i386-32
CPPFLAGS = -m32 -I$(HADOOP_INSTALL)/c++/$(PLATFORM)/include -I$(SSL_INSTALL)
wordcount: wordcount.cpp
$(CC) $(CPPFLAGS) $< -Wall -Wextra -L$(SSL_INSTALL) -lssl -lcrypto -L$(HADOOP_INSTALL)/c++/$(PLATFORM)/lib -lhadooppipes \
-lhadooputils -lpthread -g -O2 -o $#
I am unable to figure out how to resolve this error.
Any help will be appreciated.
hadoop#01HW394491 Desktop]$ hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -input /user/hadoop/dtest -output /user/hadoop/dipeshtryWC -program /user/hadoop/dbin/wordcount
11/11/25 15:55:07 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
11/11/25 15:55:07 INFO util.NativeCodeLoader: Loaded the native-hadoop library
11/11/25 15:55:07 WARN snappy.LoadSnappy: Snappy native library not loaded
11/11/25 15:55:07 INFO mapred.FileInputFormat: Total input paths to process : 3
11/11/25 15:55:07 INFO mapred.JobClient: Running job: job_201111161101_0145
11/11/25 15:55:08 INFO mapred.JobClient: map 0% reduce 0%
11/11/25 15:55:14 INFO mapred.JobClient: Task Id : attempt_201111161101_0145_m_000000_0, Status : FAILED
java.io.IOException
at org.apache.hadoop.mapred.pipes.OutputHandler.waitForAuthentication(OutputHandler.java:188)
at org.apache.hadoop.mapred.pipes.Application.waitForAuthentication(Application.java:194)
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:149)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:68)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
attempt_201111161101_0145_m_000000_0: Server failed to authenticate. Exiting
11/11/25 15:55:14 INFO mapred.JobClient: Task Id : attempt_201111161101_0145_m_000002_0, Status : FAILED
java.io.IOException
at org.apache.hadoop.mapred.pipes.OutputHandler.waitForAuthentication(OutputHandler.java:188)
at org.apache.hadoop.mapred.pipes.Application.waitForAuthentication(Application.java:194)
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:149)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:68)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
attempt_201111161101_0145_m_000002_0: Server failed to authenticate. Exiting
11/11/25 15:55:15 INFO mapred.JobClient: Task Id : attempt_201111161101_0145_m_000001_0, Status : FAILED
java.io.IOException
at org.apache.hadoop.mapred.pipes.OutputHandler.waitForAuthentication(OutputHandler.java:188)
at org.apache.hadoop.mapred.pipes.Application.waitForAuthentication(Application.java:194)
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:149)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:68)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
attempt_201111161101_0145_m_000001_0: Server failed to authenticate. Exiting
11/11/25 15:55:19 INFO mapred.JobClient: Task Id : attempt_201111161101_0145_m_000000_1, Status : FAILED
java.io.IOException
at org.apache.hadoop.mapred.pipes.OutputHandler.waitForAuthentication(OutputHandler.java:188)
at org.apache.hadoop.mapred.pipes.Application.waitForAuthentication(Application.java:194)
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:149)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:68)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
attempt_201111161101_0145_m_000000_1: Server failed to authenticate. Exiting
11/11/25 15:55:19 INFO mapred.JobClient: Task Id : attempt_201111161101_0145_m_000002_1, Status : FAILED
java.io.IOException
at org.apache.hadoop.mapred.pipes.OutputHandler.waitForAuthentication(OutputHandler.java:188)
at org.apache.hadoop.mapred.pipes.Application.waitForAuthentication(Application.java:194)
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:149)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:68)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
attempt_201111161101_0145_m_000002_1: Server failed to authenticate. Exiting
11/11/25 15:55:19 INFO mapred.JobClient: Task Id : attempt_201111161101_0145_m_000001_1, Status : FAILED
java.io.IOException
at org.apache.hadoop.mapred.pipes.OutputHandler.waitForAuthentication(OutputHandler.java:188)
at org.apache.hadoop.mapred.pipes.Application.waitForAuthentication(Application.java:194)
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:149)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:68)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
attempt_201111161101_0145_m_000001_1: Server failed to authenticate. Exiting
11/11/25 15:55:24 INFO mapred.JobClient: Task Id : attempt_201111161101_0145_m_000000_2, Status : FAILED
java.io.IOException
at org.apache.hadoop.mapred.pipes.OutputHandler.waitForAuthentication(OutputHandler.java:188)
at org.apache.hadoop.mapred.pipes.Application.waitForAuthentication(Application.java:194)
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:149)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:68)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
attempt_201111161101_0145_m_000000_2: Server failed to authenticate. Exiting
11/11/25 15:55:24 INFO mapred.JobClient: Task Id : attempt_201111161101_0145_m_000002_2, Status : FAILED
java.io.IOException
at org.apache.hadoop.mapred.pipes.OutputHandler.waitForAuthentication(OutputHandler.java:188)
at org.apache.hadoop.mapred.pipes.Application.waitForAuthentication(Application.java:194)
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:149)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:68)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
attempt_201111161101_0145_m_000002_2: Server failed to authenticate. Exiting
11/11/25 15:55:25 INFO mapred.JobClient: Task Id : attempt_201111161101_0145_m_000001_2, Status : FAILED
java.io.IOException
at org.apache.hadoop.mapred.pipes.OutputHandler.waitForAuthentication(OutputHandler.java:188)
at org.apache.hadoop.mapred.pipes.Application.waitForAuthentication(Application.java:194)
at org.apache.hadoop.mapred.pipes.Application.<init>(Application.java:149)
at org.apache.hadoop.mapred.pipes.PipesMapRunner.run(PipesMapRunner.java:68)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
attempt_201111161101_0145_m_000001_2: Server failed to authenticate. Exiting
11/11/25 15:55:30 INFO mapred.JobClient: Job complete: job_201111161101_0145
11/11/25 15:55:30 INFO mapred.JobClient: Counters: 8
11/11/25 15:55:30 INFO mapred.JobClient: Job Counters
11/11/25 15:55:30 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=56297
11/11/25 15:55:30 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
11/11/25 15:55:30 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
11/11/25 15:55:30 INFO mapred.JobClient: Rack-local map tasks=9
11/11/25 15:55:30 INFO mapred.JobClient: Launched map tasks=12
11/11/25 15:55:30 INFO mapred.JobClient: Data-local map tasks=3
11/11/25 15:55:30 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
11/11/25 15:55:30 INFO mapred.JobClient: Failed map tasks=1
11/11/25 15:55:30 INFO mapred.JobClient: Job Failed: NA
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1246)
at org.apache.hadoop.mapred.pipes.Submitter.runJob(Submitter.java:248)
at org.apache.hadoop.mapred.pipes.Submitter.run(Submitter.java:479)
at org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:494)
[hadoop#01HW394491 Desktop]$