Hive RegexSerDe Multiline Log matching showing NULL values after one line - regex

Trying to load Apache Tomcat logs with multiple lines for single row , but its only loading single line and showing NULL values for rest of lines until it reaches next record .
I have tried regex from earlier post but they are not working , the same regular expression is working in testing tools here is the link for one of them :
http://rubular.com/r/nVrWhuwg1c
It is properly recognizing the lines and seperating them in correct groups but the same code when i am trying in hive its not working .
Sample Record :
12-Dec-2013 12:03:26.988 WARNING [localhost-startStop-1] org.apache.tomcat.util.scan.StandardJarScanner.scan Failed to scan [file:/usr/share/java/jsp-api-2.3.jar] from classloader hierarchy
java.io.FileNotFoundException: /usr/share/java/jsp-api-2.3.jar (No such file or directory)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:225)
at java.util.zip.ZipFile.<init>(ZipFile.java:155)
at java.util.jar.JarFile.<init>(JarFile.java:166)
at java.util.jar.JarFile.<init>(JarFile.java:130)
at org.apache.tomcat.util.scan.JarFileUrlJar.<init>(JarFileUrlJar.java:60)
at org.apache.tomcat.util.scan.JarFactory.newInstance(JarFactory.java:49)
at org.apache.tomcat.util.scan.StandardJarScanner.process(StandardJarScanner.java:338)
at org.apache.tomcat.util.scan.StandardJarScanner.scan(StandardJarScanner.java:288)
at org.apache.catalina.startup.ContextConfig.processJarsForWebFragments(ContextConfig.java:1898)
at org.apache.catalina.startup.ContextConfig.webConfig(ContextConfig.java:1126)
at org.apache.catalina.startup.ContextConfig.configureStart(ContextConfig.java:775)
at org.apache.catalina.startup.ContextConfig.lifecycleEvent(ContextConfig.java:299)
at org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:94)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5105)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:752)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:728)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:734)
at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:596)
at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1805)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
13-Dec-2016 12:03:24.988 WARNING [localhost-startStop-1] org.apache.tomcat.util.scan.StandardJarScanner.scan Failed to scan [file:/usr/share/java/el-api-3.0.jar] from classloader hierarchy
java.io.FileNotFoundException: /usr/share/java/el-api-3.0.jar (No such file or directory)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:225)
at java.util.zip.ZipFile.<init>(ZipFile.java:155)
at java.util.jar.JarFile.<init>(JarFile.java:166)
at java.util.jar.JarFile.<init>(JarFile.java:130)
at org.apache.tomcat.util.scan.JarFileUrlJar.<init>(JarFileUrlJar.java:60)
at org.apache.tomcat.util.scan.JarFactory.newInstance(JarFactory.java:49)
at org.apache.tomcat.util.scan.StandardJarScanner.process(StandardJarScanner.java:338)
at org.apache.tomcat.util.scan.StandardJarScanner.scan(StandardJarScanner.java:288)
at org.apache.jasper.servlet.TldScanner.scanJars(TldScanner.java:262)
at org.apache.jasper.servlet.TldScanner.scan(TldScanner.java:104)
at org.apache.jasper.servlet.JasperInitializer.onStartup(JasperInitializer.java:101)
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5196)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:752)
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:728)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:734)
at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:596)
at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1805)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Create Table Query which i am trying
CREATE EXTERNAL TABLE apache_cat_log_test
(
logdatetime STRING,
logtype STRING,
requestid STRING,
verbosedata STRING,
msg string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "^(\\d{2}-\\w{3}-\\d{4})\\s+(.*?)\\s+(\\w+)\\s*(.*?)\\s*$((?:\\s*(?!\\d{2}).*?$)*)"
)
STORED AS TEXTFILE
location '/catlogs';

Related

Resolving Linear Pointer is throwing error

I was trying to use LinearPointer in my Application.When I was trying to resolve the state data where I want the unconsumed latest version,it is throwing Illegal Argument Exception.I am adding the logs.
{actor_id=internalShell, actor_owning_identity=O=PartyB, L=Tokyo, C=JP, actor_store_id=NODE_CONFIG, fiber-id=10000004, flow-id=56a4acbd-971f-4e79-8b89-314a910de58a, invocation_id=b191db29-3ad2-4276-9136-217269398148, invocation_timestamp=2020-09-25T12:29:40.292Z, origin=internalShell, session_id=99a63055-16d6-4515-82e4-145399148cae, session_timestamp=2020-09-25T12:28:11.858Z, thread-id=160}
[WARN ] 2020-09-25T12:29:40,533Z [Node thread-1] interceptors.DumpHistoryOnErrorInterceptor. - Flow [56a4acbd-971f-4e79-8b89-314a910de58a] error - List has more than one element. [errorCode=1af3t2i, moreInformationAt=https://errors.corda.net/OS/4.5/1af3t2i] {actor_id=internalShell, actor_owning_identity=O=PartyB, L=Tokyo, C=JP, actor_store_id=NODE_CONFIG, fiber-id=10000004, flow-id=56a4acbd-971f-4e79-8b89-314a910de58a, invocation_id=b191db29-3ad2-4276-9136-217269398148, invocation_timestamp=2020-09-25T12:29:40.292Z, origin=internalShell, session_id=99a63055-16d6-4515-82e4-145399148cae, session_timestamp=2020-09-25T12:28:11.858Z, thread-id=160}
java.lang.IllegalArgumentException: List has more than one element.
at kotlin.collections.CollectionsKt___CollectionsKt.single(_Collections.kt:480) ~[kotlin-stdlib-1.2.71.jar:1.2.71-release-64 (1.2.71)]
at net.corda.core.contracts.LinearPointer.resolve(StatePointer.kt:197) ~[corda-core-4.5.jar:?]
at statePointer.example.flows.UpdateVehicleFlow.call(UpdateVehicleFlow.kt:44) ~[?:?]
at statePointer.example.flows.UpdateVehicleFlow.call(UpdateVehicleFlow.kt:28) ~[?:?]
at net.corda.node.services.statemachine.FlowStateMachineImpl.run(FlowStateMachineImpl.kt:299) ~[corda-node-4.5.jar:?]
at net.corda.node.services.statemachine.FlowStateMachineImpl.run(FlowStateMachineImpl.kt:66) ~[corda-node-4.5.jar:?]
at co.paralleluniverse.fibers.Fiber.run1(Fiber.java:1092) ~[quasar-core-0.7.12_r3-jdk8.jar:0.7.12_r3]
at co.paralleluniverse.fibers.Fiber.exec(Fiber.java:788) ~[quasar-core-0.7.12_r3-jdk8.jar:0.7.12_r3]
at co.paralleluniverse.fibers.RunnableFiberTask.doExec(RunnableFiberTask.java:100) ~[quasar-core-0.7.12_r3-jdk8.jar:0.7.12_r3]
at co.paralleluniverse.fibers.RunnableFiberTask.run(RunnableFiberTask.java:91) ~[quasar-core-0.7.12_r3-jdk8.jar:0.7.12_r3]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_201]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_201]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_201]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_201]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_201]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_201]
at net.corda.node.utilities.AffinityExecutor$ServiceAffinityExecutor$1$thread$1.run(AffinityExecutor.kt:63) ~[corda-node-4.5.jar:?]
The code that I was trying is ,
val queryCriteria = QueryCriteria.LinearStateQueryCriteria(
null,
listOf(vehicleState!!.state.data.serviceStationDynamic.resolve(serviceHub).state.data.linearId.id),
null, Vault.StateStatus.UNCONSUMED)
// Use the vaultQuery with the previously created queryCriteria to fetch the ServiceStation to be used as input
// in the transaction.
val serviceStationStateAndRef = serviceHub.vaultService.queryBy<ServiceStation>(queryCriteria).states.singleOrNull()
val chargeFromDynamicPointer=serviceStationStateAndRef!!.state.data.serviceCharge
The linearId of a LinearState is supposed to be unique for a particular state throughout its evolution. It means that when a linear state evolves all the historic states of it should share the same linearId. However, it needs to be unique for different linear states.
The error message suggests that you have more than one linear state with the same linearId, which is why resolve is failing.

WSO2 ESB: Configuring db connection in dblookup mediator is failing

Using IntegrationStudio I have created an ESB project.Then I created a proxy service and added a dblookup mediator. I have configured it to use postgresql DB. I configured as mentioned below.
connection_type as DB_CONNECTION
In database configuration window, I choose connection type as postgresql.
I chose "get from server" radio button and selected "42.2.5" from the combo list and entered the connection parameters.
Connection DB Driver: com.postgres.jdbc.Driver (tried with org.postgresql.Driver)
jdbc url connection: jdbc:postgresql://localhost:5432/EDH_DATABASE
connection username: postgres
password: entered
The test connection works fine. But when running the proxy through IntegrationStudio it gives the following error.
NOTE: I COPIED postgresql-42.2.5.jar to IntegrationStudio.app/Contents/Eclipse/runtime/microesb/lib
[2020-01-08 17:39:25,612] ERROR {org.apache.synapse.mediators.db.DBLookupMediator} - SQL Exception occurred while executing statement : select * from teacher; against DataSource : jdbc:postgresql://localhost:5432/EDH_DATABASE org.apache.commons.dbcp.SQLNestedException: Cannot load JDBC driver class 'com.postgres.jdbc.Driver'
at org.apache.commons.dbcp.BasicDataSource.createConnectionFactory(BasicDataSource.java:1429)
at org.apache.commons.dbcp.BasicDataSource.createDataSource(BasicDataSource.java:1371)
at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
at org.apache.synapse.mediators.db.DBLookupMediator.processStatement(DBLookupMediator.java:58)
at org.apache.synapse.mediators.db.AbstractDBMediator.mediate(AbstractDBMediator.java:243)
at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:109)
at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:71)
at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:158)
at org.apache.synapse.core.axis2.ProxyServiceMessageReceiver.receive(ProxyServiceMessageReceiver.java:224)
at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
at org.apache.synapse.transport.passthru.ServerWorker.processNonEntityEnclosingRESTHandler(ServerWorker.java:367)
at org.apache.synapse.transport.passthru.ServerWorker.processEntityEnclosingRequest(ServerWorker.java:412)
at org.apache.synapse.transport.passthru.ServerWorker.run(ServerWorker.java:181)
at org.apache.axis2.transport.base.threads.NativeWorkerPool$1.run(NativeWorkerPool.java:172)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.postgres.jdbc.Driver
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.eclipse.osgi.internal.framework.ContextFinder.loadClass(ContextFinder.java:139)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.commons.dbcp.BasicDataSource.createConnectionFactory(BasicDataSource.java:1420)
... 16 more
The postgresql-42.2.5.jar does not have the path "com.postgresql.jdbc.Driver".
So tried "org.postgresql.Driver" as the Connection DB Driver field value. But still it gives the same error. INSPITE OF USING"org.postgresql.Driver", using the configurable field, it still gives the same error.
[2020-01-08 17:39:25,612] ERROR {org.apache.synapse.mediators.db.DBLookupMediator} - SQL Exception occurred while executing statement : select * from teacher; against DataSource : jdbc:postgresql://localhost:5432/EDH_DATABASE org.apache.commons.dbcp.SQLNestedException:
Cannot load JDBC driver class 'com.postgres.jdbc.Driver'
at
org.apache.commons.dbcp.BasicDataSource.createConnectionFactory(BasicDataSource.java:1429)
at org.apache.commons.dbcp.BasicDataSource.createDataSource(BasicDataSource.java:1371)
at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
at
org.apache.synapse.mediators.db.DBLookupMediator.processStatement(DBLookupMediator.java:58)
at org.apache.synapse.mediators.db.AbstractDBMediator.mediate(AbstractDBMediator.java:243)
at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:109)
at org.apache.synapse.mediators.AbstractListMediator.mediate(AbstractListMediator.java:71)
at org.apache.synapse.mediators.base.SequenceMediator.mediate(SequenceMediator.java:158)
at Org.apache.synapse.core.axis2.ProxyServiceMessageReceiver.receive(ProxyServiceMessageReceiver.java:224)
at org.apache.axis2.engine.AxisEngine.receive(AxisEngine.java:180)
at org.apache.synapse.transport.passthru.ServerWorker.processNonEntityEnclosingRESTHandler(ServerWorker.java:367)
at
org.apache.synapse.transport.passthru.ServerWorker.processEntityEnclosingRequest(ServerWorker.java:412)
at org.apache.synapse.transport.passthru.ServerWorker.run(ServerWorker.java:181)
at
org.apache.axis2.transport.base.threads.NativeWorkerPool$1.run(NativeWorkerPool.java:172)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: com.postgres.jdbc.Driver
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.eclipse.osgi.internal.framework.ContextFinder.loadClass(ContextFinder.java:139)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at
org.apache.commons.dbcp.BasicDataSource.createConnectionFactory(BasicDataSource.java:1420)
... 16 more
Any help would be greatly appreciated.
It seems the correct Driver name is "org.postgresql.Driver". In your error logs, even though you have changed the Driver name from "com.postgres.jdbc.Driver" to "org.postgresql.Driver", it seems to be looking for the former one. Please make sure that, you have stopped the running Micro Integrator by clicking the stop button and run the proxy again by clicking on "Run As -> Run on Micro Integrator".

Is there a way to easily import a custom Plugin in Cloud Data Fusion?

I'm setting up a pipeline using Cloud Data Fusion, and I wanted to import my own custom Plugin. Is there an easy way to import it?
I already tried to use the Import button in the Studio section but it gave me some problems with the artifacts. I also tried adding a new entity using the + button and uploading the .jar and .json files, but it does not return any message.
However, in the App Fabric log these errors have actually been encountered:
2019-06-13 08:37:15,020 - ERROR [appfabric-executor-30:i.c.c.c.HttpExceptionHandler#70] - Unexpected error: request=PUT /v3/namespaces/default/artifacts/org.myCustom.plugin/versions/1.0-SNAPSHOT/properties user=<null>:
java.lang.NullPointerException: null
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:280) ~[com.google.guava.guava-13.0.1.jar:na]
at io.cdap.cdap.internal.app.runtime.artifact.ArtifactMeta.<init>(ArtifactMeta.java:53) ~[na:na]
at io.cdap.cdap.internal.app.runtime.artifact.ArtifactStore.lambda$updateArtifactProperties$19(ArtifactStore.java:648) ~[na:na]
at io.cdap.cdap.spi.data.sql.SqlTransactionRunner.run(SqlTransactionRunner.java:74) ~[na:na]
at io.cdap.cdap.spi.data.sql.RetryingSqlTransactionRunner.run(RetryingSqlTransactionRunner.java:64) ~[na:na]
at io.cdap.cdap.spi.data.transaction.TransactionRunners.run(TransactionRunners.java:92) ~[na:na]
at io.cdap.cdap.internal.app.runtime.artifact.ArtifactStore.updateArtifactProperties(ArtifactStore.java:637) ~[na:na]
at io.cdap.cdap.internal.app.runtime.artifact.DefaultArtifactRepository.writeArtifactProperties(DefaultArtifactRepository.java:289) ~[na:na]
at io.cdap.cdap.internal.app.runtime.artifact.AuthorizationArtifactRepository.writeArtifactProperties(AuthorizationArtifactRepository.java:216) ~[na:na]
at io.cdap.cdap.gateway.handlers.ArtifactHttpHandler.writeProperties(ArtifactHttpHandler.java:341) ~[na:na]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_212]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_212]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_212]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_212]
at io.cdap.http.internal.HttpMethodInfo.invoke(HttpMethodInfo.java:82) ~[io.cdap.http.netty-http-1.2.0.jar:na]
at io.cdap.http.internal.HttpDispatcher.channelRead(HttpDispatcher.java:45) [io.cdap.http.netty-http-1.2.0.jar:na]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:38) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:353) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at io.cdap.http.internal.NonStickyEventExecutorGroup$NonStickyOrderedEventExecutor.run(NonStickyEventExecutorGroup.java:254) [io.cdap.http.netty-http-1.2.0.jar:na]
at io.netty.util.concurrent.UnorderedThreadPoolEventExecutor$NonNotifyRunnable.run(UnorderedThreadPoolEventExecutor.java:277) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_212]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_212]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_212]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_212]
2019-06-13 08:37:24,504 - DEBUG [appfabric-executor-27:i.c.c.a.g.DefaultProgramRunnerFactory#73] - Using runtime provider io.cdap.cdap.app.runtime.spark.Spark2ProgramRuntimeProvider#444b1b21 for program type Spark
2019-06-13 08:37:24,524 - DEBUG [appfabric-executor-27:i.c.c.a.g.DefaultProgramRunnerFactory#73] - Using runtime provider io.cdap.cdap.app.runtime.spark.Spark2ProgramRuntimeProvider#444b1b21 for program type Spark
2019-06-13 08:37:27,200 - ERROR [appfabric-executor-26:i.c.c.c.HttpExceptionHandler#70] - Unexpected error: request=PUT /v3/namespaces/default/artifacts/org.myCustom.plugin/versions/1.0-SNAPSHOT/properties user=<null>:
java.lang.NullPointerException: null
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:280) ~[com.google.guava.guava-13.0.1.jar:na]
at io.cdap.cdap.internal.app.runtime.artifact.ArtifactMeta.<init>(ArtifactMeta.java:53) ~[na:na]
at io.cdap.cdap.internal.app.runtime.artifact.ArtifactStore.lambda$updateArtifactProperties$19(ArtifactStore.java:648) ~[na:na]
at io.cdap.cdap.spi.data.sql.SqlTransactionRunner.run(SqlTransactionRunner.java:74) ~[na:na]
at io.cdap.cdap.spi.data.sql.RetryingSqlTransactionRunner.run(RetryingSqlTransactionRunner.java:64) ~[na:na]
at io.cdap.cdap.spi.data.transaction.TransactionRunners.run(TransactionRunners.java:92) ~[na:na]
at io.cdap.cdap.internal.app.runtime.artifact.ArtifactStore.updateArtifactProperties(ArtifactStore.java:637) ~[na:na]
at io.cdap.cdap.internal.app.runtime.artifact.DefaultArtifactRepository.writeArtifactProperties(DefaultArtifactRepository.java:289) ~[na:na]
at io.cdap.cdap.internal.app.runtime.artifact.AuthorizationArtifactRepository.writeArtifactProperties(AuthorizationArtifactRepository.java:216) ~[na:na]
at io.cdap.cdap.gateway.handlers.ArtifactHttpHandler.writeProperties(ArtifactHttpHandler.java:341) ~[na:na]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_212]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_212]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_212]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_212]
at io.cdap.http.internal.HttpMethodInfo.invoke(HttpMethodInfo.java:82) ~[io.cdap.http.netty-http-1.2.0.jar:na]
at io.cdap.http.internal.HttpDispatcher.channelRead(HttpDispatcher.java:45) [io.cdap.http.netty-http-1.2.0.jar:na]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:38) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:353) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at io.cdap.http.internal.NonStickyEventExecutorGroup$NonStickyOrderedEventExecutor.run(NonStickyEventExecutorGroup.java:254) [io.cdap.http.netty-http-1.2.0.jar:na]
at io.netty.util.concurrent.UnorderedThreadPoolEventExecutor$NonNotifyRunnable.run(UnorderedThreadPoolEventExecutor.java:277) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_212]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_212]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_212]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_212]
2019-06-13 08:38:43,782 - DEBUG [appfabric-executor-30:i.c.c.a.g.DefaultProgramRunnerFactory#73] - Using runtime provider io.cdap.cdap.app.runtime.spark.Spark2ProgramRuntimeProvider#444b1b21 for program type Spark
2019-06-13 08:38:43,803 - DEBUG [appfabric-executor-30:i.c.c.a.g.DefaultProgramRunnerFactory#73] - Using runtime provider io.cdap.cdap.app.runtime.spark.Spark2ProgramRuntimeProvider#444b1b21 for program type Spark
2019-06-13 08:38:46,441 - ERROR [appfabric-executor-38:i.c.c.c.HttpExceptionHandler#70] - Unexpected error: request=PUT /v3/namespaces/default/artifacts/org.myCustom.plugin/versions/1.0-SNAPSHOT/properties user=<null>:
java.lang.NullPointerException: null
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:280) ~[com.google.guava.guava-13.0.1.jar:na]
at io.cdap.cdap.internal.app.runtime.artifact.ArtifactMeta.<init>(ArtifactMeta.java:53) ~[na:na]
at io.cdap.cdap.internal.app.runtime.artifact.ArtifactStore.lambda$updateArtifactProperties$19(ArtifactStore.java:648) ~[na:na]
at io.cdap.cdap.spi.data.sql.SqlTransactionRunner.run(SqlTransactionRunner.java:74) ~[na:na]
at io.cdap.cdap.spi.data.sql.RetryingSqlTransactionRunner.run(RetryingSqlTransactionRunner.java:64) ~[na:na]
at io.cdap.cdap.spi.data.transaction.TransactionRunners.run(TransactionRunners.java:92) ~[na:na]
at io.cdap.cdap.internal.app.runtime.artifact.ArtifactStore.updateArtifactProperties(ArtifactStore.java:637) ~[na:na]
at io.cdap.cdap.internal.app.runtime.artifact.DefaultArtifactRepository.writeArtifactProperties(DefaultArtifactRepository.java:289) ~[na:na]
at io.cdap.cdap.internal.app.runtime.artifact.AuthorizationArtifactRepository.writeArtifactProperties(AuthorizationArtifactRepository.java:216) ~[na:na]
at io.cdap.cdap.gateway.handlers.ArtifactHttpHandler.writeProperties(ArtifactHttpHandler.java:341) ~[na:na]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_212]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_212]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_212]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_212]
at io.cdap.http.internal.HttpMethodInfo.invoke(HttpMethodInfo.java:82) ~[io.cdap.http.netty-http-1.2.0.jar:na]
at io.cdap.http.internal.HttpDispatcher.channelRead(HttpDispatcher.java:45) [io.cdap.http.netty-http-1.2.0.jar:na]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:38) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:353) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at io.cdap.http.internal.NonStickyEventExecutorGroup$NonStickyOrderedEventExecutor.run(NonStickyEventExecutorGroup.java:254) [io.cdap.http.netty-http-1.2.0.jar:na]
at io.netty.util.concurrent.UnorderedThreadPoolEventExecutor$NonNotifyRunnable.run(UnorderedThreadPoolEventExecutor.java:277) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_212]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_212]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_212]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_212]
2019-06-13 08:40:20,681 - DEBUG [appfabric-executor-36:i.c.c.a.g.DefaultProgramRunnerFactory#73] - Using runtime provider io.cdap.cdap.app.runtime.spark.Spark2ProgramRuntimeProvider#444b1b21 for program type Spark
2019-06-13 08:40:20,703 - DEBUG [appfabric-executor-36:i.c.c.a.g.DefaultProgramRunnerFactory#73] - Using runtime provider io.cdap.cdap.app.runtime.spark.Spark2ProgramRuntimeProvider#444b1b21 for program type Spark
2019-06-13 08:40:22,990 - ERROR [appfabric-executor-29:i.c.c.c.HttpExceptionHandler#70] - Unexpected error: request=PUT /v3/namespaces/default/artifacts/org.myCustom.plugin/versions/1.0-SNAPSHOT/properties user=<null>:
java.lang.NullPointerException: null
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:280) ~[com.google.guava.guava-13.0.1.jar:na]
at io.cdap.cdap.internal.app.runtime.artifact.ArtifactMeta.<init>(ArtifactMeta.java:53) ~[na:na]
at io.cdap.cdap.internal.app.runtime.artifact.ArtifactStore.lambda$updateArtifactProperties$19(ArtifactStore.java:648) ~[na:na]
at io.cdap.cdap.spi.data.sql.SqlTransactionRunner.run(SqlTransactionRunner.java:74) ~[na:na]
at io.cdap.cdap.spi.data.sql.RetryingSqlTransactionRunner.run(RetryingSqlTransactionRunner.java:64) ~[na:na]
at io.cdap.cdap.spi.data.transaction.TransactionRunners.run(TransactionRunners.java:92) ~[na:na]
at io.cdap.cdap.internal.app.runtime.artifact.ArtifactStore.updateArtifactProperties(ArtifactStore.java:637) ~[na:na]
at io.cdap.cdap.internal.app.runtime.artifact.DefaultArtifactRepository.writeArtifactProperties(DefaultArtifactRepository.java:289) ~[na:na]
at io.cdap.cdap.internal.app.runtime.artifact.AuthorizationArtifactRepository.writeArtifactProperties(AuthorizationArtifactRepository.java:216) ~[na:na]
at io.cdap.cdap.gateway.handlers.ArtifactHttpHandler.writeProperties(ArtifactHttpHandler.java:341) ~[na:na]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_212]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_212]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_212]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_212]
at io.cdap.http.internal.HttpMethodInfo.invoke(HttpMethodInfo.java:82) ~[io.cdap.http.netty-http-1.2.0.jar:na]
at io.cdap.http.internal.HttpDispatcher.channelRead(HttpDispatcher.java:45) [io.cdap.http.netty-http-1.2.0.jar:na]
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:38) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:353) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at io.cdap.http.internal.NonStickyEventExecutorGroup$NonStickyOrderedEventExecutor.run(NonStickyEventExecutorGroup.java:254) [io.cdap.http.netty-http-1.2.0.jar:na]
at io.netty.util.concurrent.UnorderedThreadPoolEventExecutor$NonNotifyRunnable.run(UnorderedThreadPoolEventExecutor.java:277) [io.netty.netty-all-4.1.16.Final.jar:4.1.16.Final]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_212]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_212]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_212]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_212]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_212]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_212]
I'm new to Cloud Data Fusion and I have never used CDAP before. Is there an explanation for these errors?
You can import a pipeline, but if you have created a plugin, then you can upload the plugin by click on green '+' and then selecting "Plugin">"Upload" which you have done. Could you share you plugin JSON ?
Ok, I tried that way as well. My JSON looked as follows:
{
"parents": [
"system:cdap-data-pipeline[4.0.0,6.1.0)",
"system:cdap-data-streams[4.0.0,6.1.0)"
],
"artifact":{
"name" : "org.myCustom.plugin",
"version": "1.0-SNAPSHOT",
"scope": "SYSTEM"
}
"properties": {
"widgets.ErrorCollector-errortransform": "{\"metadata\":{\"spec-version\":\"1.0\"},\"configuration-groups\":[{\"label\":\"Error Collector Configuration\",\"properties\":[{\"widget-type\":\"textbox\",\"label\":\"Message Field\",\"name\":\"messageField\",\"plugin-function\":{\"method\":\"POST\",\"widget\":\"outputSchema\",\"plugin-method\":\"getSchema\"},\"widget-attributes\":{\"default\":\"errMsg\"}},{\"widget-type\":\"textbox\",\"label\":\"Code Field\",\"name\":\"codeField\",\"widget-attributes\":{\"default\":\"errCode\"}},{\"widget-type\":\"textbox\",\"label\":\"Stage Field\",\"name\":\"stageField\",\"widget-attributes\":{\"default\":\"errStage\"}}]}],\"outputs\":[]}",
"widgets.FilesetDelete-postaction": "{\"metadata\":{\"spec-version\":\"1.0\"},\"configuration-groups\":[{\"label\":\"Fileset Delete Configuration\",\"properties\":[{\"widget-type\":\"dataset-selector\",\"label\":\"FileSet Name\",\"name\":\"filesetName\"},{\"widget-type\":\"textbox\",\"label\":\"FileSet directory\",\"name\":\"directory\"},{\"widget-type\":\"textbox\",\"label\":\"Delete Regex\",\"name\":\"deleteRegex\"}]}],\"outputs\":[]}",
"widgets.FilesetMove-action": "{\"metadata\":{\"spec-version\":\"1.0\"},\"configuration-groups\":[{\"label\":\"Fileset Delete Configuration\",\"properties\":[{\"widget-type\":\"dataset-selector\",\"label\":\"Source FileSet\",\"name\":\"sourceFileset\"},{\"widget-type\":\"dataset-selector\",\"label\":\"Destination FileSet\",\"name\":\"destinationFileset\"},{\"widget-type\":\"textbox\",\"label\":\"Filter Regex\",\"name\":\"filterRegex\"}]}],\"outputs\":[]}",
"widgets.StringCase-transform-custom": "{\"metadata\":{\"spec-version\":\"1.0\"},\"artifact\":{\"name\":\"org.myCustom.plugin\",\"version\":\"1.0-SNAPSHOT\",\"scope\":\"SYSTEM\"},\"display-name\":\"My Custom Transformation\",\"icon\":{\"type\":\"builtin|link|inline\",\"arguments\":{\"url\":\"https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcT1IRBT7dHXIhCkjmoy8esJsWY2Gv89tuoIbLVEwi16fTb5FbcF\",\"data\":\"data:image/png;base64,...\"}},\"configuration-groups\":[{\"label\":\"String Case Configuration\",\"properties\":[{\"widget-type\":\"csv\",\"label\":\"Fields to upper case\",\"name\":\"upperFields\",\"description\":\"List of fields to upper case.\",\"widget-attributes\":{\"delimiter\":\",\"}},{\"widget-type\":\"csv\",\"label\":\"Fields to lower case\",\"name\":\"lowerFields\",\"description\":\"List of fields to lower case.\",\"widget-attributes\":{\"delimiter\":\",\"}}]}],\"outputs\":[]}",
"widgets.StringCase-transform": "{\"metadata\":{\"spec-version\":\"1.0\"},\"configuration-groups\":[{\"label\":\"String Case Configuration\",\"properties\":[{\"widget-type\":\"csv\",\"label\":\"Fields to upper case\",\"name\":\"upperFields\",\"description\":\"List of fields to upper case.\",\"widget-attributes\":{\"delimiter\":\",\"}},{\"widget-type\":\"csv\",\"label\":\"Fields to lower case\",\"name\":\"lowerFields\",\"description\":\"List of fields to lower case.\",\"widget-attributes\":{\"delimiter\":\",\"}}]}],\"outputs\":[]}",
"widgets.TextFileSet-batchsink": "{\"metadata\":{\"spec-version\":\"1.0\"},\"configuration-groups\":[{\"label\":\"Text FileSet Sink Configuration\",\"properties\":[{\"widget-type\":\"dataset-selector\",\"label\":\"FileSet Name\",\"name\":\"fileSetName\"},{\"widget-type\":\"textbox\",\"label\":\"Field separator\",\"name\":\"fieldSeparator\"},{\"widget-type\":\"textbox\",\"label\":\"Field separator\",\"name\":\"outputDir\"}]}],\"outputs\":[]}",
"widgets.TextFileSet-batchsource": "{\"metadata\":{\"spec-version\":\"1.0\"},\"configuration-groups\":[{\"label\":\"Text FileSet Source Configuration\",\"properties\":[{\"widget-type\":\"dataset-selector\",\"label\":\"FileSet Name\",\"name\":\"fileSetName\"},{\"widget-type\":\"textbox\",\"label\":\"Input files within the FileSet\",\"name\":\"files\"},{\"widget-type\":\"select\",\"label\":\"Create FileSet if it does not exist\",\"name\":\"createIfNotExists\",\"widget-attributes\":{\"values\":[\"true\",\"false\"],\"default\":\"false\"}},{\"widget-type\":\"select\",\"label\":\"Delete data read on pipeline run success\",\"name\":\"deleteInputOnSuccess\",\"widget-attributes\":{\"values\":[\"true\",\"false\"],\"default\":\"false\"}}]}],\"outputs\":[{\"widget-type\":\"non-editable-schema-editor\",\"schema\":{\"position\":\"long\",\"text\":\"string\"}}]}",
"widgets.WordCount-batchaggregator": "{\"metadata\":{\"spec-version\":\"1.0\"},\"configuration-groups\":[{\"label\":\"Word Count Aggregator Configuration\",\"properties\":[{\"widget-type\":\"textbox\",\"label\":\"Field Name\",\"name\":\"field\"}]}],\"outputs\":[{\"widget-type\":\"non-editable-schema-editor\",\"schema\":{\"word\":\"string\",\"count\":\"long\"}}]}",
"widgets.WordCount-sparkcompute": "{\"metadata\":{\"spec-version\":\"1.0\"},\"configuration-groups\":[{\"label\":\"Word Count Compute Configuration\",\"properties\":[{\"widget-type\":\"textbox\",\"label\":\"Field Name\",\"name\":\"field\"}]}],\"outputs\":[{\"widget-type\":\"non-editable-schema-editor\",\"schema\":{\"word\":\"string\",\"count\":\"long\"}}]}",
"widgets.WordCount-sparksink": "{\"metadata\":{\"spec-version\":\"1.0\"},\"configuration-groups\":[{\"label\":\"Word Count Sink Configuration\",\"properties\":[{\"widget-type\":\"textbox\",\"label\":\"Field Name\",\"name\":\"field\"},{\"widget-type\":\"dataset-selector\",\"label\":\"Table Name\",\"name\":\"tableName\"}]}],\"outputs\":[]}",
"doc.ErrorCollector-errortransform": "# Error Collector\r\n\r\n\r\nDescription\r\n-----------\r\nThe ErrorCollector plugin takes errors emitted from the previous stage and flattens them by adding\r\nthe error message, code, and stage to the record and outputting the result.\r\n\r\nUse Case\r\n--------\r\nThe plugin is used when you want to capture errors emitted from another stage and pass them along\r\nwith all the error information flattened into the record. For example, you may want to connect a sink\r\nto this plugin in order to store and later examine the error records.\r\n\r\nProperties\r\n----------\r\n**messageField:** The name of the error message field to use in the output schema. Defaults to 'errMsg'.\r\nIf this is not specified, the error message will be dropped.\r\n\r\n**codeField:** The name of the error code field to use in the output schema. Defaults to 'errCode'.\r\nIf this is not specified, the error code will be dropped.\r\n\r\n**stageField:** The name of the error stage field to use in the output schema. Defaults to 'errStage'.\r\nIf this is not specified, the error stage will be dropped.\r\n\r\n\r\nExample\r\n-------\r\nThis example adds the error message, error code, and error stage as the 'errMsg', 'errCode', and 'errStage' fields.\r\n\r\n {\r\n \"name\": \"ErrorCollector\",\r\n \"type\": \"errortransform\",\r\n \"properties\": {\r\n \"messageField\": \"errMsg\",\r\n \"codeField\": \"errCode\",\r\n \"stageField\": \"errStage\"\r\n }\r\n }\r\n\r\nFor example, suppose the plugin receives this error record:\r\n\r\n +============================+\r\n | field name | type | value |\r\n +============================+\r\n | A | int | 10 |\r\n | B | int | 20 |\r\n +============================+\r\n\r\nwith error code 17, error message 'invalid', from stage 'parser'. It will add the error information\r\nto the record and output:\r\n\r\n +===============================+\r\n | field name | type | value |\r\n +===============================+\r\n | A | int | 10 |\r\n | B | int | 20 |\r\n | errMsg | string | invalid |\r\n | errCode | int | 17 |\r\n | errStage | string | parser |\r\n +===============================+\r\n",
"doc.FilesetDelete-postaction": "# FilesetDelete Post Action\r\n\r\nDescription\r\n-----------\r\n\r\nIf a pipeline run succeeds, deletes files in a FileSet that match a configurable regex.\r\n\r\nUse Case\r\n--------\r\n\r\nThis post action is used if you need to clean up some files after a successful pipeline run.\r\n\r\nProperties\r\n----------\r\n\r\n**filesetName:** The name of the FileSet to delete files from.\r\n\r\n**directory:** The directory in the FileSet to delete files from. Macro enabled.\r\n\r\n**deleteRegex:** Delete files that match this regex.\r\n\r\nExample\r\n-------\r\n\r\nThis example deletes any files that have the '.crc' extension from the 2016-01-01 directory of a FileSet named 'users'.\r\n\r\n {\r\n \"name\": \"TextFileSet\",\r\n \"type\": \"batchsource\",\r\n \"properties\": {\r\n \"fileSetName\": \"users\",\r\n \"directory\": \"2016-01-01\",\r\n \"deleteRegex\": \".*\\\\.crc\"\r\n }\r\n }\r\n",
"doc.FilesetMove-action": "# FilesetMove Action\r\n\r\nDescription\r\n-----------\r\n\r\nMoves files from one FileSet into another FileSet.\r\n\r\nUse Case\r\n--------\r\n\r\nThis action may be used at the start of a pipeline run to move a subset of files from one FileSet into another\r\nFileSet to process. Or it may be used at the end of a pipeline run to move a subset of files from the output FileSet\r\nto some other location for further processing.\r\n\r\nProperties\r\n----------\r\n\r\n**sourceFileset:** The name of the FileSet to move files from\r\n\r\n**destinationFileSet:** The name of the FileSet to move files to\r\n\r\n**filterRegex:** Filter any files whose name matches this regex.\r\nDefaults to '^\\\\.', which filters any files that begin with a period.\r\n\r\nExample\r\n-------\r\n\r\nThis example moves files from the 'staging' FileSet into the 'input' FileSet.\r\n\r\n {\r\n \"name\": \"TextFileSet\",\r\n \"type\": \"batchsource\",\r\n \"properties\": {\r\n \"sourceFileset\": \"staging\",\r\n \"destinationFileset\": \"input\"\r\n }\r\n }\r\n",
"doc.StringCase-transform": "# String Case Transform\r\n\r\nDescription\r\n-----------\r\n\r\nChanges configured fields to lowercase or uppercase.\r\n\r\nUse Case\r\n--------\r\n\r\nThis transform is used whenever you need to uppercase or lowercase one or more fields.\r\n\r\nProperties\r\n----------\r\n\r\n**lowerFields:** Comma separated list of fields to lowercase.\r\n\r\n**upperFields:** Comma separated list of fields to uppercase.\r\n\r\nExample\r\n-------\r\n\r\nThis example lowercases the 'name' field and uppercases the 'id' field:\r\n\r\n {\r\n \"name\": \"StringCase\",\r\n \"type\": \"transform\",\r\n \"properties\": {\r\n \"lowerFields\": \"name\",\r\n \"upperFields\": \"id\"\r\n }\r\n }\r\n",
"doc.TextFileSet-batchsink": "# Text FileSet Batch Sink\r\n\r\nDescription\r\n-----------\r\n\r\nWrites to a CDAP FileSet in text format. One line is written for each record\r\nsent to the sink. All record fields are joined using a configurable separator.\r\n\r\n\r\nUse Case\r\n--------\r\n\r\nThis source is used whenever you need to write to a FileSet in text format.\r\n\r\nProperties\r\n----------\r\n\r\n**fileSetName:** The name of the FileSet to write to.\r\n\r\n**fieldSeparator:** The separator to join input record fields on. Defaults to ','.\r\n\r\n**outputDir:** The output directory to write to. Macro enabled.\r\n\r\nExample\r\n-------\r\n\r\nThis example writes to a FileSet named 'users', using the '|' character to separate record fields:\r\n\r\n {\r\n \"name\": \"TextFileSet\",\r\n \"type\": \"batchsink\",\r\n \"properties\": {\r\n \"fileSetName\": \"users\",\r\n \"fieldSeparator\": \"|\",\r\n \"outputDir\": \"${outputDir}\"\r\n }\r\n }\r\n\r\nBefore running the pipeline, the 'outputDir' runtime argument must be specified.\r\n",
"doc.TextFileSet-batchsource": "# Text FileSet Batch Source\r\n\r\nDescription\r\n-----------\r\n\r\nReads from a CDAP FileSet in text format. Outputs records with two fields -- position (long), and text (string).\r\n\r\nUse Case\r\n--------\r\n\r\nThis source is used whenever you need to read from a FileSet in text format.\r\n\r\nProperties\r\n----------\r\n\r\n**fileSetName:** The name of the FileSet to read from.\r\n\r\n**createIfNotExists:** Whether to create the FileSet if it does not exist. Defaults to false.\r\n\r\n**deleteInputOnSuccess:** Whether to delete the data read if the pipeline run succeeded. Defaults to false.\r\n\r\n**files:** A comma separated list of files in the FileSet to read. Macro enabled.\r\n\r\nExample\r\n-------\r\n\r\nThis example reads from a FileSet named 'users' and deletes the data it read if the pipeline run succeeded:\r\n\r\n {\r\n \"name\": \"TextFileSet\",\r\n \"type\": \"batchsource\",\r\n \"properties\": {\r\n \"fileSetName\": \"users\",\r\n \"deleteInputOnSuccess\": \"true\",\r\n \"files\": \"${inputFiles}\"\r\n }\r\n }\r\n\r\nBefore running the pipeline, the 'inputFiles' runtime argument must be specified.\r\n",
"doc.WordCount-batchaggregator": "# Word Count Batch Aggregator\r\n\r\nDescription\r\n-----------\r\n\r\nFor the configured input string field, counts the number of times each word appears in that field.\r\nRecords output will have two fields -- word (string), and count (long).\r\n\r\nUse Case\r\n--------\r\n\r\nThis plugin is used whenever you want to count the number of times each word appears in a field.\r\n\r\nProperties\r\n----------\r\n\r\n**field:** The name of the string field to count words in.\r\n\r\nExample\r\n-------\r\n\r\nThis example counts the words in the 'text' field:\r\n\r\n {\r\n \"name\": \"WordCount\",\r\n \"type\": \"batchaggregator\",\r\n \"properties\": {\r\n \"field\": \"text\"\r\n }\r\n }\r\n",
"doc.WordCount-sparkcompute": "# Word Count Spark Compute\r\n\r\nDescription\r\n-----------\r\n\r\nFor the configured input string field, counts the number of times each word appears in that field.\r\nRecords output will have two fields -- word (string), and count (long).\r\n\r\nUse Case\r\n--------\r\n\r\nThis plugin is used whenever you want to count the number of times each word appears in a field.\r\n\r\nProperties\r\n----------\r\n\r\n**field:** The name of the string field to count words in.\r\n\r\nExample\r\n-------\r\n\r\nThis example counts the words in the 'text' field:\r\n\r\n {\r\n \"name\": \"WordCount\",\r\n \"type\": \"sparkcompute\",\r\n \"properties\": {\r\n \"field\": \"text\"\r\n }\r\n }\r\n",
"doc.WordCount-sparksink": "# Word Count Spark Sink\r\n\r\nDescription\r\n-----------\r\n\r\nFor the configured input string field, counts the number of times each word appears in that field.\r\nThe results are written to a CDAP KeyValueTable.\r\n\r\nUse Case\r\n--------\r\n\r\nThis plugin is used whenever you want to count and save the number of times each word appears in a field.\r\n\r\nProperties\r\n----------\r\n\r\n**field:** The name of the string field to count words in.\r\n\r\n**tableName:** The name of KeyValueTable to store the results in.\r\n\r\nExample\r\n-------\r\n\r\nThis example counts the words in the 'text' field and stores the results in the 'wordcounts' KeyValueTable:\r\n\r\n {\r\n \"name\": \"WordCount\",\r\n \"type\": \"sparksink\",\r\n \"properties\": {\r\n \"field\": \"text\",\r\n \"tableName\": \"wordcounts\"\r\n }\r\n }\r\n"
}
}
when you tried to upload the plugin through the UI, what error message did you get? Also, did you build the plugin JSON yourself or through one of the CDAP plugin template?

AWS glue job failing when running for huge size data

I am reading bunch of gz files from S3 bucket and doing some transformations,post that writing the transformed data to S3 in parquet format. I am not getting the error when I am executing for smaller number of files.But when the data is getting huge Below is the error.Even if changing the number of DPU while job execution , the error remains the same.
18/11/23 04:54:32 INFO MultipartUploadOutputStream: close closed:false
s3://path to s3 bucket/part-xxx.snappy.parquet
18/11/23 04:54:32 ERROR FileFormatWriter: Job job_xxx_0017 aborted.
18/11/23 04:54:32 ERROR Executor: Exception in task 154.1 in stage 17.0 (TID 4186)
org.apache.spark.SparkException: Task failed while writing rows
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sq l$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:270)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$an onfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:189)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:188)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:509)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:402)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:539)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:502)
at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)
at org.apache.spark.mapred.SparkHadoopMap
RedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:171)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:254)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:259)
... 8 more
18/11/23 04:54:32 ERROR Utils: Aborting task
java.lang.NullPointerException
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:509)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:402)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:539)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:502)
at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)
at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:171)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:254)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:259)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:189)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:188)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/11/23 04:54:32 ERROR Utils: Aborting task
java.lang.NullPointerExc
eption
at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:509)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:402)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:539)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:502)
at org.apache.spark.mapred.SparkHadoopMapRedUtil$.performCommit$1(SparkHadoopMapRedUtil.scala:50)
at org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:76)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitTask(HadoopMapReduceCommitProtocol.scala:171)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:258)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:254)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:259)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:189)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:188)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/11/23 04:54:32 INFO SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.parquet.hadoop.ParquetOutputCommitter
AWS glue has the proper permissions to write data to the target directories.
Any help will be much appreciated.

apache zeppelin hive jdbc mapreduce java.sql.SQLException

JDBC interpreter for hive in Zeppelin works fine on non MR queries. In case of MR, getting the below error
%Hive
select * from table where month=2;
getting the below exception :
java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.commons.dbcp2.DelegatingStatement.execute(DelegatingStatement.java:291)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:577)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:660)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:489)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)