I am doing a simple inner join between two tables, but I keep getting the warning shown below. I saw in other posts, that it is ok to ignore the warning, but my jobs end in failure and do not progress.
The tables are pretty big, (12 billion rows) but I am adding just three columns from one table to the other.
When reduce the dataset to a few million rows and run the script in Amazon Sagemaker Jupyter notebook. it works fine. But when I run it on the EMR cluster for the entire dataset, it fails. I even ran the specific snappy partition that it seemed to fail on, and it worked in sagemaker.
The job has no problems reading from one of the tables, it is the other table that seems to give the problem
INFO FileScanRDD: Reading File path:
s3a://path/EES_FD_UVA_HIST/date=2019-10-14/part-00056-ddb83da5-2e1b-499d-a52a-cad16e21bd2c-c000.snappy.parquet,
range: 0-102777097, partition values: [18183] 20/04/06 15:51:58 WARN
S3AbortableInputStream: Not all bytes were read from the
S3ObjectInputStream, aborting HTTP connection. This is likely an error
and may result in sub-optimal behavior. Request only the bytes you
need via a ranged GET or drain the input stream after use. 20/04/06
15:51:58 WARN S3AbortableInputStream: Not all bytes were read from the
S3ObjectInputStream, aborting HTTP connection. This is likely an error
and may result in sub-optimal behavior. Request only the bytes you
need via a ranged GET or drain the input stream after use. 20/04/06
15:52:03 INFO CoarseGrainedExecutorBackend: Driver commanded a
shutdown 20/04/06 15:52:03 INFO MemoryStore: MemoryStore cleared
20/04/06 15:52:03 INFO BlockManager: BlockManager stopped 20/04/06
15:52:03 INFO ShutdownHookManager: Shutdown hook called
This is my code:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
uvalim=spark.read.parquet("s3://path/UVA_HIST_WITH_LIMITS")
uvaorg=spark.read.parquet("s3a://path/EES_FD_UVA_HIST")
config=uvalim.select('SEQ_ID','TOOL_ID', 'DATE' ,'UL','LL')
uva=uvaorg.select('SEQ_ID', 'TOOL_ID', 'TIME_STAMP', 'RUN_ID', 'TARGET', 'LOWER_CRITICAL', 'UPPER_CRITICAL', 'RESULT', 'STATUS')
uva_config=uva.join(config, on=['SEQ_ID','TOOL_ID'], how='inner')
uva_config.write.mode("overwrite").parquet("s3a://path/Uvaconfig.parquet")
Is there a way to debug this?
Update: Based on Emerson's suggestion:
I ran it with the debug log. It ran for 9 hours with a Fail before i killed the yarn application.
For some reason the stderr did not have much output
This is the stderr output:
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found
binding in
[jar:file:/mnt/yarn/usercache/hadoop/filecache/301/__spark_libs__1712836156286367723.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation. SLF4J: Actual binding is of type
[org.slf4j.impl.Log4jLoggerFactory] 20/04/07 05:04:13 INFO
CoarseGrainedExecutorBackend: Started daemon with process name:
5653#ip-10-210-13-51 20/04/07 05:04:13 INFO SignalUtils: Registered
signal handler for TERM 20/04/07 05:04:13 INFO SignalUtils: Registered
signal handler for HUP 20/04/07 05:04:13 INFO SignalUtils: Registered
signal handler for INT 20/04/07 05:04:15 INFO SecurityManager:
Changing view acls to: yarn,hadoop 20/04/07 05:04:15 INFO
SecurityManager: Changing modify acls to: yarn,hadoop 20/04/07
05:04:15 INFO SecurityManager: Changing view acls groups to: 20/04/07
05:04:15 INFO SecurityManager: Changing modify acls groups to:
20/04/07 05:04:15 INFO SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users with view
permissions: Set(yarn, hadoop); groups with view permissions: Set();
users with modify permissions: Set(yarn, hadoop); groups with modify
permissions: Set() 20/04/07 05:04:15 INFO TransportClientFactory:
Successfully created connection to
ip-10-210-13-51.ec2.internal/10.210.13.51:35863 after 168 ms (0 ms
spent in bootstraps) 20/04/07 05:04:16 INFO SecurityManager: Changing
view acls to: yarn,hadoop 20/04/07 05:04:16 INFO SecurityManager:
Changing modify acls to: yarn,hadoop 20/04/07 05:04:16 INFO
SecurityManager: Changing view acls groups to: 20/04/07 05:04:16 INFO
SecurityManager: Changing modify acls groups to: 20/04/07 05:04:16
INFO SecurityManager: SecurityManager: authentication disabled; ui
acls disabled; users with view permissions: Set(yarn, hadoop); groups
with view permissions: Set(); users with modify permissions:
Set(yarn, hadoop); groups with modify permissions: Set() 20/04/07
05:04:16 INFO TransportClientFactory: Successfully created connection
to ip-10-210-13-51.ec2.internal/10.210.13.51:35863 after 20 ms (0 ms
spent in bootstraps) 20/04/07 05:04:16 INFO DiskBlockManager: Created
local directory at
/mnt1/yarn/usercache/hadoop/appcache/application_1569338404918_1241/blockmgr-2adfe133-fd28-4f25-95a4-2ac1348c625e
20/04/07 05:04:16 INFO DiskBlockManager: Created local directory at
/mnt/yarn/usercache/hadoop/appcache/application_1569338404918_1241/blockmgr-3620ceea-8eee-42c5-af2f-6975c894b643
20/04/07 05:04:17 INFO MemoryStore: MemoryStore started with capacity
3.8 GB 20/04/07 05:04:17 INFO CoarseGrainedExecutorBackend: Connecting to driver:
spark://CoarseGrainedScheduler#ip-10-210-13-51.ec2.internal:35863
20/04/07 05:04:17 INFO CoarseGrainedExecutorBackend: Successfully
registered with driver 20/04/07 05:04:17 INFO Executor: Starting
executor ID 1 on host ip-10-210-13-51.ec2.internal 20/04/07 05:04:18
INFO Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port
34073. 20/04/07 05:04:18 INFO NettyBlockTransferService: Server created on ip-10-210-13-51.ec2.internal:34073 20/04/07 05:04:18 INFO
BlockManager: Using
org.apache.spark.storage.RandomBlockReplicationPolicy for block
replication policy 20/04/07 05:04:18 INFO BlockManagerMaster:
Registering BlockManager BlockManagerId(1,
ip-10-210-13-51.ec2.internal, 34073, None) 20/04/07 05:04:18 INFO
BlockManagerMaster: Registered BlockManager BlockManagerId(1,
ip-10-210-13-51.ec2.internal, 34073, None) 20/04/07 05:04:18 INFO
BlockManager: external shuffle service port = 7337 20/04/07 05:04:18
INFO BlockManager: Registering executor with local external shuffle
service. 20/04/07 05:04:18 INFO TransportClientFactory: Successfully
created connection to ip-10-210-13-51.ec2.internal/10.210.13.51:7337
after 19 ms (0 ms spent in bootstraps) 20/04/07 05:04:18 INFO
BlockManager: Initialized BlockManager: BlockManagerId(1,
ip-10-210-13-51.ec2.internal, 34073, None) 20/04/07 05:04:20 INFO
CoarseGrainedExecutorBackend: Got assigned task 0 20/04/07 05:04:20
INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 20/04/07 05:04:21
INFO TorrentBroadcast: Started reading broadcast variable 0 20/04/07
05:04:21 INFO TransportClientFactory: Successfully created connection
to ip-10-210-13-51.ec2.internal/10.210.13.51:38181 after 17 ms (0 ms
spent in bootstraps) 20/04/07 05:04:21 INFO MemoryStore: Block
broadcast_0_piece0 stored as bytes in memory (estimated size 39.4 KB,
free 3.8 GB) 20/04/07 05:04:21 INFO TorrentBroadcast: Reading
broadcast variable 0 took 504 ms 20/04/07 05:04:22 INFO MemoryStore:
Block broadcast_0 stored as values in memory (estimated size 130.2 KB,
free 3.8 GB) 20/04/07 05:04:23 INFO CoarseGrainedExecutorBackend:
eagerFSInit: Eagerly initialized FileSystem at s3://does/not/exist in
5155 ms 20/04/07 05:04:25 INFO Executor: Finished task 0.0 in stage
0.0 (TID 0). 53157 bytes result sent to driver 20/04/07 05:04:25 INFO CoarseGrainedExecutorBackend: Got assigned task 2 20/04/07 05:04:25
INFO Executor: Running task 2.0 in stage 0.0 (TID 2) 20/04/07 05:04:25
INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 53114 bytes
result sent to driver 20/04/07 05:04:25 INFO
CoarseGrainedExecutorBackend: Got assigned task 3 20/04/07 05:04:25
INFO Executor: Running task 3.0 in stage 0.0 (TID 3) 20/04/07 05:04:25
ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM 20/04/07
05:04:25 INFO DiskBlockManager: Shutdown hook called 20/04/07 05:04:25
INFO ShutdownHookManager: Shutdown hook called
Can you switch to using s3 instead of s3a. i belive s3a is not recommended for use in EMR. Additionaly, You can run your job in debug.
sc = spark.sparkContext
sc.setLogLevel('DEBUG')
Read the below document that talks about s3a
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-file-systems.html
So after troubleshooting the Debugs, I came to the conclusion that it was indeed a memory issue.
The cluster I was using was running out of memory after loading a few days worth of data. Each day was about 2 billion rows.
So I tried parsing my script by each day which it seemed to be able to handle.
However when handling some days where the data was a slightly larger (7 billion rows), it gave me a
executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
error. This post by Jumpman solved the problem by simply extending the spark.dynamicAllocation.executorIdleTimeout value
So thank you #Emerson and #Jumpman!
Related
I am trying to execute my spark scala application on an AWS EMR cluster, by creating a step Application Spark.
My Cluster contains 4 m3.xlarge
I start my application using this command :
spark-submit --deploy-mode cluster --class Main s3://mybucket/myjar_2.11-0.1.jar s3n://oc-mybucket/folder arg1 arg2
My application takes 3 parameters, the first one is a folder.
Unfortunatly after starting the application I see that only one Executor (+the master) who are active and I have 3 Executors dead, so all tasks are working only on the first one. see image
I tried many ways to activate thoses excutor but without any result ( "spark.default.parallelism, ""spark.executor.instances" and "spark.executor.cores"). What should I do so all the executor would be active and processing data ?
Also, when looking at Ganglia i have cpu always under 35%, is there a way so the cpu would be wokring more than 75% ?
thank you
UPDTAE
this is the stderr content of dead executors
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/filecache/14/__spark_libs__3671437061469038073.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
20/08/15 23:28:56 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 14765#ip-172-31-39-255
20/08/15 23:28:56 INFO SignalUtils: Registered signal handler for TERM
20/08/15 23:28:56 INFO SignalUtils: Registered signal handler for HUP
20/08/15 23:28:56 INFO SignalUtils: Registered signal handler for INT
20/08/15 23:28:57 INFO SecurityManager: Changing view acls to: yarn,hadoop
20/08/15 23:28:57 INFO SecurityManager: Changing modify acls to: yarn,hadoop
20/08/15 23:28:57 INFO SecurityManager: Changing view acls groups to:
20/08/15 23:28:57 INFO SecurityManager: Changing modify acls groups to:
20/08/15 23:28:57 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hadoop); groups with view permissions: Set(); users with modify permissions: Set(yarn, hadoop); groups with modify permissions: Set()
20/08/15 23:28:58 INFO TransportClientFactory: Successfully created connection to ip-172-31-36-83.eu-west-1.compute.internal/172.31.36.83:37115 after 186 ms (0 ms spent in bootstraps)
20/08/15 23:28:58 INFO SecurityManager: Changing view acls to: yarn,hadoop
20/08/15 23:28:58 INFO SecurityManager: Changing modify acls to: yarn,hadoop
20/08/15 23:28:58 INFO SecurityManager: Changing view acls groups to:
20/08/15 23:28:58 INFO SecurityManager: Changing modify acls groups to:
20/08/15 23:28:58 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hadoop); groups with view permissions: Set(); users with modify permissions: Set(yarn, hadoop); groups with modify permissions: Set()
20/08/15 23:28:58 INFO TransportClientFactory: Successfully created connection to ip-172-31-36-83.eu-west-1.compute.internal/172.31.36.83:37115 after 2 ms (0 ms spent in bootstraps)
20/08/15 23:28:58 INFO DiskBlockManager: Created local directory at /mnt1/yarn/usercache/hadoop/appcache/application_1597532473783_0002/blockmgr-d0d258ba-4345-45d1-9279-f6a97b63f81c
20/08/15 23:28:58 INFO DiskBlockManager: Created local directory at /mnt/yarn/usercache/hadoop/appcache/application_1597532473783_0002/blockmgr-e7ae1e29-85fa-4df9-acf1-f9923f0664bc
20/08/15 23:28:58 INFO MemoryStore: MemoryStore started with capacity 2.6 GB
20/08/15 23:28:59 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler#ip-172-31-36-83.eu-west-1.compute.internal:37115
20/08/15 23:28:59 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
20/08/15 23:28:59 INFO Executor: Starting executor ID 3 on host ip-172-31-39-255.eu-west-1.compute.internal
20/08/15 23:28:59 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40501.
20/08/15 23:28:59 INFO NettyBlockTransferService: Server created on ip-172-31-39-255.eu-west-1.compute.internal:40501
20/08/15 23:28:59 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/08/15 23:29:00 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(3, ip-172-31-39-255.eu-west-1.compute.internal, 40501, None)
20/08/15 23:29:00 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(3, ip-172-31-39-255.eu-west-1.compute.internal, 40501, None)
20/08/15 23:29:00 INFO BlockManager: external shuffle service port = 7337
20/08/15 23:29:00 INFO BlockManager: Registering executor with local external shuffle service.
20/08/15 23:29:00 INFO TransportClientFactory: Successfully created connection to ip-172-31-39-255.eu-west-1.compute.internal/172.31.39.255:7337 after 20 ms (0 ms spent in bootstraps)
20/08/15 23:29:00 INFO BlockManager: Initialized BlockManager: BlockManagerId(3, ip-172-31-39-255.eu-west-1.compute.internal, 40501, None)
20/08/15 23:29:03 INFO CoarseGrainedExecutorBackend: eagerFSInit: Eagerly initialized FileSystem at s3://does/not/exist in 3363 ms
20/08/15 23:30:02 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
20/08/15 23:30:02 INFO DiskBlockManager: Shutdown hook called
20/08/15 23:30:02 INFO ShutdownHookManager: Shutdown hook called
is this problem have to be with memory ?
You dont use all executors by default by spark-submit, you can specify the number of executors --num-executors, executor-core and executor-memory.
For instance, to increase the executors(which by default are 2)
spark-submit --num-executors N #where N is desired number of executors like 5,10,50
See example in docs here
If it doesnt help or overides with spark-submit, you can override spark.executor.instances in conf/spark-defaults.conf file or similar so you don't have to specify it explicitly on the command line
For CPU utilization, you should look into executor-core and executor-core and either change them in spark-submit or conf. Increasing cpu cores will increase the usage hopefully.
Update:
As pointed by #Lamanus and I double checked, emr greater than 4.4 have spark.dynamicAllocation.enabled set to true, I suggest you to double check the partitions of your data, since having Dynamic allocation enabled the number of executor instances depends on by number of partitions, which vary according to stage in DAG execution. Also, with dynamic allocation, you can try out spark.dynamicAllocation.initialExecutors , spark.dynamicAllocation.maxExecutors , spark.dynamicAllocation.maxExecutors to control executors.
This maybe a bit late but I found this AWS Big Data blog insightful to ensure that most of my cluster is utilised and that I'm able to achieve as much parallelism as possible.
https://aws.amazon.com/blogs/big-data/best-practices-for-successfully-managing-memory-for-apache-spark-applications-on-amazon-emr/
More specifically:
Number of executors per instance = (total number of virtual cores per
instance - 1)/ spark.executors.cores
Total executor memory = total RAM per instance / number of executors
per instance
You can then control the number of parallel tasks during stages using spark.default.parallelism or repartitioning.
I'm trying to configure one Siddhi application on WSO2 Stream Processor with two sources (both are files) but that doesn't work (one source works fine). However, even when I split the application in to Siddhi application that doesn't work too. There logs in both situations are the same as below:
[2018-01-25 08:51:20,583] INFO {org.quartz.impl.StdSchedulerFactory} - Using default implementation for ThreadExecutor
[2018-01-25 08:51:20,586] INFO {org.quartz.simpl.SimpleThreadPool} - Job execution threads will use class loader of thread: Timer-0
[2018-01-25 08:51:20,599] INFO {org.quartz.core.SchedulerSignalerImpl} - Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl
[2018-01-25 08:51:20,599] INFO {org.quartz.core.QuartzScheduler} - Quartz Scheduler v.2.3.0 created.
[2018-01-25 08:51:20,600] INFO {org.quartz.simpl.RAMJobStore} - RAMJobStore initialized.
[2018-01-25 08:51:20,601] INFO {org.quartz.core.QuartzScheduler} - Scheduler meta-data: Quartz Scheduler (v2.3.0) 'polling-task-runner' with instanceId 'NON_CLUSTERED'
Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally.
NOT STARTED.
Currently in standby mode.
Number of jobs executed: 0
Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 1 threads.
Using job-store 'org.quartz.simpl.RAMJobStore' - which does not support persistence. and is not clustered.
[2018-01-25 08:51:20,601] INFO {org.quartz.impl.StdSchedulerFactory} - Quartz scheduler 'polling-task-runner' initialized from an externally provided properties instance.
[2018-01-25 08:51:20,601] INFO {org.quartz.impl.StdSchedulerFactory} - Quartz scheduler version: 2.3.0
[2018-01-25 08:51:20,601] INFO {org.quartz.core.QuartzScheduler} - Scheduler polling-task-runner_$_NON_CLUSTERED started.
[2018-01-25 08:51:20,604] INFO {org.quartz.core.QuartzScheduler} - Scheduler polling-task-runner_$_NON_CLUSTERED started.
[2018-01-25 08:51:20,605] ERROR {org.wso2.carbon.connector.framework.server.polling.PollingTaskRunner} - Exception occurred while scheduling job org.quartz.ObjectAlreadyExistsException: Unable to store Trigger with name: 'scheduledPoll' and group: 'group1', because one already exists with this identification.
at org.quartz.simpl.RAMJobStore.storeTrigger(RAMJobStore.java:415)
at org.quartz.simpl.RAMJobStore.storeJobAndTrigger(RAMJobStore.java:252)
at org.quartz.core.QuartzScheduler.scheduleJob(QuartzScheduler.java:855)
at org.quartz.impl.StdScheduler.scheduleJob(StdScheduler.java:249)
at org.wso2.carbon.connector.framework.server.polling.PollingTaskRunner.start(PollingTaskRunner.java:74)
at org.wso2.carbon.connector.framework.server.polling.PollingServerConnector.start(PollingServerConnector.java:57)
at org.wso2.carbon.transport.remotefilesystem.server.connector.contractimpl.RemoteFileSystemServerConnectorImpl.start(RemoteFileSystemServerConnectorImpl.java:75)
at org.wso2.extension.siddhi.io.file.FileSource.deployServers(FileSource.java:537)
at org.wso2.extension.siddhi.io.file.FileSource.connect(FileSource.java:370)
at org.wso2.siddhi.core.stream.input.source.Source.connectWithRetry(Source.java:130)
at org.wso2.siddhi.core.SiddhiAppRuntime.start(SiddhiAppRuntime.java:335)
at org.wso2.carbon.stream.processor.core.internal.StreamProcessorService.deploySiddhiApp(StreamProcessorService.java:280)
at org.wso2.carbon.stream.processor.core.internal.StreamProcessorDeployer.deploySiddhiQLFile(StreamProcessorDeployer.java:81)
at org.wso2.carbon.stream.processor.core.internal.StreamProcessorDeployer.deploy(StreamProcessorDeployer.java:170)
at org.wso2.carbon.deployment.engine.internal.DeploymentEngine.lambda$deployArtifacts$0(DeploymentEngine.java:291)
at java.util.ArrayList.forEach(ArrayList.java:1249)
at org.wso2.carbon.deployment.engine.internal.DeploymentEngine.deployArtifacts(DeploymentEngine.java:282)
at org.wso2.carbon.deployment.engine.internal.RepositoryScanner.sweep(RepositoryScanner.java:112)
at org.wso2.carbon.deployment.engine.internal.RepositoryScanner.scan(RepositoryScanner.java:68)
at org.wso2.carbon.deployment.engine.internal.DeploymentEngine.start(DeploymentEngine.java:121)
at org.wso2.carbon.deployment.engine.internal.DeploymentEngineListenerComponent.onAllRequiredCapabilitiesAvailable(DeploymentEngineListenerComponent.java:216)
at org.wso2.carbon.kernel.internal.startupresolver.StartupComponentManager.lambda$notifySatisfiableComponents$7(StartupComponentManager.java:266)
at java.util.ArrayList.forEach(ArrayList.java:1249)
at org.wso2.carbon.kernel.internal.startupresolver.StartupComponentManager.notifySatisfiableComponents(StartupComponentManager.java:252)
at org.wso2.carbon.kernel.internal.startupresolver.StartupOrderResolver$1.run(StartupOrderResolver.java:204)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Anybody can throw an idea on how to overcome this?
Thanks.
Thanks for pointing this issue out.
Seems like this is occurred due to scheduling two polling tasks with the same id.
I have created an issue for this in git repository[1]. The fix will be shipped with an update soon.
[1] https://github.com/wso2/product-sp/issues/463
Best Regards!
The following ERROR is logged on the gateway worker nodes on start-up.
2016-08-23 12:32:42,344 [-] [Timer-5] ERROR KeyTemplateRetriever Exception when retrieving throttling data from remote endpoint
Unexpected character (<) at position 0.
at org.json.simple.parser.Yylex.yylex(Unknown Source)
at org.json.simple.parser.JSONParser.nextToken(Unknown Source)
at org.json.simple.parser.JSONParser.parse(Unknown Source)
at org.json.simple.parser.JSONParser.parse(Unknown Source)
at org.json.simple.parser.JSONParser.parse(Unknown Source)
at org.wso2.carbon.apimgt.gateway.throttling.util.KeyTemplateRetriever.retrieveKeyTemplateData(KeyTemplateRetriever.java:100)
at org.wso2.carbon.apimgt.gateway.throttling.util.KeyTemplateRetriever.loadKeyTemplatesFromWebService(KeyTemplateRetriever.java:111)
at org.wso2.carbon.apimgt.gateway.throttling.util.KeyTemplateRetriever.run(KeyTemplateRetriever.java:54)
at java.util.TimerThread.mainLoop(Timer.java:555)
at java.util.TimerThread.run(Timer.java:505)
Despite the error the gateway worker nodes startup and the environment can be used to successfully invoke a sample API.
All the apim nodes, bar the traffic manager, however report these warnings
2016-08-22 16:40:56,652 [-] [Timer-5] WARN KeyTemplateRetriever Failed retrieving throttling data from remote endpoint: Connection refused. Retrying after 15 seconds...
2016-08-22 16:40:56,653 [-] [Timer-4] WARN BlockingConditionRetriever Failed retrieving Blocking Conditions from remote endpoint: Connection refused. Retrying after 15 seconds...
Environment:
APIM 2.0.0 cluster
publisher (default profile)
store (default profile)
gw manager and 2 gw workers (default profiles)
traffic manager (using traffic-manager profile)
Database: MariaDB Server, wsrep_25.10.r4144
Userstore : Read/write LDAP
JVM: java version "1.8.0_92"
OS: CentOS Linux release 7.0.1406 (Core)
n.b. key manager un-configured using default pack settings
If you disable Advanced Throttling in api-manager.xml like below, that error will go away. If you enable that, it requires a key manager node.
<EnableAdvanceThrottling>false</EnableAdvanceThrottling>
I encountered the issue recently and the issue was throttle#data#v1.war (repository/deployment/server/webapps/throttle#data#v1.war) hasn't been deployed at the time worker starts up.
If you have a distributed AM 2.0 deployment make sure Keymanager is up and throttle#data#v1.war is deployed in keymanager before worker srartup..
It's a really basic setup, I am using slf4j-simple
I have the following route:
get("/fail", (req, res) -> {
throw new RuntimeException("fail");
}
);
As expected, it throws 500 internal error.
However, the logs show nothing about this. How can I get these bubbled exceptions to log?
These are the only logs I see:
[Thread-0] INFO org.eclipse.jetty.util.log - Logging initialized #164ms
[Thread-0] INFO spark.embeddedserver.jetty.EmbeddedJettyServer - == Spark has ignited ...
[Thread-0] INFO spark.embeddedserver.jetty.EmbeddedJettyServer - >> Listening on 0.0.0.0:4567
[Thread-0] INFO org.eclipse.jetty.server.Server - jetty-9.3.z-SNAPSHOT
[Thread-0] INFO org.eclipse.jetty.server.ServerConnector - Started ServerConnector#35eae602{HTTP/1.1,[http/1.1]}{0.0.0.0:4567}
[Thread-0] INFO org.eclipse.jetty.server.Server - Started #259ms
You may implement your log operation inside the Exception Mapper.
I followed the official guide to set up a cluster (Clustering AS 5.3.0) (https://docs.wso2.com/display/CLUSTER420/Setting+up+a+Cluster).
But eventually, I could not reach the management page with https://localhost:9443/carbon, and the
Manager Node (10.13.46.34): (with some Error when passing date, i still dont know how to fix)
wso2server -Dsetup
[05-10 11:58:29]ERROR {org.wso2.carbon.registry.indexing.solr.SolrClient}-Error when passing date to create solr date format.java.text.ParseException: Unparseable date: "Tue May 03 17:35:
14 CST 2016"
[05-10 12:01:04]INFO {org.wso2.carbon.core.clustering.hazelcast.wka.WKABasedMembershipScheme}-Member joined [a9402117-a832-4eb6-b563-a58949ff784e]: /10.0.34.41:4200
[05-10 12:01:06]INFO {org.wso2.carbon.core.clustering.hazelcast.util.MemberUtils}-Added member: Host:10.0.34.41, Remote Host:null, Port: 4200, HTTP:9763, HTTPS:9443, Domain: wso2.as.doma
in, Sub-domain:worker, Active:true
[05-10 12:03:31]INFO {org.wso2.carbon.core.services.util.CarbonAuthenticationUtil}-'admin#carbon.super [-1234]' logged in at [2016-05-10 12:03:31,999+0800]
Worker node(10.0.34.44):
wso2server.bat -DworkerNode=true
......
......
[05-10 12:01:25]INFO {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent}-Server :Application Server-5.3.0
[05-10 12:01:25]INFO {org.wso2.carbon.core.internal.StartupFinalizerServiceComponent}-WSO2 Carbon started in 88 sec
[05-10 12:01:26]INFO {org.wso2.carbon.ui.internal.CarbonUIServiceComponent} - Mgt Console URL : https://10.0.34.44:9443/carbon/
[05-10 12:02:20]INFO {org.wso2.carbon.core.services.util.CarbonAuthenticationUtil} - 'admin#carbon.super [-1234]' logged in at [2016-05-10 12:02:20,817+0800]
i can successfully login Manager-node's mgt console ('https://10.13.46.34:9443/carbon/')
but fail to login Worker-node's mgt console (https://10.0.34.44:9443/carbon/)
So, anyone can tell me how Manager-node's console page to list out a set of application servers? because I want to manage all node together.
And how to deploy a web-application to all nodes in this AS cluster environment.
thanks!
When you start the WSO2 AS node with -DworkerNode=true then you can't access the UI. Because normally worker nodes are use to serve the requests. therefore worker profile doesnt cantains ui features
According to your comment you are having a one manager node and one worker node. You can use deployment synchronizer to deploy webapps in worker nodes. Basically what is happening through that is when you deploy a webapp in the management node it will be commit to a svn location and worker node will checkout that. So worker node also get a copy of the app.
You can refer to https://docs.wso2.com/display/CLUSTER44x/Configuring+SVN-Based+Deployment+Synchronizer for more details and setup
Or simply you can copy the war file to the repository/deployment/server/webapp folder manually in the worker node.