Spark on Kubernetes: How to improve my performance? - amazon-web-services
I am running a Spark App over a Kubernetes cluster (at the moment, I do not have permission to resize or rescale this cluster). I think I have mostly 3 important issues that are impacting my performance and I would like to ask to this community if there is something that I could do to make it better. I am going to order them according to the priorities that I set myself.
1- I do not have permissions on K8s cluster, and I think the configuration that I set for my Spark App is not taking effect. It is just a guess because I do not have much experience on K8s. So, I have configured my Spark App like this:
return SparkSession.builder \
.config("spark.executor.instances", "5") \
.config("spark.executor.cores", "4") \
.config("spark.driver.cores", "2") \
.config("spark.kubernetes.executor.request.cores", 3.5) \
.config("spark.executor.memory", "10g") \
.config("spark.driver.memory", "6g") \
.config("spark.sql.broadcastTimeout", "600") \
.config("spark.memory.fraction", 0.2) \
.config("spark.kubernetes.memoryOverheadFactor", 0.2) \
But when the pod is created, I get this from LOG:
'containers': [{'args': ['/bin/bash',
'-c',
'spark-submit --master '
'"k8s://https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_PORT_443_TCP_PORT}" '
'--deploy-mode client --name "${POD_NAME}" '
'--conf "spark.driver.host=${POD_IP}" '
'--conf spark.driver.memory=40g --conf '
'spark.driver.memoryOverhead=0.4 --conf '
'spark.eventLog.dir=s3a://*****-spark/history/ireland '
'--conf spark.eventLog.enabled=true --conf '
Driver's memory is 40gb instead of 6gb, Driver's memoryOverhead is 0.4 instead pf 0.2. So, when I see this I get confused and I am not sure if the number of cores/executor and executor's memory is applied like I configured.
2- The last task of this Spark App writes a Dataframe into a Hive Table (format parquet) on a S3 Bucket.
results.withColumn("make", F.col("name_make_dict")).filter(F.col("name_mod1_dict").isNotNull()) \
.select(*json.loads(parser.get("config", "formatMatched"))) \
.withColumnRenamed("name_mod1_dict", "name_model_dict") \
.repartition(30, "inserted_at_date", "brand", "make") \
.write.insertInto(f"{schema}.cars_features")
And I have checked the LOG:
[2022-04-16 10:32:23,040] {pod_launcher.py:156} INFO - b'22/04/16 10:32:23 INFO CodeGenerator: Code generated in 35.851449 ms\n'
[2022-04-16 10:32:23,097] {pod_launcher.py:156} INFO - b'22/04/16 10:32:23 INFO SparkContext: Starting job: insertInto at NativeMethodAccessorImpl.java:0\n'
[2022-04-16 10:32:23,100] {pod_launcher.py:156} INFO - b'22/04/16 10:32:23 INFO DAGScheduler: Registering RDD 146 (insertInto at NativeMethodAccessorImpl.java:0) as input to shuffle 9\n'
[2022-04-16 10:32:23,100] {pod_launcher.py:156} INFO - b'22/04/16 10:32:23 INFO DAGScheduler: Registering RDD 149 (insertInto at NativeMethodAccessorImpl.java:0) as input to shuffle 10\n'
[2022-04-16 10:32:23,100] {pod_launcher.py:156} INFO - b'22/04/16 10:32:23 INFO DAGScheduler: Got job 18 (insertInto at NativeMethodAccessorImpl.java:0) with 30 output partitions\n'
and it finishes very fast:
[2022-04-16 10:33:02,044] {pod_launcher.py:156} INFO - b'22/04/16 10:33:02 INFO TaskSchedulerImpl: Removed TaskSet 34.0, whose tasks have all completed, from pool \n'
[2022-04-16 10:33:02,048] {pod_launcher.py:156} INFO - b'22/04/16 10:33:02 INFO DAGScheduler: ResultStage 34 (insertInto at NativeMethodAccessorImpl.java:0) finished in 26.634 s\n'
[2022-04-16 10:33:02,048] {pod_launcher.py:156} INFO - b'22/04/16 10:33:02 INFO DAGScheduler: Job 18 finished: insertInto at NativeMethodAccessorImpl.java:0, took 38.948620 s\n'
but then it starts like this and takes like 10 minutes to write files into S3:
[2022-04-16 10:33:05,305] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:33:10,321] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:33:15,337] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:33:20,354] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:33:25,370] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:33:30,383] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:33:35,399] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:33:40,416] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:33:45,432] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:33:50,450] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:33:55,466] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:34:00,482] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:34:05,498] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:34:10,513] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:34:15,527] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:34:20,541] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:34:25,559] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:34:30,578] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:34:35,593] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:34:40,614] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:34:45,628] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:34:50,645] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:34:55,662] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:35:00,679] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:35:05,690] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:35:10,705] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:35:15,722] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:35:20,740] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:35:25,757] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:35:30,775] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:35:35,788] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:35:40,799] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:35:45,817] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:35:50,834] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:35:55,852] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:36:00,867] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:36:05,889] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:36:10,904] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:36:15,913] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:36:20,926] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:36:25,941] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:36:30,957] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:36:33,949] {pod_launcher.py:156} INFO - b'22/04/16 10:36:33 INFO FileFormatWriter: Write Job eefcf8a0-9ba9-4752-a51e-f4bc27e5ffcc committed.\n'
[2022-04-16 10:36:33,954] {pod_launcher.py:156} INFO - b'22/04/16 10:36:33 INFO FileFormatWriter: Finished processing stats for write job eefcf8a0-9ba9-4752-a51e-f4bc27e5ffcc.\n'
[2022-04-16 10:36:35,972] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:36:40,991] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:36:46,006] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:36:51,022] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:36:56,038] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:37:01,057] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:37:06,072] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:37:10,856] {pod_launcher.py:156} INFO - b"22/04/16 10:37:10 INFO FileUtils: Creating directory if it doesn't exist: s3a://*********/local/odyn/cars_pricing_cvt/cars_features/inserted_at_date=2022-04-16/brand=autovit/make=JAGUAR\n"
[2022-04-16 10:37:11,089] {base_job.py:197} DEBUG - [heartbeat]
[2022-04-16 10:37:12,166] {pod_launcher.py:156} INFO - b'22/04/16 10:37:12 INFO SessionState: Could not get hdfsEncryptionShim, it is only applicable to hdfs filesystem.\n'
[2022-04-16 10:37:12,655] {pod_launcher.py:156} INFO - b'22/04/16 10:37:12 INFO Hive: Renaming src: s3a://*********/local/odyn/cars_pricing_cvt/cars_features/.hive-staging_hive_2022-04-16_10-26-50_235_365683970089316725-1/-ext-10000/inserted_at_date=2022-04-16/brand=autovit/make=JAGUAR/part-00013-87f53444-2a15-4a4c-b7bc-0b3304ba9bb7.c000, dest: s3a://*********/local/odyn/cars_pricing_cvt/cars_features/inserted_at_date=2022-04-16/brand=autovit/make=JAGUAR/part-00013-87f53444-2a15-4a4c-b7bc-0b3304ba9bb7.c000, Status:true\n'
[2022-04-16 10:37:12,967] {pod_launcher.py:156} INFO - b'22/04/16 10:37:12 INFO Hive: New loading path = s3a://*********/local/odyn/cars_pricing_cvt/cars_features/.hive-staging_hive_2022-04-16_10-26-50_235_365683970089316725-1/-ext-10000/inserted_at_date=2022-04-16/brand=autovit/make=JAGUAR with partSpec {inserted_at_date=2022-04-16, brand=autovit, make=JAGUAR}\n'
[2022-04-16 10:37:13,045] {pod_launcher.py:156} INFO - b"22/04/16 10:37:13 INFO FileUtils: Creating directory if it doesn't exist: s3a://*********/local/odyn/cars_pricing_cvt/cars_features/inserted_at_date=2022-04-16/brand=gratka/make=LANCIA\n"
[2022-04-16 10:37:14,065] {pod_launcher.py:156} INFO - b'22/04/16 10:37:14 INFO SessionState: Could not get hdfsEncryptionShim, it is only applicable to hdfs filesystem.\n'
[2022-04-16 10:37:14,576] {pod_launcher.py:156} INFO - b'22/04/16 10:37:14 INFO Hive: Renaming src: s3a://*********/local/odyn/cars_pricing_cvt/cars_features/.hive-staging_hive_2022-04-16_10-26-50_235_365683970089316725-1/-ext-10000/inserted_at_date=2022-04-16/brand=gratka/make=LANCIA/part-00009-87f53444-2a15-4a4c-b7bc-0b3304ba9bb7.c000, dest: s3a://*********/local/odyn/cars_pricing_cvt/cars_features/inserted_at_date=2022-04-16/brand=gratka/make=LANCIA/part-00009-87f53444-2a15-4a4c-b7bc-0b3304ba9bb7.c000, Status:true\n'
[2022-04-16 10:37:14,803] {pod_launcher.py:156} INFO - b'22/04/16 10:37:14 INFO Hive: New loading path = s3a://*********/local/odyn/cars_pricing_cvt/cars_features/.hive-staging_hive_2022-04-16_10-26-50_235_365683970089316725-1/-ext-10000/inserted_at_date=2022-04-16/brand=gratka/make=LANCIA with partSpec {inserted_at_date=2022-04-16, brand=gratka, make=LANCIA}\n'
[2022-04-16 10:37:14,870] {pod_launcher.py:156} INFO - b"22/04/16 10:37:14 INFO FileUtils: Creating directory if it doesn't exist: s3a://*********/local/odyn/cars_pricing_cvt/cars_features/inserted_at_date=2022-04-16/brand=otomoto/make=MG\n"
[2022-04-16 10:37:15,808] {pod_launcher.py:156} INFO - b'22/04/16 10:37:15 INFO SessionState: Could not get hdfsEncryptionShim, it is only applicable to hdfs filesystem.\n'
[2022-04-16 10:37:16,106] {base_job.py:197} DEBUG - [heartbeat]
I have tried with repartition(30, ...) as you can see above and with coalesce(1) but the write speed is the same. I do not know why this is happening and if it is an expected behaviour. I have checked this:
Extremely slow S3 write times from EMR/ Spark
Spark s3 write (s3 vs s3a connectors)
Spark Write to S3 Storage Option
3- When I am reading some tables from Hive on S3 like this (for one day is 40.4 MB->2467 objects):
sparkSession \
.sql(
f"""select * from {schema}.f_car_info where inserted_at_date = to_date('{date}', "yyyy-MM-dd")""") \
.select(F.col("id_ad"), F.col("name_make_ad"), F.col("name_make_dict"), F.col("name_model_ad"),
F.col("name_version_ad"), F.col("name_generation_ad"), F.col("prod_year_ad"), F.col("brand"))
I get this from LOG:
[2022-04-16 10:25:38,271] {pod_launcher.py:156} INFO - b'22/04/16 10:25:38 INFO InMemoryFileIndex: Listing leaf files and directories in parallel under 352 paths. The first several paths are: s3a://*********/local/odyn/cars_pricing_cvt/f_car_info/inserted_at_date=2022-04-16/brand=autovit/name_make_dict=ABARTH, s3a://*********/local/odyn/cars_pricing_cvt/f_car_info/inserted_at_date=2022-04-16/brand=autovit/name_make_dict=ACURA, s3a://*********/local/odyn/cars_pricing_cvt/f_car_info/inserted_at_date=2022-04-16/brand=autovit/name_make_dict=ALFA ROMEO, s3a://*********/local/odyn/cars_pricing_cvt/f_car_info/inserted_at_date=2022-04-16/brand=autovit/name_make_dict=ASTON MARTIN, s3a://*********/local/odyn/cars_pricing_cvt/f_car_info/inserted_at_date=2022-04-16/brand=autovit/name_make_dict=AUDI, s3a://*********/local/odyn/cars_pricing_cvt/f_car_info/inserted_at_date=2022-04-16/brand=autovit/name_make_dict=AUSTIN, s3a://*********/local/odyn/cars_pricing_cvt/f_car_info/inserted_at_date=2022-04-16/brand=autovit/name_make_dict=BENTLEY, s3a://*********/local/odyn/cars_pricing_cvt/f_car_info/inserted_at_date=2022-04-16/brand=autovit/name_make_dict=BMW, s3a://*********/local/odyn/cars_pricing_cvt/f_car_info/inserted_at_date=2022-04-16/brand=autovit/name_make_dict=BUICK, s3a://*********/local/odyn/cars_pricing_cvt/f_car_info/inserted_at_date=2022-04-16/brand=autovit/name_make_dict=CADILLAC.\n'
[2022-04-16 10:25:38,564] {pod_launcher.py:156} INFO - b'22/04/16 10:25:38 INFO SparkContext: Starting job: persist at NativeMethodAccessorImpl.java:0\n'
[2022-04-16 10:25:38,584] {pod_launcher.py:156} INFO - b'22/04/16 10:25:38 INFO DAGScheduler: Got job 0 (persist at NativeMethodAccessorImpl.java:0) with 352 output partitions\n'
[2022-04-16 10:25:38,587] {pod_launcher.py:156} INFO - b'22/04/16 10:25:38 INFO DAGScheduler: Final stage: ResultStage 0 (persist at NativeMethodAccessorImpl.java:0)\n'
[2022-04-16 10:25:38,587] {pod_launcher.py:156} INFO - b'22/04/16 10:25:38 INFO DAGScheduler: Parents of final stage: List()\n'
[2022-04-16 10:25:38,587] {pod_launcher.py:156} INFO - b'22/04/16 10:25:38 INFO DAGScheduler: Missing parents: List()\n'
[2022-04-16 10:25:38,593] {pod_launcher.py:156} INFO - b'22/04/16 10:25:38 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at persist at NativeMethodAccessorImpl.java:0), which has no missing parents\n'
[2022-04-16 10:25:38,689] {pod_launcher.py:156} INFO - b'22/04/16 10:25:38 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 88.3 KB, free 7.7 GB)\n'
[2022-04-16 10:25:38,721] {pod_launcher.py:156} INFO - b'22/04/16 10:25:38 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 31.9 KB, free 7.7 GB)\n'
[2022-04-16 10:25:38,724] {pod_launcher.py:156} INFO - b'22/04/16 10:25:38 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 100.126.80.1:34339 (size: 31.9 KB, free: 7.7 GB)\n'
[2022-04-16 10:25:38,727] {pod_launcher.py:156} INFO - b'22/04/16 10:25:38 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1184\n'
[2022-04-16 10:25:38,749] {pod_launcher.py:156} INFO - b'22/04/16 10:25:38 INFO DAGScheduler: Submitting 352 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at persist at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14))\n'
[2022-04-16 10:25:38,753] {pod_launcher.py:156} INFO - b'22/04/16 10:25:38 INFO TaskSchedulerImpl: Adding task set 0.0 with 352 tasks\n'
[2022-04-16 10:25:38,794] {pod_launcher.py:156} INFO - b'22/04/16 10:25:38 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 100.126.159.37, executor 1, partition 0, PROCESS_LOCAL, 7494 bytes)\n'
[2022-04-16 10:25:45,006] {pod_launcher.py:156} INFO - b'22/04/16 10:25:45 INFO TaskSetManager: Finished task 350.0 in stage 0.0 (TID 350) in 107 ms on 100.126.111.179 (executor 5) (352/352)\n'
[2022-04-16 10:25:45,009] {pod_launcher.py:156} INFO - b'22/04/16 10:25:45 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool \n'
[2022-04-16 10:25:45,009] {pod_launcher.py:156} INFO - b'22/04/16 10:25:45 INFO DAGScheduler: ResultStage 0 (persist at NativeMethodAccessorImpl.java:0) finished in 6.371 s\n'
[2022-04-16 10:25:45,015] {pod_launcher.py:156} INFO - b'22/04/16 10:25:45 INFO DAGScheduler: Job 0 finished: persist at NativeMethodAccessorImpl.java:0, took 6.451078 s\n'
[2022-04-16 10:25:45,121] {pod_launcher.py:156} INFO - b'22/04/16 10:25:45 INFO PrunedInMemoryFileIndex: It took 6916 ms to list leaf files for 352 paths.\n'
Yes, they are 352! tasks. I imagine when I have much more files in a table in future. it will impact the read time. I have checked this:
Why so many tasks in my spark job? Getting 200 Tasks By Default
and used sqlContext.setConf("spark.sql.shuffle.partitions", "15”) but no changes.
Could you please give me some ideas on these 3 issues?
Many thanks!!!
Hey looks like you are working on some cool stuff! I can't comment on #2 and #3 but for #1 I can probably shed some light. I haven't really used Spark.
My guess is for Spark specifying fields at runtime override whatever you are trying to do with SparkSession.builder in your code.
Those overriding runtime args can be at either the container image level OR the kubernetes pod configuration level. Since you did not share that info it is hard for me to try and figure out which one is your problem, but my guess is it at the pod definition level, these can override the container image settings.
Kubernetes Pod Definition Level
For example in a pod definition (or in a deployment) check this out:
https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/
Example Kubernetes Pod Definition
apiVersion: v1
kind: Pod
metadata:
name: Example
spec:
containers:
- name: <container name>
image: <your image>
command:
- '/bin/bash'
args:
- '-c'
- 'spark-submit --master '
- '"k8s://https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_PORT_443_TCP_PORT}" '
- '--deploy-mode client --name "${POD_NAME}" '
- '--conf "spark.driver.host=${POD_IP}" '
- '--conf spark.driver.memory=40g --conf '
- 'spark.driver.memoryOverhead=0.4 --conf '
- 'spark.eventLog.dir=s3a://*****-spark/history/ireland '
- '--conf spark.eventLog.enabled=true --conf '
You could then change the args section of this yaml pod definition to be what you want. I hope this helps you out! or at least points you in the right direction.
Related
EMR Core nodes are not taking up map reduce jobs
I have a 2 node EMR (Version 4.6.0) Cluster (1 master (m4.large) , 1 core (r4.xlarge) ) with HBase installed. I'm using default EMR configurations. I want to export HBase tables using hbase org.apache.hadoop.hbase.mapreduce.Export -D hbase.mapreduce.include.deleted.rows=true Table_Name hdfs:/full_backup/Table_Name 1 I'm getting the following error 2022-04-04 11:29:20,626 INFO [main] util.RegionSizeCalculator: Calculating region sizes for table "Table_Name". 2022-04-04 11:29:20,900 INFO [main] client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService 2022-04-04 11:29:20,900 INFO [main] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x17ff27095680070 2022-04-04 11:29:20,903 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x17ff27095680070 2022-04-04 11:29:20,904 INFO [main] zookeeper.ZooKeeper: Session: 0x17ff27095680070 closed 2022-04-04 11:29:20,980 INFO [main] mapreduce.JobSubmitter: number of splits:1 2022-04-04 11:29:20,994 INFO [main] Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum 2022-04-04 11:29:21,192 INFO [main] mapreduce.JobSubmitter: Submitting tokens for job: job_1649071534731_0002 2022-04-04 11:29:21,424 INFO [main] impl.YarnClientImpl: Submitted application application_1649071534731_0002 2022-04-04 11:29:21,454 INFO [main] mapreduce.Job: The url to track the job: http://ip-10-0-2-244.eu-west-1.compute.internal:20888/proxy/application_1649071534731_0002/ 2022-04-04 11:29:21,455 INFO [main] mapreduce.Job: Running job: job_1649071534731_0002 2022-04-04 11:29:28,541 INFO [main] mapreduce.Job: Job job_1649071534731_0002 running in uber mode : false 2022-04-04 11:29:28,542 INFO [main] mapreduce.Job: map 0% reduce 0% It is stuck at this progress and not running. However when I add a task node and redo the same command, it gets finished within seconds. Based on the documentation, https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-master-core-task-nodes.html , core node itself should handle tasks as well. What could be going wrong?
Why doesn't Dask dashboard update when I run some code?
I'm trying to recreate the behaviour of the Dask dashboard as illustrated in this Youtube video https://www.youtube.com/watch?time_continue=1086&v=N_GqzcuGLCY. I can see my dashboard, but it doesn't update when I run a computation. I'm running everything on my local machine (Kubuntu 18.04). I used anaconda to set up my environment, including python 2.7.14 dask 0.17.4 dask-core 0.17.4 bokeh 1.0.4 tornado 4.5.1 I set up my scheduler from the command line dask-scheduler distributed.scheduler - INFO - ----------------------------------------------- distributed.scheduler - INFO - Clear task state distributed.scheduler - INFO - Scheduler at: tcp://192.168.1.204:8786 distributed.scheduler - INFO - bokeh at: :8787 distributed.scheduler - INFO - Local Directory: /tmp/scheduler-bYQe2p distributed.scheduler - INFO - ----------------------------------------------- distributed.scheduler - INFO - Register tcp://127.0.0.1:35007 distributed.scheduler - INFO - Starting worker compute stream, tcp://127.0.0.1:35007 ...and a worker too. dask-worker localhost:8786 distributed.nanny - INFO - Start Nanny at: 'tcp://127.0.0.1:36345' distributed.worker - INFO - Start worker at: tcp://127.0.0.1:44033 distributed.worker - INFO - Listening to: tcp://127.0.0.1:44033 distributed.worker - INFO - bokeh at: 127.0.0.1:8789 distributed.worker - INFO - nanny at: 127.0.0.1:36345 distributed.worker - INFO - Waiting to connect to: tcp://localhost:8786 distributed.worker - INFO - ------------------------------------------------- distributed.worker - INFO - Threads: 4 distributed.worker - INFO - Memory: 16.70 GB distributed.worker - INFO - Local Directory: /home/fergal/orbital/repos/projects/safegraph/dask/dask-worker-space/worker-QjJ1ke distributed.worker - INFO - ------------------------------------------------- distributed.worker - INFO - Registered to: tcp://localhost:8786 distributed.worker - INFO - ------------------------------------------------- Then my code, borrowed from the video, is from dask.distributed import Client import dask.array as da client = Client(processes=False) print(client) x = da.random.random((10000, 10000, 10), chunks=(1000,1000,5)) y = da.random.random((10000, 10000, 10), chunks=(1000,1000,5)) z = (da.arcsin(x) + da.arcsin(y)).sum(axis=(1,2)) z.visualize('eg.svg') z.compute() The code runs, and produces a graph via graph-viz. The bokeh server is accessible at 127.0.0.1:8787/status, and displays a big blue block at the top right, as per the first few seconds of the video. But when I run my code, the webpage doesn't update to show a running computation, nor does it show any results when the computation is finished. Iwould expect to see something like what I see around time 1:20 on the video. I'm undoubtedly neglecting to set something up properly, but I can't find any clues in either the documentation or on Stack Overflow. So what am I doing wrong?
I found a solution. Update dask to 1.1.5, shutdown the dask-scheduler (and dask-worker). I'm guessing my problem was that the version of dask from the default conda channel was out of date. I downloaded the newer version from conda-forge
AWS BeanStalk doesn't tend to connect to AWS RDS
I think I am almost there. I created an instance of AWS BeanStalk and added an oracle DB instance to it. When I found the log, I saw the driver was loaded but it keeps saying that URL is invalid. Here are my RDS info and log message. [RDS Info] Endpoint = aa1c9autjaqoufk.c2k1ch01futy.ap-northeast-2.rds.amazonaws.com Port = 1521 Public Access = yes [System Log] 25-Jun-2018 02:42:56.759 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["http-nio-8080"] 25-Jun-2018 02:42:56.787 INFO [main] org.apache.tomcat.util.net.NioSelectorPool.getSharedSelector Using a shared selector for servlet write/read 25-Jun-2018 02:42:56.796 INFO [main] org.apache.coyote.AbstractProtocol.init Initializing ProtocolHandler ["ajp-nio-8009"] 25-Jun-2018 02:42:56.799 INFO [main] org.apache.tomcat.util.net.NioSelectorPool.getSharedSelector Using a shared selector for servlet write/read 25-Jun-2018 02:42:56.800 INFO [main] org.apache.catalina.startup.Catalina.load Initialization processed in 1366 ms 25-Jun-2018 02:42:56.842 INFO [main] org.apache.catalina.core.StandardService.startInternal Starting service Catalina 25-Jun-2018 02:42:56.848 INFO [main] org.apache.catalina.core.StandardEngine.startInternal Starting Servlet Engine: Apache Tomcat/8.0.50 25-Jun-2018 02:42:56.872 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDirectory Deploying web application directory /var/lib/tomcat8/webapps/ROOT 25-Jun-2018 02:42:58.613 INFO [localhost-startStop-1] org.apache.jasper.servlet.TldScanner.scanJars At least one JAR was scanned for TLDs yet contained no TLDs. Enable debug logging for this logger for a complete list of JARs that were scanned but no TLDs were found in them. Skipping unneeded JARs during scanning can improve startup time and JSP compilation time. 25-Jun-2018 02:42:58.689 INFO [localhost-startStop-1] org.apache.catalina.startup.HostConfig.deployDirectory Deployment of web application directory /var/lib/tomcat8/webapps/ROOT has finished in 1,817 ms 25-Jun-2018 02:42:58.693 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["http-nio-8080"] 25-Jun-2018 02:42:58.720 INFO [main] org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler ["ajp-nio-8009"] 25-Jun-2018 02:42:58.736 INFO [main] org.apache.catalina.startup.Catalina.start Server startup in 1935 ms Loading driver... Driver loaded! jdbc:oracle:oci://aa1c9autjaqoufk.c2k1ch01futy.ap-northeast-2.rds.amazonaws.com:1521/ebdb?user=username&password=password SQLException: Invalid Oracle URL specified SQLState: 99999 VendorError: 17067 Closing the connection. SQLException: Invalid Oracle URL specified SQLState: 99999 VendorError: 17067 Closing the connection. I included ojdbc8 drvier in my web project library and made a build. Is this about driver? What am I doing wrong?
Message clearly says your URL is incorrect, It should be something like below. //step1 load the driver class Class.forName("oracle.jdbc.driver.OracleDriver"); //step2 create the connection object Connection con=DriverManager.getConnection( "jdbc:oracle:thin:#aa1c9autjaqoufk.c2k1ch01futy.ap-northeast-2.rds.amazonaws.com:1521:edb","username","password"); `
Elastic Beanstalk Config fails when trying to install GhostScript 9.10
I'm trying to install GhostScript 9.10 on Elastic Beanstalk because currently only Ghostscript 8.70 is available via yum packages. The installation is working via SSH on the EC2 instance but the configuration file is always failing and I don't understand whats the reason. Here is my .ebextensions configuration file: commands: 01_admin_rights: command: "sudo su" 02_get_gs: command: "curl -O http://downloads.ghostscript.com/public/old-gs-releases/ghostscript-9.10.tar.gz" 03_extract_gs: command: "tar -xzf ghostscript-9.10.tar.gz" 04_cd_gs: command: "cd ghostscript-9.10" 05_configure_gs: command: "bash configure" 06_install_gs: command: "make install" 07_so_gs: command: "make so" 08_reboot: command: "reboot" And here goes the elastic beanstalk error log part: [2016-06-21T12:22:52.720Z] INFO [24703] - [Application update Come on #15#25/AppDeployStage0/EbExtensionPreBuild/Infra-EmbeddedPreBuild/prebuild_2__Staging/Command 01_admin_rights] : Starting activity... [2016-06-21T12:22:52.757Z] INFO [24703] - [Application update Come on #15#25/AppDeployStage0/EbExtensionPreBuild/Infra-EmbeddedPreBuild/prebuild_2__Staging/Command 01_admin_rights] : Completed activity. [2016-06-21T12:22:52.757Z] INFO [24703] - [Application update Come on #15#25/AppDeployStage0/EbExtensionPreBuild/Infra-EmbeddedPreBuild/prebuild_2__Staging/Command 02_get_gs] : Starting activity... [2016-06-21T12:22:53.524Z] INFO [24703] - [Application update Come on #15#25/AppDeployStage0/EbExtensionPreBuild/Infra-EmbeddedPreBuild/prebuild_2__Staging/Command 02_get_gs] : Completed activity. Result: % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 33.6M 100 33.6M 0 0 49.2M 0 --:--:-- --:--:-- --:--:-- 49.2M [2016-06-21T12:22:53.524Z] INFO [24703] - [Application update Come on #15#25/AppDeployStage0/EbExtensionPreBuild/Infra-EmbeddedPreBuild/prebuild_2__Staging/Command 03_extract_gs] : Starting activity... [2016-06-21T12:22:55.066Z] INFO [24703] - [Application update Come on #15#25/AppDeployStage0/EbExtensionPreBuild/Infra-EmbeddedPreBuild/prebuild_2__Staging/Command 03_extract_gs] : Completed activity. [2016-06-21T12:22:55.066Z] INFO [24703] - [Application update Come on #15#25/AppDeployStage0/EbExtensionPreBuild/Infra-EmbeddedPreBuild/prebuild_2__Staging/Command 04_cd_gs] : Starting activity... [2016-06-21T12:22:55.069Z] INFO [24703] - [Application update Come on #15#25/AppDeployStage0/EbExtensionPreBuild/Infra-EmbeddedPreBuild/prebuild_2__Staging/Command 04_cd_gs] : Completed activity. [2016-06-21T12:22:55.070Z] INFO [24703] - [Application update Come on #15#25/AppDeployStage0/EbExtensionPreBuild/Infra-EmbeddedPreBuild/prebuild_2__Staging/Command 05_configure_gs] : Starting activity... [2016-06-21T12:22:55.073Z] INFO [24703] - [Application update Come on #15#25/AppDeployStage0/EbExtensionPreBuild/Infra-EmbeddedPreBuild/prebuild_2__Staging/Command 05_configure_gs] : Activity execution failed, because: bash: configure: No such file or directory (ElasticBeanstalk::ExternalInvocationError) [2016-06-21T12:22:55.073Z] INFO [24703] - [Application update Come on #15#25/AppDeployStage0/EbExtensionPreBuild/Infra-EmbeddedPreBuild/prebuild_2__Staging/Command 05_configure_gs] : Activity failed. I understand that the command 5 is failing because the file doesn't exist. However when I'm doing the steps manually via SSH the file exists and all commands can be executed in this order. What do I miss? EDIT: I played around with the configure argument and tried: /bin/bash: ./configure ./configure bash ./configure All arguments fail with the same error "No such file or directory." If I connect via SSH and enter one of the configure commands then it works without any issues. Anybody knows whats going on here?
I think this could be because separate commands don't keep the environment or the current directory consistent between them. One of your commands is cd, but I don't think the changed directory persists to the next command. Try combining all commands into one, like this: command: | cd directory ./configure Also, ebextensions run as root, so you may not need the sudo su in the beginning.
Why isn't WSO2 ESB joining ELB cluster?
We already have WSO2 ESB 4.6 in production so I'm using ESB 4.6.0 and ELB 2.0.3. Following the instructions at https://docs.wso2.com/pages/viewpage.action?pageId=26839403 I seem to have successfully installed the ELB. I tried setting up the ESB from scratch (as opposed to using our production instance or config) and started it up, but it isn't joining the cluster, and I'm unable to access the management console at https://mgt.esb.cloud-test.wso2.com:8243/carbon . (In order to keep everything according to the installing instructions, I added elb.wso2.com, mgt.esb.cloud-test-wso2.com, and esb.cloud-test.wso2.com to /etc/hosts as . . . 127.0.0.1 localhost localhost.localdomain elb.wso2.com mgt.esb.cloud-test.wso2.com esb.cloud-test.wso2.com I haven't installed the ESB worker node yet, but that, the ESB management node, and the ELB will be on the same host for now. I enabled the port offset for the management node. I confirmed the hostnames and port numbers are correct in the ELB and ESB configurations. This is the log from starting the ESB management node: JAVA_HOME environment variable is set to /usr/java/jre1.7.0_17 CARBON_HOME environment variable is set to /opt/app/wso2esb-4.6.0-manager [2014-12-09 13:37:13,474] INFO - CarbonCoreActivator Starting WSO2 Carbon... [2014-12-09 13:37:13,476] INFO - CarbonCoreActivator Operating System : Linux 2.6.18-194.el5, amd64 [2014-12-09 13:37:13,476] INFO - CarbonCoreActivator Java Home : /usr/java/jre1.7.0_17 [2014-12-09 13:37:13,476] INFO - CarbonCoreActivator Java Version : 1.7.0_17 [2014-12-09 13:37:13,477] INFO - CarbonCoreActivator Java VM : Java HotSpot(TM) 64-Bit Server VM 23.7-b01,Oracle Corporation [2014-12-09 13:37:13,477] INFO - CarbonCoreActivator Carbon Home : /opt/app/wso2esb-4.6.0-manager [2014-12-09 13:37:13,477] INFO - CarbonCoreActivator Java Temp Dir : /opt/app/wso2esb-4.6.0-manager/tmp [2014-12-09 13:37:13,477] INFO - CarbonCoreActivator User : clowndrugs, en-US, America/Chicago [2014-12-09 13:37:13,505] WARN - ValidationResultPrinter Open files limit :1024 of the system is below the recommended minimum count :4096 [2014-12-09 13:37:13,522] INFO - AgentHolder Agent created ! [2014-12-09 13:37:13,543] INFO - AgentDS Successfully deployed Agent Client [2014-12-09 13:37:16,158] INFO - EmbeddedRegistryService Configured Registry in 40ms [2014-12-09 13:37:16,260] INFO - RegistryCoreServiceComponent Registry Mode : READ-WRITE [2014-12-09 13:37:17,612] INFO - ClusterBuilder Clustering has been enabled [2014-12-09 13:37:17,625] INFO - ClusterBuilder Running in application mode [2014-12-09 13:37:17,626] INFO - ClusterBuilder Clustering configuration management has been enabled [2014-12-09 13:37:17,627] INFO - ClusterBuilder Clustering state management has been disabled [2014-12-09 13:37:17,954] INFO - LandingPageWebappDeployer Deployed product landing page webapp: StandardEngine[Catalina].StandardHost[localhost].StandardContext[/home] [2014-12-09 13:37:18,049] INFO - DeploymentInterceptor Deploying Axis2 service: echo {super-tenant} [2014-12-09 13:37:18,237] INFO - DeploymentEngine Deploying Web service: Echo.aar - file:/opt/app/wso2esb-4.6.0-manager/repository/deployment/server/axis2services/Echo.aar [2014-12-09 13:37:18,324] INFO - DeploymentInterceptor Deploying Axis2 service: Version {super-tenant} [2014-12-09 13:37:18,344] INFO - DeploymentEngine Deploying Web service: Version.aar - file:/opt/app/wso2esb-4.6.0-manager/repository/deployment/server/axis2services/Version.aar [2014-12-09 13:37:18,607] INFO - DeploymentInterceptor Deploying Axis2 service: echo {super-tenant} [2014-12-09 13:37:18,700] INFO - DeploymentInterceptor Deploying Axis2 service: Version {super-tenant} [2014-12-09 13:37:18,795] INFO - ModuleDeployer Deploying module: rampart-1.6.1-wso2v8 - file:/opt/app/wso2esb-4.6.0-manager/repository/deployment/client/modules/rampart-1.6.1-wso2v8.mar [2014-12-09 13:37:18,798] INFO - ModuleDeployer Deploying module: addressing-1.6.1-wso2v7 - file:/opt/app/wso2esb-4.6.0-manager/repository/deployment/client/modules/addressing-1.6.1-wso2v7.mar [2014-12-09 13:37:18,803] INFO - TCPTransportSender TCP Sender started [2014-12-09 13:37:19,992] INFO - DeploymentEngine Deploying Web service: org.wso2.carbon.message.processor - [2014-12-09 13:37:20,000] INFO - DeploymentEngine Deploying Web service: org.wso2.carbon.message.store - [2014-12-09 13:37:21,806] INFO - DeploymentInterceptor Deploying Axis2 service: wso2carbon-sts {super-tenant} [2014-12-09 13:37:21,959] INFO - DeploymentEngine Deploying Web service: org.wso2.carbon.sts - [2014-12-09 13:37:22,205] INFO - DeploymentEngine Deploying Web service: org.wso2.carbon.tryit - [2014-12-09 13:37:22,497] INFO - CarbonServerManager Repository : /opt/app/wso2esb-4.6.0-manager/repository/deployment/server/ [2014-12-09 13:37:22,743] INFO - PermissionUpdater Permission cache updated for tenant -1234 [2014-12-09 13:37:22,815] INFO - ServiceBusInitializer Starting ESB... [2014-12-09 13:37:22,832] INFO - ServiceBusInitializer Initializing Apache Synapse... [2014-12-09 13:37:22,836] INFO - SynapseControllerFactory Using Synapse home : /opt/app/wso2esb-4.6.0-manager/. [2014-12-09 13:37:22,836] INFO - SynapseControllerFactory Using synapse.xml location : /opt/app/wso2esb-4.6.0-manager/././repository/deployment/server/synapse-configs/default [2014-12-09 13:37:22,836] INFO - SynapseControllerFactory Using server name : localhost [2014-12-09 13:37:22,839] INFO - SynapseControllerFactory The timeout handler will run every : 15s [2014-12-09 13:37:22,845] INFO - Axis2SynapseController Initializing Synapse at : Tue Dec 09 13:37:22 CST 2014 [2014-12-09 13:37:22,845] INFO - Axis2SynapseController Loading mediator extensions... [2014-12-09 13:37:22,848] INFO - CarbonSynapseController Loading the mediation configuration from the file system [2014-12-09 13:37:22,850] INFO - MultiXMLConfigurationBuilder Building synapse configuration from the synapse artifact repository at : ././repository/deployment/server/synapse-configs/default [2014-12-09 13:37:22,850] INFO - XMLConfigurationBuilder Generating the Synapse configuration model by parsing the XML configuration [2014-12-09 13:37:23,075] INFO - SynapseConfigurationBuilder Loaded Synapse configuration from the artifact repository at : ././repository/deployment/server/synapse-configs/default [2014-12-09 13:37:23,076] INFO - Axis2SynapseController Deploying the Synapse service... [2014-12-09 13:37:23,078] INFO - Axis2SynapseController Deploying Proxy services... [2014-12-09 13:37:23,078] INFO - Axis2SynapseController Deploying EventSources... [2014-12-09 13:37:23,083] INFO - ServerManager Server ready for processing... [2014-12-09 13:37:23,120] WARN - MediationStatisticsComponent Can't register an observer for mediationStatisticsStore. If you have disabled StatisticsReporter, please enable it in the Carbon.xml [2014-12-09 13:37:23,177] INFO - RuleEngineConfigDS Successfully registered the Rule Config service [2014-12-09 13:37:24,407] INFO - HttpsTransportListener HTTPS port : 9444 [2014-12-09 13:37:24,407] INFO - HttpTransportListener HTTP port : 9764 [2014-12-09 13:37:24,408] INFO - TribesClusteringAgent Initializing cluster... [2014-12-09 13:37:24,423] INFO - TribesClusteringAgent Cluster domain: wso2.esb.domain [2014-12-09 13:37:24,425] INFO - TribesClusteringAgent Using wka based membership management scheme [2014-12-09 13:37:24,433] INFO - WkaBasedMembershipScheme Receiver Server Socket bound to:/10.221.90.92:4001 [2014-12-09 13:37:24,535] INFO - WkaBasedMembershipScheme Receiver Server Socket bound to:/10.221.90.92:4001 [2014-12-09 13:37:24,639] INFO - WkaBasedMembershipScheme Could not connect to member 127.0.0.1:4000(wso2.esb.domain) [2014-12-09 13:37:24,653] INFO - TribesClusteringAgent Local Member 10.221.90.92:4001(wso2.esb.domain) [2014-12-09 13:37:24,653] INFO - TribesUtil No members in current cluster [2014-12-09 13:37:24,654] INFO - TribesClusteringAgent Cluster initialization completed. [2014-12-09 13:37:25,828] INFO - RegistryEventingServiceComponent Successfully Initialized Eventing on Registry [2014-12-09 13:37:25,943] INFO - JMXServerManager JMX Service URL : service:jmx:rmi://localhost:11112/jndi/rmi://localhost:10000/jmxrmi [2014-12-09 13:37:25,943] INFO - StartupFinalizerServiceComponent Server : WSO2 Enterprise Service Bus-4.6.0 [2014-12-09 13:37:25,944] INFO - StartupFinalizerServiceComponent WSO2 Carbon started in 17 sec [2014-12-09 13:37:26,567] INFO - CarbonUIServiceComponent Mgt Console URL : https://mgt.esb.cloud-test.wso2.com:8243/carbon/ The output from the ELB gives lots of these: [2014-12-09 15:33:44,773] ERROR - TenantAwareLoadBalanceEndpoint No application members available Is this expected behavior since I have yet set up any services or a worker node? From the instructions, it sounded like I should be able to see the management node's Web console at this point.
try the following changes and see modify elb.wso2.com in ELB axis2.xml file modify mgt.esb.cloud-test-wso2.com in ESB axis2.xml file