Dataframe.rdd.map().collect Does not work in PySpark [duplicate] - python-2.7

This question already has answers here:
java.io.IOException: Cannot run program "python" using Spark in Pycharm (Windows)
(4 answers)
Closed 4 years ago.
I am very new to Python. Using Python 2.7
I am trying to run this simple code.
I am creating this DF from a CSV file.
This Dataframe has just 2 columns. I have tried below code snippets but it is getting Failed with every try
newDf = fullDf.rdd.map(lambda x: str(x[1])).collect() # FAILS
newDf = fullDf.rdd.map(lambda x: x.split(",")[1]).collect() # FAILS
What is the Issue here. Same thing works in Scala-Spark.
My Spark version is 2.1.0, Python version 2.7
I don't know what this Error is :
18/03/08 17:47:10 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/03/08 17:47:17 ERROR Executor: Exception in task 1.0 in stage 3.0 (TID 5)
java.io.IOException: Cannot run program "python": CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:120)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:67)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:116)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:386)
at java.lang.ProcessImpl.start(ProcessImpl.java:137)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 13 more
18/03/08 17:47:17 ERROR Executor: Exception in task 2.0 in stage 3.0 (TID 6)
java.io.IOException: Cannot run program "python": CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:120)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:67)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:116)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:386)
at java.lang.ProcessImpl.start(ProcessImpl.java:137)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 13 more
18/03/08 17:47:17 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 4)
java.io.IOException: Cannot run program "python": CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:120)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:67)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:116)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:386)
at java.lang.ProcessImpl.start(ProcessImpl.java:137)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 13 more
18/03/08 17:47:17 WARN TaskSetManager: Lost task 1.0 in stage 3.0 (TID 5, localhost, executor driver): java.io.IOException: Cannot run program "python": CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:120)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:67)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:116)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:386)
at java.lang.ProcessImpl.start(ProcessImpl.java:137)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 13 more
18/03/08 17:47:17 ERROR TaskSetManager: Task 1 in stage 3.0 failed 1 times; aborting job
Traceback (most recent call last):
File "C:/Test.py", line 25, in <module>
newDf = fullDf.rdd.map(lambda x: x.split(",")[1]).collect()
File "C:\spark-2.1.0\python\lib\pyspark.zip\pyspark\rdd.py", line 809, in collect
File "C:\spark-2.1.0\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py", line 1133, in __call__
File "C:\spark-2.1.0\python\lib\pyspark.zip\pyspark\sql\utils.py", line 63, in deco
File "C:\spark-2.1.0\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 3.0 failed 1 times, most recent failure: Lost task 1.0 in stage 3.0 (TID 5, localhost, executor driver): java.io.IOException: Cannot run program "python": CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:120)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:67)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:116)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessImpl.create(Native Method)
at java.lang.ProcessImpl.<init>(ProcessImpl.java:386)
at java.lang.ProcessImpl.start(ProcessImpl.java:137)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
... 13 more
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1944)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:935)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:934)
at org.apache.spark.api.python.PythonRDD$.collectAndServe(PythonRDD.scala:453)
at org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Cannot run program "python": CreateProcess error=2, The system cannot find the file specified
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:120)
at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:67)
at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:116)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Caused by: java.io.IOException: CreateProcess error=2, The system cannot

As i was using Anaconda Python (2.7) While installing i did not checked the option to add python to Windows PATH. Thats the reason why it was not getting python.
So to fix this i had to use the SETX command to set Python & Conda
SETX PATH "%PATH%;C:\anaconda2\Scripts;C:\anaconda2;C:\spark-2.1.0\python\pyspark"
Reference -> Setting Anaconda Python

Related

Failed to read part of the files from s3 bucket with Spark

I have a Spark 2.4.7 running on GCP Dataproc and task to read some files from AWS S3.
Having AWS credentials, namely access key and secret access key, appended to files /etc/profile.d/spark_config.sh /etc/*bashrc /usr/lib/spark/conf/spark-env.sh at cluster creation, I can read SOME of the files from my S3 bucket, while reading all of them gives
Traceback (most recent call last):
File "<stdin>", line 7, in <module>
File "/usr/lib/spark/python/pyspark/sql/readwriter.py", line 274, in json
return self._df(self._jreader.json(self._spark._sc._jvm.PythonUtils.toSeq(path)))
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o202.json.
: java.nio.file.AccessDeniedException: s3a://my_bucket/part-00000-a5adf948-1f65-4068-ae68-76c3708b4cf1-c000.txt.lzo: getFileStatus on s3a://my_bucket/part-00000-a5adf948-1f65-4068-ae68-76c3708b4cf1-c000.txt.lzo: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: Q3TR40YDDCWFKESZ; S3 Extended Request ID: MeZc+rmfDXDL2DEXWS8zpfrn1s3dbthI31pErLpVT14kTGjpaxUYEmBG53D9f1cWwVRVL1oCzeM=), S3 Extended Request ID: MeZc+rmfDXDL2DEXWS8zpfrn1s3dbthI31pErLpVT14kTGjpaxUYEmBG53D9f1cWwVRVL1oCzeM=
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:174)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:117)
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:1923)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:1877)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:1812)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1632)
at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:2631)
at org.apache.spark.sql.execution.datasources.DataSource.$anonfun$checkAndGlobPathIfNecessary$1(DataSource.scala:575)
at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:245)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.flatMap(TraversableLike.scala:245)
at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:242)
at scala.collection.immutable.List.flatMap(List.scala:355)
at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:559)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:242)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:230)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:411)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
The code I use for reading:
spark.read \
.option("io.compression.codecs", "com.hadoop.compression.lzo.LzoCodec") \
.schema(StructType().add("id", StringType())) \
.json("s3a://my_bucket/*.lzo")
Files in the location are inserted at different times using the same mechanism. Examining one of the "corrupted" files using AWS CLI, I see the error:
aws s3 cp s3://my_bucket/part-00000-499c7394-c5cf-4045-8a54-0e1d5bc87897-c000.txt.lzo - | head
download failed: s3://my_bucket/part-00000-499c7394-c5cf-4045-8a54-0e1d5bc87897-c000.txt.lzo to - An error occurred (403) when calling the HeadObject operation: Forbidden
My question is how to further trace down the failed reads of some of the files. Thank you!
UPD:
Somehow the problematic files are ununsually small compared to the rest: ~4KiB in size compared to 30-50MiB other items.

Apache Django Internal Server Error (Ubuntu 18.04)

I have a Django wagtail application. I can run it in debug and I can run it on a django server but if I try to run it on my Ubuntu 18.04 server using Apache2 WSGI I get "Internal Server Error". Apache log:
[Mon Jan 18 23:29:23.673557 2021] [mpm_event:notice] [pid 92324:tid 140613127453760] AH00491: caught SIGTERM, shutting down
Exception ignored in: <function BaseEventLoop.__del__ at 0x7fe301c1fc10>
Traceback (most recent call last):
File "/usr/lib/python3.8/asyncio/base_events.py", line 654, in __del__
NameError: name 'ResourceWarning' is not defined
Exception ignored in: <function Local.__del__ at 0x7fe301b95040>
Traceback (most recent call last):
File "/home/user/wow/lib/python3.8/site-packages/asgiref/local.py", line 96, in __del__
NameError: name 'TypeError' is not defined
Exception ignored in: <function Local.__del__ at 0x7fe301b95040>
Traceback (most recent call last):
File "/home/user/wow/lib/python3.8/site-packages/asgiref/local.py", line 96, in __del__
NameError: name 'TypeError' is not defined
Exception ignored in: <function Local.__del__ at 0x7fe301b95040>
Traceback (most recent call last):
File "/home/user/wow/lib/python3.8/site-packages/asgiref/local.py", line 96, in __del__
NameError: name 'TypeError' is not defined
[Mon Jan 18 23:29:24.194636 2021] [mpm_event:notice] [pid 92749:tid 139932982541376] AH00489: Apache/2.4.41 (Ubuntu) OpenSSL/1.1.1f mod_wsgi/4.6.8 Python/3.8 configured -- resuming normal operations
[Mon Jan 18 23:29:24.194764 2021] [core:notice] [pid 92749:tid 139932982541376] AH00094: Command line: '/usr/sbin/apache2'
[Mon Jan 18 23:29:31.468972 2021] [wsgi:error] [pid 92750:tid 139932932445952] WSGI without exception
my wsgi file:
import os
import time
import traceback
import signal
import sys
from django.core.wsgi import get_wsgi_application
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "americanss.settings.production")
application = get_wsgi_application()
try:
application = get_wsgi_application()
print('WSGI without exception')
except Exception:
print('handing WSGI exception')
# Error loading applications
if 'mod_wsgi' in sys.modules:
traceback.print_exc()
os.kill(os.getpid(), signal.SIGINT)
time.sleep(2.5)
from what I have found, the exceptions raised happen and nobody seems to know why but most of the time it doesn't stop the app from running. In my case I get a 500 error and I can't tell if that is related to the traceback because no one else reports a 500 with these errors.
apactctl -M:
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 143.110.226.199. Set the 'ServerName' directive globally to suppress this message
Loaded Modules:
core_module (static)
so_module (static)
watchdog_module (static)
http_module (static)
log_config_module (static)
logio_module (static)
version_module (static)
unixd_module (static)
access_compat_module (shared)
alias_module (shared)
auth_basic_module (shared)
authn_core_module (shared)
authn_file_module (shared)
authz_core_module (shared)
authz_host_module (shared)
authz_user_module (shared)
autoindex_module (shared)
deflate_module (shared)
dir_module (shared)
env_module (shared)
filter_module (shared)
mime_module (shared)
mpm_event_module (shared)
negotiation_module (shared)
reqtimeout_module (shared)
rewrite_module (shared)
setenvif_module (shared)
socache_shmcb_module (shared)
ssl_module (shared)
status_module (shared)
wsgi_module (shared)

AWS Glue cryptic: Py4JJavaError: An error occurred while calling o82.parquet

I'm using AWS Glue to move concat some smaller parquet files to larger ones with a very simple pyspark script. However, I keep getting the following error and it very cryptic. clearly it has to do with writing some file out, but I have no idea if its corrupt data or resources of what. Has any had an error like this while using AWS Glue or spark in general?
File "/mnt/yarn/usercache/root/appcache/application_1522879912040_0003/container_1522879912040_0003_01_000001/pyspark.zip/pyspark/sql/readwriter.py", line 644, in parquet
File "/mnt/yarn/usercache/root/appcache/application_1522879912040_0003/container_1522879912040_0003_01_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/mnt/yarn/usercache/root/appcache/application_1522879912040_0003/container_1522879912040_0003_01_000001/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/mnt/yarn/usercache/root/appcache/application_1522879912040_0003/container_1522879912040_0003_01_000001/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o82.parquet.
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:147)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:121)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:121)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:121)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:101)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:492)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:198)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:494)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 63 in stage 3.0 failed 4 times, most recent failure: Lost task 63.3 in stage 3.0 (TID 986, ip-172-31-31-4.ec2.internal, executor 34): ExecutorLostFailure (executor 34 exited caused by one of the running tasks) Reason: Container marked as failed: container_1522879912040_0003_01_003509 on host: ip-172-31-31-4.ec2.internal. Exit status: -100. Diagnostics: Container released on a *lost* node
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1435)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1423)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1422)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1422)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:802)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:802)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1650)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1605)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1594)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:628)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1918)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1931)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1951)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:127)
... 31 more

java.lang.ClassNotFoundException: scala.Int Error when loading typesafe configuration

I have akka project.
This is application-1.conf at src/resources folder.
akka {
loglevel="INFO"
actor {
provider = "akka.remote.RemoteActorRefProvider"
}
remote {
enabled-transports = ["akka.remote.netty.tcp"]
netty.tcp {
hostname = "127.0.0.1"
port = 2552
}
log-sent-messages=on
log-received-messages=on
}
}
Below is my action at package example7_2 at folder src/main/scala/example7_2.
package example7_2
import akka.actor.Actor
class SimpleActor extends Actor {
override def receive: Receive = {
case msg => println(s"I have been created at
${self.path.address.hostPort} and received message $msg")
}
}
I have the Main App HelloAkkaRemoting10 below.
package example7_2
import akka.actor.ActorSystem
import com.typesafe.config.ConfigFactory
object HelloAkkaRemoting10 extends App {
val actorSystem = ActorSystem("HelloAkkaRemoting1", ConfigFactory.load("application-1"))
}
When I run the application, I get the
[error] (run-main-0) java.lang.ClassNotFoundException: scala.Int
[error] java.lang.ClassNotFoundException: scala.Int
[error] at sbt.internal.inc.classpath.ClasspathFilter.loadClass(ClassLoaders.scala:74)
[error] at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
[error] at java.lang.Class.forName0(Native Method)
[error] at java.lang.Class.forName(Class.java:348)
[error] at akka.actor.ReflectiveDynamicAccess.$anonfun$getClassFor$1(ReflectiveDynamicAccess.scala:21)
[error] at scala.util.Try$.apply(Try.scala:209)
[error] at akka.actor.ReflectiveDynamicAccess.getClassFor(ReflectiveDynamicAccess.scala:20)
[error] at akka.serialization.Serialization.$anonfun$bindings$3(Serialization.scala:313)
[error] at scala.collection.TraversableLike$WithFilter.$anonfun$map$2(TraversableLike.scala:739)
[error] at scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:231)
[error] at scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:462)
[error] at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:738)
[error] at akka.serialization.Serialization.<init>(Serialization.scala:311)
[error] at akka.serialization.SerializationExtension$.createExtension(SerializationExtension.scala:15)
[error] at akka.serialization.SerializationExtension$.createExtension(SerializationExtension.scala:12)
[error] at akka.actor.ActorSystemImpl.registerExtension(ActorSystem.scala:880)
[error] at akka.actor.ExtensionId.apply(Extension.scala:77)
[error] at akka.actor.ExtensionId.apply$(Extension.scala:77)
[error] at akka.serialization.SerializationExtension$.apply(SerializationExtension.scala:12)
[error] at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:203)
[error] at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:796)
[error] at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:793)
[error] at akka.actor.ActorSystemImpl._start(ActorSystem.scala:793)
[error] at akka.actor.ActorSystemImpl.start(ActorSystem.scala:809)
[error] at akka.actor.ActorSystem$.apply(ActorSystem.scala:244)
[error] at akka.actor.ActorSystem$.apply(ActorSystem.scala:287)
[error] at akka.actor.ActorSystem$.apply(ActorSystem.scala:262)
[error] at example7_2.HelloAkkaRemoting10$.delayedEndpoint$example7_2$HelloAkkaRemoting10$1(HelloAkkaRemoting10.scala:7)
[error] at example7_2.HelloAkkaRemoting10$delayedInit$body.apply(HelloAkkaRemoting10.scala:6)
[error] at scala.Function0.apply$mcV$sp(Function0.scala:34)
[error] at scala.Function0.apply$mcV$sp$(Function0.scala:34)
[error] at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
[error] at scala.App.$anonfun$main$1$adapted(App.scala:76)
[error] at scala.collection.immutable.List.foreach(List.scala:389)
[error] at scala.App.main(App.scala:76)
[error] at scala.App.main$(App.scala:74)
[error] at example7_2.HelloAkkaRemoting10$.main(HelloAkkaRemoting10.scala:6)
[error] at example7_2.HelloAkkaRemoting10.main(HelloAkkaRemoting10.scala)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[error] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[error] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[error] at java.lang.reflect.Method.invoke(Method.java:498)
[error] at sbt.Run.invokeMain(Run.scala:89)
[error] at sbt.Run.run0(Run.scala:83)
[error] at sbt.Run.execute$1(Run.scala:61)
[error] at sbt.Run.$anonfun$run$4(Run.scala:73)
[error] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
[error] at sbt.util.InterfaceUtil$$anon$1.get(InterfaceUtil.scala:10)
[error] at sbt.TrapExit$App.run(TrapExit.scala:252)
[error] at java.lang.Thread.run(Thread.java:748)
[error] java.lang.RuntimeException: Nonzero exit code: 1
[error] at sbt.Run$.executeTrapExit(Run.scala:120)
[error] at sbt.Run.run(Run.scala:73)
[error] at sbt.Defaults$.$anonfun$bgRunMainTask$6(Defaults.scala:1130)
[error] at sbt.Defaults$.$anonfun$bgRunMainTask$6$adapted(Defaults.scala:1125)
[error] at sbt.internal.BackgroundThreadPool.$anonfun$run$1(DefaultBackgroundJobService.scala:359)
[error] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
[error] at scala.util.Try$.apply(Try.scala:209)
[error] at sbt.internal.BackgroundThreadPool$BackgroundRunnable.run(DefaultBackgroundJobService.scala:282)
[error] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error] at java.lang.Thread.run(Thread.java:748)
[error] (chapter7/compile:runMain) Nonzero exit code: 1
[error] Total time: 3 s, completed Oct 18, 2017 11:11:52 AM
Other miscellaneous details -
sbt version - 1.0.2
scala version - 2.12.3
akka version - 2.5.4
Looks to me ConfigFactory load is throwing error but not sure the exact root cause. Please let me know if I am missing any configuration.
This appears to be a bug in sbt that has nothing to do with Typesafe Config and only slightly with Akka, I have filed it as https://github.com/sbt/sbt/issues/3736

Problems initialising Django 1.8 app using apache 2.4 mod_passenger 5.0 after centos 7 yum update

I have been ussing apache/mod_passenger on Centos 7 for a year now on my server at home to run a basic Django 1.8 app.
No issues, install updates with yum, etc.
Until today, I have noticed the website of my application was throwing an error. In the apache log I can see this:
[ 2016-02-21 19:45:38.8531 15680/7f470c08e880 age/Cor/CoreMain.cpp:707 ]: Passenger core online, PID 15680
[ 2016-02-21 19:45:38.8584 15685/7f6eda27f880 age/Ust/UstRouterMain.cpp:504 ]: Starting Passenger UstRouter...
[ 2016-02-21 19:45:38.8591 15685/7f6eda27f880 age/Ust/UstRouterMain.cpp:317 ]: Passenger UstRouter online, PID 15685
[Sun Feb 21 19:45:38.865561 2016] [mpm_prefork:notice] [pid 15644] AH00163: Apache/2.4.6 (CentOS) OpenSSL/1.0.1e-fips mod_fcgid/2.3.9 Phusion_Passenger/5.0.25 mod_wsgi/3.4 Python/2.7.5 configured -- resuming normal operations
[Sun Feb 21 19:45:38.865602 2016] [core:notice] [pid 15644] AH00094: Command line: '/usr/sbin/httpd -D FOREGROUND'
[Sun Feb 21 19:46:58.467864 2016] [autoindex:error] [pid 15701] [client 86.121.21.124:64572] AH01276: Cannot serve directory /var/www/myProjects/testuser66/static/: No matching DirectoryIndex (index.html) found, and server-generated directory index forbidden by Options directive
App 15712 stdout:
App 15712 stderr: error: cannot open Packages index using db5 - Permission denied (13)
App 15712 stderr: error: cannot open Packages database in /var/lib/rpm
App 15712 stderr: Traceback (most recent call last):
App 15712 stderr: File "/usr/share/passenger/helper-scripts/wsgi-loader.py", line 325, in <module>
App 15712 stderr: app_module = load_app()
App 15712 stderr: File "/usr/share/passenger/helper-scripts/wsgi-loader.py", line 62, in load_app
App 15712 stderr:
App 15712 stderr: return imp.load_source('passenger_wsgi', startup_file)
App 15712 stderr: File "/var/www/myProjects/testuser66/passenger_wsgi.py", line 8, in <module>
App 15712 stderr: if sys.executable != INTERP: os.execl(INTERP, INTERP, *sys.argv)
App 15712 stderr: File "/usr/lib64/python2.7/os.py", line 312, in execl
App 15712 stderr: execv(file, args)
App 15712 stderr: OSError: [Errno 13] Permission denied
App 15712 stderr:
[ 2016-02-21 19:47:02.9959 15680/7f470bef7700 age/Cor/App/Implementation.cpp:304 ]: Could not spawn process for application /var/www/myProjects/testuser66: An error occurred while starting the web application. It exited before signalling successful startup back to Phusion Passenger.
Error ID: d7ea712a
Error details saved to: /tmp/passenger-error-kqxLLO.html
Message from application: An error occurred while starting the web application. It exited before signalling successful startup back to Phusion Passenger. Please read this article for more information about this problem.<br>
Raw process output:
error: cannot open Packages index using db5 - Permission denied (13)
error: cannot open Packages database in /var/lib/rpm
Traceback (most recent call last):
File "/usr/share/passenger/helper-scripts/wsgi-loader.py", line 325, in <module>
app_module = load_app()
File "/usr/share/passenger/helper-scripts/wsgi-loader.py", line 62, in load_app
return imp.load_source(&apos;passenger_wsgi&apos;, startup_file)
File "/var/www/myProjects/testuser66/passenger_wsgi.py", line 8, in <module>
if sys.executable != INTERP: os.execl(INTERP, INTERP, *sys.argv)
File "/usr/lib64/python2.7/os.py", line 312, in execl
execv(file, args)
OSError: [Errno 13] Permission denied
[ 2016-02-21 19:47:02.9999 15680/7f4706242700 age/Cor/Con/CheckoutSession.cpp:277 ]: [Client 1-1] Cannot checkout session because a spawning error occurred. The identifier of the error is d7ea712a. Please see earlier logs for details about the error.
I used to have some trouble with Passenger after running yum update, but usually because of SE Linux. But I have checked permissions to my best knowledge and it is not the case.
Also the "cannot open /var/lib/rpm" is confusiog for me. What does it have to do with Apache/Passenger? I checked the rpm db, deleted yum cache, there are no signs of corupted rpm db.
Any help would be appreciated!