AWS Sagemaker Notebook Not working ,how can i solve the issue? - amazon-web-services

The code failed because of a fatal error:
Error sending http request and maximum retry encountered..
Some things to try:
a) Make sure Spark has enough available resources for Jupyter to create a Spark context.
b) Contact your Jupyter administrator to make sure the Spark magics library is configured correctly.
c) Restart the kernel.
Note: There are no such logs on cloudwatch to figureout the issue.
enter image description here

Are you looking to run Spark queries? If not, you can use the Python kernel, or any kernel other than Sparkmagic and proceed with your work.
If not, see this blog and the documentation to use Spark with notebook instances

Related

Dataproc custom image: Cannot complete creation

For a project, I have to create a Dataproc cluster that has one of the outdated versions (for example, 1.3.94-debian10) that contain the vulnerabilities in Apache Log4j 2 utility. The goal is to get the alert related (DATAPROC_IMAGE_OUTDATED), in order to check how SCC works (it is just for a test environment).
I tried to run the command gcloud dataproc clusters create dataproc-cluster --region=us-east1 --image-version=1.3.94-debian10 but got the following message ERROR: (gcloud.dataproc.clusters.create) INVALID_ARGUMENT: Selected software image version 1.3.94-debian10 is vulnerable to remote code execution due to a log4j vulnerability (CVE-2021-44228) and cannot be used to create new clusters. Please upgrade to image versions >=1.3.95, >=1.4.77, >=1.5.53, or >=2.0.27. For more information, see https://cloud.google.com/dataproc/docs/guides/recreate-cluster, which makes sense, in order to protect the cluster.
I did some research and discovered that I will have to create a custom image with said version and generate the cluster from that. The thing is, I have tried to read the documentation or find some tutorial, but I still can't understand how to start or to run the file generate_custom_image.py, for example, since I am not confortable with cloud shell (I prefer the console).
Can someone help? Thank you

AWS Glue Development Endpoint Not Working properly

I am trying to use a development Endpoint to interactively run and edit ETL scripts but there seems to some issues in the development endpoint just after creating it as i am getting errors in scala/python REPL and also unable to do SSH tunnel to remote interpreter.
Let me explain what i did exactly - I created a development endpoint in the AWS console with all the default configurations. While creating the development endpoint i only provided three things 'Development endpoint name' and 'IAM Role' and my 'pub ssh key'. This is how it looks after creation
Then Right After creating the endpoint i am connecting to the spark/python REPL, I am able to connect to them successfully but within couple of minutes of connecting, the REPL starts throwing errors without writing a single line of code. This is happening in all the REPL present in the development endpoints.
Also When I am trying to do SSH tunneling to remote interpreter to connect my Local Zeppelin Notebook it is throwing - "bind: Cannot assign requested address".
Couple of things that are working though -
Able to do ssh to the endpoint.
Created a Sagemaker notebook in the AWS glue that is attached to this development endpoint and this notebook seems to be working fine, although surely it is adding an additional cost and i don't want to continue using it.
Can anyone please help what am i doing wrong? Am I missing any important steps that is needed to be done on the machine right after creating the development endpoint?
Thanks in Advance!
Not very sure about this error but if you are using it smaller datasets then probably you would like to use Docker implementation as it will not add any additional cost and you can go on with your developments.
You can refer this blog on how to set it up
https://towardsdatascience.com/develop-glue-jobs-locally-using-docker-containers-bffc9d95bd1

AWS Glue Sagemaker Notebook "No module named awsglue.transforms"

I've created a Sagemaker notebook to dev AWS Glue jobs, but when running through the provided example ("Joining, Filtering, and Loading Relational Data with AWS Glue") I get the following error:
Does anyone know what I've setup wrong/haven't setup to cause the import to not work?
You'll need to download the library files from here for Glue 0.9 or here for Glue 1.0 (Check your Glue jobs for the version).
Put the zip in S3 and reference it in the "Python library path" on your Dev Endpoint.
I had the same issue and the selected solution did not work for me.
I did manage to get working by using cloud formation (AWS::Glue::DevEndpoint).
Through trial and error I noticed that you can't specify both NumberOfNodes and NumberOfWorkers at the same time. You have to specify one or the other.
Using NumberOfNodes: 5 resulted in the exact same error as specified in the question. But using the 2nd option worked perfectly.
So to conclude, to fix this error you can use CloudFormation and make sure to use the NumberOfWorkers property.
hm... this approach doesn't work for me.
I've just put zip to "Python library path", referenced to it and it doesn't work
Add AWSGlueServiceNotebookRole to your Dev Endpoint IAM Role, restart your kernel and rerun

GCP) How to keep jupyter session connected after disconnecting jupyter session from my local laptop?

I keep jupyter server running on GCP VM instance by tmux.
But the problem is that I wanna keep fitting my model after leaving jupyter server session from my local laptop
(eg. I turn off my laptop but jupyter session is still alive, fitting model, and I am able to re-connect that session to check status).
The only way I came up with is to use ~.py and execute $python3 fitting.py, but I wanna run and fit model on jupyter notebook to monitor avoiding adding extra code.
If there is a possible way to do so, please kindly teach me.
Thanks!
Have you considered using the Fairing library? It comes pre-installed with GCP's new AI Platform Notebooks.
This library allows you to pack up your notebook and send it off for remote execution. A new notebook will the executed content will be saved to your GCP Storage bucket. No active internet connection required once you kick of the notebook run.
You can learn how to use it by creating a new GCP AI Platform Notebook and looking at the tutorials folder inside it. You can also find additional tutorials for Fairing here

Spark step on EMR just hangs as "Running" after done writing to S3

Running PySpark 2 job on EMR 5.1.0 as a step. Even after the script is done with a _SUCCESS file written to S3 and Spark UI showing the job as completed, EMR still shows the step as "Running". I've waited for over an hour to see if Spark was just trying to clean itself up but the step never shows as "Completed". The last thing written in the logs is:
INFO MultipartUploadOutputStream: close closed:false s3://mybucket/some/path/_SUCCESS
INFO DefaultWriterContainer: Job job_201611181653_0000 committed.
INFO ContextCleaner: Cleaned accumulator 0
I didn't have this problem with Spark 1.6. I've tried a bunch of different hadoop-aws and aws-java-sdk jars to no avail.
I'm using the default Spark 2.0 configurations so I don't think anything else like metadata is being written. Also the size of the data doesn't seem to have an impact on this problem.
If you aren't already, you should close your spark context.
sc.stop()
Also, if you are watching the Spark Web UI via a browser, you should close that as it sometimes keeps the spark context alive. I recall seeing this on the spark dev mailing list, but can't find the jira for it.
We experienced this problem and resolved it by running the job in cluster deploy mode using the following spark-submit option:
spark-submit --deploy-mode cluster
It was something to do with when running in client mode the driver runs in the master instance and the spark-submit process is getting stuck despite the spark spark context closing. This was causing the instance controller to continuously polling for process as it never receives the completion signal. Running the driver on one of the instance nodes using the above option doesn't seem to have this problem. Hope this helps
I experienced the same issue with Spark on AWS EMR and I solved the issue by calling sys.exit(O) at the end of my Python script. The same worked with Scala program with System.exit(0).