AWS Glue Development Endpoint Not Working properly - amazon-web-services

I am trying to use a development Endpoint to interactively run and edit ETL scripts but there seems to some issues in the development endpoint just after creating it as i am getting errors in scala/python REPL and also unable to do SSH tunnel to remote interpreter.
Let me explain what i did exactly - I created a development endpoint in the AWS console with all the default configurations. While creating the development endpoint i only provided three things 'Development endpoint name' and 'IAM Role' and my 'pub ssh key'. This is how it looks after creation
Then Right After creating the endpoint i am connecting to the spark/python REPL, I am able to connect to them successfully but within couple of minutes of connecting, the REPL starts throwing errors without writing a single line of code. This is happening in all the REPL present in the development endpoints.
Also When I am trying to do SSH tunneling to remote interpreter to connect my Local Zeppelin Notebook it is throwing - "bind: Cannot assign requested address".
Couple of things that are working though -
Able to do ssh to the endpoint.
Created a Sagemaker notebook in the AWS glue that is attached to this development endpoint and this notebook seems to be working fine, although surely it is adding an additional cost and i don't want to continue using it.
Can anyone please help what am i doing wrong? Am I missing any important steps that is needed to be done on the machine right after creating the development endpoint?
Thanks in Advance!

Not very sure about this error but if you are using it smaller datasets then probably you would like to use Docker implementation as it will not add any additional cost and you can go on with your developments.
You can refer this blog on how to set it up
https://towardsdatascience.com/develop-glue-jobs-locally-using-docker-containers-bffc9d95bd1

Related

lyft/Cartography on EC2, is it possible?

I've been trying to Run cartography on my EC2 account for the last 2 days. I have no previous knowledge of Neo4j, But following their installation process doesn't work.
First I've tried to install Neo4j using rpm instructions for Neo4J website, no success acessing Neo4j on port 7474. Error: Connection refused.
Then I gave up trying to make Neo4J work on an EC2 installation, and used their MarketPlace AMi- Works Like a charm but I don't know what is being installed on that AMI. So I decided to install and run cartography on this instance.
My first problem was installing python, pip and java correctly. After everything working, I've discovered neo4j bolt port used my public IP, not my localhost. After thatI was able to finally execute Cartography, but Not it's giving me the following error:
neobolt.exceptions.ClientError: Supplied bookmark [FB:kcwQ40omSYgvSzKPpCQTXDOcCBSQ] does not conform to pattern neo4j:bookmark:v1:tx
Have Anyone really was able to use this?, every step along the way requires some specific libraries.
Thanks !
I maintain cartography and hope I can help (wish I saw this earlier though haha)
Few things to check:
Are you using Neo4j 4.x? cartography currently only supports 3.5.x.
To run for one AWS account,
AWS_PROFILE=profilename cartography --neo4j-uri <uri for your neo4j instance; usually bolt://localhost:7687>`
To run multiple accounts, set up an AWS config file and run
AWS_CONFIG_FILE=/path/to/your/aws/config cartography --neo4j-uri <uri for your neo4j instance; usually bolt://localhost:7687> --aws-sync-all-profiles
(see https://github.com/lyft/cartography/blob/master/docs/setup/install.md#cartography-installation)
If you have more questions feel free to open a GitHub issue or start a thread on our Slack (can talk about more specialized setups like if you're using containers or anything like that too)

AWS is rejecting my credentials when I try to push to S3 from an Ubuntu server instead of my local machine

I have a client (as in a customer, not a program!) who runs a site powered by S3 and Cloudfront. I developed a data-driven Web interactive (powered by JS, of course) that I regularly update for them since the data updates frequently. I use the public/private key pair they provided me since I do not, quite reasonably, have access to their entire AWS account.
The client would like the ability to update the interactive any time, even if I'm not immediately on call. But it's unrealistic for me to ask them to replicate my local environment through a Docker container or what-have-you -- I'm not even sure they have admin access to their Macs. I do not have a direct contact to their DevOps team that manages the infrastructure. (It's a big company, and my points of contact are not developers.)
My plan was to spin up a small, cheap EC2 instance on my AWS account, install the necessarily languages and libraries to replicate what I do from my machine: fetch the new data and process it (in R -- the raw files are very large so the interactive cannot fetch them directly), recompile the interactive (in Node.js, using Webpack) and push it live or to a staging server (using awscli, Amazon's Python client). I already have simple Shell scripts to do all of this in one fell swoop and I've confirmed that the client is able to ssh into my small instance using a .pem file I provided them and run one or two Shell scripts.
The problem: The client's AWS account is rejecting the credentials they provided me when an upload request comes from my tiny Ubuntu 18.04 server instead of my local machine. There are many competing instructions out there for how to install awscli on Ubuntu -- aptitude, pip, direct download from Amazon, etc. I've tried them all, and am currently using this version on the server:
$ aws --version
aws-cli/1.18.104 Python/3.6.9 Linux/5.3.0-1030-aws botocore/1.17.27
I ran aws configure on the server, made absolutely certain I was using the correct key pair, and set it up fine (or so I thought). But when I test it with an innocuous command, I'm told the key is invalid:
$ aws s3 ls s3://[client's s3 instance]
An error occurred (InvalidAccessKeyId) when calling the ListObjectsV2 operation:
The AWS Access Key Id you provided does not exist in our records.
When I run this exact same command on my local machine, a MacBook, using the same credentials, it works just fine. I'm 100% certain I'm using the same key pair -- I checked manually.
So I'm perplexed as to why AWS rejects a request from an Ubuntu server but accepts one from a local machine. I've been through every StackOverflow post and message board I can find, but the diagnosis is usually that the key pair is wrong, which it isn't.
The only thing I can think of is that I'm using a different version of awscli locally. On my local machine, I'm still using an older version:
$ aws --version
aws-cli/1.16.10 Python/2.7.17 Darwin/19.6.0 botocore/1.12.0
I'm reluctant to update the cli on my machine in case it breaks there too but can test in a VM. But more to the point, I'm wondering if, under the hood, there's a more fundamental difference been running an aws command from a server versus a laptop. It's tricky thing to Google for!
After all that, and checking 100 times, there was one character missing in the credentials on the server. Order is restored in the universe. Thanks!

Error when trying to connect to a Cloud SQL instance using the Cloud Shell

I've had a Cloud SQL instance for about a year now.
I always accessed it the same way:
I would go to my project on the Cloud Console.
Click on the Cloud Shell icon at the top right (a small right pointing arrow).
A black shell screen would pop up where I would type
gcloud sql connect <my instance> --user=root.
Enter my password.
Now, all of a sudden, I am getting an error message saying:
There was no instance found at projects//instances/ or you are not authorized to connect to it.
I am the owner of the project, and also have Admin rights to the Cloud SQL instance. The project and instance are still there, and my app that accesses the data stored in the instances' database is working fine - therefore I know the database is also present, otherwise my app wouldn't work.
I didn't touch or change anything in the Cloud SQL instance. Suddenly, I simply can't access my database using the exact same procedure I have been using almost every day over the past year now.
I am able to access the database using a local Python script on my laptop and the Cloud SQL Proxy, but I would like to access it from the Cloud Shell again.
Any ideas on what could the problem be?
gcloud components update - update all of your installed components to the latest version
gcloud init - reinitialize gcloud shell. It performs the following setup steps:
Authorizes gcloud and other SDK tools to access Google Cloud Platform using your user account credentials, or from an account of your choosing whose credentials are already available.
It seems like there was a problem with the GCP Cloud Shell (even though there was no mention of it on the GCP error tracking page). When I logged back in today and followed the same above process everything worked well.
Looks like GCP Cloud Shell could occasionally go rouge and start producing errors. Word of advice, don't panic when this happens (like I did) and start resetting, rebooting and messing up things. Just wait a day and check back again.

Ipython notebook remote server on AWS

I'm running a remote IPython notebook server on an EC2 instance on AWS. The instance is running Ubuntu.
Followed this tutorial to set up, and everything seems to work - I can access the notebook via https with a password and run code.
However, I can't seem to save changes to the notebook - It says "saving notebook" and then nothing happens (i.e, still written 'unsaved changes' on top).
Any ideas would be greatly appreciated.
Edit: It's not a permissions problem, since running in sudo doesn't help.
When creating a new notebook in the remote server, I am able to save. Problem only occurs for notebooks pulled from my git repository. Also, when opening a problematic notebook, and deleting all cells until it's absolutely empty, I can sometimes (!) save the empty notebook, and sometimes (!!) I still can't.
I've encountered an issue where notebooks wouldn't save on the nbserver on AWS EC2 instance I set up in a similar manner via different tutorial. It turns out I had to refresh and re-login using the password, because my browser would automatically log out have a certain period. Might help if you close and re-attempt to go the the nbserver and see if it asks you to re-login.
Here's a few other things you can try:
try to copy a problematic notebook into the server (scp) and try to open+save, as opposed to going thru repo pull to see if anything changes
check if the hanging "saving notebook" message appear for notebooks in certain directories
check the ipython console messages when you save a problematic notebook and see if anything there can help you pinpoint the issue

Jenkins/Hudson call script on AWS EC2 Instance

Today, on my work, when we need to deploy a Play! Framework (1.2.7) app in our EC2 Instance (AWS), we need to access the server and call a script that download all the source code, precompile the source code, start Play! Framework and restart nginx (everything in one script - .sh).
This process work fine today, but in emergency cases it's very slowly because we need to access de EC2 Instance (with key pair) and depending on the location the internet is slowly.
I want to know if is possible to use Hudson/Jenkins to just call this script on my EC2 Instances. I know that Hudson/Jenkins have a lot of functionality (test, build, etc.) but for now I just want to deploy my app (call the script from ec2-instance).
If anyone knows another tool that help, I will be very grateful.
Thanks.
You can use/build SBT plugins to run the remote command, or simply exec the local ssh with the command, but that can get quite hard to debug.
If you can build your instance to bootstrap itself from scratch, for example with a UserData script, then all you need to do is terminate your old instance and start a new one, and that is much easier to automate.