I have annotated dataset. I am trying to use this dataset at Linux virtual machine in google cloud. Please help with steps
If you have the gsutil tool installed in your VM (which you can easily install following this documentation), you can easily download the files to your VM using the following command:
gsutil cp -r gs://<YOUR-BUCKET-NAME>/<YOUR-FOLDER-PATH>/* <DESTINATION-DIRECTORY>
This would be the most straight-forward answer, however, if you need to automate this, you could use the Cloud Storage Client Libraries to do this process using code.
Related
When I download files to the bucket I use the command: gsutil -u absolute-bison-xxxxx cp $FILE gs://bucket_1 which works fine.
I am running downstream programs that I want the output to be saved to the same bucket, but when I for instance type: -output gs://bucket_1/file.out to specify the folder for the output, it does not recognise the bucket as a place to store the output. How do I set the path to the bucket?
From the command you have posted on your question, I think you are using Cloud SDK as stated in the documentation to copy files from your computer to your bucket.
If that is the case, following Kolban’s comment, you could use Cloud Storage FUSE to mount your Google Cloud Storage bucket as a file system.
In order to use Cloud Storage Fuse, you should be running an OS based on Linux kernel version 3.10 or newer, or Mac OS X 10.10.2 as stated on the installation notes; also, you could run it from Google Compute Engine VMs.
You could begin installing it following the installation notes and following this YouTube tutorial.
For more information about using Cloud Storage FUSE, or to file an issue, go to the Google Cloud [GitHub repository].
For a fully-supported file system product in Google Cloud, see Filestore.
You could use Firestore as recommended in the documentation. To be used locally on your machine, you could follow this guide.
On starting the SageMaker Studio server, I can only see a set of predefined kernels when
I select kernel for any notebook.
I create conda environments and persist them between sessions by pointing .condarc to a custom miniconda directory stored on EFS.
I want all notebooks to have access to environments stored in the custom miniconda directory. I can do that on the system terminal but can't seem to find a way to make the kernels available to notebooks.
I am aware of Life Cycle Configuration but that seems to be working only with notebooks instances rather than SageMaker Studio.
Desired outcomes
Ideally making custom kernels persistently available to notebooks but if that isn't feasible or requires custom docker image, I am happy with running a script manually every time I run the server.
What I have tried so far:
I ran the following which is a tweaked version of start.sh meant to be for Life Cycle Configuration.
#!/bin/bash
set -e
sudo -u sagemaker-user -i <<'EOF'
unset SUDO_UID
WORKING_DIR=/home/sagemaker-user/.SageMaker/custom-miniconda/
source "$WORKING_DIR/miniconda/bin/activate"
for env in $WORKING_DIR/miniconda/envs/*; do
BASENAME=$(basename "$env")
source activate "$BASENAME"
python -m ipykernel install --user --name "$BASENAME" --display-name "$BASENAME"
done
EOF
That didn't work and I couldn't access the kernels from the notebooks.
If you need a persistent custom kernel in SageMaker studio, you can create an ECR repository and build a docker image with custom environment configurations. This image can then be attached to the SageMaker studio notebooks. Reference link!
SageMaker studio now also supports the use of lifecycle configurations. Reference link!
On SSH, I've tried using gsutil -m cp filename* Desktop to copy a file from the VM Instance to my computer desktop, like Google Cloud's own example in its documentation. I got a message saying that the file was copied successfully, but no mention of downloading anything, and I don't see the relevant file on my desktop. I've tried specifying the full desktop address instead of just 'Desktop', but SSH does not recognize the address.
Is there a way I can directly download files from the VM Instance to my desktop without having to go through a Google Cloud bucket?
In my opinion this is the easiest way. Directly to local machine in a few steps.
I am struggling about how to upload my Cloud Code files that i had on Parse.com to my Parse Server hosted on AWS EB.
So far i have:
Parse Server hosted on AWS EB. To host it on AWS i used the Orange Deploy Button which basically makes all stuff easier for people without having to install the Parse Server locally and upload it later to AWS.
iOS App written in objective C connected to the Parse server and working perfectly
Parse Dashboard locally on my mac connected to the Parse Server on AWS
The only thing that i would need is to upload all my cloud code files to the Parse Server. How could i do this? I have researched a lot over Google, stackoverflow, etc without success. There is some information but its unclear. Thanks in advance.
Finally and thanks to Ran Hassid i now have a Fully functional Parse Server on AWS with Cloud Code. For those who are in the same situation where i was, here is the answer to my question:
Go to this link here and follow all the steps (By the time i asked the question, the information provided by this link of AWS wasn't that clear as it is now. They improved the explanations and the info.)
After you finish all the previous steps from the link. You would have a Parse Server on AWS working.
Now the part of CLOUD CODE. Just create a folder in your MAC or PC wherever you like. Let's say on the desktop and called it Parse Server AWS (You can call it whatever you want)
Install the EB CLI which is the Command line interface to user Terminal (On Mac) or the equivalent on windows to work with the parse server you just set up on AWS (Similar to CloudCode with Parse CLI). The easy way to install it is running this command:
brew install awsebcli
Now open terminal on mac (or the equivalent on windows) and go to the folder that you just created on the step 3.
Run the next command. It will ask you to select the location of your parse server, and then the name.
eb init
Now this command. It will download all the files from AWS of your parse server to this folder you are in.
eb labs download
Finally, you will have a folder called Cloud where you can put all your cloud code files in.
When you finish just run the command:
eb deploy
Now you have your parse server with all your cloud code files working on AWS.
Now any change you need to make to your cloudCode files, just change the local files inside this folder just created on step 3 and run again the command from the step 9. Just exactly as you used to do with Parse Deploy command
Hopefully this information will help many people as it helped to me.
Have a happy coding!
parse-server cloud code is a bit different from Parse.com cloud code. In Parse.com we use the Parse CLI in order to modify and deploy our cloud code (parse deploy ...) in parse-server your cloud code exist under the following path of your parse project ./cloud/main.js* so your cloud code endpoint is the main.js file which by default located under the **cloud folder of your parse project. If you really want you can change this path but to keep it simple use the default location.
Now about deployment. in parse-server you need to redeploy your parse server again when you do some modification to your cloud code. Another option is to edit your cloud code remotely but from my POV its better to redeploy it
Is it possible or advisable to run WebHCat on an Amazon Elastic MapReduce cluster?
I'm new to this technology and I was wonder if it was possible to use WebHCat as a REST interface to run Hive queries. The cluster in question is running Hive.
I wasn't able to get it working but WebHCat is actually installed by default on Amazon's EMR instance.
To get it running you have to do the following,
chmod u+x /home/hadoop/hive/hcatalog/bin/hcat
chmod u+x /home/hadoop/hive/hcatalog/sbin/webhcat_server.sh
export TEMPLETON_HOME=/home/hadoop/.versions/hive-0.11.0/hcatalog/
export HCAT_PREFIX=/home/hadoop/.versions/hive-0.11.0/hcatalog/
/home/hadoop/hive/hcatalog/webhcat_server.sh start
You can then confirm that it's running on port 50111 using curl,
curl -i http://localhost:50111/templeton/v1/status
To hit 50111 on other machines you have to open the port up in the EC2 EMR security group.
You then have to configure the users you going to "proxy" when you run queries in hcatalog. I didn't actually save this configuration, but it is outlined in the WebHCat documentation. I wish they had some concrete examples there but basically I ended up configuring the local 'hadoop' user as the one that run the queries, not the most secure thing to do I am sure, but I was just trying to get it up and running.
Attempting a query then gave me this error,
{"error":"Server IPC version 9 cannot communicate with client version
4"}
The workaround was to switch off of the latest EMR image (3.0.4 with Hadoop 2.2.0) and switch to a Hadoop 1.0 image (2.4.2 with Hadoop 1.0.3).
I then hit another issues where it couldn't find the Hive jar properly, after struggling with the configuration more, I decided I had dumped enough time into trying to get this to work and decided to communicate with Hive directly (using RBHive for Ruby and JDBC for the JVM).
To answer my own question, it is possible to run WebHCat on EMR, but it's not documented at all (Googling lead me nowhere which is why I created this question in the first place, it's currently the first hit when you search "WebHCat EMR") and the WebHCat documentation leaves a lot to be desired. Getting it to work seems like a pain, though my hope is that by writing up the initial steps someone will come along and take it the rest of the way and post a complete answer.
I did not test it but, it should be doable.
EMR allows to customise the bootstrap actions, i.e. the scripts run where the nodes are started. You can use bootstrap actions to install additional software and to change the configuration of applications on the cluster
See more details at http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-bootstrap.html.
I would create a shell script to install WebHCat and test your script on a regular EC2 instance first (outside the context of EMR - just as a test to ensure your script is OK)
You can use EC2's user-data properties to test your script, typically :
#!/bin/bash
curl http://path_to_your_install_script.sh | sh
Then - once you know the script is working - make it available to the cluster on a S3 bucket and follow these instructions to include your script as custom bootstrap action of your cluster.
--Seb