I am using Packer to create AMI's.
So Packer creates a temporary security group and keypair etc and launches an Insatnce.In my usecase after installing all the packages I need to run some test.Hence the results are generated on the same Instance which was launched by packer.
I want those results.Basically I want to download the results file on the machine which triggered the packer build.Is there a way in packer itself to download any or any other way.
Thanks in Advance for any help.
I Actually did not get any help on this.
The solution I found on this in a hacky way was:
I created a bash script which basically uploads the result.xml(Test result) to S3.
And do not forget to give correct permission to the s3 bucket, so that the packer instance can access it.
And download the result file after the packer run is completed from S3.
For anyone who runs into this today, you can use the direction option to switch to download:
https://www.packer.io/docs/provisioners/file#direction
Related
I am running a python code in EC2 instance where I am loading a Huggingface model using the from_pretrained() method. I get the error
OSError: Couldn't reach server at 'https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json' to download pretrained model configuration file.
while trying to initialize the reader. To get over this, I downloaded the file manually and provided the local JSON path. That worked fine but then I see issues in loading the tokenizer too.
OSError: Couldn't reach server at '{}' to download vocabulary files.
I think my network settings of EC2 are not correct due to which I am unable to connect to external Huggingface repository.
I tried relaxing the inbound rules for EC2 to IP version|Type|Protocol|Port range|Destination=>IPv4|All|traffic|All|All|0.0.0.0/0 but even that doesn't help. The outbound rules are already IPv4|All|traffic|All|All|0.0.0.0/0.
I also tried creating an IAM role with policy AmazonS3ReadOnlyAccess and attached it to the EC2 instance but still getting the same error.
Could someone point what needs to be done to solve this. Thanks.
Here is how i fixed this issue.
i installed pyopenssl like this :
!pip install pyopenssl
then i restarted terminal and re-ran the code and it fixed the issue for me,thanks
might be your network is using proxy
this might help
$ proxies = {"http": 'foo.bar:3128', addyourproxy:'foo.bar:4012'}
$ from transformers import pipeline
$ qt_ans = pipeline('question-answering')
How can one download files from a GCP Storage bucket to a Container-Optimised OS (COS) on instance startup?
I know of the following solutions:
gcloud compute copy-files
SSH through console
SCP
Yet all of these have to be done manually and externally after an instance is started.
There is also cloud init, yet I can't find any info on how to copy files from a Storage bucket. Examples seem to be suggesting that it's better to include content of files in the cloud init file directly, which is not something I want to do because security. Is it possible to download files from Storge bucket using cloud init?
I considered using a startup script, yet COS lacks CLI tools such as gcloud or gsutil to be able to run any such commands in a startup script.
I know I could copy the files manually and then save the image as a boot disk, but I'm hoping there are solutions that avoid having to do so.
Most of all, I'm assuming I'm not asking for something impossible, given that COS instance setup allows me to specify Docker volumes that I could mount onto the starting container. This seems to suggest I should be able to have some private files on the instance the moment COS will attempt to run my image on startup. But how?
Trying to execute a startup-script with a cloud-sdk image and copying files there as suggested by Guillaume didn't work for me for a while, showing this log. Eventually I realised that the cloud-sdk image is 2.41GB when uncompressed and takes over 2 minutes to complete pulling. I tried again with an empty COS instance and the startup script completed successfully, downloading the data from a Storage bucket.
However, a 2.41GB image and over 2 minutes of boot time sound like a bit of an overkill to download a 2KB file. Don't they?
I'm glad to see a working solution to my question (thanks Guillaume!) although I'm still wondering: isn't there a nicer way to do this? I feel that this method is even less tidy than manually putting the files on the COS instance and then creating a machine image to use in the future.
Based on Guillaume's answer I created and published a gsutil wrapper image, available as voyz/gsutil_wrap. This way I am able to run a startup-script with the following command:
docker run -v /host/path:/container/path \
--entrypoint gsutil voyz/gsutil_wrap \
cp gs://bucket/path /container/path
It's essentially a copy of what Guillaume suggested, except it is using an image containing only a minimum setup required to run gsutil. As a result it weighs 0.22GB and pulls within 10-20 seconds on average - as opposed to 2.41GB and over 2 minutes respectively for the google/cloud-sdk image suggested by Guillaume.
Also, credit to this incredibly useful StackOverflow answer that allows gsutil to use the default service account for authentication.
The startup-script is the correct location to do this. And YES, COS lacks some useful library.
BUT you can run container! And, for example, the Google Cloud SDK container!
So, add this startup-script in the VM metadata:
key -> startup-script
value ->
docker run -v /local/path/to/copy/files:/dummy/container/path \
--entrypoint gsutil google/cloud-sdk \
cp gs://your_bucket/path/to/file /dummy/container/path
Note: the startup script is ran in root mode. Perform a chmod/chown in your startup script if you need to change the file access mode.
Let me know if you need more explanation on this command line
Of course, with a fresh COS image, the startup time is quite long (pull the container image and extract it).
To reduce the startup time, you can "bake" your image. I mean, start with a COS, download/install what you want on it (or only perform a docker pull of the googkle/cloud-sdk container) and create a custom image from this.
Like this, all the required dependencies will be present on the image and the boot start will be quicker.
I was copying huge files in my r5a.4xlarge ec2 instance. While copying I got an error message that the disk size is full so cannot copy further.
I closed that session and start another session using the command aws ssm start-session --target <instance id>
all I get is a message that - Starting session with SessionId: <sessionid> and nothing happens.
Earlier it used to start the session very smoothly.
Can anyone help? All I want to do is enter the instance and delete the copied files.
If you know exactly where the files are located, you can try using User Data to execute clean up on next restart: https://aws.amazon.com/premiumsupport/knowledge-center/execute-user-data-ec2/
However it might not work, since cloud-init needs to create file on filesystem in order to function. In this case you can try detaching root volume, attaching it to another instance and cleaning it up from there.
Have you attempted to delete the file with SSM Run Command?
https://docs.aws.amazon.com/systems-manager/latest/userguide/rc-console.html
You can use the existing AWS-RunShellScript (or AWS-RunPowerShellScript) to execute the CLI commands to delete the files you wish to delete.
I have installed jenkins on my local machine (on premises). I have my server (Linux) in AWS Cloud. I need to share logs with developers with out giving server access to them. I need to create a jenkins job by running that job they should get the logs from server.
How can i do that ?? If any one following the same process to get the data from cloud please help me in solving this... Thanks in advance.
Use the SSH Agent plugin to securely setup your private key
Use SCP to copy the log files to the local workspace
Archive those files to the Jenkins job
You could write a pipeline script to do this. Something like:
node ("linux") {
sshagent (credentials: ['deploy-dev']) {
sh 'scp user#awshostnamehere:/somepath/somelogfile .'
archive somelogfile
}
}
Note that this requires you to fill in the blanks. To get this to work you would have to:
Setup an SSH private key credential named deploy-dev
Setup a build agent with the label 'linux' or change that to a label of an agent you do have.
I am facing issues with s3-dist-cp command in emr-5.0.0 version. In my application, I need to push some files from hdfs to S3. I am using s3-dist-cp command to achieve this. It was working fine in emr-4.2.0. But its not working in emr-5.0.0. If I run the command manually it works fine. But it fails in my application. I didn't make any change in my application to run it on emr-5.
Do I need to make any change if I need to use emr-5? Has there been any change in way we use s3-dist-cp command in emr-5?
I am using following command:
s3-dist-cp --src /user/hive/warehouse/abc.text --dest s3n://bucket/abc.text
s3-dist-cp is only available on the master node(s3-dist-cp.jar).
The following is the location of the application.
/usr/share/aws/emr/s3-dist-cp/
The s3-dist-cp.jar is not available in the slave nodes.
You can login into slave machine and verify it.
So the reason your application failure might be, In new emr you might be using some workflow management tool which deploy the application on slaves and start from there. As s3 s3-dist-cp is not available and it fails.
Work Around
First Option
bundle the jar and use following commands
hadoop jar s3-dist-cp.jar --src location --dest location
Second
Boot Strap the s3-dist-cp.jars on the cluster
You can even run it as java program
First thing, s3n:// is now deprecated, start using s3:// for S3 paths.
Secondly, if you're merely copying a file into S3 from a local file on your cluster, you can use aws s3 cp:
aws s3 cp /user/hive/warehouse/abc.text s3://bucket/abc.text
The syntax that you have used for s3-dist-cp is incorrect. Please try again with the command below.
s3-dist-cp --src hdfs:///user/hive/warehouse/abc.text --dest s3n://bucket/abc.text
Let me know if this solves your proble.