Azure. The Run Command Feature of Doesn't work on VM Created From Snapshot - azure-virtual-machine

We have 2 VMs, VM1 is the original VM, VM2 is created from a Snapshot of VM1.
Run Command runs fine on VM1, but when executed with PowerShell Invoke-AzVMRunCommand it times out. When doing "Run command" from Portal no changes (and no timeout) appear.
What can be the reason for that?
We are initiating Run Command via PowerShell:
Invoke-AzVMRunCommand -ResourceGroupName Restore-RG -VMName DB-SLV-IMG-VM-LatestRestore-VM -CommandId 'RunPowerShellScript' -ScriptPath C:\Scripts\Script1.ps1
And the contents of Script1.ps1 are:
. C:\Scripts\TargetScript.ps1
The contents of the TargetScript.ps1 on VM1 and VM2 are:
New-Item C:\Scripts\File1

You could ensure that you follow the correct steps to create the Azure VM from that snapshot.
To create an Azure VM from a snapshot, you need to create a managed disk from a snapshot and then attaching the new managed disk as the OS disk. For more information, see the sample in Create a VM from a snapshot with PowerShell.
After my validation, if Run Command runs fine on VM1, you should see the Script1.ps1 in such path C:\Packages\Plugins\Microsoft.CPlat.Core.RunCommandWindows\1.1.5\Downloads on the target VM.
Then you create a snapshot at this time and create a managed disk from this snapshot then you create a new Azure VM2 by attaching the new managed disk. You will see the same data on the VM2 as the VM1.
If you run command for VM2 at this time, the following message will show up unless you remove the File1 in the path C:\Scripts\File1. This is expected.
The file 'C:\Scripts\File1' already exists.

Related

AWS: Userdata block on EC2 launch template is not running the provided powershell script

I'm working on migrating a VM from azure to AWS. I have successfully migrated using a migration service and it boots up a VM on completion. I had created an AMI out of that VM which also turned out to be successful. But when I try creating a ec2 or a autoscale group out of this ami, im unable to curl http://169.254.169.254/ or any of the ec2 metadata. This is due to the fact that the ec2i is using the gateway from the previous config from azure to make any internal network calls. When I run the InitializeInstance.ps1 script that comes inside the ec2, the instance is able to facilitate the right gateway and external ip etc.
But since I'm going to run them as autoscale groups, I cannot run this script everytime ASG spins up a new ec2 based on load. Hence I tried executing the script on 'User Data' part of the launch template that this ASG uses. But that doesnt seem to deliver expected results. Help me out in finding a way to solve this.
Ec2 launch template -- UserData:
<powershell> C:\ProgramData\Amazon\EC2-Windows\Launch\Scripts\InitializeInstance.ps1 </powershell>
Im assuming that the ec2 pulls the userdata scripts from 'http://169.254.169.254/latest/user-data' and since this gives out a timeout, its not able to execute the user data script. Correct me if Im wrong
Executing the script through the shell of the VM, but this is exhaustive and not a great practice
Using the User data in the ec2 launch template, but that apparently is not executing the listed scripts since I even tried a simple powershell script to create a new file. The file was never created.
<powershell> $file = $env:SystemRoot + "\Temp\" + (Get-Date).ToString("MM-dd-yy-hh-mm") New-Item $file -ItemType file </powershell>

Cannot SSH into the GCP VM instances that used to work

I created a few GCP VM instances yesterday all using the same configuration but running different tasks.
I could SSH into those instances via the GCP console and they were all working fine.
Today I want to check if the tasks are done, but I cannot SSH into any of those instances via the browser anymore...The error message reads:
Connection via Cloud Identity-Aware Proxy Failed
Code: 4010
Reason: destination read failed
You may be able to connect without using the Cloud Identity-Aware Proxy.
So I retried with Cloud Identity-Award Proxy disabled. But then it reads:
Connection Failed
An error occurred while communicating with the SSH server. Check the server and the network configuration.
Running
gcloud compute instances list
displayed all my instances and the status is RUNNING.
But when I ran
gcloud compute instances get-serial-port-output [instance-name]
using the [instance-name] returned from the above command. (This is to check if the boot disk of the instance has run out of free space.)
It returned
(gcloud.compute.instances.get-serial-port-output) Could not fetch serial port output: The resource '...' was not found
Some extra info:
I'm accessing the VM instance from the same internet (my home internet) and everything else is the same
I'm the owner of the project
My account is using a GCP free trial with $300 credit
The instances have machine type c2-standard-4 and are using Linux Deep Learning
The gcloud config looks right to me:
$ gcloud config list
[component_manager]
disable_update_check = True
[compute]
gce_metadata_read_timeout_sec = 5
[core]
account = [my_account]
disable_usage_reporting = True
project = [my_project]
[metrics]
environment = devshell
Update:
I reset one of the instances and now I can successfully SSH into that instance. However the job running on the instance stopped after reset.
I want to keep the jobs running on the other instances. Is there a way to SSH into other instances without reset?
You issue is at the VM side. Task's you're running make the ssh service unable to accept incoming connection and only after the restart you were able to connect.
You should be able to see the instance's serial console output using gcloud compute instances get-serial-port-output [instance-name] but if for some reason you're not You may try instead using GCP console - go to the instance's details and click on Serial port 1 (console) and you will see the output.
You may even interact with your VM (login) via the console. This is particularily usefull if something stopped the ssh service but for that you need a login/password so first you have to access the VM or use the startup script to add a user with your password. But then again - this requires a restart.
In either case it seems that the restarting your VM's is the best option. But you may try to figure out what is causing ssh service to stop after some time by inspecting logs. Or you can create your own (disk space, memory, cpu etc) by using cron with df -Th /mountpoint/path | tail -n1 >> /name_of_the_log_file.log.
You can for example use cron for checking & starting ssh service.
And if something doesn't work as supposed to (according to documentation) - go to the IssueTracker and create a new issue to get more help.

AWS EFS Backup using Datapipeline

I want to take backup of my EFS production environment.
I have setup solution as defined in walk through
efsbackup walkthrough. with 2 EFS file system: Production & Backup
I created 4 security groups :
efs-mt-sg (EFS SG) & add access to efs-ec2-sg on NFS port.
efs-ec2-sg(EC2 SG).
efs-backup-mt-sg (backup EFS SG) & add access
to efs-ec2-backup-sg on NFS port.
efs-ec2-backup-sg(backup EC2
SG).
I have setup data pipline using template 1-Node-EFSBackupPipeline.json
Now when I activate this pipeline it runs well with status Finished and shows me logs as in Stdout with all command executed, but i don't see any backup in my backup EFS when i mount in ec2 and check size of same comapring with production EFS.
Where as, when I add an EC2 instance using security groups efs-ec2-sg,efs-ec2-backup-sg and run all specified commands in efs-backup.sh it works well and i can see this EFS file system with backup files. How can i get this data pipeline work through?
So this is precisely the problem I had when I first setup this backup system back in Dec '16.. Turns out the base ami that datapipeline spins up to execute the backups isn't properly configured with the nfs tools needed. So it attempts to mount but fails and subsequently just trots along as if everything is ok.. Define the AMI in your pipeline - check the latest jsons in github, and plug in the latest AMI for your region - easily found with a google search, you should be good to go.
Or you can just edit your pipeline and add the ImageId attribute with value to your various EC2Resources...

How to run Python Spark code on Amazon Aws?

I have written a python code in spark and I want to run it on Amazon's Elastic Map reduce.
My code works great on my local machine, but I am slightly confused over how to run it on Amazon's AWS?
More specifically, how should I transfer my python code over to the Master node? Do I need to copy my Python code to my s3 bucket and execute it from there? Or, should I ssh into Master and scp my python code to the spark folder in Master?
For now, I tried running the code locally on my terminal and connecting to the cluster address ( I did this by reading the output of --help flag of spark, so I might be missing a few steps here)
./bin/spark-submit --packages org.apache.hadoop:hadoop-aws:2.7.1 \
--master spark://hadoop#ec2-public-dns-of-my-cluster.compute-1.amazonaws.com \
mypythoncode.py
I tried it with and without my permissions file i.e.
-i permissionsfile.pem
However, it fails and the stack trace shows something on the lines of
Exception in thread "main" java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3n URL, or by setting the fs.s3n.awsAccessKeyId or fs.s3n.awsSecretAccessKey properties (respectively).
at org.apache.hadoop.fs.s3.S3Credentials.initialize(S3Credentials.java:66)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.initialize(Jets3tNativeFileSystemStore.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
......
......
Is my approach correct and I need to resolve the Access issues to get going or am I heading in a wrong direction?
What is the right way of doing it?
I searched a lot on youtube but couldn't find any tutorials on running Spark on Amazon's EMR.
If it helps, the dataset I am working on it is part of Amazon's public dataset.
go to EMR, create new cluster... [recommendation: start with 1 node only, just for testing purposes].
Click the checkbox to install Spark, you can uncheck the other boxes if you don't need those additional programs.
configure the cluster further by choosing a VPC and a security key (ssh key, a.k.a pem key)
wait for it to boot up. Once your cluster says "waiting", you're free to proceed.
[spark submission via the GUI] in the GUI, you can add a Step and select Spark job, and upload your spark file to S3, and then choose the path to that newly uploaded S3 file. Once it runs it will either succeed or fail. If it fails, wait a moment, and then click "view logs" over on the of that Step line in the list of steps. Keep tweaking your script until you've got it working.
[submission via the command line] SSH into the driver node following the ssh instructions at the top of the page. Once inside, use a command-line text editor to create a new file, and paste the contents of your script in. Then spark-submit yourNewFile.py. If it fails, you'll see the error output straight to the console. Tweak your script, and re-run. Do that until you've got it working as expected.
Note: running jobs from your local machine to a remote machine is troublesome because you may actually be causing your local instance of spark to be responsible for some expensive computations and data transfer over the network. So thats why you want to submit AWS EMR jobs from within EMR.
There are typical two ways to run a job on an Amazon EMR cluster (whether for Spark or other job types):
Login to the master node an run Spark jobs interactively. See: Access the Spark Shell
Submit jobs to the EMR cluster. See: Adding a Spark Step
If you have Apache Zeppelin installed on your EMR cluster, you can use a web browser to interact with Spark.
The error you are experiencing is saying that files where accessed via the s3n: protocol, which requires AWS credentials to be provided. If, instead, the files were accessed via s3:, I suspect that the credentials would be sourced from the IAM Role that is automatically assigned to nodes in the cluster and this error would be resolved.

AWS EC2 - Run a Script on Instance Launch With Windows Server 2012

I would like to run a script to clear out a folder (ie: C:/myfolder) on Windows Server 2012. I thought about adding an item to the Startup Scripts list under Edit Group Policy, but this would clear out my folder any time any of my servers rebooted. I only want the folder cleared out on a new instance launch from an existing AMI.
What's the best way to achieve this?
The best way to achieve this is EC2 User Data, which is essentially a user-defined script that is executed during instance launch. On Windows, you can run user data as cmd or powershell. User Data is provided when you make a request to launch a new instance.
The existing AMI needs to be configured to run user data at launch. This can be managed from the EC2 Config Service, which Amazon provides pre-installed on community AMIs of Windows Server 2012. By default, the EC2 Config Service will execute the user data during the first launch, and then set itself to not execute user data again unless you manually change it to do so.
Here's an example from the AWS documentation where the caller is invoking Rename-Computer via powershell:
To empty out the folder without deleting the folder itself, your script will probably look something like this:
<powershell>
Remove-Item "C:\myfolder\*" -Force -Recurse
</powershell>
When running user data, it is important to be aware of what the cmdlets you're executing do, and particularly when to use the -Force flag to skip interactive prompts. Some cmdlets will situationally ask the client for input, and when you're executing user data that will cause your script to hang because this is being executed by the system user during startup.