Amazon Web Services - Length of HostIds returned - amazon-web-services

I'm making a platform that requires allocating hosts programatically. The API returns a HostId, that's a string. (Source: boto3 docs). If anyone has experience dealing with AWS, could you tell me if this string has a constant length? And if it does, how long is it?
This is from the perspective of designing a database - specifically for setting a maximum length to the field. I don't want to assign the host ids superfluous space.

Since december 2016 aws has moved to a new format which is a resource identifier followed by a 17-character string.
Each resources have their own prefix, for example:
instance is i-xxxx....
an EBS snapshot is snap-xxxx....
reserved instances are r-xxx....
...

Related

Replicating data from SQL Server to BigQuery

I've been trying to follow instructions from Google on Replicating data from SQL Server to BigQuery available here: https://cloud.google.com/data-fusion/docs/tutorials/replicating-data/sqlserver-to-bigquery. Following instructions to the letter step by step always results in this odd error when creating the Cloud Fusion instance
Invalid argument (HTTP 400): retry budget exhausted (3 attempts): cloud-control2-saas::GCE_BAD_REQUEST: Invalid value for field 'networkPeering.name': '*******'. Must be a match of regex '(?:[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?)'.
**** is the project ID with the VPC network suffix after a dash and it looks a bit like this (I've changed values)
website.com:api-project-0000000000-default
This value is being assigned somewhere by Google, I am not given a choice to select this or enter this through the instructions when creating the Instance.
Googling the error doesn't show me anything useful and sadly I do not have budget to acquire GCP support in this instance to try and ask them why their instruction appear not to work.
I've already checked quotas, billing, service account permissions, etc. I've also tried both a new VPC as well as a shared VPC with all the settings from the guide.
Would appreciate someone more experienced in this area maybe point me in the right direction or if someone has some sort of understanding of where else to check what could be wrong I would appreciate it.
Instructions do point at creating a peering connection but the instructions themselves require the Cloud Data Fusion Instance to be created before configuring the peering connection and since I can't create the Cloud Data Fusion Instance I am unsure on what exactly I am supposed to do.
Appreciate the help!
According to this documentation, before creating a private instance I assume you're creating a VPC network.
networkPeering.name is a combination of your Project-id and VPC-network. The error which you're getting is due to incorrect naming convention of networkPeeering name. ie. the value of networkPeering.name does not match the regex expression (?:[a-z](?:[-a-z0-9]{0,61}[a-z0-9])?), which in your case is due to the project-ID: website.com:api-project-xxxxxxxxx.
Also note that networkPeering name should be less than 63 characters in length as per the regex expression.

Best way to retire an index

I am retiring an old elastic search index in AWS that has not received a new document since 2016. However, something is still trying to search it.
I still want deprecate this index in a manner manner where I can get back to the original state quickly. I have created a manual snapshot of the index and it is sitting in S3. I was planning on deleting the domain, but, from what I understand, that deletes everything billable under AWS including the end point. As I mentioned above, I want to be able to get back to the original state of the index. So this domain contains a series of indexes. The largest index is 20.5 Gb. I was going to delete the large index and resize the cluster to a smaller instance size and footprint. Will this work or will it be unsearchable?
I've no experience using Elasticsearch on AWS, but I have an idea about your index.
You say the index has received no new documents for a long time. If this also means no deletions and no updates, you could theoretically just take this index to a new cluster, using either snapshot + restore, or a cross-cluster reindex. Continue operating your old cluster until you're sure the new one is working well.
Again - not familiar with AWS terminology, but it sounds like this approach translates to using separate "domains". First you fully ensure the new "domain" is working with the right hardware spec and data, and then delete the old "domain".
TL;DR -> yes!
The backup to S3 will work, but the documents will be unsearchable because in order to downsize the storage you have to delete the index.
But if someday you want to restore the data from S3 back to the index, you can.
You can resize instances and storage sizes with no downtime, however, that takes a long time and you pay extra for the machines while they are resizing.
Example:
you change your storage size from 100gb to 99gb
elasticsearch service will spin up another instance, copy all your data from the old instance to the new one and then delete the old one.
same for instance sizes.
machine up, cluster sync, machine down.
while they are syncing, you pay for them.
your plan will work, es is very flexible.
if you really don't trust aws, just make a json export from the index and keep it on s3 too, just in case things go south.

How to Query Route53 hosted zone to check for an existing record set?

I am new to amazon Route53. As of now, I am able to create a hosted zone and a resource record set in my amazon account. But now I want to search whether a record set already exists in my hosted zone. For Example
Hosted zone "abc.com" and it has two-record set in it.
A.abc.com
B.abc.com
Now I want to query my hosted zone and find out whether A.abc.com already exists in the abc.com.
So, is there any API that I can use where I can pass my amazon credentials and my amazon hostedzone and the searched "record set" and then I can get the result back whether that record set already exists. Kindly guide me.
After research, I found out that there is "ListResourceRecordSet" which will give me the list back for a particular zone. But I don't want the list I just want to check whether the entry already exists.
I have been able to perform this check efficiently using the ListResourceRecordSet API method, and the name and maxitems parameters. You haven't specified how you are accessing the API, so I'm going to explain this using the standard AWS REST API.
Given your example:
Call the API passing A.abc.com as the name parameter and 1 as the maxitems parameter. Your request will look like this: https://route53.amazonaws.com/2013-04-01/hostedzone/{YOUR_HOSTED_ZONE_ID}/rrset?name=A.abc.com.&maxitems=1
Note that I've added a trailing dot (".") to the end of the resource name A.abc.com. The API reference indicates that it may affect result sort order so I add it just in case.
You will get back an XML result in this format:
<?xml version="1.0"?>
<ListResourceRecordSetsResponse xmlns="https://route53.amazonaws.com/doc/2013-04-01/">
<ResourceRecordSets>
<ResourceRecordSet>
<Name>A.abc.com.</Name>
<Type>A</Type>
<TTL>3600</TTL>
<ResourceRecords>
<ResourceRecord>
<Value>SOME_IP_ADDRESS</Value>
</ResourceRecord>
</ResourceRecords>
</ResourceRecordSet>
</ResourceRecordSets>
<IsTruncated>true</IsTruncated>
<NextRecordName>B.abc.com.</NextRecordName>
<NextRecordType>A</NextRecordType>
<MaxItems>1</MaxItems>
</ListResourceRecordSetsResponse>
Now you're going to have to do some parsing. Check the result to see if there is one ResourceRecordSet and if its Name property matches the name of the resource record you are looking for (you probably want to do a case-insensitive compare of the two values). Keep in mind that the Name property has that trailing period (".") at the end, so add it to the name you're searching for before doing the comparison.
If there is exactly one resource record set and the name matches the one you are looking for, it exists. If either one of those checks fails, then it does not exist.
Granted, this isn't as simple as a GetResourceRecordSet operation would be, but at least it keeps you from having to query the entire zone and parse a bunch of records. You also won't run into the long delay or throttling issues that you may using the CLI --query option.
There does not appear to be a way to use this method with the AWS CLI as it lacks a --name parameter for some reason. I can vouch for the fact that the JavaScript SDK will allow you to do this using the StartRecordName parameter.
There is no way to filter the API call, but there is a way to filter the data returned. Using the CLI you can do this with the --query option.
From the documentation: "To view all the resource record sets of a particular name, use the --query parameter to filter them out. For example:"
aws route53 list-resource-record-sets --hosted-zone-id Z2LD58HEXAMPLE --query "ResourceRecordSets[?Name == 'A.abc.com']"

AWS elasticsearch log rotation

I want to use AWS elasticsearch to store the log of my application. Since there a huge amount of data to input to AWS elasticsearch ( ~30GB daily), so i would only keep 3 days of data. Are there any way to schedule data removal from AWS elasticsearch or do a log rotation? What happen if the AWS elasticsearch storage is full?
Thanks for the help
A possible way is to specify the index parameter in elasticsearchoutput to something like logstash-%{appname}-%{date_format}". Hence you can then use curator plugin in order to delete the old indices by number of days or so.
This SO pretty much explains the same. Hope it helps!
I assume you are using the AWS Amazon Elasticsearch Service?
The storage type is an EBS volume with a fixed size of disk space. If you want to keep only the last three days, I assume you have 3 indices then, like that
my-index-2017.01.30
my-index-2017.01.31
my-index-2017.02.01
Basically you can write some simple script which deletes indices older than 3 days. With the REST API it just is in Sense DELETE my-index-2017.01.30.
I recommend to use Elasticsearch Curator for the job. See https://www.elastic.co/guide/en/elasticsearch/client/curator/current/delete_indices.html
I'm not sure if the Service interface itself has an option for that. But Elasticsearch Curator should do the job for you.
Update for 2020:
AWS ES has now support for Index state management which lets you define custom management policies to automate routine tasks and apply them to indices and index patterns. You no longer need to set up and manage external processes to run your index operations.
For example, you can define a policy that moves your index into a read_only state after 30 days and then ultimately deletes it after 90 days.
Index State Management - https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/ism.html

Is it possible to get a time for state transition for an Amazon EC2 instance?

I'm accessing EC2 with the aws-sdk for Ruby. I have an array of instances from describe_instances().
This provides me with the state of the instances and even a state transition reason. But how can I get a time for the state transition?
Edit
So I have:
client=Aws::EC2::Client()
resp =client.describe_instances({ filters })
and I would need
resp.reservations[0].instances[0].state_transition_time #=> Time
similar to
resp.reservations[0].instances[0].state_transition_reason #=> String
This information is not available via the Amazon EC2 API at this time. The aws-sdk gem returns all of the information available from the DescribeInstances operation as documented here: http://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeInstances.html
The State Transition Reason is not always populated with a date and time and may not even be populated at all per the documentation. I have not found any hints in the documentation that specify the conditions in which you DO get a date/time, but in my experience, the date/time are present in the State Transition Reason for between 30 and 90 days. After that, the reason seems to persist, but the date is dropped from the string.
All of the documentation that I can find is listed here:
Attribute Definition
EC2 API - Ruby