Is there a way to avoid some default values being pushed into metric through Pushgateway - prometheus-pushgateway

I'm exploring data scrapping options through Pushgateway in Prometheus. As per pushgateway mechanism goes it always overwrite the record unless there is one unique label is present.
At times I see different instance (ip address), pod (k8 pod name) in the metric making 2 different records. Due to which I'm unable to identify the latest data.
Is there a way where we can control the default values being inserted through Pushgateway.

Related

How to get the AWS RDS maximum connections from the AWS API?

Is there a way to get the maximum number of connections for an RDS database from the AWS API?
I know that you can get the current number of connections from the DatabaseConnections Cloudwatch metric, but I'm looking to get the maximum/limit of connections possible for the database.
I also know that you can get it from within the database. For example, in Postgres, you can run:
postgres=> show max_connections;
However, I would like to get the value from outside the database.
I read in this this documentation about the max_connections DB instance parameter.
In most cases, the max_connections instance parameter is a value like this LEAST({DBInstanceClassMemory/9531392},5000) which depends on the DBInstanceClassMemory formula variable.
I read in this documentation that DBInstanceClassMemory can depend on several factors and is lower than the memory figures shown in the instance class tables.
Is there a way to get the DBInstanceClassMemory value from the API?
It looks like the AWS Console is able to get the value from outside of the database. See the red line in the graph below:
Edit: I found the JavaScript function that calculates the maximum number of connections in the AWS Console. It's called getEstimatedDefaultMaxConnections and it basically just divides the instance class' memory (from the instance class table) by the default memory-per-maximum-connections value (i.e. the default formula listed in the documentation). So, it ignores the fact that DBInstanceClassMemory will be less than the instance class' memory and it also ignores any changes you make to the max_connections DB instance parameter.
Is there a way for me to get that value using the API or to calculate it based on the DBInstanceClassMemory value (if it is available via the API)?
I ended up calculating an estimate of the number of maximum connections by fetching the max_connections DB parameter of the database, parsing it and evaluating it.
To get an estimate of the DBInstanceClassMemory value, I first fetched all of the available instance types using describe-instance-types and saved it to a file. I set DBInstanceClassMemory to 90% of this value to account for the memory lost to OS and RDS processes.
Then I:
Iterated through all of my RDS instances using DescribeDBInstances,
Fetched the DB parameters for each database using DescribeDBParameters and filtered for the max_connections parameter.
Parsed the max_connections parameter and evaluated the function using my estimated DBInstanceClassMemory for the database.

Fetch desired ID contents from multiple servers

Imagine I have a distributed system with 500 servers. I have a main database server that stores some metadata and each entry’s primary key is the content ID. The actual content that’s related to the content ID spreads across 500 servers. But not all contentID’s content is in the 500 servers yet. Say only half of them are on the 500 servers.
How could I find out the contentIDs that are not deployed to any one of the 500 servers yet?
I’m thinking using map reduce style way to solve this but not sure how would the process be like.
Given the context in the question:
You can build a table in your database containing information about contentID to instance mapping.
Whenever an instance has a data for the given content ID, it needs to make a call and register the contentID.
If your instances can crash and you need to remove those content, you can implement health-check which will try to update your database every 30seconds~ 1 minute.
Now, whenever you need to access the instanceID for a given contentID and whether it has been loaded or not you can refer to the table above and check if the contentID has a instanceID with health-check time within 1 min.
Note: You can also consider using Zookeeper or In-Memory datastore like Redis for storing this data as well.

How to achieve strong delete consistency in dynamoDB Global Tables

In DynamoDB global tables, if one region receives a delete request on a record, and another region receives update during the same time, how can we ensure the Delete operation takes precedence and the record does not live after the conflict resolution.
In other words can we achieve Strong Delete Consistency for global tables?
You will never be able to guarantee strong consistency for global tables.
However, it sounds like there is a specific race condition you are trying to prevent where an update overwrites a delete, and that is possible to prevent.
The simplest way to guarantee that a delete is not followed by an update is by using a specific region as the “master” region for every item. If you need to update or delete the item, use the endpoint for the master region. The drawback is that cross-region writes will have much higher latency than same region writes. However, this may be an acceptable trade-off depending on the details of your application.
How do you go about implementing this? You could add a regionId attribute to your table, and every time you create an item, you set a specific region which should be the master region for that item. Whenever you go to update/delete an item, read the item to find the item’s master region and make the update/delete request to the appropriate regional endpoint.
So that’s the principle, but there’s actually something to make it even easier for you. DynamoDB adds a few special attributes to all items in a global table (see Global Tables – How it Works), and one of those attributes is aws:rep:updateregion which is the region where it was last updated. Just make sure when you need to update or delete an item, you read that attribute and then use the endpoint for that region to make the update/delete request.
Are you looking for the update to fail with error, or is it enough that the record get deleted if it was deleted prior to the update?
If the latter, then that’s pretty much what would happen: the items get deleted; sometimes before the update, other times after; but they always get deleted. The only difference is that some of the updates would appear to succeed while other would fail, depending on order of operations
However, if you need the updates to always fail then I’m afraid you need to come up with a distributed global lock: it would be costly and slow.
If you want to see for yourself, I recommend setting up a test: create a global table and add a bunch of items (say 10,000) and then, with two DynamoDB clients from the same EC2 instance, perform DELETE and UPDATE requests in two different regions in a tight loop. At the end you should see that all items are deleted.

Impact of On-Demand mode on Audit table data for Amazon DynamoDB

I am working on Amazon DynamoDB audit table.
The read/write mode was set to "Provisioning". Now, the mode is changed to "On-Demand". I have an "Audit Table" (which captures the audit information like date and time of operation, user details, etc) associated with DynamoDB.
My questions on this are:
1) How is it impacting the data that gets created in the "Audit Table"?
2) Will the data be deleted automatically on timely bases?
3) If not, what is the maximum limit of data that a table (audit table in this case) can persist?
Please let me know if you need any more information from my side.
Waiting for your answers on my questions.
Thanks and regards,
Mahesh Bongale
Provisioning just means that the table is initializing with whatever read/write capacity you set, or OnDemand capacity if you set it to that mode (similar to an auto-scaling mode where it will always deliver the throughput needed by your application). More info: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html
No, absolutely not, unless you specifically add code that will delete old data OR set a specific TTL on your data. More info: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/TTL.html
There is no specific limit on the number of rows in a given table. It can be as much as you want. There are a few limits though on a few things, some can be lifted if you ask AWS, some can not: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html

DynamoDB local db limits - use for initial beta-go-live

given Dynamo's pricing, the thought came to mind to use DynamoDB Local DB on an EC2 instance for the go-live of our startup SaaS solution. I've been trying to find like a data sheet for the local db, specifying limits as to # of tables, or records, or general size of the db file. Possibly, we could even run a few local db instances on dedicated EC2 servers as we know at login what user needs to be connected to what db.
Does anybody have any information on the local db limits or on this approach? Also, anybody knows of any legal/licensing issues with using dynamo-local in that way?
Every item in DynamoDB Local will end up as a row in the SQLite database file. So the limits are based on SQLite's limitations.
Maximum Number Of Rows In A Table = 2^64 but the database file limit will likely be reached first (140 terabytes).
Note: because of the above, the number of items you can store in DynamoDB Local will be smaller with the preview version of local with Streams support. This is because to support Streams the update records for items are also stored. E.g. if you are only doing inserts of these items then the item will effectively be stored twice: once in a table containing item data and once in a table containing the INSERT UpdateRecord data for that item (more records will also be generated if the item is being updated over time).
Be aware that DynamoDB Local was not designed for the same performance, availability, and durability as the production service.