How to get the AWS RDS maximum connections from the AWS API? - amazon-web-services

Is there a way to get the maximum number of connections for an RDS database from the AWS API?
I know that you can get the current number of connections from the DatabaseConnections Cloudwatch metric, but I'm looking to get the maximum/limit of connections possible for the database.
I also know that you can get it from within the database. For example, in Postgres, you can run:
postgres=> show max_connections;
However, I would like to get the value from outside the database.
I read in this this documentation about the max_connections DB instance parameter.
In most cases, the max_connections instance parameter is a value like this LEAST({DBInstanceClassMemory/9531392},5000) which depends on the DBInstanceClassMemory formula variable.
I read in this documentation that DBInstanceClassMemory can depend on several factors and is lower than the memory figures shown in the instance class tables.
Is there a way to get the DBInstanceClassMemory value from the API?
It looks like the AWS Console is able to get the value from outside of the database. See the red line in the graph below:
Edit: I found the JavaScript function that calculates the maximum number of connections in the AWS Console. It's called getEstimatedDefaultMaxConnections and it basically just divides the instance class' memory (from the instance class table) by the default memory-per-maximum-connections value (i.e. the default formula listed in the documentation). So, it ignores the fact that DBInstanceClassMemory will be less than the instance class' memory and it also ignores any changes you make to the max_connections DB instance parameter.
Is there a way for me to get that value using the API or to calculate it based on the DBInstanceClassMemory value (if it is available via the API)?

I ended up calculating an estimate of the number of maximum connections by fetching the max_connections DB parameter of the database, parsing it and evaluating it.
To get an estimate of the DBInstanceClassMemory value, I first fetched all of the available instance types using describe-instance-types and saved it to a file. I set DBInstanceClassMemory to 90% of this value to account for the memory lost to OS and RDS processes.
Then I:
Iterated through all of my RDS instances using DescribeDBInstances,
Fetched the DB parameters for each database using DescribeDBParameters and filtered for the max_connections parameter.
Parsed the max_connections parameter and evaluated the function using my estimated DBInstanceClassMemory for the database.

Related

Is there a way to avoid some default values being pushed into metric through Pushgateway

I'm exploring data scrapping options through Pushgateway in Prometheus. As per pushgateway mechanism goes it always overwrite the record unless there is one unique label is present.
At times I see different instance (ip address), pod (k8 pod name) in the metric making 2 different records. Due to which I'm unable to identify the latest data.
Is there a way where we can control the default values being inserted through Pushgateway.

Fetch desired ID contents from multiple servers

Imagine I have a distributed system with 500 servers. I have a main database server that stores some metadata and each entry’s primary key is the content ID. The actual content that’s related to the content ID spreads across 500 servers. But not all contentID’s content is in the 500 servers yet. Say only half of them are on the 500 servers.
How could I find out the contentIDs that are not deployed to any one of the 500 servers yet?
I’m thinking using map reduce style way to solve this but not sure how would the process be like.
Given the context in the question:
You can build a table in your database containing information about contentID to instance mapping.
Whenever an instance has a data for the given content ID, it needs to make a call and register the contentID.
If your instances can crash and you need to remove those content, you can implement health-check which will try to update your database every 30seconds~ 1 minute.
Now, whenever you need to access the instanceID for a given contentID and whether it has been loaded or not you can refer to the table above and check if the contentID has a instanceID with health-check time within 1 min.
Note: You can also consider using Zookeeper or In-Memory datastore like Redis for storing this data as well.

AWS Neptune Node counts timing out

We're running a large bulk load into AWS neptune and can no longer query the graph to get node counts without the query timing out. What options do we have to ensure we can audit the total counts in the graph?
Fails on curl and sagemaker notebook.
There are a few of things you could consider.
The easiest is to just increase the timeout specified in the cluster and/or instance parameter group, so that the query can (hopefully) complete.
If your Neptune engine version is 1.0.5.x then you can use the DFE engine to improve Gremlin count performance. You just need to enable the DFE engine using DFEQueryEngine=viaQueryHint in the cluster parameter group.
If you get the status of the load it will show you a value for the number of records processed so far. In this context a record is not a row from a CSV file or RDF format file. Instead it is the count of triples loaded in the RDF case and the count of property values and labels in the property graph case. As a simple example, imagine a CSV file with 100 rows and each row has 6 columns. Not including the ID column that is a label and 4 properties. The total number of records to load will be 100*5 i.e 500. If you have sparse rows then the calculation will be approximate unless you add up every non ID column with an actual value.
If you have the Neptune streams feature enabled you can inspect the stream and find the last vertex or edge created. Note that just enabling streams for this purpose may not be the ideal choice as it will impact the speed of the load as adding to the stream adds some overhead.

DynamoDB local db limits - use for initial beta-go-live

given Dynamo's pricing, the thought came to mind to use DynamoDB Local DB on an EC2 instance for the go-live of our startup SaaS solution. I've been trying to find like a data sheet for the local db, specifying limits as to # of tables, or records, or general size of the db file. Possibly, we could even run a few local db instances on dedicated EC2 servers as we know at login what user needs to be connected to what db.
Does anybody have any information on the local db limits or on this approach? Also, anybody knows of any legal/licensing issues with using dynamo-local in that way?
Every item in DynamoDB Local will end up as a row in the SQLite database file. So the limits are based on SQLite's limitations.
Maximum Number Of Rows In A Table = 2^64 but the database file limit will likely be reached first (140 terabytes).
Note: because of the above, the number of items you can store in DynamoDB Local will be smaller with the preview version of local with Streams support. This is because to support Streams the update records for items are also stored. E.g. if you are only doing inserts of these items then the item will effectively be stored twice: once in a table containing item data and once in a table containing the INSERT UpdateRecord data for that item (more records will also be generated if the item is being updated over time).
Be aware that DynamoDB Local was not designed for the same performance, availability, and durability as the production service.

API Gateway generating 11 sql queries per second on REG_LOG

We have sysdig running on our WSO2 API gateway machine and we notice that it fires a large number of SQL queries to the database for a minute, than waits a minute and repeats.
The query looks like this:
Every minute it goes wild, waits for a minute and goes wild again with a request of the following format:
SELECT REG_PATH, REG_USER_ID, REG_LOGGED_TIME, REG_ACTION, REG_ACTION_DATA
FROM REG_LOG
WHERE REG_LOGGED_TIME>'2016-02-29 09:57:54'
AND REG_LOGGED_TIME<'2016-03-02 11:43:59.959' AND REG_TENANT_ID=-1234
There is no load on the server. What is causing this? What can we do to avoid this?
screen shot sysdig api gateway process
This particular query is the result of the registry indexing task that runs in the background. The REG_LOG table is being queried periodically to retrieve the latest registry actions. The indexing task cannot be stopped. However, one can configure the frequency of the indexing task through the following parameter that is in the registry.xml. See [1] for more information.
indexingFrequencyInSeconds
If this table is filled up, one can clean the data using a simple SQL query. However, when deleting the records, one must be careful not to delete all the data. The latest records of each resource path should be left in the REG_LOG table since reindexing of data requires at least one reference of each resource path.
Also, if required, before clearing up the REG_LOG table, you can take a dump of the data in case you do not want to loose old records. Hope this answer provides information you require.
[1] - https://docs.wso2.com/display/Governance510/Configuration+for+Indexing