Verify VM is running or stopped from how long using kusto query language - azure-virtual-machine

I am new for kusto query language. I need some help, how to check vm is shutdown from how long and if running how long it was running. can you please help me on that because i am just starting learning on kusto query language.

Update 1:
Original answer:
When the vm is stopped, there're events named Deallocate Virtual Machine are sent to AzureActivity table. When the vm is started to running, there're events named Start Virtual Machine are sent to AzureActivity table.
So it's easy to find the vm is running or stopped by the query below(in azure monitor -> Logs):
AzureActivity
| where OperationName in ("Deallocate Virtual Machine","Start Virtual Machine")
| project TimeGenerated,OperationName
| top 1 by TimeGenerated desc
If the query result contains Deallocate Virtual Machine, it means the vm is in stopped status. Otherwise, it's in running status. The screenshot is as below:
Next, since we know the vm status, for example, the vm is in stopped status, then we can write query to calculate how long since it's stopped. To do that, we can use the current time to minus the time when the vm is stopped. The query like below:
let stop_time = AzureActivity
| where OperationName == "Deallocate Virtual Machine"
| project TimeGenerated
| top 1 by TimeGenerated desc;
AzureActivity
| extend the_time = now() - toscalar(stop_time)
| project the_time
| top 1 by the_time
Here is the test result:
You can also modify the query above to calculate the running time if the vm is now in running status.

Related

Why does my Python app always cold start twice on AWS lambda?

I have a lambda, in Python where I am loading a large machine learning model during the cold start. The code is something like this:
uuid = uuid4()
app_logger.info("Loading model... %s" % uuid)
endpoints.embedder.load()
def create_app() -> FastAPI:
app = FastAPI()
app.include_router(endpoints.router)
return app
app_logger.info("Creating app... %s" % uuid)
app = create_app()
app_logger.info("Loaded app. %s" % uuid)
handler = Mangum(app)
The first time after deployment, AWS Lambda seems to start the Lambda twice as seen by the two different UUIDs. Here are the logs:
2023-01-05 21:44:40.083 | INFO | myapp.app:<module>:47 - Loading model... 76a5ac6f-a4fc-490e-b21c-83bb5ef458eb
2023-01-05 21:44:42.406 | INFO | myapp.embedder:load:31 - Loading embedding model
2023-01-05 21:44:50.626 | INFO | myapp.app:<module>:47 - Loading model... c633a9c6-bcfc-44d5-bacf-9834b39ee300
2023-01-05 21:44:51.878 | INFO | myapp.embedder:load:31 - Loading embedding model
2023-01-05 21:45:00.418 | INFO | myapp.app:<module>:59 - Creating app... c633a9c6-bcfc-44d5-bacf-9834b39ee300
2023-01-05 21:45:00.420 | INFO | myapp.app:<module>:61 - Loaded app. c633a9c6-bcfc-44d5-bacf-9834b39ee300
This happens consistently. It executes it for 10 seconds the first time, then seems to restart and do it again. There are no errors in the logs that indicate why this would be. I have my Lambda configured to run with 4G of memory and it always loads with < 3GB used.
Any ideas why this happens and how to avoid it?
To summarize all the learnings in the comments so far:
AWS limits the init phase to 10 seconds. This is explained here: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtime-environment.html
If the app exceeds 10 seconds, it gets inited again without this limit
If you hit the 10 second limit, there are two ways to deal with this:
Init the model after the function is loaded during the invocation. The downsides being that you don't get the CPU boost and lower cost initialization.
Use provisioned concurrency. Init is not limited to 10 seconds, but this is more expensive and can still run into the same problems as not using it, e.g. if you get a burst in usage.
Moving my model to EFS does improve startup time compared to S3 and Docker layer caching, but it is not sufficient to make it init in < 10 seconds. It might work for other use cases with slightly smaller models though.
Perhaps someday SnapStart will address this problem for Python. Until then, I am going back to EC2.

GCP - Initate a shutdown to an instance after certein time when it started (for example 3 hours after started)

I have instances in GCP.
I can schedule a time to start and stop using the scheduler.
But, I don't want a specific time of the day, I want a specific time after instance was started.
For example - Stop the instance after 8 hours the instance is up and running.
You can add the contents of a startup script directly to a VM when you create the VM.
You can also pass a Linux startup script directly to an existing VM:
In your Cloud Console go to VM Instance page and click on the instance you want to pass the start up script
Click Edit.
Under Automation, specify the following:
#! /bin/bash
shutdown -P +60
-P Instructs the system to shut down and then power down.
The time argument specifies when to perform the shutdown operation.
The time can be formatted in different ways:
First, it can be an absolute time in the format hh:mm, where hh is the hour (1 or 2 digits, from 0 to 23) and mm is the minute of the hour (in two digits).
Second, it can be in the format +m, where m is the number of minutes to wait.
Also, the word now is the same as specifying +0; it shuts the system down immediately.

Abrruptly CPU usage goes to 99% in ec2

I have an EC2 in aws. after every few hours I see the CPU usage goes to 99%. I am unable to find the process causing this issue.
Is there any flag which i can set to see the culprit process when I restart the ec2 instance?
I am running ubuntu 20 in the EC2 and the instance type is t2 micro.
Below are processes that I am running
Mysql
Mongo
A spring boot service
I think if these process is causing the issue then it should happen after few minutes of when I start these services but it is happening in absurd way after few hours
You can use the top program to see what's consuming the most CPU.
This program is normally used from a terminal window, as it refreshes the display every few seconds (by default, 10, but you can change this). If you consistently see this performance issue, then simply log in, run top, and look at what it says are the top CPU consumers.
You can also use it in the case where CPU consumption spikes and then reduces: if that consistently happens with one program, then its total CPU will reflect that fact. Change the sort order to select the TIME+ field.
Finally, if you don't want to keep a terminal window open, you can run top in "batch" mode and write the output to a file. Here's how to invoke it every second, and only capture the top 10 CPU consumers:
while true ; do top -b -n 1 | head -17 >> /tmp/top.log ; sleep 1 ; done

Using Aws-RDS, getting too many db connections with only 2 app users?

using Amazon-RDS, with medium sized instance (db.t2.medium) has max connections limit of aroud 400, still get almost full db connections, even when only 2 users are using the app, using it with mobile apis only (android) not making calls from anywhere else.
What might be the issue, where are all these connections coming from ?
DDOS ? can ddos led to this, but we bought brand new server
You're probably not closing connections when you're done with them.
Log into the database as the root user and execute this query:
select HOST, COMMAND, count(*) from INFORMATION_SCHEMA.PROCESSLIST group by 1, 2;
It will give you output that looks like this:
+-----------+---------+----------+
| HOST | COMMAND | count(*) |
+-----------+---------+----------+
| localhost | Query | 1 |
| localhost | Sleep | 1 |
+-----------+---------+----------+
If you have two users with stable IP addresses, you'll probably see four lines of output: two for each user, with a high count for Sleep. This indicates that you're leaving connections open.
If you're running on mobile, however, the IP addresses may not be stable. You'll need to do a second level of analysis to see if they're all from the same ISP(s).
The only way that a DDOS would fill up your connection pool is if you've leaked the database password. If that's the case, you should consider your database corrupted and start over (with more attention to security).

Autoscale workers on Digital Ocean

I have 3 Webservers. DB, Web and Worker. The worker is just processing sidekiq processes all day long.
As soon the queue is over 100.000 jobs, I want to have a second worker instance and I do struggle a little bit with the right thinking of how to do it. (and if the queue is above 300.000 I need 3 workers, on and on).
I take my Worker and make a snapshot.
Via Digital-Ocean::API I will create a new instance, based on that image.
As soon as the instance is booting it needs to update the code from the Git-Repository
I need to tell the Database Server that it is allowed to receive connections from this instance IP
as soon the the queue is below 20.000 i can kill my instance.
Is this the right way of doing or are there better ways of how to do ? Am i missing something?
Additional Question:
On DB i only have mysql and redis. no ruby or anything else. so there is also no rails to run. If my worker decides, to create another worker, the new one needs to have access to mysql. It seems like to be impossible to create some access from a remote machine and it looks I need to create the access from the db server.
mysql> show grants;
+-----------------------------------------------------------------------------------------+
| Grants for rails#162.243.10.147 |
+-----------------------------------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'rails'#'162.243.10.147' IDENTIFIED BY PASSWORD <secret> |
| GRANT ALL PRIVILEGES ON `followrado`.* TO 'rails'#'162.243.10.147' |
+-----------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
mysql> CREATE USER 'rails'#'162.243.243.127' IDENTIFIED BY 'swag';
ERROR 1227 (42000): Access denied; you need (at least one of) the CREATE USER privilege(s) for this operation
Is this the right way of doing or are there better ways of how to do ? Am i missing something?
Yes - seems reasonable
as soon the the queue is below 20.000 i can kill my instance.
Maybe let it linger around for a while before killing it in case the queue goes up again
On DB i only have mysql and redis. no ruby or anything else. so there is also no rails to run. If my worker decides, to create another worker, the new one needs to have access to mysql. It seems like to be impossible to create some access from a remote machine and it looks I need to create the access from the db server.
Yes, you need to create access from the db server - in general the access is granted to the entire VPC cidr and not just single server IPs [more common when static instances] - specially if you plan to launch dynamic instances with constantly changing IPs