Why does the instance in my Computer Engine show normal, but all ports are unavailable, and a few hours ago, all functions were normal
A few days ago, my instance was attacked. Google sent me an email telling me that my instance was conducting mining activities and the resources were suspended. After I appealed, I deleted the instance and then recreated it. Now every time I create an instance , The instance can only be used for a few hours, then all the ports of the instance are unavailable, and the IP cannot be pinged
If someone could tell me what to do, I would really appreciate him
I would recommend you contact GCP support for this; they will be able to investigate your issue internally and tell you specifically the cause of the issue and the next steps for resolution.
https://cloud.google.com/support-hub
Related
I've been running a few ML training sessions on a GCE VM (with Colab). At the start they are saving me a good deal of time/computing resources, but, like everything Google so far, ultimately the run time disconnects and I cannot reconnect to my VM despite it still being there. a) how do we reconnect to a runtime if the VM exists, we have been disconnected, and it says it cannot reconnect to runtime?
b) how to avoid disconnecting/this issue at all? I am using Colab Pro+ and paying for VM's. And always they cut out at some point and it's just another week of time gone out the window. I must be doing something wrong as there's no way we pay for just losing all of our progress/time all the time and have to restart in hope it doesn't collapse again (it's been about 2 weeks of lost time and I'm just wondering why it GCE VM's can't just run a job for 4 days without collapsing at some point). What am I doing wrong? I just want to pay for an external resource that runs the jobs I pay for, and no connect/disconnect/lose everything issue every few days. I don't understand why Google does this.
I am working on a new Amazon Redshift database that I recently started.
I am experiencing an issue where after I connect to the database, I can run queries without any issue. However, if I spend some time without running anything (like, 5 minutes), when I try running another query or command, ir never finishes.
I am using dBeaver Community 21.2.2 to interact with the connection, and it stays "Executing query" forever. The only way i can get it to work is by cancelling, disconnecting from the redshift, connecting again and then it executes correctly. Until I stop using for some minutes, and then it's happens all over again.
I tought this was a dBeaver issue, as we have a Meabase connected to this same cluster without any issues. But today, I tried manipulating this cluster with R using RJDBC, and the same thing happens: I can run queries, until I stop, and then when I try running something else it never stops, until I disconnect and connect again.
I'm sorry if I wasn't able to explain it clearly, I tried searching for simmilar issues but couldn't.
I suspect that the queries in question are not even being launched on the database. You can check this by reviewing svl_statementtext to see if the query is even being seen. Put a unique comment in the query to help determine if it is actually the query in question.
Since I've seen similar behavior before I'll write up a possible way this can happen. In this case the queries were not being seen by the database or the connection to the database was being dropped mid execution. The cause is network switches and their configurations.
Typical network connections are fairly quick - you ask for a web page and it is given to you. Connection is complete. When you click on a link a new connection is established and also end quickly. These network actions are atomic from a network connection point of view. However, database connections are different. One connection is made and many back and forth transmissions of data happen while the connection is open. No problem and with the right set of network configurations these connections can be open and idle for days.
The problem come in when the operators of the network equipment decide that connections that have no data flowing are "stale" after some fixed amount of time. They do this so that the network equipment can "forget" about these connections and focus on "active" connections. ISPs drop idle connections a lot so that they can handle the load of traffic and connections that flow through their equipment. This doesn't cause any issues for web pages and APIs but database connections get clobbered.
When this happens is look exactly like what you describe. Both sides (client and database) think that the connection is still active but the network equipment has dropped the connection. Nothing gets through but no notification is sent either party. You will likely see corresponding open sessions on the Redshift side for these dropped connections and the database is just waiting for the client to give a command on each of them. An administrator will need to go through and close (terminate) these sessions for them to go away.
Now the thing that doesn't align with experience is the speed at which these connections are being marked as "stale". In my case my ISP was closing connections that were idle for more than 30 min. You seem to be timing out much faster than this. In some cases corporate firewalls will be configured with short idle connection timeouts for routes out of the private network to the internet. So there are cases where the timeouts can be short. The networks at AWS do not have these timeouts so if your connections are completely within AWS then this isn't your answer.
To address this there are a few ways to go. The easy way is to set up a tunnel into AWS with "keep alive" packets sent every 30 sec or so. You will need an ec2 instance at AWS so it isn't cost free. Ssh tunneling is the usual tool for this and there are write-ups online for setting it up.
The hard way (but likely most correct way) is to work with network experts to understand when the timeout is happening and why. If the timeout cannot be changed then it may be possible to configure a different network topology for your use case. Network peering or VPN could address.
In some cases you may be able to not have jdbc or odbc connections at all. You see these protocols are valid but they are old and most networking doesn't work this way anymore which is why they suffer from these issues. Redshift Data API let's you issue SQL to redshift in a single package and check on completion later on. These API calls are each independent connections so there is no possibility of "timing out" between them. The downside is this process is not interactive and therefore not supported by workbenches.
So does this match what you have going on?
When creating a Dataproc instance and connecting via Jupyterhub, it constantly disconnects. This means any work on a Jupyter notebook in the Jupyterhub connection is lost. This seems to happen very frequently and appears to happen for many users, not just me (it happened to a class of about 6 people I teach). Here are the errors that happen (centered around failed to fetch):
This seems uncharacteristically poor for Google. Is there any way to fix it, or is it some fundamental problem with Dataproc and GCP? I don't have premium support so don't know how to write in to Google directly about it.
I have been testing out a ubuntu instance on GCS for the last couple weeks and a possible home for one of our web servers. Last week suddenly everything stopped working. I was not able to SSH to shell, and I couldn't even visit the site anymore through my browser. I logged into the dashboard and nothing seemed wrong. I had several other colleges try to go to the site and it loaded without any issues. I could not find any settings in the dashboard that would suggest some kind of block like this, so i assumed I must have triggered some kind of anti spam system. I decided to give a few days before trying to mess with it any further. after 6 days of not messing with it at all I still can not visit the site, or login via SSH.
Then to verify they are blocking my IP address and that it wasn't just something wrong with my machine. I switched my IP and then everything started behaving as expected once again. I can get to the site in my browser and can once again SSH into the VM. After switching back to my previous static IP everything went back to not letting me view the webpage, or ssh into the server.
My problem is that this isn't a permanent solution for me. I have many servers that only allow login from my previous IP address so I'd rather fix the issue with this VM rather then change all those system to allow from a new IP address. Any help on finding the solution would be greatly appreciated.
Please let me know if I can provide any additional info to help find the problem.
followup info:
The way our network is set up the IP we get from DHCP is the real world IP our device is seen with (I think we own a block or something)
this is the first time i've done anything with a GCS VM
Edit: added additional information
I've been running an instance EC2 through Laravel forge for about 2000 hours and this morning got this error while trying to reach it:
SQLSTATE[08006] [7] could not connect to server: Connection refused Is
the server running on host "172...***" and accepting TCP/IP
connections on port 5432?
After SSHing into the server I've getting a similar error when trying to run a command. I've dug through AWS but don't see any errors being throw. I double checked the ip address for the instance to make sure the IP hadn't changed for any reason. Of course I'm a little behind on my backups for the application so I'm hoping someone might have some ideas why else I can do to try and access this data. I haven't made any changes to the app in about 10 days, but found the error while I was pushing an update. I have six other instances of the same app that weren't affected (thankfully) but makes me even more confused with the cause of the issue.
In case anyone comes across a similar issue, here's what had happened. I had an error running in the background which had filled up the EC2 harddrive's log. Since the default Larvel/Forge image has a DB running within in the EC2 instance, once it ran out of room everything stopped working. I was able to SSH in and delete the log though, and everything started working again.
To prevent the issue from happening again I then created an amazon RDS and used that rather than the EC2 instance. It's about three or four times the price of just an EC2 instance, but still not that much and the confidence I now have in the system is well worth it.