I am using vertex-ai endpoints to serve a deep learning service.
My service takes approximately 30s - 2 minutes to respond on CPU depending on the size of the input. I noticed that when the input size takes more than one minute to respond, the API fails, giving me this error:
<!DOCTYPE html>
<html lang=en>
<meta charset=utf-8>
<meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
<title>Error 502 (Server Error)!!1</title>
<style>
*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}#media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}#media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}#media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}
</style>
<a href=//www.google.com/><span id=logo aria-label=Google></span></a>
<p><b>502.</b> <ins>That’s an error.</ins>
<p>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds. <ins>That’s all we know.</ins>
When I retry, I keep getting the same error. Once I decrease the input size, the API starts working again. For these reasons, I believe this is a timeout issue.
So my question is: how can I change the timeout value in vertex-ai endpoints? I read through all the documentation, and it doesn't seem to be mentioned anywhere.
Thank you.
There is an upper limit on the timeout of about 60s plus some extra overhead. So anything approaching 2m is definitely the reason why you're getting that error. It also isn't configurable.
Are there ways to speed up the model serving overhead? Such as deploying on faster hardware, other model optimizations? If you're running a custom container, perhaps take advantage of more cores, reduce any external dependencies
Related
I have a healthy streaming sent to AWS IVS.
When using the very same javascript code given by AWS to play the streaming, it's not working :
I got the code from here
<script src="https://player.live-video.net/1.7.0/amazon-ivs-player.min.js"></script>
<video id="video-player" playsinline></video>
<script>
if (IVSPlayer.isPlayerSupported) {
const player = IVSPlayer.create();
player.attachHTMLVideoElement(document.getElementById('video-player'));
player.load("https://b9ce18423ed5.someregion.playback.live-video.net/api/video/v1/someregion.242198732211.channel.geCBmnQ6exSM.m3u8");
player.play();
}
</script>
The playback URL is coming from the IVS channel.
When running this code, nothing happens and the video tag source is set to :
<video id="video-player" playsinline="" src="blob:null/b678db19-6b9a-42fc-979e-1e0eda4a3b46"></video>
There is no code from my side. It's only AWS code. Is that a bug or am I doing something wrong ?
Thanks.
Regards,
You can confirm if the stream is healthy using the AWS Management console. If it's loading in the Live stream tab, then it should play in the AWS IVS player that you've integrated.
I used the below code & upon successful streaming, it was loading fine.
<!DOCTYPE html>
<html>
<head>
<!--Set reference to the AWS IVS Player-->
<script src="https://player.live-video.net/1.8.0/amazon-ivs-player.min.js">
</script>
<!--Create video tags-->
<video id="video-player" playsinline controls muted="true" height="500px" width="700px"></video>
<script>
//Once AWS IVS Player is loaded, it creates a global object - "IVSPlayer". We use that to create a player instance that loads the playback URL & plays it the connected video html element.
if (IVSPlayer.isPlayerSupported) {
const player = IVSPlayer.create();
player.attachHTMLVideoElement(document.getElementById('video-player'));
player.load("*PLACE_YOUR_PLAYBACK_URL_HERE{.m3u8 extention}*");
player.play();
}else{
console.warn("Error: Browser not supported!");
}
</script>
</head>
<body>
</body>
</html>
Here is the reason why in my case the IVS stream didn't play. Maybe it may help someone else.
In my case, it was not playing because the video streamed was completely black. So it thinks the video stream is "empty". Once I got something in the video, it was playing properly.
I am trying to run an RNA-Seq application with Google Cloud Functions. To run this application I need to be able to have over 800 functions running concurrently. This has been achieved using AWS Lambda, but I have not been able to do this on Google Cloud Functions.
When I attempt to run hundreds of basic HTTP requests with the default HTTP trigger, I start getting tons of 500 errors:
<html><head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<title>500 Server Error</title>
</head>
<body text=#000000 bgcolor=#ffffff>
<h1>Error: Server Error</h1>
<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>
<h2></h2>
</body></html>
If I check the logs for my function, I see no error messages! The cloud console makes it seem like everything is perfect even though my requests are failing.
How should I got about diagnosing this problem? It looks like it's something wrong on Google's end, as my code works fine when requests do go through. Does Google limit the amount of HTTP requests you can make?
Any help would be really appreciated.
The scaling limit for HTTP type functions is different than background functions. Please read the documentation about scalability to be clear on the limits.
For background functions, it will scale gradually up to 1000 current invocations. Since you're writing an HTTP function, this does not apply.
For HTTP functions, note that rates are limited by the amount of outbound network bandwidth generated by the function (among other things). You will have to take a close look at what your function is actually doing to figure out if it's exceeding the documented rate limits.
If you can limit what the function is doing internally to meet the scalability limits, one thing you can try is to shard your functions. Instead of one HTTP function, create two, and split traffic between them. The stated limits are per-function (not per-project), so you should be able to handle more load that way.
I have followed Splash's FAQ for production setups and my system currently looks like this:
1 Scrapy Container with 6 concurrency requests.
1 HAProxy Container that load balance to splash containers
2 Splash Containers with 3 slots each.
I use docker stats to monitor my setup and I never get more than 7% CPU usage or more than 55% Memory usage.
I still get a lot of
DEBUG: Retrying <GET https://the/url/ via http://haproxy:8050/execute> (failed 1 times): 504 Gateway Time-out
For every successful request I get 6-7 of these timeouts.
I have experimented with changing the slots of the splash containers and the amount of concurrency requests. I've also tried running with a single splash container behind the HAProxy. I keep getting these errors.
I'm running on a AWS EC2 t2.micro instance which have 1gb memory.
I suspect that the issue is still related to the splash instance getting flooded. Is there any advice you can give me to reduce the load of the Splash instances? Is there a good ratio between slots and concurrency requests? Should I throttle requests?
I'm using AppFabric to share session between 2 or more different Web Applications.
But I got a problem which said: "Expected the Session item 'FULL_NAME' to be present in the cache, but it could not be found, indicating that the current Session is corrupted. Please verify that LRU eviction has been disabled for the cache used to store Session."
My configuration:
<dataCacheClient>
<hosts>
<host name="CACHE1" cachePort="22233" />
<host name="CACHE2" cachePort="22233" />
<host name="CACHE3" cachePort="22233" />
</hosts>
<machineKey validationKey="C7415df6847D0C0B5146F5605B5973EBD59kjh67EE6414ECD534A95F528F153B6B5F42CFFA9EBF65B2169F7DA5D801C0F9053454A159505253DC687A" decryptionKey="3AE9EE73F1A2781B73DEC6C3fgdgdfD28E0C730284DD314118DA8B" validation="SHA1" decryption="AES" />
<sessionState timeout="40" mode="Custom" customProvider="AFCacheSessionStateProvider">
<providers>
<add name="AFCacheSessionStateProvider" type="Microsoft.Web.DistributedCache.DistributedCacheSessionStateStoreProvider, Microsoft.Web.DistributedCache" cacheName="XXXXX" shareID="YYYYY" retryCount="10" useBlobMode="false" />
</providers>
</sessionState>
is there anyone know what is problem?
It's not the asp.net config you need to check, it's the cache itself.
On a cache host, see the output of the command Get-CacheConfig XXXXX - as the docs say you will see output like this: check the EvictionType:
CacheName : Cache1
TimeToLive : 10 mins
CacheType : Partitioned
Secondaries : 0
IsExpirable : True
EvictionType : LRU
NotificationsEnabled : False
For more information on eviction settings, see the Expiration and Eviction documentation. If your eviction type is not LRU, check cache memory usage, as that page states:
When the memory consumption of the cache service on a cache server
exceeds the low watermark threshold, AppFabric starts evicting objects
that have already expired. When memory consumption exceeds the high
watermark threshold, objects are evicted from memory, regardless of
whether they have expired or not, until memory consumption goes back
down to the low watermark.
There's an expiration troubleshooting page with more info.
C:>vmc stats smile
DL is deprecated, please use Fiddle
Getting stats for smile... OK
instance cpu memory disk
0 0.5% 198.2K of 1G 54.6M of 2G
The 404 is coming from Tomcat which means connections are being forwarded correctly. Looking at the location the 404 is coming from;
/WEB-INF/views/WEB-INF/views/home.jsp.jsp
one can only assume there is something suspect with the way requests are being routed with in the application itself.