Memory crash on sending 100MB+ file to S3 on chrome - amazon-web-services

I'm currently using Javascript to upload some video files to S3. The process works for files <100MB, but for ~100MB plus on chrome I run into an error (this works on safari). I am using ManagedUpload in this example which should be doing multipart/form-data in the background.
Code snippet:
...
let upload = new AWS.S3.ManagedUpload({
params:{
Bucket: 'my-bucket',
Key: videoFileName,
Body: videoHere,
ACL: "public-read"
}
});
upload.promise();
...
Chrome crashes with the error RESULT_CODE_INVALID_CMDLINE_URL, dev tools crash and in the Chrome terminal logs i get this:
[5573:0x3000000000] 27692 ms: Scavenge 567.7 (585.5) -> 567.7 (585.5) MB, 23.8 / 0.0 ms (average mu = 0.995, current mu = 0.768) allocation failure
[5573:0x3000000000] 28253 ms: Mark-sweep 854.6 (872.4) -> 609.4 (627.1) MB, 235.8 / 0.0 ms (+ 2.3 ms in 2 steps since start of marking, biggest step 1.4 ms, walltime since start of marking 799 ms) (average mu = 0.940, current mu = 0.797) allocation fa
<--- JS stacktrace --->
[5573:775:0705/140126.808951:FATAL:memory.cc(38)] Out of memory. size=0
[0705/140126.813085:WARNING:process_memory_mac.cc(93)] mach_vm_read(0x7ffee4199000, 0x2000): (os/kern) invalid address (1)
[0705/140126.880084:WARNING:system_snapshot_mac.cc(42)] sysctlbyname kern.nx: No such file or directory (2)
I've tried using HTTP PUT also, both work for smaller files but once i get bigger they both crash.
Any ideas? I've been through tons of SO posts / AWS docs but nothing helped this issue yet.
Edit: I've filed the issue with Chrome; seems like its an actual bug. Will update post when I have an answer.

This issue came from loading the big file into memory (several times) which would crash chrome before it even had a chance to upload.
The fix was using createObjectURL (a url pointing to the file) instead of readAsDataUrl (the entire file itself), and when sending the file to your API, use const newFile = new File([await fetch(objectURL).then(req => req.blob()], 'example.mp4', {type: 'video/mp4'});
This worked for me as I was doing many conversions to get readAsDataUrl to the file type i wanted, but in this way i use much less space.

Related

Why would the sent_bytes in my AWS ELB vary(by a lot) for same file?

This is a question to ask why my AWS ELB logs show a "sent_bytes" value that varies, even though it's for the same file.
I am troubleshooting a problem where a particular client repeatedly fails to download the same file, and it has an error that suggests the file was incomplete or corrupted after the download or that it timed out?).
The error is "Expecting chunk trailer" on the client side, which is only found in the Mono source code library, based on search results. https://github.com/mono/mono/blob/main/mcs/class/System/System.Net/MonoChunkStream.cs.
The client stack trace is:
2022-03-17 21:02:36.249 | Thread Pool Worker | ERROR | Error downloading dictionary MyFile.xlsx
System.Net.WebException: Expecting chunk trailer.
at System.Net.MonoChunkStream.ThrowExpectingChunkTrailer () <0x102eaae70 + 0x00050> in <5d6e75c514094a0784a0885f6eeb75ed#2cbb4e9e23352cd15a5fa1bbf9e005c1>:0
at System.Net.MonoChunkStream.FinishReading (System.Threading.CancellationToken cancellationToken) <0x102eab390 + 0x00497> in <5d6e75c514094a0784a0885f6eeb75ed#2cbb4e9e23352cd15a5fa1bbf9e005c1>:0
at System.Net.HttpWebRequest.RunWithTimeoutWorker[T] (System.Threading.Tasks.Task`1[TResult] workerTask, System.Int32 timeout, System.Action abort, System.Func`1[TResult] aborted, System.Threading.CancellationTokenSource cts) <0x102f49e10 + 0x00668> in <5d6e75c514094a0784a0885f6eeb75ed#2cbb4e9e23352cd15a5fa1bbf9e005c1>:0
at System.Net.WebConnectionStream.Read (System.Byte[] buffer, System.Int32 offset, System.Int32 count) <0x102eb65f0 + 0x00118> in <5d6e75c514094a0784a0885f6eeb75ed#2cbb4e9e23352cd15a5fa1bbf9e005c1>:0
at System.IO.Stream.CopyTo (System.IO.Stream destination, System.Int32 bufferSize) <0x10266f160 + 0x000bb> in <25bf495f7d6b4944aa395b3ab5293479#2cbb4e9e23352cd15a5fa1bbf9e005c1>:0
at System.IO.Stream.CopyTo (System.IO.Stream destination) <0x10266f110 + 0x00037> in <25bf495f7d6b4944aa395b3ab5293479#2cbb4e9e23352cd15a5fa1bbf9e005c1>:0
at MyCode.SomeClass.CodeThatIsDownloadingMyFileOverHttps() <0x104870670 + 0x0052b> in <475b4753e5e54e9f908ae74cc036795f#2cbb4e9e23352cd15a5fa1bbf9e005c1>:0
Using AWS Athena, I was able to query the downloads requests on this specific file, and I created a line chart of the time + sent_bytes for requests for this specific file.
What shows, and is interesting/weird, is that the # of sent_bytes drops significantly sometimes.
What would explain the drop in the # of sent_bytes on the same file?
If the client only receives a partial result, it explains the "Expecting chunk trailer" error we see on the client.
Any docs you can point me to about sent_bytes behavior in AWS?
Could there be networking hardware that's content filtering and causing slowness/problems?
The Athena Query would look like this:
SELECT time, sent_bytes
FROM my_elb_db.prod_logs
WHERE
domain_name = 'my.site.com' AND request_verb = 'GET' AND
request_url = 'https://my.site.com:443/path/to/MyFile.xlsx' AND
from_iso8601_timestamp(time)
BETWEEN parse_datetime('2022-03-01-18:17:00','yyyy-MM-dd-HH:mm:ss')
AND parse_datetime('2022-03-22-18:18:00','yyyy-MM-dd-HH:mm:ss')
ORDER BY from_iso8601_timestamp(time)
limit 1000
Notes:
The file is 1511 KB.
There are no ELB request logs that show a status code other than 200 for this file.
In the last 90 days, this user had 40 failures (mostly) involving this file, very rare for other users

Nextjs export timeout configuration

I am building a website with NextJS that takes some time to build. It has to create a big dictionary, so when I run next dev it takes around 2 minutes to build.
The issue is, when I run next export to get a static version of the website there is a timeout problem, because the build takes (as I said before), 2 minutes, whihc exceeds the 60 seconds limit pre-configured in next.
In the NEXT documentation: https://nextjs.org/docs/messages/static-page-generation-timeout it explains that you can increase the timeout limit, whose default is 60 seconds: "Increase the timeout by changing the staticPageGenerationTimeout configuration option (default 60 in seconds)."
However it does not specify WHERE you can set that configuration option. In next.config.json? in package.json?
I could not find this information anywhere, and my blind tries of putting this parameter in some of the files mentioned before did not work out at all. So, Does anybody know how to set the timeout of next export? Thank you in advance.
They were a bit more clear in the basic-features/data-fetching part of the docs that it should be placed in the next.config.js
I added this to mine and it worked (got rid of the Error: Collecting page data for /path/[pk] is still timing out after 2 attempts. See more info here https://nextjs.org/docs/messages/page-data-collection-timeout build error):
// next.config.js
module.exports = {
// time in seconds of no pages generating during static
// generation before timing out
staticPageGenerationTimeout: 1000,
}

What is the cause of these google beta transcoding service job validation errors

"failureReason": "Job validation failed: Request field config is
invalid, expected an estimated total output size of at most 400 GB
(current value is 1194622697155 bytes).",
The actual input file was only 8 seconds long. It was created using the safari media recorder api on mac osx.
"failureReason": "Job validation failed: Request field
config.editList[0].startTimeOffset is 0s, expected start time less
than the minimum duration of all inputs for this atom (0s).",
The actual input file was 8 seconds long. It was created using the desktop Chrome media recorder api, with mimeType "webm; codecs=vp9" on mac osx.
Note that Stackoverlow wouldn't allow me to include the tag google-cloud-transcoder suggested by "Getting Support" https://cloud.google.com/transcoder/docs/getting-support?hl=sr
Like Faniel mentioned, your first issue is that your video was less than 10 seconds which is below the minimum 10 seconds for the API.
Your second issue is that the "Duration" information is likely missing from the EBML headers of your .webm file. When you record with MediaRecorder the duration of your video is set to N/A in the file headers as it is not known in advance. This means the Transcoder API will treat the length of your video is Infinity / 0. Some consider this a bug with Chromium.
To confirm this is your issue you can use ts-ebml or ffprobe to inspect the headers of your video. You can also use these tools to repair the headers. Read more about this here and here
Also just try running with the Transcoder API with this demo .webm which has its duration information set correctly.
This Google documentation states that the input file’s length must be at least 5 seconds in duration and should be stored in Cloud Storage (for example, gs://bucket/inputs/file.mp4). Job Validation error can occur when the inputs are not properly packaged and don't contain duration metadata or contain incorrect duration metadata. When the inputs are not properly packaged, we can explicitly specify startTimeOffset and endTimeOffset in the job config to set the correct duration. If the duration of the ffprobe output (in seconds) of the job config is more than 400 GB, it can result in a job validation error. We can use the following formula to estimate the output size.
estimatedTotalOutputSizeInBytes = bitrateBps * outputDurationInSec / 8;
Thanks for the question and feedback. The Transcoder API currently has a minimum duration of 10 seconds which may be why the job wasn't successful.

Simple libtorrent Python client

I tried creating a simple libtorrent python client (for magnet uri), and I failed, the program never continues past the "downloading metadata".
If you may help me write a simple client it would be amazing.
P.S. When I choose a save path, is the save path the folder which I want my data to be saved in? or the path for the data itself.
(I used a code someone posted here)
import libtorrent as lt
import time
ses = lt.session()
ses.listen_on(6881, 6891)
params = {
'save_path': '/home/downloads/',
'storage_mode': lt.storage_mode_t(2),
'paused': False,
'auto_managed': True,
'duplicate_is_error': True}
link = "magnet:?xt=urn:btih:4MR6HU7SIHXAXQQFXFJTNLTYSREDR5EI&tr=http://tracker.vodo.net:6970/announce"
handle = lt.add_magnet_uri(ses, link, params)
ses.start_dht()
print 'downloading metadata...'
while (not handle.has_metadata()):
time.sleep(1)
print 'got metadata, starting torrent download...'
while (handle.status().state != lt.torrent_status.seeding):
s = handle.status()
state_str = ['queued', 'checking', 'downloading metadata', \
'downloading', 'finished', 'seeding', 'allocating']
print '%.2f%% complete (down: %.1f kb/s up: %.1f kB/s peers: %d) %s %.3' % \
(s.progress * 100, s.download_rate / 1000, s.upload_rate / 1000, \
s.num_peers, state_str[s.state], s.total_download/1000000)
time.sleep(5)
What happens it is that the first while loop becomes infinite because the state does not change.
You have to add a s = handle.status (); for having the metadata the status changes and the loop stops. Alternatively add the first while inside the other while so that the same will happen.
Yes, the save path you specify is the one that the torrents will be downloaded to.
As for the metadata downloading part, I would add the following extensions first:
ses.add_extension(lt.create_metadata_plugin)
ses.add_extension(lt.create_ut_metadata_plugin)
Second, I would add a DHT bootstrap node:
ses.add_dht_router("router.bittorrent.com", 6881)
Finally, I would begin debugging the application by seeing if my network interface is binding or if any other errors come up (my experience with BitTorrent download problems, in general, is that they are network related). To get an idea of what's happening I would use libtorrent-rasterbar's alert system:
ses.set_alert_mask(lt.alert.category_t.all_categories)
And make a thread (with the following code) to collect the alerts and display them:
while True:
ses.wait_for_alert(500)
alert = lt_session.pop_alert()
if not alert:
continue
print "[%s] %s" % (type(alert), alert.__str__())
Even with all this working correctly, make sure that torrent you are trying to download actually has peers. Even if there are a few peers, none may be configured correctly or support metadata exchange (exchanging metadata is not a standard BitTorrent feature). Try to load a torrent file (which doesn't require downloading metadata) and see if you can download successfully (to rule out some network issues).

Posting a Thumbnail via the Vimeo API 3.2

Two questions.
1) I'm trying to make my video's thumbnail a snapshot of a particular time in the video (2 seconds in). But, after following the instructions found at this link, when I run the following code, the thumbnail does not change, and the array is empty. I've tried different time formats, but not sure which is right. Any suggestions on what might going wrong?
https://developer.vimeo.com/api/playground/videos/%7Bvideo_id%7D/pictures
2) Is it possible to just upload the actual thumbnail file via the API?
$video_data = $lib->request('videos/107110137/pictures', array('time' => '00.02'), 'POST');
echo '<p>video_data after thumb change is <pre>';
print_r($video_data);
//Prints out
Array
(
[body] =>
[status] => 0
[headers] => Array
(
)
)
Thanks!
A status code of 0 means that curl was unable to reach the API servers.
The most common issue that leads to a 0 status code is HTTPS certificate problems. Take a look at http://unitstep.net/blog/2009/05/05/using-curl-in-php-to-access-https-ssltls-protected-sites/. I do NOT recommend their quick fix, because it will leave you more vulnerable to man in the middle attacks
If that doesn't work, add a curl_error check to the _request function to find out more.