I am using google vertex AI online prediction:
In order to send an image it has to be in a JSON file in unit8 format which has to be less than 1.5 MB, when converting my image to uint8 it definitely exceeds 1.5MB.
To go around this issue we can encode the unit8 file to b64, that makes the JSON file in KBs
when running the prediction I get Resource Exhausted: 429 received trailing metadata size exceeds limit Is there anyone who knows what's the problem?
Related
"failureReason": "Job validation failed: Request field config is
invalid, expected an estimated total output size of at most 400 GB
(current value is 1194622697155 bytes).",
The actual input file was only 8 seconds long. It was created using the safari media recorder api on mac osx.
"failureReason": "Job validation failed: Request field
config.editList[0].startTimeOffset is 0s, expected start time less
than the minimum duration of all inputs for this atom (0s).",
The actual input file was 8 seconds long. It was created using the desktop Chrome media recorder api, with mimeType "webm; codecs=vp9" on mac osx.
Note that Stackoverlow wouldn't allow me to include the tag google-cloud-transcoder suggested by "Getting Support" https://cloud.google.com/transcoder/docs/getting-support?hl=sr
Like Faniel mentioned, your first issue is that your video was less than 10 seconds which is below the minimum 10 seconds for the API.
Your second issue is that the "Duration" information is likely missing from the EBML headers of your .webm file. When you record with MediaRecorder the duration of your video is set to N/A in the file headers as it is not known in advance. This means the Transcoder API will treat the length of your video is Infinity / 0. Some consider this a bug with Chromium.
To confirm this is your issue you can use ts-ebml or ffprobe to inspect the headers of your video. You can also use these tools to repair the headers. Read more about this here and here
Also just try running with the Transcoder API with this demo .webm which has its duration information set correctly.
This Google documentation states that the input file’s length must be at least 5 seconds in duration and should be stored in Cloud Storage (for example, gs://bucket/inputs/file.mp4). Job Validation error can occur when the inputs are not properly packaged and don't contain duration metadata or contain incorrect duration metadata. When the inputs are not properly packaged, we can explicitly specify startTimeOffset and endTimeOffset in the job config to set the correct duration. If the duration of the ffprobe output (in seconds) of the job config is more than 400 GB, it can result in a job validation error. We can use the following formula to estimate the output size.
estimatedTotalOutputSizeInBytes = bitrateBps * outputDurationInSec / 8;
Thanks for the question and feedback. The Transcoder API currently has a minimum duration of 10 seconds which may be why the job wasn't successful.
There is a limit of 256KB as the max size of message which can be published to AWS-SNS. Can we compress a message using GZIP and send publish the compressed message to overcome the size limit ?
You can gzip the message body -- however -- SNS message bodies only support UTF-8 character data. Gzipped data is binary, so that is not directly compatible with SNS because not every possible sequence of bytes is also a valid sequence of UTF-8 characters.
So, after gzipping your payload, you need to encode that binary data using a scheme such as base-64. Base-64 encodes arbitrary binary data (8 bits per byte) using only 64 (which is 2^6, giving effectively 6 bits per byte) symbols and so the byte count inflates by 8/6 (133%) as a result of this encoding. This means 192KB of binary data encodes to 256KB of base-64-encoded data, so the maximum allowable size of your message after gzip becomes 192K (since the SNS limit is 256KB). But all the base-64 symbols are valid single-byte UTF-8 characters, which is a significant reason why this encoding is so commonly used, despite its size increase. That, and the fact that gzip typically has a compression ratio far superior to 1.33:1 (which is the break-even point for gzip + base-64).
But if your messages will gzip to 192K or lower, this definitely does work with SNS (as well as SQS, which has the same character set and size limits).
You already take a look at this? https://docs.aws.amazon.com/sns/latest/dg/sns-large-payload-raw-message-delivery.html
If you think that the file can increase on the time I suggest another approach.
Put the file on S3 bucket and attach the S3 Event Notification to SNSTopic so all consumer will be notified when a new file is ready to be processed.
In other word the message of the SNS will be the location of the file and not the file it self.
Think about it.
You can also use the SNS/SQS extended client library for large message payloads.
https://aws.amazon.com/about-aws/whats-new/2020/08/amazon-sns-launches-client-library-supporting-message-payloads-of-up-to-2-gb
Can I change chunk size to allways be the same ? What I need is fixed chunk size to always be 100mb. I want to send files from browser without passing it throught server. I use signature 4. It would be cool if we could restrict max file size and max chunk size
I am doing an experiment to understand which file size behaves best with s3 and [EMR + Spark]
Input data :
Incompressible data: Random Bytes in files
Total Data Size: 20GB
Each folder has varying input file size: From 2MB To 4GB file size.
Cluster Specifications :
1 master + 4 nodes : C3.8xls
--driver-memory 5G \
--executor-memory 3G \
--executor-cores 2 \
--num-executors 60 \
Code :
scala> def time[R](block: => R): R = {
val t0 = System.nanoTime()
val result = block // call-by-name
val t1 = System.nanoTime()
println("Elapsed time: " + (t1 - t0) + "ns")
result
}
time: [R](block: => R)R
scala> val inputFiles = time{sc.textFile("s3://bucket/folder/2mb-10240files-20gb/*/*")};
scala> val outputFiles = time {inputFiles.saveAsTextFile("s3://bucket/folder-out/2mb-10240files-20gb/")};
Observations
2MB - 32MB: Most of the time is spent in opening file handles [Not efficient]
64MB till 1GB: Spark itself is launching 320 tasks for all these file sizes, it's no longer the no of files in that bucket with 20GB
data e.g. 512 MB files had 40 files to make 20gb data and could
just have 40 tasks to be completed but instead, there were 320
tasks each dealing with 64MB data.
4GB file size : 0 Bytes outputted [Not able to handle in-memory /Data not even splittable ???]
Questions
Any default setting that forces input size to be dealt with to be 64MB ??
Since the data I am using is random bytes and is already compressed how is it splitting this data further? If it can split this data why is it not able to split file size of 4gb object file
size?
Why is compressed file size increased after uploading via spark? The 2MB compressed input file becomes 3.6 MB in the output bucket.
Since it is not specified, I'm assuming usage of gzip and Spark 2.2 in my answer.
Any default setting that forces input size to be dealt with to be 64MB ??
Yes, there is. Spark is a Hadoop project, and therefore treats S3 to be a block based file system even though it is an object based file system.
So the real question here is: which implementation of S3 file system are you using(s3a, s3n) etc. A similar question can be found here.
Since the data I am using is random bytes and is already compressed how is it splitting this data further? If it can split this data why is it not able to split file size of 4gb object file size?
Spark docs indicate that it is capable of reading compressed files:
All of Spark’s file-based input methods, including textFile, support running on directories, compressed files, and wildcards as well. For example, you can use textFile("/my/directory"), textFile("/my/directory/.txt"), and textFile("/my/directory/.gz").
This means that your files were read quite easily and converted to a plaintext string for each line.
However, you are using compressed files. Assuming it is a non-splittable format such as gzip, the entire file is needed for de-compression. You are running with 3gb executors which can satisfy the needs of 4mb-1gb files quite well, but can't handle a file larger than 3gb at once (probably lesser after accounting for overhead).
Some further info can be found in this question. Details of splittable compression types can be found in this answer.
Why is compressed file size increased after uploading via spark?The 2MB compressed input file becomes 3.6 MB in output bucket.
As a corollary to the previous point, this means that spark has de-compressed the RDD while reading as plaintext. While re-uploading, it is no longer compressed. To compress, you can pass a compression codec as a parameter:
sc.saveAsTextFile("s3://path", classOf[org.apache.hadoop.io.compress.GzipCodec])
There are other compression formats available.
I get the above error in joomla 3.1.5 when i attempt to install a template from extension manager. I increased the upload limit but it dint solve my problem.
As well as increasing the upload_max_filesize, you also need to increase the post_max_size.
Try setting the post_max_size to something like 20M