ImageResizer and S3Reader2: The string was not recognized as a valid DateTime - amazon-web-services

I'm upgrading a website to a .NET website using MVC5 using ImageResizer with the images stored on AWS S3. The images stored on S3 are fine, have public read access and load without a problem when calling the S3 URL.
When I use the ImageResizer plugin S3Reader2 I get the following error on most of my images: "The string was not recognized as a valid DateTime. There is an unknown word starting at index 26."
You can find the ImageResizer Diagnostics here: Diagnostics
You can find the stack trace here: Stacktrace
Any help would be highly appreciated!

The failing blobs have an invalid Expires header set. Mon, 28 Apr 2025 21:50:04 G4T does not conform to the RFC for datetime values in HTTP headers, and AWSSDK correctly throws an exception when it encounters the malformed date.
The bad metadata should be replaced or removed from the failing blobs.

Related

Google BigQuery cannot read some ORC data

I'm trying to load ORC data files stored in GCS into BigQuery via bq load/bq mk and facing an error below. The data files copied via hadoop discp command from on-prem cluster's Hive instance version 1.2. Most of the orc-files are loaded successfully, but few are not. There is no problem when I read this data from Hive.
Command I used:
$ bq load --source_format ORC hadoop_migration.pm hive/part-v006-o000-r-00000_a_17
Upload complete.
Waiting on bqjob_r7233761202886bd8_00000175f4b18a74_1 ... (1s) Current status: DONE
BigQuery error in load operation: Error processing job '<project>-af9bd5f6:bqjob_r7233761202886bd8_00000175f4b18a74_1': Error while reading data, error message:
The Apache Orc library failed to parse metadata of stripes with error: failed to open /usr/share/zoneinfo/GMT-00:00 - No such file or directory
Indeed, there is no such file and I believe it shouldn't be.
Google doesn't know about this error message but I've found similar problem here: https://issues.apache.org/jira/browse/ARROW-4966. There is a workaround for on-prem servers of creating sym-link to /usr/share/zoneinfo/GMT-00:00. But I'm in a Cloud.
Additionally, I found that if I extract data from orc file via orc-tools into json format I'm able to load that json file into BigQuery. So I suspect that the problem not in the data itself.
Does anybody came across such problem?
Official Google support position below. In short BigQuery doesn't understand some timezone's description and we suggested to change it in the data. Our workaround for this was to convert ORC data to parquet and then load it into table.
Indeed this error can happen. Also when you try to execute a query from the BigQuery Cloud Console such as:
select timestamp('2020-01-01 00:00:00 GMT-00:00')
you’ll get the same error. It is not just related to the ORC import, it’s how BigQuery understands timestamps. BigQuery supports a wide range of representations as described in [1]. So:
“2020-01-01 00:00:00 GMT-00:00” -- incorrect timestamp string literal
“2020-01-01 00:00:00 abcdef” -- incorrect timestamp string literal
“2020-01-01 00:00:00-00:00” -- correct timestamp string literal
In your case the problem is with the representation of the time zone within the ORC file. I suppose it was generated that way by some external system. If you were able to get the “GMT-00:00” string with preceding space replaced with just “-00:00” that would be the correct name of the time zone. Can you change the configuration of the system which generated the file into having a proper time zone string?
Creating a symlink is only masking the problem and not solving it properly. In case of BigQuery it is not possible.
Best regards,
Google Cloud Support
[1] https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#time_zones

"checksum must be specified in PUT API, when the resource already exists"

I am getting the following error while building using AWS Lex?
"checksum must be specified in PUT API, when the resource already exists"
Can someone tell what it means and how to fix it?
I was getting the same error when building my bot in the console. I found the answer here.
Refresh the page and then set the version of the bot to Latest.
The documentation states that you have to provide the checksum of a bot that already exists if you are trying to update it: https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/LexModelBuildingService.html#putBot-property
"checksum — (String)
Identifies a specific revision of the $LATEST version.
When you create a new bot, leave the checksum field blank. If you specify a checksum you get a BadRequestException exception.
When you want to update a bot, set the checksum field to the checksum of the most recent revision of the $LATEST version. If you don't specify the checksum field, or if the checksum does not match the $LATEST version, you get a PreconditionFailedException exception."
That's the aws-sdk for JavaScript docs, but the same concept applies to any SDK as well as the AWS CLI.
This requires calling get-bot first, which will return the checksum of the bot among other data. Save that checksum somewhere and pass it in the params when you call put-bot
I would recommend using the tutorials here: https://docs.aws.amazon.com/lex/latest/dg/gs-console.html
That tutorial demonstrates using the AWS CLI, but the same concepts can be abstracted to use any SDK you desire.
Had the same problem.
I guess once you have published one bot, you can not anymore modify or build it.
Create another bot.

AWS S3 query string parameter causes method not allowed error

A GET request to this URL returns the file as expected...
curl -v 'http://xxx.s3.amazonaws.com/lineitems/58ecfff764a6036d96deaa69/bootstrap.min.js'
However when I add the specific query string parameter 'select' then I get a 405 method not allowed error.
curl -v 'http://xxx.s3.amazonaws.com/lineitems/58ecfff764a6036d96deaa69/bootstrap.min.js?select='
< HTTP/1.1 405 Method Not Allowed
< x-amz-request-id: 7F3339518976EB66
< x-amz-id-2: 8YmXqeME+Y5bLRdlMhDKQyrznjNJr/gw7ortpLjXqFDlPfYR1Ckqz+2Gr2/35/SWKaNviMLZLEk=
< Allow: POST
< Content-Type: application/xml
< Transfer-Encoding: chunked
< Date: Thu, 02 Nov 2017 10:50:33 GMT
< Server: AmazonS3
Other query string parameter names do not cause this problem. It only appears to be affecting files in this folder and has only started happening recently (in the last week).
I can't see anything unusual in the properties for the files in this folder and everything gets uploaded to the bucket by the same code.
I'm at a loss to explain why this is happening.
You'll need to use a different parameter name. select now has a meaning to S3, so it is no longer quietly discarded.
Update: The sudden appearance of the ?select subresource appears to have been when AWS began deploying a new feature, S3 Select, which allows JSON and CSV objects to actually be queried for a subset of their content, using SQL expressions. The feature was announced later the same month.
The original answer follows.
For reasons that aren't readily explainable, select= in the query string causes S3 to interpret your request as... something different. Exactly what it is, is not clear.
<ResourceType>SELECT</ResourceType>
Interestingly, if you try a POST, you get an error message saying that POST is not allowed, either, but the Allow: POST is then no longer in the response headers.
The bucket logs show the request operation as REST.GET.SELECT, which doesn't seem to be documented, where a normal GET request is logged as REST.GET.OBJECT.
So you're triggering some unexpected behavior, and you'll need to use something different.
The fact that it previously worked tends to rule out my initial theory, that you were somehow prompting S3 to assume you wanted to make a deprecated SOAP request (which requires POST), but if it's really true that this was working all along, then I'm inclined instead to think instead that you may have inadvertently stumbled on a feature that has not yet been released.
Unofficially, S3 silently ignores most unexpected query string parameters. Signature V2 also ignores them completely (and actually requires them not to be signed, if I remember my test results of that algorithm correctly).
Officially, it seems you should be using a query string parameter beginning with x- if you definitely don't want the service to interpret it. This will also write the parameter to the logs, which might prove to be a useful side effect in the future for debugging purposes.
You can include custom information to be stored in the access log record for a request by adding a custom query-string parameter to the URL for the request. Amazon S3 will ignore query-string parameters that begin with x-, but will include those parameters in the access log record for the request, as part of the Request-URI field of the log record. (emphasis added)
http://docs.aws.amazon.com/AmazonS3/latest/dev/LogFormat.html

WireMock returns image that's corrupt

I've recorded a mock through WireMock that contains an image in the body. When I try to get the stub using Postman the response back is an image that won't load and the size of the content is roughly 20-50% larger than when I get the same image from the production server. In Google Chrome it says Resource interpreted as Document but transferred with MIME type image/jpeg.
I can't tell if this is an underlying issue with Jetty or WireMock. I read some related chatter on the user group about images being returned incorrectly, but I've tried the suggestion of removing the mapping stub and just keeping the __file - no luck. This seems like an encoding issue, but I don't know how to debug it further.
If you can hang in there until next week we're putting the finishing touches on a brand new recorder and I've been specifically working through the encoding issues the current recorder suffers from.
Meanwhile, you might want to try turning off gzip in your client code.

Azure Storage Copy Blob Console App - 403 Forbidden Error

Using Microsoft.WindowsAzure.Storage 2.1.0.3
Attemping to write a conosle app to move documents from one Azure Storage account to another account. The app lists all the containers using sourceClient.ListContainers(), loops through all containers in a foreach block getting a Shared Access Token for each, and then fires a StartCopyFromBlob request for each blob. The destination blob has the same naming structure, but is in a different account (e.g. sourceAzureUrl/testContainer/filename.ext -> destAzureUrl/testContainer/filename.ext).
Most of the files (98%) copy just fine, but when certain requests are sent, it returns with this exception: "The remote server returned an error: (403) Forbidden." When the CloudBlockBlob reference to the destination blob is created, the URL does not seem to be properly URL escaped which results in the exception. However, when this code is run inside of an MVC controller, the request is somehow properly URL escaped and the request is completed without error. The copied blob contains the unescaped name as well.
It appears that the reason is the blob/filename contains a "[" and/or "]" character (e.g. Roger_Smith[1].doc). If the the filename is URL encoded beforehand, the request is completed without error, but the filename in Azure Storage is its URL escaped incarnation and not the original filename (Roger_Smith%255b1%255d.doc instead of Roger_Smith[1].doc).
Is there a way to properly URL escape the copy request and still have the result blob have the unescaped name?
Can you confirm you are running with .net 4.5? There is a uri escaping issue for the characters you mentioned (brackets) that was introduced in .net 4.5 that is incompatible with .net 4.0 (which is what the service is using to validate the response). As such it appears as if the message signature was incorrectly formed. We are working with the .net team on long term resolution for this. In the meantime you may consider running under .net 4.0 (with .net 4.5 installed to take advantage of GC improvements) or avoiding the bracket characters in the file name.