SageMaker ANSI escape codes - dockerfile

I'm using a library in a SageMaker training script that includes print statements with characters like tabs. When I look in my SM cloudwatch training logs, they're filled with ANSI escape codes like #011 (in place of tabs). This makes the logs much more difficult to read.
Is there any way I can prevent this behavior? Whether that be through a modification of my Dockerfile or my train.py script?

Related

How to fix mangled log rows with formatting in Cloud Watch?

I'm looking at my logs in Cloud Watch and it seems that there is some sort of formattin problem. This is what I see there:
[90m2023-02-13 21:07:21.521 [39m[1m[94mINFO[39m[22m [90m[PrepareNextTransaction dist/apps/admin/.next/server/chunks/813.js:5114 AsyncTask.handler][39m
If I run this app locally I can see the same rows properly:
2023-02-13 21:10:09.797 INFO [PrepareNextTransaction AsyncTask.execute]
It seems to me that CloudWatch doesn't understand the formatting, but I don't see a setting to fix this.
How can I do so?
The formatting problem you're encountering in CloudWatch is due to the usage of ANSI escape codes in your log output.
ANSI escape codes are a sequence of characters used to format text in a terminal. For example, the code \033[31m can be used to change the text color to red.
However, not all applications that display text, such as CloudWatch, support ANSI escape codes.
It would be best if you investigated how to proper configure your application code output in order to avoid ANSI codes in CloudWatch logs.

Trivy: Trim Table Output

We're trying to implement Trivy as the pipeline scanner solution in our pipelines and the table visualization is awesome.
Although, it comes with information that are not so interesting on the ending, such as secrets and ssh keys (see image).
Is there a way to suppress this output and get the end result of the scan just the tables separating language packages and OS packages that are vulnerable?
Thanks in advance. =]

Google Cloud Dataflow removing accents and special chars with '??'

This is going to be quite a hit or miss question as I don't really know which context or piece of code to give you as it is a situation of it works in local, which does!
The situation here is that I have several services, and there's a step where messages are put in a PubSub topic awaiting for the Dataflow consumer to handle them and save as .parquet files (I also have another one which sends that payload to a HTTP endpoint).
The thing is, the message in that service prior sending it to that PubSub topic seems to be correct, Stackdriver logs show all the chars as they should be.
However, when I'm going to check the final output in .parquet or in the HTTP endpoint I just see, for example h?? instead of hí, which seems pretty weird as running everything in local makes the output be correct.
I can only think about encoding server-wise when deploying the Dataflow as a job and not running in local.
Hope someone can shed some light in something this abstract.
The strange thing is that it works locally.
But as a workaround, the first thing that comes to mind is to use encoding.
Are you using at some point a function to convert your string input as bytes?
If yes, you could try to force getBytes() to use utf-8 encoding by passing by the argument like in the following example from this Stackoverflow thread:
byte[] bytes = string.getBytes("UTF-8");
// feed bytes to Base64
// get bytes from Base64
String string = new String(bytes, "UTF-8");
Also:
- Have you tried setting the parquet.enable.dictionary option?
- Are your original files written in utf-8 before conversion?
Google Cloud Dataflow (at least the Java SDK) replaces Spanish characters like 'ñ' or accents 'á','é',' etc with the symbol � since the default charset of the JVM installed on service workers is US-ASCII. So, if UTF-8 is not explicitly declared when you instantiate strings or their relative byte-arrays transformation, the platform default encoding will be used.

While writing records in a flat file using Informatica ETL job, greek characters are coming as boxes

While writing records in a flat file using Informatica ETL job, greek characters are coming as boxes.We can see original characters in the database.In session level, we are using UTF-8 encoding.We have a multi language application and need to process Chinese, Russian, Greek,Polish,Japanese etc. characters.Please suggest.
try to change your page encoding. I also faced this kind of issue. We are using ANSII encoding, hence we created separate integration service with different encoding and file ran successfully.
There is an easy option. In session properties, select target flat file then click set file propeties. In that you can change the code-page. There you can choose UTF-8. By default it is in ANSII, that's why you are facing this issue.

AWS SDK in Perl

Whether it is possible to create AWS SDK in Perl?. I need to use AWS transcoder service from my perl script. But I wonder AWS SDK is not available for Perl(http://aws.amazon.com/code). Or do they have any other method to use PHP SDK in a Perl script?
The API is just "sending specific things over HTTP". You don't need a language specific library for that, although it does make things easier. Anyone can write such a wrapper, and some people already have done that for Perl.
Years later, there is now Paws, a Perl AWS interface. It's on CPAN.
It's fairly easy to write your own Perl modules to work with the AWS API. As remarked above, if you can make HTTP calls and create an HMAC signature, any language can do it.
However, there are already a lot of Perl modules on CPAN that address specific AWS functions, such as S3 or EC2. Go to http://www.cpan.org/src/ to search for what you need (e.g., SNS). You'll generally find something that will meet your need.
http://www.timkay.com/aws/
I have found Tim Kay's "aws" and "s3" tools quite useful. They are written in Perl.
It has the added advantage of --exec, so you can append commands directly to the output, in their original state from AWS. It has been a terror for me to have international characters and other junk floating about as sad excuse for file names. With Tim's toolset, I was able to workaround the problem by using the --exec to call for the prefix of the filename (also unique) and then act upon it directly, instead of mucking about with metacharacters and other nonsense.
For example:
/123/456/789/You can't be serious that this is really a filename.txt
/123/456/901/Oh!Yes I can! *LOL* Honest!.txt
To nuke the first one:
aws ls --no-vhost mybucketname/123/456/789/ --exec='system "aws", "rm", "--no-vhost", "$bucket/$key"'
Simply put, the tool performs an equivalent "ls" on the S3 bucket, for that prefix, and returns ALL file names in that prefix, which are passed into the exec function. From there, you can see I am blindly deleting whatever files are held within.
(note: --no-vhost helps resolve bucketnames with periods in them and you don't need to use long URLs to get from point a to point b.)