Invalid null byte - field longer than 1 byte - amazon-web-services

I am copying data from a Redshift Manifest File stored in S3.
My copy command looks like
COPY <table name> FROM 's3://...' CREDENTIALS '<credentials>' FORMAT AS JSON 'auto' GZIP TRUNCATECOLUMNS ACCEPTINVCHARS EMPTYASNULL TIMEFORMAT AS 'auto' REGION '<region>' manifest;
The column in the table where I am facing this issue is of type varchar(255).
Value of this column in the s3 file looks like
"<column>":"\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000\\u0000..."
Error: Invalid null byte - field longer than 1 byte
I have tried using NULL AS '\0' as well. That didn't work. The error this gives is Invalid operation: NULL argument is not supported for JSON based COPY

It is not clear why you would want to store a bunch of ascii zero characters in a string so more information on what this is for will get a more useful workaround. The basic answer is 'don't do this'.
Ascii zero is defined as the null terminator character (aka NUL but this is not the same things a NULL) and this character has special meaning in data streams. It's a control character and as such has no business being in your strings.
If you are trying to represent binary data in a string you should base64 encode the data first.
If you are trying to represent NULL this is done with null in the json - "column":null
More information on what you are doing will be helpful in proposing a solution.

Related

SAS format is loaded but cannot be used

I loaded a format and my log says:
NOTE: Format $DEPOSIT is already on the library WORK.FORMATS.
NOTE: Format $DEPOSIT has been output.
But when I use it:
D_SYS = PUT(SOURCE,$DEPOSIT.);
I get:
ERROR 48-59: The format DEPOSIT was not found or could not be loaded.
If you try to apply a character format to a numeric value (and the reverse) then SAS will silently convert the format specification to match the data you are applying it to.
So you created the character format $DEPOSIT and are trying to apply it to the numeric variable SOURCE. So the error message is saying that the numeric format DEPOSIT does not exist.
Check that the variable SOURCE actually exists. SAS will create a numeric variable if you reference a variable that does not exist. If your variable really is numeric then you might get it to work if you convert SOURCE to character, but make sure to transform the numbers into character strings that match what the format expects.
D_SYS = PUT(cats(SOURCE),$DEPOSIT.);

Redshift varchar too narrow

I've got a table that I populate with tab-separated data from files whose encoding doesn't seem to be utf-8 exactly, like so:
CREATE TABLE tab (
url varchar(2000),
...
);
COPY tab
FROM 's3://input.tsv'
After the copy has completed I run
SELECT
MAX(LEN(url))
FROM tab
which returns 1525. I figure, since I'm wasting space I might as well resize the column by almost a quarter by using varchar(2000) instead of varchar(1525). But neither redoing the COPY nor setting up a new table and inserting the already imported data works. In both cases I get
error: Value too long for character type
Why won't the column hold these values?
Your file might be in a multi-byte format.
From the LEN Function documentation:
The LEN function returns an integer indicating the number of characters in the input string. The LEN function returns the actual number of characters in multi-byte strings, not the number of bytes. For example, a VARCHAR(12) column is required to store three four-byte Chinese characters. The LEN function will return 3 for that same string.
The extra size of a VARCHAR will not waste disk space due to the compression methods used by Amazon Redshift, but it will waste in-memory buffer space when a block is read from disk and decompressed into memory.

Remove or replace '�' character in Informatica

We have a requirement wherein we need to replace or remove '�' character (which is an unrecognizable, undefined character) present in our source. While running my workflow it runs successfully but when i check the records in target they are not committed. I get the following error in Informatica
Error executing query for record 37: 6706: The string contains an untranslatable character.
I tried functions like replace_chr, reg_replace, replace_str etc., but none seems to be working. Kindly advise on how to get rid of this. Any reply is greatly appreciated.
You need to use in your schema definitions charset=> utf8-unidode-ci
but now you can do:
UPDATE tablename
SET columnToCheck = REPLACE(CONVERT(columnToCheck USING ascii), '?', '')
WHERE ...
or
update tablename
set columnToCheck = replace(columnToCheck , char(146), '');
Replace NonASCII Characters in MYSQL
You can replace the special characters in an expression transformation.
REPLACESTR(1,Column_Name,'?',NULL)
REPLACESTR - Function
1 - Position
Column_Name - Column name which has a special character
? - Special character
NULL - Replacing character
You need to fetch rows with the appropriate character set defined on your connection. What is the connection you're using, ODBC or native? What's the DB?
Special characters are a challenge and having checked the informatica network I can see there is a kludge involving replace_str setting first a variable to the string with all non special characters first and then using the resulting variable in a replace_str so that the final value has only the allowed characters https://network.informatica.com/thread/20642 (awesome workaround by nico so long as you can positively identify every character that should be allowed) ...
As an alternate kludge I would also attempt something using an xml transformation somewhere within the mapping as informatica conveniently converts special characters to encoded (decimal or hex I cant remember) values... so long as you can live with these encoded values appearing in your target text you should be fine ( and build some extra space into your strings to accommodate any bloatage from the extra characters

base64 encode null terminator

Hi I am currently trying to encode a string using the base64 encoding method in C++.
The string itself encodes fine however I would like to have an extra null character at the end of the decoded string (so the null character would also show up in the text file I want to save the decoded string into).
I am using this base64 code here -> http://www.adp-gmbh.ch/cpp/common/base64.html
I hope you can give me some advices what I can do here to make this possible (I tried already writing two null characters at the end of the string I am encoding but it seems as if the encoding method only reads to the first occurence of a null character).
A cursory lookat the encoding function does not seem to show any special handling of NUL. And neither does the decoding function, are you sure the issue is not in the way that you test for NUL in the decoded string?

Encoded character buffer storage problem in MySQL varchar using C

I have a encoded character buffer array of size 512 in C, and a database field of varchar in MySQL. Is it possible to store the encoded character buffer into varchar?
I have tried this, but the problem which I face is that it only stores the limited area of the buffer into the database and ignore. What is the actual problem, and how do I solve this problem?
It is not clear what you mean by encoded.
If you mean that you have an arbitrary string of byte values, then varchar is a bad fit because it will attempt to trim trailing spaces. A better choice in such cases is to use varbinary fields.
If the string you are inserting contains control characters, you might be best converting that into a hex string and inserting it like follows:
create table xx (
v varbinary(512) not null );
insert into xx values ( 0x68656C6C6F20776F726C64);
This will prevent any component in the tool chain from choking on NUL characters and so forth.
What size is your varchar declared for the table?
Often varchar fields are set to 255 bytes, not characters. Starting with MySQL 5.0.3 you can have longer varchar fields.
Sounds like you need a varchar(512) field, is that what you have?
See http://dev.mysql.com/doc/refman/5.0/en/char.html