ADF replace special characters - replace

I am trying to pass the following string without single, double quotes, and new-line characters to Azure functions. I tried to replace the string in ADF to remove the single quotes and new-line characters in the following way. It didn't work. Kindly assist me how to replace all the special characters(',",\n)
#replace(activity('CPY_ACTIVITY').output.errors[0].Message,''','')
#replace(activity('CPY_ACTIVITY').output.errors[0].Message,'\n','')
Below is the error while performing copy activity to snowflake. I am trying to pass this error to Azure functions without special characters.
ErrorCode=UserErrorOdbcOperationFailed,
'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ERROR [22007] Date 'SUNDAY' is not recognized\n File 'sample_file.csv', line 2, character 14\n Row 1, column sample_table[DAY_DATE:2]\n If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.,
Source=Microsoft.DataTransfer.Runtime.GenericOdbcConnectors,'
'Type=System.Data.Odbc.OdbcException,Message=ERROR [22007] Date 'SUNDAY' is not recognized\n File 'sample_file.csv', line 2, character 14\n Row 1, column sample_table[DAY_DATE:2]\n If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client.,
Source=Snowflake,'

Replace characters using replace function as below.
Output before replacing:
Use '''' to replace single quote.
#replace(string(activity('Copy data1').output.errors[0].Message),'''','')
Output after replacing single quote:
Replacing \n along with a single quote from the error output (press enter in place of line breaker or add \n in code).
#replace(string(activity('Copy data1').output.errors[0].Message),' ','')
You can also include '\n' in Json Code to replace (\n) line breaker looks like:
"typeProperties": {
"variableName": "error_code",
"value": {
"value": "#replace(string(activity('Copy data1').output.errors[0].Message),'\n','')",
"type": "Expression"
}
Replace double quote (")
#replace(string(activity('Copy data1').output.errors[0].Message),'"','')

Related

Remove newline after incorrect field splitting in csv file

I use linux and I'm trying to use sed for this. I download a CSV from an institutional site providing some data to be analyzed. There are several thousand lines per CSV, and many columns per row (I haven't counted them, but I think the number is useless). The fields are separated by semicolons and quoted, so the format per line is:
"Field 1";"Field 2";"Field 3"; .... ;"Field X";
Each correct line ends with semicolon and '\n'. The problem is that, from time to time, there's some field that incorrectly has a newline, and the solution is to delete the newline character, so the two lines go back to be together into only one. Example of an incorrect line:
"Field 1";"Field 2";"Fi
eld 3";"Field X";
I've found that there can be a \n right after the opening quote or somewhere in the between the quotes.
I've found a way to manage this last case, where the newline is right after the quote:
sed ':a;N;$!ba;s/";"\n/";"/g' file.csv
but not for "any number of alphabet characters after the quote not ending in semicolon". I have a pattern file (to be used with -f) with these lines:
:a;N;$!ba;s/";"\n/";"/g
:a;N;$!ba;s/\([A-z]\)\n/\1/g
:a;N;$!ba;s/\([:alpha:]\)\n/\1/g
The first line of the pattern file works, but I've tried combinations of the second and third and I always get an empty file.
If current line doesn't end with a semicolon, read and append next line to pattern space and remove line break.
sed '/[^;]$/{N;s/\n//}' file

Remove leading 0 in String with letters and digits

I have a comma separated file where I need to change the first column removing leading zeroes in string. Text file is as below
ABC-0001,ab,0001
ABC-0010,bc,0010
I need to get the data as under
ABC-1,ab,0001
ABC-10,bc,0010
I can do a command line replace which i tried as below:
sed 's/ABC-0*[1-9]/ABC-[1-9]/g' file
I ended up getting output:
ABC-[1-9],ab,0001
ABC-[1-9]0,ac,0010
Can you please tell me what I am missing in here.
Alternately I also tried to apply formatting in the SQL that generates this file as below:
select regexp_replace(key,'((0+)|1-9|0+)','(1-9|0+)') from file where key in ('ABC-0001','ABC-0010')
which gives output as
ABC-(1-9|0+)1
ABC-(1-9|0+)1(1-9|0+)
Help on either of solution will be very helpful!
Try this :
sed -E 's/ABC-0*([1-9])/ABC-\1/g' file
------ --
| |
capturing group |
captured group
To do it in the query using Oracle, where the key value with the zeroes you want to remove is in a column called "key" in a table called "file", would look like this:
select regexp_replace(key, '(-)(0+)(.*)', '\1\3')
from file;
You need to capture the dash as it is "consumed" by the regex as it is matched. Followed by the second group of one or more 0's, followed by the rest of the field. Replace with captured groups 1 and 3, leaving the 0's (if any) between out.

Line breaking issue to move csv file in Linux

[I have moved the csv file into Linux system with binary mode. File content of one field is spitted into multiple lines its comment sections,I need to remove the new line , keep the same format, Please help on shell command or perl command
here is the example for three records, Actual look like]
Original content of the file
[After moved into linux, comments field is splitted into 4 lines , i want to keep the comment field in the same format but dont want the new line characters
"First line
Second line
Third line
all lines format should not change"
]2
As I said in my comment above, the specs are not clear but I suspect this is what you are trying to do. Here's a way to load data into Oracle using sqlldr where a field is surrounded by double-quotes and contains linefeeds where the end of the record is a combination carriage return/linefeed. This can happen when the data comes from an Excel spreadsheet saved as a .csv for example, where the cell contains the linefeeds.
Here's the data file as exported by Excel as a .csv and viewed in gvim, with the option turned on to show control characters. You can see the linefeeds as the '$' character and the carriage returns as the '^M' character:
100,test1,"1line1$
1line2$
1line3"^M$
200,test2,"2line1$
2line2$
2line3"^M$
Construct the control file like this using the "str" clause on the infile option line to set the end of record character. It tells sqlldr that hex 0D (carriage return, or ^M) is the record separator (this way it will ignore the linefeeds inside the double-quotes):
LOAD DATA
infile "test.dat" "str x'0D'"
TRUNCATE
INTO TABLE test
replace
fields terminated by ","
optionally enclosed by '"'
(
cola char,
colb char,
colc char
)
After loading, the data looks like this with linefeeds in the comment field (I called it colc) preserved:
SQL> select *
2 from test;
COLA COLB COLC
-------------------- -------------------- --------------------
100 test1 1line1
1line2
1line3
200 test2 2line1
2line2
2line3
SQL>

Hive Split function to split a variable on \n

Task: I want to split a variable called "website" in a hive table to get all the websites that are delimited by character space or \n
Issue: When I use either of the following queries:
SELECT website,split(website, '[\\s]') as websites FROM temp_pages
SELECT website,split(website, '[\\s, \\n]') as websites FROM temp_pages
I am unable to achieve the desired results.
Here are the results that I get
Expected Output - delimited on space
Input: http://www.insync4all.com http://www.insync4all.nl
Output: ["http://www.insync4all.com","http://www.insync4all.nl"]
Unexpected output - Delimited on \n.
When there is an \n character instead of splitting the websites based on \n character it introduces \\n
Input: www.imtherealthing.com\nwww.childmodelmagazine.com
Output: ["www.imtherealthing.com\\nwww.childmodelmagazine.com"]
Can someone help me to split the website field on \n. It will also be good to understand what is going wrong in the \n case.

Stata: removing line feed control characters

I have a dataset which I export with command outsheet into a csv-file. There are some rows which breaks line at a certain place. Using a hexadecimal editor I could recognize the control character for line feed "0a" in the record. The value of the variable producing the line break shows visually (in Stata) only 5 characters. But if I count the number of characters:
gen xlen = length(x)
I get 6. I could write a Perl programm to get rid of this problem but I prefer to remove the control characters in Stata before exporting (for example using regexr()). Does anyone have an idea how to remove the control characters?
The char() function calls up particular ASCII characters. So, you can delete such characters by replacing them with empty strings.
replace x = subinstr(x, char(10), "", .)