repace n occurrences of a character in a string from the end - regex

I am struggling to come up with a solution to replace n occurrences of a character with another character in a string starting from the end of the string. For example, if I want to replace last 5 occurrences of "," with "|" in a string like
abc, def,,{"data":{"xyz":null,"uan":"5643df"},{"path":"/abc/def/xyz"}},546,453,,,
to get a result like
abc, def,,{"data":{"xyz":null,"uan":"5643df"},{"path":"/abc/def/xyz"}}|546|453|||
I have looked at multiple solution which helps you find the last occurrence or all occurrences or 5 occurrences from the beginning but nothing which helps me do it from the end of the string. Reversing the string and doing it from the beginning and then reversing the string again is not an option because of the sheer size of the file.

With GNU sed. Replace five times last comma and rest of row with pipe and rest of row (s/,([^,]*)$/|\1/):
echo 'a,b,c,d,e,f,g,h' | sed -r 's/,([^,]*)$/|\1/; s/,([^,]*)$/|\1/; s/,([^,]*)$/|\1/; s/,([^,]*)$/|\1/; s/,([^,]*)$/|\1/;'
Output:
a,b,c|d|e|f|g|h

An awk version:
echo 'a,b,c,d,e,f,g,h' | awk -F, '{printf "%s",$1;for(i=2;i<=NF;i++) printf (NF-5<i?"|%s":",%s"),$i;print ""}'
a,b,c|d|e|f|g|h
It uses a loop to print each field. Count up and find when to use , or |. Number can be changed to get other result.
Example last to field:
echo 'a,b,c,d,e,f,g,h' | awk -F, '{printf "%s",$1;for(i=2;i<=NF;i++) printf (NF-2<i?"|%s":",%s"),$i;print ""}'
a,b,c,d,e,f|g|h

This might work for you (GNU sed):
sed -E '/(,[^,]*){5}$/{s//\n&/;h;y/,/|/;H;g;s/\n.*\n//}' file
Insert a newline just before the fifth comma from the end of a line, make a copy, replace all ,'s by |'s, append the current line to the copy and remove everything between the first and last newlines.
An alternative using GNU parallel and sed:
parallel -n0 -q echo 's/\(.*\),/\1|/' ::: {1..5} | sed -f - file
N.B. The first solution only amends the a line if there are at least 5 commas whereas the second solution amends a line regardless of how many commas there are.

Related

Sed handle patterns over multiple lines

For example if I have a file like
yo#gmail.com, yo#
gmail.com yo#gmail
.com
And I want to replace the string yo#gmail.com.
If the file had the target string in a single line then we could've just used
sed "s/yo#gmail.com/e#email.com/g" file
So, is it possible for me to catch patters that are spread between multiple line without replacing the \n?
Something like this
e#email.com, e#
email.com e#email
.com
Thank you.
You can do this:
tr -d '\n' < file | sed 's/yo#gmail.com/e#email.com/g'
This might work for you (GNU sed):
sed -E 'N;s/yo#gmail\.com/e#email.com/g
h;s/(\S+)\n(\S+)/\1\2\n\1/;/yo#gmail\.com/!{g;P;D}
s//\ne#email.com/;:a;s/\n(\S)(\S+)\n\S/\1\n\2\n/;ta
s/(.*\n.*)\n/\1/;P;D' file
Append the following line to the pattern space.
Replace all occurrences of matching email address in both the first and second lines.
Make a copy of the pattern space.
Concatenate the last word of the first line with the first word of the second and keep the second line as is. If there is no match with the email address, revert the line, print/delete the first line and repeat.
Otherwise, replace the match and re-insert the newline as of the length of the first word of the second line (deleting the first word of the second line too).
Remove the newline used for scaffolding, print/delete the first line and repeat.
N.B. The lines will not be of the same length as the originals if the replacement string length does not match the matching string length. Also there has been no attempt to break the replacement string in the same relative split if the match and replacement strings are not the same length.
Alternative:
echo $'yo#gmail.com\ne#email.com' |
sed -E 'N;s#(.*)\n(.*)#s/\\n\1/\\n\2/g#
:a;\#\\n([^/])(.*)\\n(.)?(.*/g)#{s//\1\\n\2\3\\n\4/;H;ba}
x;s/.//;s#\\n/g$#/g#gm;s#\\n/#/#;s/\./\\./g' |
sed -e 'N' -f - -e 'P;D' file
or:
echo 's/yo#gmail.com/e#email.com/' |
sed -E 'h;s#/#/\\n#g;:a;H;s/\\n([^/])/\1\\n/g;ta;x;s/\\n$//mg;s/\./\\./g' |
sed -zf - /file
N.B. With the last alternative solution, the last sed invocation can be swapped for the first alternative solutions last sed invocation.

Skipping a part of a line using sed

I have a file with content like so - #1: 00001109
Each line is of the same format. I want the final output to be #1: 00 00 11 09.
I used command in sed to introduce a space every 2 characters - sed 's/.\{2\}/& /g'. But that will give me spaces in the part before the colon too which I want to avoid. Can anyone advise how to proceed?
Could you please try following, written and tested with shown samples.
awk '{gsub(/../,"& ",$2);sub(/ +$/,"")} 1' Input_file
Explanation: First globally substituting each 2 digits pair with same value by appending space to it where gsub is globally substitution to perform it globally). Once this is done, using single sub to substitute last coming space with NULL to avoid spaces at last of lines.
With sed:
sed -E 's/[0-9]{2}/& /g;s/ +$//' Input_file
Explanation: Globally substituting each pair of digits with its same value and appending spaces to it. Then substituting space coming last space of line(added by previous substitution) with NULL.
This might work for you (GNU sed):
sed 's/[0-9][0-9]\B/& /g' file
After a pair of digits within a word, insert a space.
If perl happens to be your option, how about:
perl -pe '1 while s/(\d+)(\d\d)/$1 $2/g' file
you can use pure bash:
for line in "$(<your_file.txt)"; do
first=`echo $line | cut -d' ' -f1`" "
last=`echo $line | cut -d' ' -f2`
for char in `seq 0 2 ${#last}`; do
first+=${last:$char:2}" "
done;
done;

Sed Match Number followed by string and return Number

Hi i have a file containing the following:
7 Y-N2
8 Y-H
9 Y-O2
I want to match it with the following sed command and get the number at the beginning of the line:
abc=$(sed -n -E "s/([0-9]*)(^[a-zA-Z])($j)/\1/g" file)
$j is a variable and contains exactly Y-O2 or Y-H.
The Number is not the linenumber.
The Number is always followed by a Letter.
Before the Number are Whitespaces.
echoing $abc returns a whiteline.
Thanks
many problems here:
there are spaces, you don't account for them
the ^ must be inside the char class to make a negative letter
you're using -n option, so you must use p command or nothing will ever be printed (and the g option is useless here)
working command (I have changed -E by -n because it was unsupported by my sed version, both should work):
sed -nr "s/ *([0-9]+) +([^a-zA-Z])($j)/\1/p" file
Note: awk seems more suited for the job. Ex:
awk -v j=$j '$2 == j { print $1 }' file
Sed seems to be overly complex for this task, but with awk you can write:
awk -vk="$var" '$2==k{print $1}' file
With -vk="$var" we set the awk variable k to the value of the $var shell variable.
Then, we use the 'filter{command}' syntax, where the filter $2==k is that the second field is equal to the variable k. If we have a match, we print the first field with {print $1}.
Try this:
abc=$(sed -n "s/^ *\([0-9]*\) *Y-[OH]2*.*/\1/p" file)
Explanations:
^ *: in lines starting with any number of spaces
\([0-9]*\): following number are captured using backreference
*: after any number of spaces
Y-[OH]2*: search for Y- string followed by N or H with optional 2
\1/p: captured string \1 is output with p command

Regexp to catch string between first and second comma, where there's alphabetical character in number

First, I must mention my native language is french, so I may make english mistake!
I try to use sed to catch and delete the lines where the second item in a CSV file contains other characters then numbers.
Here is an example of a OK line :
2323421,9781550431209,,2012-07-24 13:30:57,False,2012-07-01 00:00:00,False,118,,1,246501
A line that must be deleted :
1901461,3002CAN,,2010-09-29 13:46:59,True,,True,,,,
or
2977837,9782/76132396,,2015-04-27 10:14:47,True,2015-04-26 00:00:00,True,,,,
etc...
I'm not sure this is possible to be honest!
Thank you !
Here it is using sed
sed -e '/^[^,]*,[^,]*[^0-9,]/d'
A breakdown of the pattern:
^ Start of line
[^,]*, Everything up to the first comma inclusive
[^,]* Everything which isn't a comma
[^0-9,] At least one character which isn't a number or comma
Using awk you can do this:
awk -F, '$2 ~ /^[[:digit:]]+$/' file
Or (thanks to #ghoti):
awk -F, '$2 !~ /[^[:digit:]]/' file
to get only those line where 2nd column is an integer number.
Or using sed you can do:
sed -i.bak '/^[^,]*,[[:digit:]]*[^,[:digit:]]/d' file
Perl:
perl -F, -lane 'print if $F[1] =~ /^\d+$/' file
-a autosplit line to array #F, fields start with 0
-F, splits line using commas
print the line only if field 1 contain only digits: /^\d+$/

How to use sed to replace the first space with an empty string

I am having trouble loading a space delimited text file into a table. The data in this text file is generated by teragen, and hence, is just dummy data, where there are only 2 columns, and the first column has values of random special character strings.
Example:
~~~{ZRGHS|
~~~{qahVN)
I run into a problem and get rejected rows because some of these values have a space in them as a random ASCII character, which causes it to think that there are 3 columns, when my table has 2, so they get rejected.
So, what I want to do is remove only the first space from these rejected rows, which will need to be repeated multiple times over each row, and then try to reload them. Would sed be the best way to go about this, or would something else like tr be more appropriate?
Thanks!
From what I understand, you want to remove all spaces except the last two.
You can build a regex for that, or you could use the fact that it's very easy to keep the first n occurrences:
$ echo 'one two three four' | rev | sed 's/ //2g' | rev
onetwothree four
or, with a file:
rev myfile | sed 's/ //2g' | rev
Or you could remove one space until there is only one space left:
$ echo 'one two three four' | sed ':a;/ .* /{s/ //;ba}'
onetwothree four
with a file:
sed ':a;/ .* /{s/ //;ba}' myfile
Or, if you're in the mood, you can split the line, play with it, and assemble it back (GNU sed assumed):
$ echo 'one two three four' | sed -r 's/(.*)([^ ]+) ([^ ]+)$/\1\n\2 \3/;h;s/\n.*//;s/ //g;G;s/\n.*\n//'
onetwothree four
with a file:
sed -r 's/(.*)([^ ]+) ([^ ]+)$/\1\n\2 \3/;h;s/\n.*//;s/ //g;G;s/\n.*\n//' myfile
To remove the first space from a line, use
echo "my line with spaces" | sed 's/ //'
Depending on the specifics of your approach (fixed column length? how are you adding the data?) there might be a better way to do this in a single step instead of parsing rejected rows over and over.
To strip/remove 1st character from string:
function stringStripStart {
echo ${1:1:${#1}}
}
Similar to remove traling character:
function stringStripEnd {
FINAL_LEN=${#1}-1
echo ${1:0:$FINAL_LEN}
}
Note: for empty string, some additional condition needs to be added.