Skipping a part of a line using sed - regex

I have a file with content like so - #1: 00001109
Each line is of the same format. I want the final output to be #1: 00 00 11 09.
I used command in sed to introduce a space every 2 characters - sed 's/.\{2\}/& /g'. But that will give me spaces in the part before the colon too which I want to avoid. Can anyone advise how to proceed?

Could you please try following, written and tested with shown samples.
awk '{gsub(/../,"& ",$2);sub(/ +$/,"")} 1' Input_file
Explanation: First globally substituting each 2 digits pair with same value by appending space to it where gsub is globally substitution to perform it globally). Once this is done, using single sub to substitute last coming space with NULL to avoid spaces at last of lines.
With sed:
sed -E 's/[0-9]{2}/& /g;s/ +$//' Input_file
Explanation: Globally substituting each pair of digits with its same value and appending spaces to it. Then substituting space coming last space of line(added by previous substitution) with NULL.

This might work for you (GNU sed):
sed 's/[0-9][0-9]\B/& /g' file
After a pair of digits within a word, insert a space.

If perl happens to be your option, how about:
perl -pe '1 while s/(\d+)(\d\d)/$1 $2/g' file

you can use pure bash:
for line in "$(<your_file.txt)"; do
first=`echo $line | cut -d' ' -f1`" "
last=`echo $line | cut -d' ' -f2`
for char in `seq 0 2 ${#last}`; do
first+=${last:$char:2}" "
done;
done;

Related

repace n occurrences of a character in a string from the end

I am struggling to come up with a solution to replace n occurrences of a character with another character in a string starting from the end of the string. For example, if I want to replace last 5 occurrences of "," with "|" in a string like
abc, def,,{"data":{"xyz":null,"uan":"5643df"},{"path":"/abc/def/xyz"}},546,453,,,
to get a result like
abc, def,,{"data":{"xyz":null,"uan":"5643df"},{"path":"/abc/def/xyz"}}|546|453|||
I have looked at multiple solution which helps you find the last occurrence or all occurrences or 5 occurrences from the beginning but nothing which helps me do it from the end of the string. Reversing the string and doing it from the beginning and then reversing the string again is not an option because of the sheer size of the file.
With GNU sed. Replace five times last comma and rest of row with pipe and rest of row (s/,([^,]*)$/|\1/):
echo 'a,b,c,d,e,f,g,h' | sed -r 's/,([^,]*)$/|\1/; s/,([^,]*)$/|\1/; s/,([^,]*)$/|\1/; s/,([^,]*)$/|\1/; s/,([^,]*)$/|\1/;'
Output:
a,b,c|d|e|f|g|h
An awk version:
echo 'a,b,c,d,e,f,g,h' | awk -F, '{printf "%s",$1;for(i=2;i<=NF;i++) printf (NF-5<i?"|%s":",%s"),$i;print ""}'
a,b,c|d|e|f|g|h
It uses a loop to print each field. Count up and find when to use , or |. Number can be changed to get other result.
Example last to field:
echo 'a,b,c,d,e,f,g,h' | awk -F, '{printf "%s",$1;for(i=2;i<=NF;i++) printf (NF-2<i?"|%s":",%s"),$i;print ""}'
a,b,c,d,e,f|g|h
This might work for you (GNU sed):
sed -E '/(,[^,]*){5}$/{s//\n&/;h;y/,/|/;H;g;s/\n.*\n//}' file
Insert a newline just before the fifth comma from the end of a line, make a copy, replace all ,'s by |'s, append the current line to the copy and remove everything between the first and last newlines.
An alternative using GNU parallel and sed:
parallel -n0 -q echo 's/\(.*\),/\1|/' ::: {1..5} | sed -f - file
N.B. The first solution only amends the a line if there are at least 5 commas whereas the second solution amends a line regardless of how many commas there are.

A sed command to swap first and last character of each line

I want to write a one liner sed command to swap first and last character of every line of file. The below shown command is not working
sed 's/\(.\)\(.+\)\(.\)/\3\2\1/' input.txt
I even tried adding start of line and end of line characters
sed 's/^\(.\)\(.+\)\(.\)$/\3\2\1/' input.txt
It doesn't seem to match anything in the file.
sed -E 's/(.)(.+)(.)/\3\2\1/' input.txt
You need to escape the +,
sed 's/^\(.\)\(.\+\)\(.\)$/\3\2\1/' input.txt
If you like to try some other, here is a gnu awk version
awk '{a=$1;$1=$NF;$NF=a}1' FS= OFS= input.txt
This sets a to the first character, then sets first to last and last to a
It needs gnu awk, since settings FS to nothing is not in standard awk
This works portable:
abcd | sed 's/^\(.\)\(.*\)\(.\)$/\3\2\1/'
you can use the .*. Prints
dbca
also works with the ad too, like
echo ad | sed 's/^\(.\)\(.*\)\(.\)$/\3\2\1/'
prints
da
The .+ isn't known for every sed e.g. for example it didn't work on OS X. Therefore I recommending to use .* or simulating the .+ with ..*, like
echo ad | sed 's/^\(.\)\(..*\)\(.\)$/\3\2\1/'
prints
ad #not swaps
echo 'are' | sed 's/\(.\)\(.*\)\(.\)/\3\2\1/'
No need of ^ nor $ becasue sed take the biggest possible by default (so the whole line)
use * instead of + because with the + you need at least a 3 char line to works where a 2 char line still should swap start and end.

pipe sed command to create multiple files

I need to get X to Y in the file with multiple occurrences, each time it matches an occurrence it will save to a file.
Here is an example file (demo.txt):
\x00START how are you? END\x00
\x00START good thanks END\x00
sometimes random things\x00\x00 inbetween it (ignore this text)
\x00START thats nice END\x00
And now after running a command each file (/folder/demo1.txt, /folder/demo2.txt, etc) should have the contents between \x00START and END\x00 (\x00 is null) in addition to 'START' but not 'END'.
/folder/demo1.txt should say "START how are you? ", /folder/demo2.txt should say "START good thanks".
So basicly it should pipe "how are you?" and using 'echo' I can prepend the 'START'.
It's worth keeping in mind that I am dealing with a very large binary file.
I am currently using
sed -n -e '/\x00START/,/END\x00/ p' demo.txt > demo1.txt
but that's not working as expected (it's getting lines before the '\x00START' and doesn't stop at the first 'END\x00').
If you have GNU awk, try:
awk -v RS='\0START|END\0' '
length($0) {printf "START%s\n", $0 > ("folder/demo"++i".txt")}
' demo.txt
RS='\0START|END\0' defines a regular expression acting as the [input] Record Separator which breaks the input file into records by strings (byte sequences) between \0START and END\0 (\0 represents NUL (null char.) here).
Using a multi-character, regex-based record separate is NOT POSIX-compliant; GNU awk supports it (as does mawk in general, but seemingly not with NUL chars.).
Pattern length($0) ensures that the associated action ({...}) is only executed if the records is nonempty.
{printf "START%s\n", $0 > ("folder/demo"++i)} outputs each nonempty record preceded by "START", into file folder/demo{n}.txt", where {n} represent a sequence number starting with 1.
You can use grep for that:
grep -Po "START\s+\K.*?(?=END)" file
how are you?
good thanks
thats nice
Explanation:
-P To allow Perl regex
-o To extract only matched pattern
-K Positive lookbehind
(?=something) Positive lookahead
EDIT: To match \00 as START and END may appear in between:
echo -e '\00START hi how are you END\00' | grep -aPo '\00START\K.*?(?=END\00)'
hi how are you
EDIT2: The solution using grep would only match single line, for multi-line it's better use perl instead. The syntax will be very similar:
echo -e '\00START hi \n how\n are\n you END\00' | perl -ne 'BEGIN{undef $/ } /\A.*?\00START\K((.|\n)*?)(?=END)/gm; print $1'
hi
how
are
you
What's new here:
undef $/ Undefine INPUT separator $/ which defaults to '\n'
(.|\n)* Dot matches almost any character, but it does not match
\n so we need to add it here.
/gm Modifiers, g for global m for multi-line
I would translate the nulls into newlines so that grep can find your wanted text on a clean line by itself:
tr '\000' '\n' < yourfile.bin | grep "^START"
from there you can take it into sed as before.

How to use sed to replace the first space with an empty string

I am having trouble loading a space delimited text file into a table. The data in this text file is generated by teragen, and hence, is just dummy data, where there are only 2 columns, and the first column has values of random special character strings.
Example:
~~~{ZRGHS|
~~~{qahVN)
I run into a problem and get rejected rows because some of these values have a space in them as a random ASCII character, which causes it to think that there are 3 columns, when my table has 2, so they get rejected.
So, what I want to do is remove only the first space from these rejected rows, which will need to be repeated multiple times over each row, and then try to reload them. Would sed be the best way to go about this, or would something else like tr be more appropriate?
Thanks!
From what I understand, you want to remove all spaces except the last two.
You can build a regex for that, or you could use the fact that it's very easy to keep the first n occurrences:
$ echo 'one two three four' | rev | sed 's/ //2g' | rev
onetwothree four
or, with a file:
rev myfile | sed 's/ //2g' | rev
Or you could remove one space until there is only one space left:
$ echo 'one two three four' | sed ':a;/ .* /{s/ //;ba}'
onetwothree four
with a file:
sed ':a;/ .* /{s/ //;ba}' myfile
Or, if you're in the mood, you can split the line, play with it, and assemble it back (GNU sed assumed):
$ echo 'one two three four' | sed -r 's/(.*)([^ ]+) ([^ ]+)$/\1\n\2 \3/;h;s/\n.*//;s/ //g;G;s/\n.*\n//'
onetwothree four
with a file:
sed -r 's/(.*)([^ ]+) ([^ ]+)$/\1\n\2 \3/;h;s/\n.*//;s/ //g;G;s/\n.*\n//' myfile
To remove the first space from a line, use
echo "my line with spaces" | sed 's/ //'
Depending on the specifics of your approach (fixed column length? how are you adding the data?) there might be a better way to do this in a single step instead of parsing rejected rows over and over.
To strip/remove 1st character from string:
function stringStripStart {
echo ${1:1:${#1}}
}
Similar to remove traling character:
function stringStripEnd {
FINAL_LEN=${#1}-1
echo ${1:0:$FINAL_LEN}
}
Note: for empty string, some additional condition needs to be added.

Extract multiple occurrences on the same line using sed/regex

I am trying to loop through each line in a file and find and extract letters that start with ${ and end with }. So as the final output I am expecting only SOLDIR and TEMP(from inputfile.sh).
I have tried using the following script but it seems it matches and extracts only the second occurrence of the pattern TEMP. I also tried adding g at the end but it doesn't help. Could anybody please let me know how to match and extract both/multiple occurrences on the same line ?
inputfile.sh:
.
.
SOLPORT=\`grep -A 4 '\[LocalDB\]' \${SOLDIR}/solidhac.ini | grep \${TEMP} | awk '{print $2}'\`
.
.
script.sh:
infile='inputfile.sh'
while read line ; do
echo $line | sed 's%.*${\([^}]*\)}.*%\1%g'
done < "$infile"
May I propose a grep solution?
grep -oP '(?<=\${).*?(?=})'
It uses Perl-style lookaround assertions and lazily matches anything between '${' and '}'.
Feeding your line to it, I get
$ echo "SOLPORT=\`grep -A 4 '[LocalDB]' \${SOLDIR}/solidhac.ini | grep \${TEMP} | awk '{print $2}'\`" | grep -oP '(?<=\${).*?(?=})'
SOLDIR
TEMP
This might work for you (but maybe only for your specific input line):
sed 's/[^$]*\(${[^}]\+}\)[^$]*/\1\t/g;s/$[^{$]\+//g'
Extracting multiple matches from a single line using sed isn't as bad as I thought it'd be, but it's still fairly esoteric and difficult to read:
$ echo 'Hello ${var1}, how is your ${var2}' | sed -En '
# Replace ${PREFIX}${TARGET}${SUFFIX} with ${PREFIX}\a${TARGET}\n${SUFFIX}
s#\$\{([^}]+)\}#\a\1\n#
# Continue to next line if no matches.
/\n/!b
# Remove the prefix.
s#.*\a##
# Print up to the first newline.
P
# Delete up to the first newline and reprocess what's left of the line.
D
'
var1
var2
And all on one line:
sed -En 's#\$\{([^}]+)\}#\a\1\n#;/\n/!b;s#.*\a##;P;D'
Since POSIX extended regexes don't support non-greedy quantifiers or putting a newline escape in a bracket expression I've used a BEL character (\a) as a sentinel at the end of the prefix instead of a newline. A newline could be used, but then the second substitution would have to be the questionable s#.*\n(.*\n.*)##, which might involve a pathological amount of backtracking by the regex engine.