sed replace regular expression match - regex

Portion of my dataset which is pipe delimited csv file:
|B20005G |77|B20005G 077|$2,500 to $4,999|
|B20005G |78|B20005G 078|$5,000 to $7,499|
|B20005G |79|B20005G 079|$7,500 to $9,999|
I match the lines of the third field with this sed expression:
sed -n '/|[[:alnum:]]\{7\} [[:digit:]]\{3\}|/p'
Now, is there a way to tell sed to delete space in the third field to get this:
|B20005G |77|B20005G077|$2,500 to $4,999|
|B20005G |78|B20005G078|$5,000 to $7,499|
|B20005G |79|B20005G079|$7,500 to $9,999|

Try this awk method
awk -F'|' 'BEGIN {OFS="|"} {sub(/ +/,"",$4)}1' FileName
OutPut:
|B20005G |77|B20005G077|$2,500 to $4,999|
|B20005G |78|B20005G078|$5,000 to $7,499|
|B20005G |79|B20005G079|$7,500 to $9,999|

with a regex like this
\([[:alnum:]]{7}\) \([[:digit:]]{3}\)
defines two groups, the ones between \( \), which we can refer to in the substitution via \1, \2, so
sed -n 's/\([[:alnum:]]\{7\}\) \([[:digit:]]\{3\}\)/\1\2/' myfile.txt
which gets rid of the space in between the two groups.

Related

Match multiple regex groups on different lines from a tex to print into csv

I have a beamer latex file, in that file some frames have the form
\frame{\frametitle{Title01}
Sub01\\
\begin{tabular}{|p{7cm}|}
\hline
\rowcolor{black}\\
\rowcolor{white}\\
\rowcolor{green}\\
\hline
\end{tabular}
}
I would like to end up with a csv format like
Title01,Sub01,black,white,green
Title02,Sub02,red,white,blue
So far I have managed to get all the titles with
sed -rn 's/^.*frametitle\{(.*)\}/\1,/pm' f.tex
I am failing to match the second group Sub01 (for now with latexlinebreak \) in the next line, a small selection of what I have tried so far
sed -rn 's/^.*frametitle\{(.*)\}\n(.*)$/\1,\2/mp' f.tex
sed -rn 's/^.*frametitle\{(.*)\}$^(.*)$/\1,\2/mp' f.tex
sed -rn 's/^.*frametitle\{(.*)(\}\n)(.*)$/\1,\3/mp' f.tex
sed -rn 's/^.*frametitle\{(.*)\}\n(.*)\n/\1,\2/mp' f.tex
all matching either just the title or nothing at all.
This might work for you (GNU sed):
sed -n '/^\\frame{\\frametitle{\(.*\)}.*/{s//\1/;h;n;s/\([^\]*\).*/\1/;H;:a;n;/^\\rowcolor{\(.*\)}.*/{s//\1/;H};/^}/!ba;g;s/\n/,/gp}' file
This is a filtering job, so use the -n option to only print what you want.
The data required exists between a line starting \frame{\frametitle{...} and ends with a line staring }.
Using the above criteria, copy the required matching data into the hold space and on encountering the end of the match, replace the current line by this copied data.
The data will be delimited by newlines, so replace these by commas and print out the result.
Like this using perl in multiline mode:
perl -0ne '
my #a = (
/.*?frametitle\{(\w+)\}\R # first line
(\w+) # second line
.*rowcolor\{(\w+).*rowcolor\{(\w+).*rowcolor\{(\w+) # other lines
/sx
);
END{
print join(",", #a) . "\n";
}
' file

How to replace text with comma [Linux on Windows]

We get these automated emails from our client that have this rough format:
VP##0-X1-#####-#[Revision #:Document title]
VP##0-X2-#####-#[Revision #:Document title]
VP##0-X3-#####-#[Revision #:Document title]
What I want to do:
replace [Revision with a comma
replace : with a comma
delete ]
So that I can convert this into a CSV and then use some excel magic to fill in our tracking sheet.
I've tried to use sed with this general format:
sed -i 's,[Revision ,\,,g' <FILE>
but I don't know how to get a comma in for this case.
This is what I want to get in the end:
VP##0-X1-#####-#,#,Document title
VP##0-X2-#####-#,#,Document title
VP##0-X3-#####-#,#,ocument title
Any and all insight is appreciated.
I'm using Ubuntu on Windows.
sed 's/\[Revision /,/;s/:/,/;s/]//' inputfile
VP##0-X1-#####-#, #,Document title
VP##0-X2-#####-#, #,Document title
VP##0-X3-#####-#, #,Document title
No need to use heavy lifting by using back-referencing or using multiple sed commands. You can issue multiple replacement commands from within single sed command:
Syntax:
sed 's/a/A/' file
sed 's/b/B/' file
sed 's/c/C/' file
Can be combined into one command:
sed 's/a/A/;s/b/B/;s/c/C/' file #note the semicolon separating multiple replace operations.
You can use:
sed -Ei 's/(.*)(\[Revision)(.*)(:)(.*)(])/\1,\3,\5/' <FILE>
Testing it with one line and an echo:
$ echo "[VP##0-X1-#####-#[Revision #:Document title]" | sed -E 's/(.*)(\[Revision)(.*)(:)(.*)(])/\1,\3,\5/'
[VP##0-X1-#####-#, #,Document title
Explanation:
'(.*)(\[Revision)(.*)(:)(.*)(])
The regular expression in the first half of the sed command is divided into 6 groups defined by ().
Group 2 (\[Revision) will match "[Revision" and group 4 (:) will match ":", the parts of the string you want to replace.
/\1,\3,\5/'
In the second part of the command, the same groups can be used as the replacement text, so I used group 1 (\1) to preserve everything before "[Revision", then use a comma ',', then use group 3 (\3) (everything between "[Revision" and ":"), a comma ",", and finally group 5 (\5). Group 6 will match the final ']', so it is not used as you wanted to remove it.
The [ must be escaped since it is a special character for regular expressions. Also, it may be better to use another character than , as separator in the sed command. This should do the trick:
sed -i 's/\[Revision /,/g' <FILE>
With sed, / is a pretty common separator. Also, square brackets are special characters and need to be escaped.
replace [Revision with a comma
sed -i 's/\[Revision /,/g' <FILE>
replace : with a comma
sed -i 's/:/,/g' <FILE>
delete ]
sed -i 's/\]//g' <FILE>

sed replace "),(" with a newline "\n"

I have a sql file and I want to separate every statement in a new line. To do this I need to replace "),(" with "\n". I tried the following but I doesn't work:
sed ’s/\),\(/\n/’ tables.sql
Thanks
Escaped parenthesis in sed are capture groups. The syntax would be:
$ echo $'(a),(b)' | sed 's/),(/\\n/g'
(a\nb)

Removing text between "|" delimiter and "," delimiter using shell script

I have a large multi-lined file that is being pulled from a database the file has fields delimited by commas and if the field has multiple values the values are separated by "|"
example input:
name,title,email1|email2|email3,phone,address
In a shell script I need to remove "|email2|email3"
example output:
name,title,email1,phone,address
I need to do this for each line in the file.
Try sed:
sed "s/\|[^,]*//g"
Result:
h2co3-macbook:~ h2co3$ echo "name,title,email1|email2|email3,phone,address" | sed "s/\|[^,]*//g"
name,title,email1,phone,address
h2co3-macbook:~ h2co3$
Using sed:
sed -i 's/|[^,]*//g' filename
Note that in most regex flavors | is a special character that specifies alternation, and to match a literal | you need to use \|. This is not the case for sed, to match a literal | you use | and for alternation you use \| (unless an extended regex option is specified).
Use sed with inline option:
sed -i.bak 's/|[^|,]*//g' inFile
Live Demo: http://ideone.com/zKUVhl
This answer splits the input into fields and outputs the ones you want.
awk -F'[|,]' -v OFS=, '{print $1, $2, $3, $(NF-1), $NF}' file

Change CSV Delimiter with sed

I've got a CSV file that looks like:
1,3,"3,5",4,"5,5"
Now I want to change all the "," not within quotes to ";" with sed, so it looks like this:
1;3;"3,5";5;"5,5"
But I can't find a pattern that works.
If you are expecting only numbers then the following expression will work
sed -e 's/,/;/g' -e 's/\("[0-9][0-9]*\);\([0-9][0-9]*"\)/\1,\2/g'
e.g.
$ echo '1,3,"3,5",4,"5,5"' | sed -e 's/,/;/g' -e 's/\("[0-9][0-9]*\);\([0-9][0-9]*"\)/\1,\2/g'
1;3;"3,5";4;"5,5"
You can't just replace the [0-9][0-9]* with .* to retain any , in that is delimted by quotes, .* is too greedy and matches too much. So you have to use [a-z0-9]*
$ echo '1,3,"3,5",4,"5,5",",6","4,",7,"a,b",c' | sed -e 's/,/;/g' -e 's/\("[a-z0-9]*\);\([a-z0-9]*"\)/\1,\2/g'
1;3;"3,5";4;"5,5";",6";"4,";7;"a,b";c
It also has the advantage over the first solution of being simple to understand. We just replace every , by ; and then correct every ; in quotes back to a ,
You could try something like this:
echo '1,3,"3,5",4,"5,5"' | sed -r 's|("[^"]*),([^"]*")|\1\x1\2|g;s|,|;|g;s|\x1|,|g'
which replaces all commas within quotes with \x1 char, then replaces all commas left with semicolons, and then replaces \x1 chars back to commas. This might work, given the file is correctly formed, there're initially no \x1 chars in it and there're no situations where there is a double quote inside double quotes, like "a\"b".
Using gawk
gawk '{$1=$1}1' FPAT="([^,]+)|(\"[^\"]+\")" OFS=';' filename
Test:
[jaypal:~/Temp] cat filename
1,3,"3,5",4,"5,5"
[jaypal:~/Temp] gawk '{$1=$1}1' FPAT='([^,]+)|(\"[^\"]+\")' OFS=';' filename
1;3;"3,5";4;"5,5"
This might work for you:
echo '1,3,"3,5",4,"5,5"' |
sed 's/\("[^",]*\),\([^"]*"\)/\1\n\2/g;y/,/;/;s/\n/,/g'
1;3;"3,5";4;"5,5"
Here's alternative solution which is longer but more flexible:
echo '1,3,"3,5",4,"5,5"' |
sed 's/^/\n/;:a;s/\n\([^,"]\|"[^"]*"\)/\1\n/;ta;s/\n,/;\n/;ta;s/\n//'
1;3;"3,5";4;"5,5"