Sed dynamic backreference replacement - regex

I am trying to use sed for transforming wikitext into latex code. I am almost done, but I would like to automate the generation of the labels of the figures like this:
[[Image(mypicture.png)]]
... into:
\includegraphics{mypicture.png}\label{img-1}
For what I would like to keep using sed. The current regex and bash code I am using is the following:
__tex_includegraphics="\\\\includegraphics[width=0.95\\\\textwidth]{$__images_dir\/"
__tex_figure_pre="\\\\begin{figure}[H]\\\\centering$__tex_includegraphics"
__tex_figure_post="}\\\\label{img-$__images_counter}\\\\end{figure}"
sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"\
... but I cannot make that counter to be increased. Any ideas?
Within a more general perspective, my question would be the following: can I use a backreference in sed for creating a replacement that is different for each of the matches of sed? This is, each time sed matches the pattern, can I use \1 as the input of a function and use the result of this function as the replacement?
I know it is a tricky question and I might have to use AWK for this. However, if somebody has a solution, I would appreciate his or her help.

This might work for you (GNU sed):
sed -r ':a;/'"$PATTERN"'/{x;/./s/.*/echo $((&+1))/e;/./!s/^/1/;x;G;s/'"$PATTERN"'(.*)\n(.*)/'"$PRE"'\2'"$POST"'\1/;ba}' file
This looks for a PATTERN contained in a shell variable and if not presents prints the current line. If the pattern is present it increments or primes the counter in the hold space and then appends said counter to the current line. The pattern is then replaced using the shell variables PRE and POST and counter. Lastly the current line is checked for further cases of the pattern and the procedure repeated if necessary.

You could read the file line-by-line using shell features, and use a separate sed command for each line. Something like
exec 0<input_file
while read line; do
echo $line | sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"
__images_counter=$(expr $__images_counter + 1)
done
(This won't work if there are multiple matches in a line, though.)
For the second part, my best idea is to run sed or grep to find what is being matched, and then run sed again with the value of the function of the matched text substituted into the command.

Related

How to use sed to migrate from process.env.MY_VAR to env.get('MY_VAR').required() using sed regex?

I'd like to migrate from dotenv to env-var npm package for a dozen of repositories.
Therefore I am looking for a smart and easy way to search and replace a pattern on every file.
My goal is to move from this pattern process.env.MY_VAR to env.get('MY_VAR').required()
And to move from this pattern process.env.MY_VAR || DEFAULT_VALUE to env.get('MY_VAR').required().default('DEFAULT_VALUE')
For reference, I found this command clear; grep -r "process\.env\." --exclude-dir=node_modules | sed -r -n 's|^.*\.([[:upper:]_]+).*$|\1=|p' > .env.example to generate .env.example
Apparently I can use sed -e "s/pattern/result/" <file list> but I am not sure how to catch the pattern, and return this same pattern in the result.
You have already figured out the main parts of the answer I think. But I'm unclear about what you refer to with MY_VAR. If its actually the name MY_VAR or if its just a dummy name for all var-names consisting of only uppercase characters and underscores. I expect it to be the latter on. Then you could go with something like this:
sed "s/\<process.env.\([A-Z_]*\)\>/env.get('\1').required()/" <file list>
This will read all the files and output them all to stdout with the replacement done. But I guess you should use -i for in-place replacement directly in the file (be careful!).
Since you got several replacements you could give each replacement separately like:
sed -i -e "s/pattern1/result1/" -e "s/pattern2/result2/" <file list>
NOTE: The thing described above could for sure be done in multiple other ways, this is only one solution to my interpretation of your problem!
I would suggest that you take some tutorials on regexp to start of with. It is a handy tool that is present in one form or the other in most programming languages and programming tools (sed being just one such tool).
sed -E '
s/(^|[^[:alnum:]_])process\.env\.([[:alnum:]_]+) \|\| ([[:alnum:]_]+)($|[^[:alnum:]_])/\1env.get('\''\2'\'').required().default('\''\3'\'')\4/g
s/(^|[^[:alnum:]_])process\.env\.([[:alnum:]_]+)($|[^[:alnum:]_])/\1env.get('\''\2'\'').required()\3/g
' myfile
It's essential that the two substitute commands happen in the above order, because the second pattern also matches the first pattern (which we don't want).
The pattern (^|[^[:alnum:]_]) is just a more portable version of the \< word boundary symbol.
Remember you can use the -i flag with sed to edit the file in place.
Running this on the third paragraph in your question (for example), we get:
My goal is to move from this pattern env.get('MY_VAR').required() to env.get('MY_VAR').required() And to move from this pattern env.get('MY_VAR').required().default('DEFAULT_VALUE') to env.get('MY_VAR').required().default('DEFAULT_VALUE')

sed and regular expression: unexpected replacement pattern

I am trying to use a small bash script using sed to append a string, but I do not seem to be able to make it work.
I want to append a string to another string pattern:
Strings in input file:
Xabc
Xdef
Desired output:
XabcZ
XdefZ
Here is the script:
#!/bin/bash
instring="$2"
sed -r "s/${instring}/${instring}Z/g" $1
Where $1 is the file name and $2 is the string pattern I am looking for
Then I run the script:
bash script.test.sh test.txt X
output:
XZabc
XZdef
As expected.
but if I use regular expressions:
bash script.test.sh test.txt X...
All I get is:
X...Z
X...Z
So obviously it is not reading it correctly in the replacement part of the command. Smae thing if I use X[a-z09] (but there may be "_" in my strings, I want to include those as well). I had a look at several previous similar topics, but I do not seem able to implement any of the solutions correctly (bear with a newbie...). Thank you for your kind help.
EDIT: After receiving the answers from Glenn Jackman (accepted solution) and RavinderSingh13, I would like to clarify two important points for whoever is having a similar issue:
1) Glenn Jackman solution did not work because I needed to convert the text file from DOS to Unix. I did it with dos2unix , but for some reason did not work (maybe forgot to overwrite the output to the old file?). I later did it using sed -i 's/\r$//' test.txt ; that solved the issue, and Glenn's solution now works. having a dos-formatted text file has been the source of many trouble, for me at least.
2) I probably did not make clear that I only wanted to target specific lines in the input files; my example only has target strings, but the actual file has strings that I do not want to edit. That was probably the misunderstanding occurred with RavinderSingh13's script, which actually works, but targets every single line.
Hope this can help future readers. Thank you, Stackers, you saved the day once again :)
What you have (sed -r "s/${instring}/${instring}Z/g" $1) uses the variable as a pattern on the left-hand side and as plain text on the right-hand side.
What you want to do is:
sed -r "s/${instring}/&Z/g" $1
# ....................^
where the & marker is replaced by whatever text the pattern matched. In the documentation for The s Command:
[T]he replacement can contain unescaped & characters which reference the whole matched portion of the pattern space.
EDIT: In case you need to pass a regex to script then following may help, where my previous solution was only appending a character to last of the line.
cat script.ksh
value="$2"
sed "s/$value/&Z/" "$1"
After running the script:
./script.ksh X.*
XabcZ
XdefZ
After seeing OP's comment to match everything which starts from either small letter or capital letter run script in following style then.
./script.ksh [A-Za-z]+*
Could you please try following and let me know if this helps you.
cat script.ksh
value="$2"
sed "s/$/$value/" "$1"
After running script I am getting following output on terminal too.
./script.ksh Input_file Z
XabcZ
XdefZ
You could use sed -i option in above code in case you want to save output into Input_file itself too.

Linux script to parse each line, check the regex and modify the line

I'm trying to write a linux bash script that takes in input a csv file with lines written in the following format (something can be blank):
something,something,,number,something,something,something,something,something,something,,,
something,something.something,,number,something,something,something,something,something,something,,,
and i have to have as output the following format (if the lines contains . it has to separate the two substring in substring1,substring2 and remove one , character, else do nothing)
something,something,,number,something,something,something,something,something,something,,,
something,something,something,number,something,something,something,something,something,something,,,
I tried to parse each line of the file and check if it respects a regex, but the command starts a never ending loop (don't know why) and morevor don't know how to divide the substring to have as output substring1,substring2
for f in /filepath/filename.csv
do
while read p; do
if [[$p == .\..]] ; then echo $p; fi
done <$f
done
Thanks in advance!
I can't provide you with a working code at the moment but a piece of quick advice:
1. Try with tool called sed
2. Learn about "capture groups" for regex to get info on how to divide the text based on expressions.
To separate strings AWK will be useful
echo "Hello.world" | awk -F"." '{print "STR1="$1", STR2="$2 }'
Hope it will help.
As your task is more about transforming unrelated lines of text than of parsing fields of csv formatted files, sed is indeed the tool to go.
Learning to use sed properly, even for the most basic tasks, is synonym to learning regular expressions. The following invocation of sed command transforms your input sample to your expected output:
sed 's/\.\([^,]*\),/,\1/g' input.csv >output.csv
In the above example, s/// is the replacement command.
From the manpage:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful,
replace that portion matched with replacement. [...]
Explaining the regexp and replacement of the above command is probably out of the scope for the question, so I'll finish my answer here... Hope it helps!
Ok, i managed to use regexp, but the following command seems not working again:
sed '\([^,]*\),\([^,]*\)\.\([^,]*\),,\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),/\1,\2,\3,\4,\5,\6,\7,\8,\9,\10,\11,\12,'
sed: -e expression #1, char 125: unknown command: `\'

Converting Regex to Sed

I have the following regex.
/http:\/\/([a-zA-Z0-9\-]+\.)+[a-zA-Z0-9\-]+:[a-zA-Z0-9\-]+\/[a-zA-Z]+\.[a-zA-Z]+/g
Which identifies matching URL's (https://regex101.com/r/sG9zR7/1). I need to modify it in order to be able to use it on the command line so it prints out the results. so I modified it to following
sed -n 's/.*\(http:\/\/\([a-zA-Z0-9\-]+\.\)+[a-zA-Z0-9\-]+:[a-zA-Z0-9\-]+\/[a-zA-Z]+\.[a-zA-Z]+\).*/\1/p' filename
(I was trying to add bold to the characters added but could not)
there were as follows
sed -n 's/.*( (in the beginning )
\ (For the inner parenthesis)
).*/\1/p' filename (at the end)
However, i get no results when i execute it.
Make it a habit to use a delimiter other that / when dealing with
URLs. It makes the pattern easier to read.
sed -r -n 's~.*\(http://\([a-z0-9\-]+\.\)+[a-z0-9\-]+:[a-z0-9\-]+/[a-z]+\.[a-z]+\).*~\1~ip' file
Note that I use i modifier for ignorecase.
As hwnd comments, you should put -r flag to sed command as well since your pattern requires + to be treated in a special manner.
sed -rn 's~.*(http://([a-z0-9\-]+.)*[a-z0-9\-]+:[0-9]+\/[a-z0-9]+.[a-z]+).*~\1~ip' Filename is the working command. With the assistance of the sample supplied (thank you hjpotler92) I was able to figure out the escape character did not need to be applies to certain characters. Will have to find out when and how it is applied when using the -r option.
You can achieve the same with an xpath query via xidel:
xidel file.html -e '//a/#href[fn:matches(.,"http://[^/]*:")]/fn:substring-after(.,"=")'

How to print only matches with sed?

Okay, this is an easy one, but I can't figure it out.
Basically I want to extract all links ([^<>]*) from a big html file.
I tried to do this with sed, but I get all kinds of results, just not what I want. I know that my regexp is correct, because I can replace all the links in a file:
sed 's_[^<>]*_TEST_g'
If I run that on something like
<div>A google link</div>
<div>A google link</div>
I get
<div>TEST</div>
<div>TEST</div>
How can I get rid of everything else and just print the matches instead? My preferred end result would be:
A google link
A google link
PS. I know that my regexp is not the most flexible one, but it's enough for my intentions.
Match the whole line, put the interesting part in a group, replace by the content of the group. Use the -n option to suppress non-matching lines, and add the p modifier to print the result of the s command.
sed -n -e 's!^.*\(<[Aa] [^<>]*>.*</[Aa]>\).*$!\1!p'
Note that if there are multiple links on the line, this only prints the last link. You can improve on that, but it goes beyond simple sed usage. The simplest method is to use two steps: first insert a newline before any two links, then extract the links.
sed -n -e 's!</a>!&\n!p' | sed -n -e 's!^.*\(<[Aa] [^<>]*>.*</[Aa]>\).*$!\1!p'
This still doesn't handle HTML comments, <pre>, links that are spread over several lines, etc. When parsing HTML, use an HTML parser.
If you don't mind using perl like sed it can copy with very diverse input:
perl -n -e 's+(<a href=.*?</a>)+ print $1, "\n" +eg;'
Assuming that there is only one hyperlink per line the following may work...
sed -e 's_.*&lta href=_&lta href=_' -e 's_>.*_>ed &lt&lt'EOF'
-e 's_.*&lta href=_&lta href=_' -e 's_>.*_>_'
This might work for you (GNU sed):
sed '/<a href\>/!d;s//\n&/;s/[^\n]*\n//;:a;$!{/>/!{N;ba}};y/\n/ /;s//&\n/;P;D' file