Perl regex: replace all backslashes with double-backslashes - regex

Within a set of large files, I need to replace all occurrences of "\" with "\\". I'd like to use Perl for this purpose. Right now, I have the following:
perl -spi.bak -e '/s/\\/\\\\/gm' inputFile
This command was suggested to me, but it results in no change to inputFile (except an updated timestamp). Thinking that the problem might be that the "\"s were not surrounded by blanks, I tried
perl -spi.bak -e '/s/.\\./\\\\/gm' inputFile
Again, this had no effect on the file. Finally, I thought I might be missing a semicolon, so I tried:
perl -spi.bak -e '/s/.\\./\\\\/gm;' inputFile
This also has no effect. I know that my file contains "\"s, for example in the following line:
("C:\WINDOWS\system32\iac25_32.ax","Indeo audio)
I'm not sure whether there is a problem with the regex, or if something is wrong with the way I'm invoking Perl. I have a basic understanding of regexes, but I'm an absolute beginner when it comes to Perl.
Is there anything obviously wrong here? One thing I notice is that the command returns quite quickly, despite the fact that inputFile is ~10MB in size.

The hard part with handling backslashes in command lines is knowing how many processes are going to manipulate the command line - and what their quoting rules are.
On Unix, under any shell, the first command line you show would work.
You appear to be on Windows, and there, you have the DOS command 'shell' to deal with.
I would put the replacement into a file and pass that to Perl:
#!/bin/perl -spi.bak
s/\\/\\\\/g;
That should do the trick - save as 'subber.pl' and then run:
perl subber.pl file1 ...

How about this it should replace all \ with two \s.
s/\\/\\\\/g

perl -pi -e 's/\\/\\\\/g' inputfile
will replace all of them in one file

this
s/\\/\\\\/g
works for me
You've got a renegade / in the front of the substitution flag at the beginning of the regex
don't use
.\\.
otherwise it will trash whatever's before and after the \ in the file

perl -spi.bak -e 's/.\\./\\\\/gm;' inputFile
maybe?
Why did you type that leading /?

You appear to be on Windows, and
there, you have the DOS command
'shell' to deal with.
Hopefully I am not splitting hairs but Windows hasn't come with DOS for a long time now. I think ME (circa 1999/2000) was last version that still came with DOS or was built on DOS. The "command" shell is controlled by cmd.exe since XP (a sort of DOS simulation), but the affects of running a perl one-liner in a command shell might still be the same as running them in a DOS shell. Maybe someone can verify that.

Related

sed and regular expression: unexpected replacement pattern

I am trying to use a small bash script using sed to append a string, but I do not seem to be able to make it work.
I want to append a string to another string pattern:
Strings in input file:
Xabc
Xdef
Desired output:
XabcZ
XdefZ
Here is the script:
#!/bin/bash
instring="$2"
sed -r "s/${instring}/${instring}Z/g" $1
Where $1 is the file name and $2 is the string pattern I am looking for
Then I run the script:
bash script.test.sh test.txt X
output:
XZabc
XZdef
As expected.
but if I use regular expressions:
bash script.test.sh test.txt X...
All I get is:
X...Z
X...Z
So obviously it is not reading it correctly in the replacement part of the command. Smae thing if I use X[a-z09] (but there may be "_" in my strings, I want to include those as well). I had a look at several previous similar topics, but I do not seem able to implement any of the solutions correctly (bear with a newbie...). Thank you for your kind help.
EDIT: After receiving the answers from Glenn Jackman (accepted solution) and RavinderSingh13, I would like to clarify two important points for whoever is having a similar issue:
1) Glenn Jackman solution did not work because I needed to convert the text file from DOS to Unix. I did it with dos2unix , but for some reason did not work (maybe forgot to overwrite the output to the old file?). I later did it using sed -i 's/\r$//' test.txt ; that solved the issue, and Glenn's solution now works. having a dos-formatted text file has been the source of many trouble, for me at least.
2) I probably did not make clear that I only wanted to target specific lines in the input files; my example only has target strings, but the actual file has strings that I do not want to edit. That was probably the misunderstanding occurred with RavinderSingh13's script, which actually works, but targets every single line.
Hope this can help future readers. Thank you, Stackers, you saved the day once again :)
What you have (sed -r "s/${instring}/${instring}Z/g" $1) uses the variable as a pattern on the left-hand side and as plain text on the right-hand side.
What you want to do is:
sed -r "s/${instring}/&Z/g" $1
# ....................^
where the & marker is replaced by whatever text the pattern matched. In the documentation for The s Command:
[T]he replacement can contain unescaped & characters which reference the whole matched portion of the pattern space.
EDIT: In case you need to pass a regex to script then following may help, where my previous solution was only appending a character to last of the line.
cat script.ksh
value="$2"
sed "s/$value/&Z/" "$1"
After running the script:
./script.ksh X.*
XabcZ
XdefZ
After seeing OP's comment to match everything which starts from either small letter or capital letter run script in following style then.
./script.ksh [A-Za-z]+*
Could you please try following and let me know if this helps you.
cat script.ksh
value="$2"
sed "s/$/$value/" "$1"
After running script I am getting following output on terminal too.
./script.ksh Input_file Z
XabcZ
XdefZ
You could use sed -i option in above code in case you want to save output into Input_file itself too.

Linux script to parse each line, check the regex and modify the line

I'm trying to write a linux bash script that takes in input a csv file with lines written in the following format (something can be blank):
something,something,,number,something,something,something,something,something,something,,,
something,something.something,,number,something,something,something,something,something,something,,,
and i have to have as output the following format (if the lines contains . it has to separate the two substring in substring1,substring2 and remove one , character, else do nothing)
something,something,,number,something,something,something,something,something,something,,,
something,something,something,number,something,something,something,something,something,something,,,
I tried to parse each line of the file and check if it respects a regex, but the command starts a never ending loop (don't know why) and morevor don't know how to divide the substring to have as output substring1,substring2
for f in /filepath/filename.csv
do
while read p; do
if [[$p == .\..]] ; then echo $p; fi
done <$f
done
Thanks in advance!
I can't provide you with a working code at the moment but a piece of quick advice:
1. Try with tool called sed
2. Learn about "capture groups" for regex to get info on how to divide the text based on expressions.
To separate strings AWK will be useful
echo "Hello.world" | awk -F"." '{print "STR1="$1", STR2="$2 }'
Hope it will help.
As your task is more about transforming unrelated lines of text than of parsing fields of csv formatted files, sed is indeed the tool to go.
Learning to use sed properly, even for the most basic tasks, is synonym to learning regular expressions. The following invocation of sed command transforms your input sample to your expected output:
sed 's/\.\([^,]*\),/,\1/g' input.csv >output.csv
In the above example, s/// is the replacement command.
From the manpage:
s/regexp/replacement/
Attempt to match regexp against the pattern space. If successful,
replace that portion matched with replacement. [...]
Explaining the regexp and replacement of the above command is probably out of the scope for the question, so I'll finish my answer here... Hope it helps!
Ok, i managed to use regexp, but the following command seems not working again:
sed '\([^,]*\),\([^,]*\)\.\([^,]*\),,\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),\([^,]*\),/\1,\2,\3,\4,\5,\6,\7,\8,\9,\10,\11,\12,'
sed: -e expression #1, char 125: unknown command: `\'

Perl & Regex within Windows CMD Line

Is there anyway to accomplish matching + storing all in one cmd line? So instead of saving the matches to an array: i.e.
($matches) = $filecontents =~ m/.../g
...the matches would save to a *.txt file? I have been experimenting for a couple of days now, and believe that I am close to a solution. But a few nuances of Perl and Windows CMD Prompt are preventing me from accomplishing this task. Here's what I most recently tried:
% perl -p -i.bak -e "m/(?<=")(\d\.\d+)(?=")/g" filename.extension
I am a beginner with the CMD line, and I am running Windows 7 (soon to be switching over to Linux). Obviously I need to specify a file to which I can save my matches. The trouble is, this is where my knowledge drops off. Could someone give me a hand with this? Any comments are appreciated. Thank you!
If I understand correctly, you want to pull out all of the matches from an entire file, and write those results to a separate file.
This will work if the below results are what you're after. I don't have a Windows box to test on, but this should work (you might have to use double quotes on the outside of the one-liner and escape the ones inside, but I'm not sure.
This one-liner iterates without printing (-n) the 'file.txt' file, and prints a match combined with a newline if there is one into the 'results.txt' file via command-line redirection:
perl -ne 'print "$_\n" for m/(?<=")(\d\.\d+)(?=")/g' file.txt > results.txt
Input file:
$ cat file.txt
one
two "9.162"
three one "6.3"
five one six
Output file:
$ cat results.txt
9.162
6.3

Sed dynamic backreference replacement

I am trying to use sed for transforming wikitext into latex code. I am almost done, but I would like to automate the generation of the labels of the figures like this:
[[Image(mypicture.png)]]
... into:
\includegraphics{mypicture.png}\label{img-1}
For what I would like to keep using sed. The current regex and bash code I am using is the following:
__tex_includegraphics="\\\\includegraphics[width=0.95\\\\textwidth]{$__images_dir\/"
__tex_figure_pre="\\\\begin{figure}[H]\\\\centering$__tex_includegraphics"
__tex_figure_post="}\\\\label{img-$__images_counter}\\\\end{figure}"
sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"\
... but I cannot make that counter to be increased. Any ideas?
Within a more general perspective, my question would be the following: can I use a backreference in sed for creating a replacement that is different for each of the matches of sed? This is, each time sed matches the pattern, can I use \1 as the input of a function and use the result of this function as the replacement?
I know it is a tricky question and I might have to use AWK for this. However, if somebody has a solution, I would appreciate his or her help.
This might work for you (GNU sed):
sed -r ':a;/'"$PATTERN"'/{x;/./s/.*/echo $((&+1))/e;/./!s/^/1/;x;G;s/'"$PATTERN"'(.*)\n(.*)/'"$PRE"'\2'"$POST"'\1/;ba}' file
This looks for a PATTERN contained in a shell variable and if not presents prints the current line. If the pattern is present it increments or primes the counter in the hold space and then appends said counter to the current line. The pattern is then replaced using the shell variables PRE and POST and counter. Lastly the current line is checked for further cases of the pattern and the procedure repeated if necessary.
You could read the file line-by-line using shell features, and use a separate sed command for each line. Something like
exec 0<input_file
while read line; do
echo $line | sed -e "s/\[\[Image(\([^)]*\))\]\].*/$__tex_figure_pre\1$__tex_figure_post/g"
__images_counter=$(expr $__images_counter + 1)
done
(This won't work if there are multiple matches in a line, though.)
For the second part, my best idea is to run sed or grep to find what is being matched, and then run sed again with the value of the function of the matched text substituted into the command.

sed error "unterminated 's' command" troubleshooting

I am building a script that will, among other things, replace a pattern in an XML file with a folder path.
The sed command I am trying to use is:
SEDCMD="s|PATHTOEXPORT|$2|"
where $2 is the command-line parameter that has the folder path in it.
This is later called:
sed -e $SEDCMD $FILTER > $TEMPFILTER
However, on running the command, I am getting an "unterminated 's' command" error.
How can I get around this? I've tried changing the characters used to separate the regex (from / to |). And I've tried quoting (in different ways) the command-line parameter.
The shell is seeing the parsing the contents of $SEDCMD. If you’re using this from a shell script, including a Makefile, you should always protect all your expanded variables with double quotes. The double quotes will force variable interpolation but protect any shell metacharacter from further interpretation.
sed -e "$SEDCMD" "$FILTER" > "$TEMPFILTER"
I assume that $FILTER and $TEMPFILTER are filenames? I’ve quoted them, too, just in case they contain evil things like whitespace or other sorts of shell metacharacters; bizarre, yes, but it’s been known to happen. A regularly run rename 's/\s+/_/g' on filenames to clean them of whitespace, but for the others, you'll have to take a more careful approach; e.g., what to do with stars vs question marks vs brackets and parens, etc.
If you add -x and/or -v to your shell command line, you’ll get some trace debugging, which I think would likely have shown where you went amiss here.