Powershell, how to exclude the "\" character in a Regex format - regex

Based to this Topic: How do I replace a line in a file using PowerShell?
<setting name="Media.MediaLinkServerUrl" value=" "/>
The Regex to select whatever inside: value=" and "/> is
$regex = '(?<=<setting name="Media\.MediaLinkServerUrl" value=")[^"]*'
It work well!
but what about if:
<setting name="Media.MediaLinkServerUrl" value=" \/>
I tried:
$regex = '(?<=<setting name="Media\.MediaLinkServerUrl" value=")[^\]*'
and
$regex = '(?<=<setting name="Media\.MediaLinkServerUrl" value=")[^\\]*'
but it doesn't work!
I know \ is an reserved regex character, but [^.]* or [^|]* for example work well.
So how to select between \ and \ with Regex in Powershell in the example above?
A Big thx!
Ps: I can't comment or ask on the original post because i don't have 50 reputation, sorry.

Related

Regex to avoid all words between {{ and }}

I am using https://github.com/tbroadley/spellchecker-cli.
I have a JSON file that I'd like to run spellChecker on and it looks like this:
{
"abc.editGroupsMaxLengthError": "Maximum {{charLen}} characters"
}
I would like to know how can all words between {{ and }} be ignored by the spellchecker.
I tried with
[A-Za-z]+}}
as documented here https://github.com/tbroadley/spellchecker-cli#ignore-regexes to ignore regex.
but it doesn't seem to use }} or {{ for some reason.
How can this be fixed?
You can wrap your {{...}} substrings with <!-- spellchecker-disable --> / <!-- spellchecker-enable --> tags, see this Github issue.
So, make sure your JSON looks like
{
"abc.editGroupsMaxLengthError": "Maximum <!-- spellchecker-disable -->{{charLen}}<!-- spellchecker-enable --> characters"
}
And the result will be
C:\Users\admin\Documents\1>spellchecker spellchecker -f spellchecker_test.json
Spellchecking 1 file...
spellchecker_test.json: no issues found
To wrap the {{...}} strings in a certain file in Windows you could use PowerShell, e.g., for a spellchecker_test.json file:
powershell -Command "& {(Get-Content spellchecker_test.json -Raw) -replace '(?s){{.*?}}','<!-- spellchecker-disable -->$&<!-- spellchecker-enable -->' | Set-Content spellchecker_test.json}"
In *nix, Perl is preferable:
perl -0777 -i -pe 's/\{\{.*?}}/<!-- spellchecker-disable -->$&<!-- spellchecker-enable -->/s' spellchecker_test.json

How do I perform a regex test in bash that starts with spaces and includes quotation marks?

I'm trying to write a bash script that will change the fill color of certain elements within SVG files. I'm inexperienced with shell scripting, but I'm good with regexes (...in JS).
Here's the SVG tag I want to modify:
<!-- is the target because its ID is exactly "the.target" -->
<path id="the.target" d="..." style="fill:#000000" />
Here's the bash code I've got so far:
local newSvg="" # will hold newly-written SVG file content
while IFS="<$IFS" read tag
do
if [[ "${tag}" =~ +id *= *"the\.target" ]]; then
tag=$(echo "${tag}" | sed 's/fill:[^;];/fill:${color};/')
fi
newSvg="${newSvg}${tag}"
done < ${iconSvgPath} # is an argument to the script
Explained: I'm using read (splitting the file on < via custom IFS) to read the SVG content tag by tag. For each tag, I test to see if it includes an id property with the exact value I want. If it doesn't, I add this tag as-is to a newSvg string that I will later write to a file. If the tag does have the desired ID, I'll used sed to replace fill:STUFF; with fill:${myColor};. (Note that my sed is also failing, but that's not what I'm asking about here.)
It fails to find the right line with the test [[ "${tag}" =~ +id *= *"the\.target" ]].
It succeeds if I change the test to [[ "${tag}" =~ \"the\.target\" ]].
I'm not happy with the working version because it's too brittle. While I don't intend to support all the flexibility of XML, I would like to be tolerant of semantically irrelevant whitespace, as well as the id property being anywhere within the tag. Ideally, the regex I'd like to write would express:
id (preceded by at least one whitespace)
followed by zero or more whitespaces
followed by =
followed by zero or more whitespaces
followed by "the.target"
I think I'm not delimiting the regex properly inside the [[ ... =~ REGEX ]] construction, but none of the answers I've seen online use any delimiters whatsoever. In javascript, regex literals are bounded (e.g. / +id *= *"the\.target"/), so it's straightforward beginning a regex with a whitespace character that you care about. Also, JS doesn't have any magic re: *, whereas bash is 50% magic-handling-of-asterisks.
Any help is appreciated. My backup plan is maybe to try to use awk instead (which I'm no better at).
EDIT: My sed was really close. I forgot to add + after the [^;] set. Oof.
It would be much easier if you define regular expression pattern in a variable :
tag=' id = "the.target"'
pattern=' +id *= *"the\.target"'
if [[ $tag =~ $pattern ]]; then
echo matched.
fi
Thank you for giving us such a clear example that regex is not the way to solve this problem.
A SVG file is an XML file, and a possible tool to modify these is xmlstarlet.
Try this script I called modifycolor:
#!/bin/bash
# invoke as: modifycolor <svg.file> <target_id> <new_color>
xmlstarlet edit \
--update "//path[#id = '$2']/#style" --value "fill:#$3" \
"$1"
Assuming the svg file is test.svg, invoke it as:
./modifycolor test.svg the.target ff0000
You will be astonished by the result.
If you want to paste a piece of code inside your bash script, try this:
target="the.target"
newSvg=$(xmlstarlet edit \
--update "//path[#id = '${target}']/#style" --value "fill:#${myColor}" \
"${iconSvgPath}")
Thanks to folks for pointing out the mistakes in my bash-fu, I came up with this code which does what I said I wanted. I will not be marking this as the accepted answer because, as folks have observed, regex is a bad way to operate on XML. Sharing this for posterity.
local newSvg="" # will hold newly-written SVG code
while IFS="<$IFS" read tag
do
if [[ "${tag}" =~ \ +id\ *=\ *\"the\.target\" ]]; then
tag=$(echo "${tag}" | sed -E 's/fill:[^;]+;/fill:'"${color}"';/')
fi
newSvg="${newSvg}${tag}"
done < ${iconSvgPath}
Fixes:
escape the whitespace in the regex: =~ \ +id\ *=\ *
for sed, switch to double-quotes for the variable in the pattern
also for sed, I added the -E extended regex flag in order to support the negated set [^;]
Re: XML, I'll be comparing the list of available CLI-friendly XML parsers to the set of tools commonly available on my users' machines.

How to colorize expression without its boundary in shell script

Having an input such as:
./Tomcatv8.1/projects.xml: <jdbc url="jdbc:sqlserver://localhost:1433;databaseName=MMMABC" />
./Tomcatv8.2/projects.xml: <jdbc url="jdbc:sqlserver://localhost:1433;databaseName=MMMABC_New" />
./Tomcatv8.3/projects.xml: <jdbc url="jdbc:sqlserver://localhost:1433;databaseName=ABC_20170407_STG" />
./Tomcatv8.5/projects.xml: <jdbc url="jdbc:sqlserver://localhost:1433;databaseName=UPGABC_New" />
I want to colorize the database name.
I used
grep --color=auto -E "[a-zA-Z0-9_]+\""
It works quite well, except that it also highlight the final " sign, which is used as a boundary in my regexp.
How to just highlight the database name?
You may enclose the " into a lookahead and use a PCRE regex with grep:
grep --color=auto -P "[a-zA-Z0-9_]+(?=\")"
^ ^^^^^^
See the regex demo
The (?=\") only checks if the text matches the pattern, but the value is not added to the resulting match. See more about lookarounds in regex here.

How to cut html tag with content from huge multiline file with perl, sed or awk (tags in same and different lines all is mixed)? [duplicate]

This question already has answers here:
How to cut html tag from very large multiline text file with content with use perl, sed or awk?
(4 answers)
Closed 7 years ago.
I trying to clear file from <math>.*?</math>. It is easy to do it in one line but how to do it with multiline? Where in one line can be more tags or less?
I prepare some test text for Wikipedia to show problem:
: <math>A =
\begin{bmatrix}
a_{1,1} & a_{1,2} & \dots \\
a_{2,1} & a_{2,2} & \dots \\
\vdots & \vdots & \ddots
\end{bmatrix}
</math> oraz <math>B =
\begin{bmatrix}
b_{1,1} & b_{1,2} & \dots \\
b_{2,1} & b_{2,2} & \dots \\
\vdots & \vdots & \ddots
\end{bmatrix}
=
\begin{bmatrix}
B_1 \\
B_2 \\
\vdots
\end{bmatrix}
</math>,
We discuss problem on Stackoverflow and receive such good solution but not working if line contains overlapping tags like </math> oraz <math> it is correct since we have pair but it not works.
I am not expert in awk, sed, perl - only know very well regex.
Perl suggestion (not working on this example):
cat dirt-math-2.txt | perl -wlne '
unless(((/.*<math>/../<\/math>/)||0) > 1){s/<math>//;print}
' | less
Awk suggestion (not working on this example):
cat dirt-math-2.txt | awk '
sub(/<math>.*/, "") {print; cut=1}
/<\/math>/ {cut=0; next}
!cut' | less
File to parse is whole Wikipedia in Polish language so it is need be parsed without loading 6Gb into memory. Thank you in advance for any suggestion. I asked some similar question before but it is not the same.
Here's a Perl solution. It works by accumulating lines from the file into a buffer $text and then removing all <math>...</math> pairs. If the resulting buffer has no opening <math> tag then it is printed and emptied. That way, text from the file will only be stored in memory until it has no unpaired <math> tags, and normally it will contain only a single line of input
The program expects the path to the input file as a parameter on the command line. It has been tested against your sample data in this and your previous questions, and works fine
use strict;
use warnings;
my $text;
while ( <> ) {
$text .= $_;
$text =~ s/<math>.*?<\/math>//sg;
if ( $text !~ /<math>/ ) {
print $text;
$text = '';
}
}
A way with sed:
sed -r ':a;/<math>/{:b;s!<math>([^<]|<[^/]|</[^m]|</m[^a]|</ma[^t]|</mat[^h]|</math[^>])*</math>!!g;ta;N;bb;}' file
details:
:a; # defines the label "a"
/<math>/ { # condition: if the pattern space contains "<math>"
:b; # defines the label "b"
# try to replace (the ugly alternation "emulate" a non greedy quantifier)
s!<math>([^<]|<[^/]|</[^m]|</m[^a]|</ma[^t]|</mat[^h]|</math[^>])*</math>!!g;
ta; # if something is replaced go to label "a"
N; # else append the next line to the pattern space
bb; # and go to label "b"
}

How to find and replace instances with regex

I'm trying to reformat some data that I have that isn't playing well when I copy text from a pdf.
Cordless
9B12071R
CHARGER, 3.6V,LI-ION
Cordless
9B12073R
CHARGER,NI-CD,FRAMER
Framing / Sheathing tools
F28WW
WIRE COLLATED FRAMIN
Framing / Sheathing tools
N89C-1
COIL FRAMING NAILR
Framing / Sheathing tools
N80CB-HQ
I want to have it formatted like this:
Cordless 9B12071R CHARGER, 3.6V,LI-ION
Cordless 9B12073R CHARGER,NI-CD,FRAMER
....
What I'm trying to do is a find and replace that replaces the first two new lines "\n" with a tab "\t" and leaving the third "\n" in tact.
The first thing I do is replace all "\n" with "\t" which is easy. After that, I want to replace the third "\t" with "\n". How would I do that using regex?
For EditPadPro, paste this into the Search box
([A-Za-z /]+)
([A-Za-z0-9_-]+)
(.*)
Paste this into the Replace box
\1 \2 \3
And that should do it. Basically you can add carriage returns and tabs using Ctrl+Enter and Ctrl+Tab in EditPadPro.
I had to add a carriage return to your text in the question as it's missing the last line I think. All the others are in triples of data.
Alright here is the php code that does exactly as you want:
<?php
$s = "Cordless
9B12071R
CHARGER, 3.6V,LI-ION
Cordless
9B12073R
CHARGER,NI-CD,FRAMER";
$p = '/(Cordless.*?)\\n(.+?)\\n(CHARGER.+?)(\\n|$)/s';
$r = '\\1' . "\t" . '\\2' . "\t" . '\\3' . "\n";
echo preg_replace($p, $r, $s);
?>
OUTPUT:
>php -q regex.php
Cordless 9B12071R CHARGER, 3.6V,LI-ION
Cordless 9B12073R CHARGER,NI-CD,FRAMER
Is this a regex job or can you rely on the line number?
$ perl -nE 'chomp; print $_, $.%3? "\t": "\n"' file
EDIT (after comment)
If you have to do this in an editor, then this works in vim:
%s/\(.\+\)\n\(\C[A-Z0-9-]\+\)\n\(.\+\)/\1^I\2^I\3/
The important bit here is the assumption that a line that consists entirely of A-Z, 0-9 and - constitutes a part number. ^I is a tab, you type tab and vim prints ^I. (I hope your editor has this many steroids!)