looking for RexEx to replace a string within a string

looking for RexEx to replace a string within a string - regex

I have to replace a string in a text file with Powershell. The text file has this content:
app.framework.locale=de_DE
app.gf.data.profile=somevalues,somevalues[,...]
app.gf.data.profile.path=C:\somepath\somesubpath
app.basket.currencies=currencies
I want to set the profile and the profile.path. So everything behind the "=" should be replaced, regardless of how long the string behind the "=" is.
I am reading the file, replacing the string, and writing back the file with these commands:
change the profile:
(Get-Content $TXT_FILE ).Replace('app.gf.data.profile=default,',"app.gf.data.profile=default,$Man") | Out-File $TXT_FILE -Encoding ASCII
change the profile path:
(Get-Content $TXT_FILE ).Replace('app.gf.data.profile.path=',"app.gf.data.profile.path=$Path") | Out-File $TXT_FILE -Encoding ASCII
As you can see, my replacement script is not correct, the rest behind the "=" will remain. I think I need a regex or some kind of wildcards to fit my needs.

You need to use
(Get-Content $TXT_FILE) `
-replace '^(app\.gf\.data\.profile=).*', "`$1,$Man" `
-replace '^(app\.gf\.data\.profile\.path=).*', "`${1}$Path" | `
Out-File $TXT_FILE -Encoding ASCII
Here, -replace is used to enable regex replacing, and both regular expressions follow a similar logic:
^ - matches start of a line
(app\.gf\.data\.profile=) - captures a literal string (literal dots in regex must be escaped) into Group 1 ($1 or ${1})
.* - matches the rest of the line.
${1} is used in the second case as the variable $Path follows the backreference immediately, and if it starts with a digit, the replacement will not be as expected.

Related

Powershell: append text after string in file

Problem: I am trying to append a string after a tag. I got a large text file, and I only need to append some text after the tag (including the text xxxxxx) <xxxxxx>, and I cannot seem to figure it out just yet.
Currently im trying this with regex: <[(xxxxxx)]+>, which according to regex101.com does match the exact tag <xxxxxx>, but when I use this in Powershell it returns a lot of other stuff.
How can I make sure that Powershell only matches <xxxxxx> ? And to append some string after <xxxxxx> ?
Sample snippet from the text file: PredefinedSettings=<xxxxxx><abc test123 /abc></xxxxxx>
Sample PS command: Get-Content .\samplefile.ini | Select-String -Pattern "<[(xxxxxx)]+>"
Which returns the entire line PredefinedSettings=<xxxxxx><abc test123 /abc></xxxxx> instead of just <xxxxxx>

If you want to output just the matched text, you can do the following:
Select-String -Path sample.ini -Pattern '<(/?xxxxxx)>' -AllMatches | Foreach-Object {
$_.Matches.Groups[1].Value # Outputs matched text between `<>`
$_.Matches.Value # Outputs all matched text
}
The -AllMatches switch will allow matching beyond the first match. So it would return <xxxxxx> and </xxxxxx>.
If you want to replace text in a file, you can do the following:
(Get-Content .\samplefile.ini) -replace '<(/?xxxxxx)>','<$1Text>' |
Set-Content .\sampplefile.ini
If your replacement text is in a variable, you will need to escape the $ for the capture group.
$Text = 'replacement Text'
(Get-Content .\samplefile.ini) -replace '<(/?xxxxxx)>',"<`$1$Text>" |
Set-Content .\sampplefile.ini
$1 is the capture group 1 data matched within the first (). Depending on your Text, it may be wise to name your capture group. If Text is 23OtherText, <$123OtherText> will attempt to substitute capture group 123. Using a named capture group, you can do the following:
(Get-Content .\samplefile.ini) -replace '<(?<Tag>/?xxxxxx)>','<${Tag}Text>' |
Set-Content .\sampplefile.ini
/? matches zero or more / characters.
-replace will return all text not matched and all text replaced by the operator.

I hope I got your question right.
In regex Quantifiers are greedy so it will select from the first open tag to the last closing tag, you can change that by using a ?.
So your Regex will be <[(xxxxxx)]+?>.

Replace text + optional newline in file

I've been through other similar questions and tried their advice, but it wouldn't help.
I'm trying to delete a specific line of text in a text file.
My code which works
(Get-Content -Path "MyPath.txt" -Raw).Replace('this is the line', '') | Set-Content "MyPath.txt" -Encoding UTF8
Now this works but leaves an ugly empty line in the text file. I wanted to also replace an optional newline character by adding this regex at the end of the line
\n?
and this wouldn't work. The other threads made other recommendations and I've tried all combinations but just can't match. I'm using windows style ending (CRLF)
Both using -Raw and not using it
\n
\r\n
`n
`r`n
I haven't even added the regex question mark at the end (or non-capturing group in case it needs the \r\n syntax).

The [string] type's .Replace() method doesn't support regexes (regular expressions), whereas PowerShell's -replace operator does.
However, the simplest solution in this case is to take advantage of the fact that the -ne operator acts as a filter with an array-valued LHS (as other comparison operators do):
#(Get-Content -Path MyPath.txt) -ne 'this is the line' |
Set-Content MyPath.txt -Encoding UTF8
Note how Get-Content is called without -Raw in order to return an array of lines, from which -ne then filters out the line of (non)-interest; #(...), the array-subexpression operator ensures that the output is an array even if the file happens to contain just one line.
The assumption is that string 'this is the line' matches the whole line (case-insensitively).
If that is not the case, instead of -ne you could use -notlike with a wildcard expression or -notmatch with a regex (e.g.,
-notmatch 'this is the line' or -notlike '*this is the line')

Powershell regex match sequence doesn't work although it matches in Sublime Text find and replace

I am trying to create a Powershell regex statement to remove the top five lines of this output from a git diff file that has already been modified with Powershell regex.
[1mdiff --git a/uk1.adoc b/uk2.adoc</span>+++
[1mindex b5d3bf7..90299b8 100644</span>+++
[1m--- a/uk1.adoc</span>+++
[1m+++ b/uk2.adoc</span>+++
[36m## -1,9 +1,9 ##</span>+++
= Heading
Body text
Image shown because binary code doesn't show in the text
The following statement matches the text so the '= Heading' line is placed at the top of the page if I replace with nothing.
^[^=]*.[+][\n]
But in Powershell, it isn't matching the text.
Get-Content "result2.adoc" | % { $_ -Replace '^[^=]*.[+][\n]', '' } | Out-File "result3.adoc";
Any ideas about why it doesn't work in Powershell?
My overall goal is to create a diff file of two versions of an AsciiDoc file and then replace the ASCII codes with HTML/CSS code to display the resulting AsciiDoc file with green/red track changes.

The simplest - and faster - approach is to read the input file as a single, multiline string with Get-Content -Raw and let the regex passed to -replace operate across multiple lines:
(Get-Content -Raw result2.adoc) -replace '(?s)^.+?\n(?==)' |
Set-Content result3.adoc
(?s) activates in-line option s which makes . match newline (\n) characters too.
^.+?\n(?==) matches from the start of the string (^) any number of characters (including newlines) (.+), non-greedily (?)
until a newline (\n) followed by a = is found.
(?=...) is a look-ahead assertion, which matches = without consuming it, i.e., without considering it part of the substring that matched.
Since no replacement operand is passed to -replace, the entire match is replace with the implied empty string, i.e., what was matched is effectively removed.
As for what you tried:
The -replace operator passes its LHS through if no match is found, so you cannot use it to filter out non-matching lines.
Even if you match an undesired line in full and replace it with '' (the empty string), it will show up as an empty line in the output when sent to Set-Content or Out-File (>).
As for your specific regex, ^[^=]*.[+][\n] (whether or not the first ^ is followed by an ESC (0x1b) char.):
[\n] (just \n would suffice) tries to match a newline char. after a literal + ([+]), yet lines read individually with Get-Content (without -Raw) by definition are stripped of their trailing newline, so the \n will never match; instead, use $ to match the end of a line.
Instead of % (the built-in alias for the ForEach-Object cmdlet) you could have used ? (the built-in alias for the Where-Object cmdlet) to perform the desired filtering:
Get-Content result2.adoc | ? { $_ -notmatch '^\e\[' }
$_ -notmatch '^\e[' returns $True only for lines that don't start (^) with an ESC character (\e, whose code point is 0x1b) followed by a literal (\) [, thereby effectively filtering out the lines before the = Heading line.
However, the multi-line -replace command at the top is a more direct and faster expression of your intent.

Here is the code I ended up with after help from #mklement0. This Powershell script creates MS Word-style track changes for two versions of an AsciiDoc file. It creates the Diff file, uses regex to replace ASCII codes with HTML/CSS tags, removes the Diff header (thank you!), uses AsciiDoctor to create an HTML file and then PrinceXML to create a PDF file of the output that I can send to document reviewers.
git diff --color-words file1.adoc file2.adoc > result.adoc;
Get-Content "result.adoc" | % {
$_ -Replace '(=+ ?)([A-Za-z\s]+)(\[m)', '$1$2' `
-Replace '\[32m', '+++<span style="color: #00cd00;">' `
-Replace '\[31m', '+++<span style="color: #cd0000; text-decoration: line-through;">' `
-Replace '\[m', '</span>+++' } | Out-File -encoding utf8 "result2.adoc" ;
(Get-Content -Raw result2.adoc) -replace '(?s)^.+?\n(?==)', '' | Out-File -encoding utf8 "result3.adoc" ;
asciidoctor result3.adoc -o result3.html;
prince result3.html --javascript -o result3.pdf;
Read-Host -Prompt "Press Enter to exit"
Here's a screenshot of the result using some text from Wikipedia:

Regular Expressions in powershell split

I need to strip out a UNC fqdn name down to just the name or IP depending on the input.
My examples would be
\\tom.overflow.corp.com
\\123.43.234.23.overflow.corp.com
I want to end up with just tom or 123.43.234.23
I have the following code in my array which is striping out the domain name perfect, but Im still left with \\tom
-Split '\.(?!\d)')[0]

Your regex succeeds in splitting off the tokens of interest in principle, but it doesn't account for the leading \\ in the input strings.
You can use regex alternation (|) to include the leading \\ at the start as an additional -split separator.
Given that matching a separator at the very start of the input creates an empty element with index 0, you then need to access index 1 to get the substring of interest.
In short: The regex passed to -split should be '^\\\\|\.(?!\d)' instead of '\.(?!\d)', and the index used to access the resulting array should be [1] instead of [0]:
'\\tom.overflow.corp.com', '\\123.43.234.23.overflow.corp.com' |
ForEach-Object { ($_ -Split '^\\\\|\.(?!\d)')[1] }
The above yields:
tom
123.43.234.23
Alternatively, you could remove the leading \\ in a separate step, using -replace:
'\\tom.overflow.corp.com', '\\123.43.234.23.overflow.corp.com' |
ForEach-Object { ($_ -Split '\.(?!\d)')[0] -replace '^\\\\' }
Yet another alternative is to use a single -replace operation, which does not require a ForEach-Object call (doesn't require explicit iteration):
'\\tom.overflow.corp.com', '\\123.43.234.23.overflow.corp.com' -replace
'?(x) ^\\\\ (.+?) \.\D .+', '$1'
Inline option (?x) (IgnoreWhiteSpace) allows you to make regexes more readable with insignificant whitespace: any unescaped whitespace can be used for visual formatting.
^\\\\ matches the \\ (escaped with \) at the start (^) of each string.
(.+?) matches one or more characters lazily.
\.\D matches a literal . followed by something other than a digit (\d matches a digit, \D is the negation of that).
.+ matches one or more remaining characters, i.e., the rest of the input.
$1 as the replacement operand refers to what the 1st capture group ((...)) in the regex matched, and, given that the regex was designed to consume the entire string, replaces it with just that.

I'm stealing Lee_Daileys $InSTuff
but appending a RegEx I used recently
$InStuff = -split #'
\\tom.overflow.corp.com
\\123.43.234.23.overflow.corp.com
'#
$InStuff |ForEach-Object {($_.Trim('\\') -split '\.(?!\d{1,3}(\.|$))')[0]}
Sample Output:
tom
123.43.234.23
As you can see here on RegEx101 the dots between the numbers are not matched

The Select-String function uses regex and populates a MatchInfo object with the matches (which can then be queried).
The regex "(\.?\d+)+|\w+" works for your particular example.
"\\tom.overflow.corp.com", "\\123.43.234.23.overflow.corp.com" |
Select-String "(\.?\d+)+|\w+" | % { $_.Matches.Value }

while this is NOT regex, it does work. [grin] i suspect that if you have a really large number of such items, then you will want a regex. they do tend to be faster than simple text operators.
this will get rid of the leading \\ and then replace the domain name with .
# fake reading in a text file
# in real life, use Get-Content
$InStuff = -split #'
\\tom.overflow.corp.com
\\123.43.234.23.overflow.corp.com
'#
$DomainName = '.overflow.corp.com'
$InStuff.ForEach({
$_.TrimStart('\\').Replace($DomainName, '')
})
output ...
tom
123.43.234.23

Regex in Powershell fails to check for newlines

I'm trying to get the first block of releasenotes...
(See sample content in the code)
Whenever I use something simple it works, it only breaks when I try to
search across multiple lines (\n). I'm using (Get-Content $changelog | Out-String) because that gives back 1 string instead of an array from each line.
$changelog = 'C:\Source\VSTS\AcmeLab\AcmeLab Core\changelog.md'
$regex = '([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'
(Get-Content $changelog | Out-String) | Select-String -Pattern $regex -AllMatches
<#
SAMPLE:
------
v1.0.23
- Adds an IContainer API.
- Bugfixes.
v1.0.22
- Hotfix: Language operators.
v1.0.21
- Support duplicate query parameters.
v1.0.20
- Splitting up the ICommand interface.
- Fixing the referrer header empty field value.
#>
The result I need is:
v1.0.23
- Adds an IContainer API.
- Bugfixes.
Update:
Using options..
$changelog = 'C:\Source\VSTS\AcmeLab\AcmeLab Core\changelog.md'
$regex = '(?smi)([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'
Get-Content -Path $changelog -Raw | Select-String -Pattern $regex -AllMatches
I also get nothing.. (no matter if I use \n or \r\n)

Unless you're stuck with PowerShell v2, it's simpler and more efficient to use Get-Content -Raw to read an entire file as a single string; besides, Out-String adds an extra newline to the string.[1]
Since you're only looking for the first match, you can use the -match operator - no need for Select-String's -AllMatches switch.
Note: While you could use Select-String without it, it is more efficient to use the -match operator, given that you've read the entire file into memory already.
Regex matching is by default always case-insensitive in PowerShell, consistent with PowerShell's overall case-insensitivity.
Thus, the following returns the first block, if any:
if ((Get-Content -Raw $changelog) -match '(?m)^v\d+\.\d+\.\d+.*(\r?\n-\s?.*)+') {
# Match found - output it.
$Matches[0]
}
* (?m) turns on inline regex option m (multi-line), which causes anchors ^ and $ to match the beginning and end of individual lines rather than the overall string's.
\r?\n matches both CRLF and LF-only newlines.
You could make the regex slightly more efficient by making the (...) subexpression non-capturing, given that you're not interested in what it captured: (?:...).
Note that -match itself returns a Boolean (with a scalar LHS), but information about the match is recorded in the automatic $Matches hashtable variables, whose 0 entry contains the overall match.
As for what you tried:
'([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'
doesn't work, because by default $ only matches at the very end of the input string, at the end of the last line (though possibly before a final newline).
To make $ to match the end of each line, you'd have to turn on the multiline regex option (which you did in your 2nd attempt).
As a result, nothing matches.
'(?smi)([Vv][0-9]+\.[0-9]+\.[0-9]+\n)(^-.*$\n)+'
doesn't work as intended, because by using option s (single-line) you've made . match newlines too, so that a greedy subexpression such as .* will match the remainder of the string, across lines.
As a result, everything from the first block on matches.
[1] This problematic behavior is discussed in GitHub issue #14444.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

looking for RexEx to replace a string within a string - regex

Related

Powershell: append text after string in file

Replace text + optional newline in file

Powershell regex match sequence doesn't work although it matches in Sublime Text find and replace

Regular Expressions in powershell split

Regex in Powershell fails to check for newlines

Categories

Resources