Powershell capture group - insert some text after the matching line - regex

I am trying to insert a line with powershell statement into multiple script files.
The content of the files are like these (3 cases)
"param($installPath)"
- no CRLF characters, only 1st line
"param($installPath)`r`n`"
- with CRLF characters, no 2nd line
"param($installPath)`r`n`some text on the second line"
- with CRLF characters, non-empty 2nd line
I want to insert some text (poweshell statement in the second line 'r'n$myvar = somethingelse) so it is immediately below the first line appending 'r'n characters to the first line if they don't exist
"param($installPath)`r`nmyvar = somethingelse"
- ADD CRLF character first to the 1st line and ADD $myvar = somethingelse on the 2nd line
"param($installPath)`r`n`$myvar = somethingelse"
- ONLY ADD "$myvar = somethingelse" on the 2nd line, since CRLF already exists (no need to add the ending rn)
"param($installPath)`r`n`$myvar = somethingelse`r`n`some text on the second line"**
- ADD "$myvar = somethingelse'r'n" on the 2nd line (CRLF already exists on the first line) and APPEND CRLF to it so the existing text on the second line will move to 3rd line.
I was trying to use this regular expression:
"^param(.+)(?:(rn))"
and this replacement, but with no success ($1 is the first capture group, $2 is non capture group which I ignore even if something is found and I explicitly add CRLF after $1 capture group)
"$1rnmyvar = somethingelse"
Thanks,
Rad

The following use of -replace seems to match your requirements
$content = "param(some/path)"
#$content = "param(some/path)`r`n"
#$content = "param(some/path)`r`n`some text on the second line"
$content = $content -replace "^(param\(.+\))(?:\r\n$)?",
( '$1' + "`r`nmyvar = somethingelse" )
Write-Host "`n$content"
Note that references to capture groups have to be in single quotes.
The optional, uncaptured (?:\r\n$) group ensures that CRLF is removed if there is nothing, i.e. the end of string $, following it.
Edit
If what follows param is not known, the following regex could be used instead.
It uses [^\r\n] to capture characters that are not newlines.
"^(param[^\r\n]*)(?:\r\n$)?"

Related

Regular expression not capturing optional group

I'm using the following regular expression pattern:
.*(?<line>^\s*Extends\s+#(?<extends>[_A-Za-z0-9]+)\s*$)?.*
And the following text:
Name #asdf
Extends #extendedClass
Origin #id
What I don't understand is that both of the caught group results (line and extends) are empty, but when I remove the last question mark from the expression the groups are caught.
The line group must be optional since the Extends line is not always present.
I created a fiddle using this expression, which can be accessed at https://regexr.com/4rekk
EDIT
I forgot to mention that I'm using the multiline and dotall flags along with the expression.
It's already been mentioned that the leading .* is capturing everything when you make your (?<line>) group optional. The following is not directly related to your question but it may be useful information (if not, just ignore):
You need to be careful elsewhere. You are using ^ and $ to match the start and end of lines as well as the start and end of the string. But the $ character will not consume the newline character that marks the end of a line. So:
'Line 1\nLine 2'.match(/^Line 1$^Line 2/m) returns null
while
'Line 1\nLine 2'.match(/^Line 1\n^Line 2/m) returns a match
So in your case if you were trying to capture all three lines, any of which were optional, you would write the regex for one of the lines as follows to make sure you consume the newline:
/(?<line>^\s*Extends\s+#(?<extends>[_A-Za-z0-9]+)[^\S\n]*\n)?/ms
Where you had specified \s*$, I have [^\S\n]*\n. [^\S\n]* is a double negative that says one or more non non-white space character excluding the newline character. So it will consume all white space characters except the newline character. If you wanted to look for any of the three lines in your example (any or all are optional), then the following code snippet should do it. I have used the RegExp function to create the regex so that it can be split across multiple lines. Unfortunately, it takes a string as its argument and so some backslash characters have to be doubled up:
let s = ` Name #asdf
Extends #extendedClass
Origin #id
`;
let regex = new RegExp(
"(?<line0>^\\s*Name\\s+#(?<name>[_A-Za-z0-9]+)[^\\S\\n]*\\n)?" +
"(?<line>^\\s*Extends\\s+#(?<extends>[_A-Za-z0-9]+)[^\\S\\n]*\\n)?" +
"(?<line2>^\\s*Origin\\s+#(?<id>[_A-Za-z0-9]+)[^\\S\\n]*\\n)?",
'm'
);
let m = s.match(regex);
console.log(m.groups);
The above code snippet seems to have a problem under Firefox (an invalid regex flag, 's', is flagged on a line that doesn't exist in the above snippet). See the following regex demo.
And without named capture groups:
let s = ` Name #asdf
Extends #extendedClass
Origin #id
`;
let regex = new RegExp(
"(^\\s*Name\\s+#([_A-Za-z0-9]+)[^\\S\\n]*\\n)?" +
"(^\\s*Extends\\s+#([_A-Za-z0-9]+)[^\\S\\n]*\\n)?" +
"(^\\s*Origin\\s+#([_A-Za-z0-9]+)[^\\S\\n]*\\n)?",
'm'
);
let m = s.match(regex);
console.log(m);

AS3 keep blank lines but remove leading spaces before word after line breaks

Actual:
A new line begins
Another line begins
Here's another
Expected:
A new line begins
Another line begins
Here's another
So far I have tried this which removes all the leading spaces before word after line breaks:
var regex:RegExp = /(\r?\n|\r)+(\s+|\s+$)/g;
var newText:String = abcd.replace(regex, "\n");
Alert.show(StringUtil.trim(newText));
But I'm having difficulty to set a condition to leave blank lines as they are.
A simple option is to match and remove only the spaces at the start of lines, and never newlines:
var regex:RegExp = /^[ \t]+/gm;
var newText:String = abcd.replace(regex, "");
Use the /m (multiline) flag, so ^ matches at the beginning of every line.
Match only spaces and tabs, not line breaks.
Simply remove them.
This will also remove spaces from space-only lines. If that's a problem you can use ^[ \t]+(?=\S).
Working example: https://regex101.com/r/gdMZLZ/2

How remove 1st ":" word from line in txt file?

Please see my textfile data below
roydwk27:teenaibuchytilibu5762sumonkhan:IJQRiq&76:8801627574057
deonnarsi15:latashajcclaypoolejcv5946sumonkhan:JKVWjv&20:8801627573929
ernaalo68:lindaohschletteoha1797sumonkhan:OPYZoy&84:8801628302709
dorathyshi56:fredrickaslperkinsonsle8932sumonkhan:STJKsj&30:8801621846709
londassg15:nataliaunmcredmondung5478sumonkhan:UVDEud&61:8801624792536
xiaoexu39:miriamfyboatwrightfyr3810sumonkhan:IJZAiz&47:8801626854856
I am want delete first word until :
like
roydwk27:
deonnarsi15:
ernaalo68:
dorathyshi56:
actually I am want if sumonkhan starting line then no problem but if sumonkhan line area 1st position available : with something then need remove this.
below actually data show in my .txt file
nataliaunmcredmondung5478sumonkhan:UVDEud&61:8801624792536
miriamfyboatwrightfyr3810sumonkhan:IJZAiz&47:8801626854856
all line available sumonkhan so if sumon khan starting position like this then good else delete this : full word not full line.
I hope this regex would help you. This regex deletes everything until first colon(:).
If you are reading a file then, read it line by line and run following regex on each line.
$str = 'roydwk27:teenaibuchytilibu5762sumonkhan:IJQRiq&76:8801627574057';
$str =~ s/^(?:.*?):(.*)/$1/g;
This code is in perl, you can re-write equivalent code in any other language.
See this demo at regex101.com.
^[\w\d]+:(.*)
^ // match the beginning of a line
[\w\d]+ // match any letter and any number
: // match ":" literally
( // start of the capturing group
.* // match any characters
) // end of capturing group
Now in all your matches in the first group you have the text you want matched. Note the g (global) and m (multiline) modifiers.

Regular expression to get only the first word from each line

I have a text file
#sp_id int,
#sp_name varchar(120),
#sp_gender varchar(10),
#sp_date_of_birth varchar(10),
#sp_address varchar(120),
#sp_is_active int,
#sp_role int
Here, I want to get only the first word from each line. How can I do this? The spaces between the words may be space or tab etc.
Here is what I suggest:
Find what: ^([^ \t]+).*
Replace with: $1
Explanation: ^ matches the start of line, ([^ \t]+) matches 1 or more (due to +) characters other than space and tab (due to [^ \t]), and then any number of characters up to the end of the line with .*.
See settings:
In case you might have leading whitespace, you might want to use
^\s*([^ \t]+).*
I did something similar with this:
with open('handles.txt', 'r') as handles:
handlelist = [line.rstrip('\n') for line in handles]
newlist = [str(re.findall("\w+", line)[0]) for line in handlelist]
This gets a list containing all the lines in the document,
then it changes each line to a string and uses regex to extract the first word (ignoring white spaces)
My file (handles.txt) contained info like this:
JoIyke - personal twitter link;
newMan - another twitter handle;
yourlink - yet another one.
The code will return this list:
[JoIyke, newMan, yourlink]
Find What: ^(\S+).*$
Replace by : \1
You can simply use this to get the first word.Here we are capturing the first word in a group and replace the while line by the captured group.
Find the first word of each line with /^\w+/gm.

Visual Studio regex to remove all comments and blank lines in VB.NET code using a macro

I was trying to remove all comments and empty lines in a file with the help of a macro. Now I came up with this solution which deletes the comments(there is some bug described below) but is not able to delete the blank lines in between -
Sub CleanCode()
Dim regexComment As String = "(REM [\d\D]*?[\r\n])|(?<SL>\'[\d\D]*?[\r\n])"
Dim regexBlank As String = "^[\s|\t]*$\n"
Dim replace As String = ""
Dim selection As EnvDTE.TextSelection = DTE.ActiveDocument.Selection
Dim editPoint As EnvDTE.EditPoint
selection.StartOfDocument()
selection.EndOfDocument(True)
DTE.UndoContext.Open("Custom regex replace")
Try
Dim content As String = selection.Text
Dim resultComment As String = System.Text.RegularExpressions.Regex.Replace(content, regexComment, replace)
Dim resultBlank As String = System.Text.RegularExpressions.Regex.Replace(resultComment, regexBlank, replace)
selection.Delete()
selection.Collapse()
Dim ed As EditPoint = selection.TopPoint.CreateEditPoint()
ed.Insert(resultBlank)
Catch ex As Exception
DTE.StatusBar.Text = "Regex Find/Replace could not complete"
Finally
DTE.UndoContext.Close()
DTE.StatusBar.Text = "Regex Find/Replace complete"
End Try
End Sub
So, here is what it should looks like before and after running the macro.
BEFORE
Public Class Class1
Public Sub New()
''asdasdas
Dim a As String = "" ''asdasd
''' asd ad asd
End Sub
Public Sub New(ByVal strg As String)
Dim a As String = ""
End Sub
End Class
AFTER
Public Class Class1
Public Sub New()
Dim a As String = ""
End Sub
Public Sub New(ByVal strg As String)
Dim a As String = ""
End Sub
End Class
There are mainly two main problems with the macro
It cannot delete the blank lines in between.
If there is a piece of code which goes like this
Dim a as String = "Name='Soham'"
Then After running the macro it becomes
Dim a as String = "Name='"
To get rid of a line that contains whitespace or nothing, you can use this regex:
(?m)^[ \t]*[\r\n]+
Your regex, ^[\s|\t]*$\n would work if you specified Multiline mode ((?m)), but it's still incorrect. For one thing, the | matches a literal |; there's no need to specify "or" in a character class. For another, \s matches any whitespace character, including TAB (\t), carriage-return (\r), and linefeed (\n), making it needlessly redundant and inefficient. For example, at the first blank line (after the end of the first Sub), the ^[\s|\t]* will initially try to match everything before the word Public, then it will back off to the end of the previous line, where the $\n can match.
But a blank line, in addition to being empty or containing only horizontal whitespace (spaces or TABs), may also contain a comment. I choose to treat these "comment-only" lines as blank lines because it's relatively easy to do, and it simplifies the task of matching comments in non-blank lines, which is much harder. Here's my regex:
^[ \t]*(?:(?:REM|')[^\r\n]*)?[\r\n]+
After consuming any leading horizontal whitespace, if I see a REM or ' signifying a comment, I consume that and everything after it until the next line separator. Notice that the only thing that's required to be present is the line separator itself. Also notice the absence of the end anchor, $. It's never necessary to use that when you're explicitly matching the line separators, and in this case it would break the regex. In Multiline mode, $ matches only before a linefeed (\n), not before a carriage-return (\r). (This behavior of the .NET flavor is incorrect and rather surprising, given Microsoft's longstanding preference for \r\n as a line separator.)
Matching the remaining comments is a fundamentally different task. As you've discovered, simply searching for REM or ' is no good because you might find it in a string literal, where it does not signify the start of a comment. What you have to do is start from the beginning of the line, consuming and capturing anything that's not the beginning of a comment or a string literal. If you find a double-quote, go ahead and consume the string literal. If you find a REM or ', stop capturing and go ahead and consume the rest of the line. Then you replace the whole line with just the captured portion--i.e., everything before the comment. Here's the regex:
(?mn)^(?<line>[^\r\n"R']*(("[^"]*"|(?!REM)R)[^\r\n"R']*)*)(REM|')[^\r\n]*
Or, more readably:
(?mn) # Multiline and ExplicitCapture modes
^ # beginning of line
(?<line> # capture in group "line"
[^\r\n"R']* # any number of "safe" characters
(
(
"[^"]*" # a string literal
|
(?!REM)R # 'R' if it's not the beginning of 'REM'
)
[^\r\n"R']* # more "safe" characters
)*
) # stop capturing
(?:REM|') # a comment sigil
[^\r\n]* # consume the rest of the line
The replacement string would be "${line}". Some other notes:
Notice that this regex does not end with [\r\n]+ to consume the line separator, like the "blank lines" regex does.
It doesn't end with $ either, for the same reason as before. The [^\r\n]* will greedily consume everything before the line separator, so the anchor isn't needed.
The only thing that's required to be present is the REM or '; we don't bother matching any line that doesn't contain a comment.
ExplicitCapture mode means I can use (...) instead of (?:...) for all the groups I don't want to capture, but the named group, (?<line>...), still works.
Gnarly as it is, this regex would be a lot worse if VB supported multiline comments, or if its string literals supported backslash escapes.
I don't do VB, but here's a demo in C#.
I've just checked with the two examples from above, '+{.+}$ should do. Optionally, you could go with ('|'')+{.+}$ but the first solution also replaces the xml-descriptions ).
''' <summary>
''' Method Description
''' </summary>
''' <remarks></remarks>
Sub Main()
''first comment
Dim a As String = "" 'second comment
End Sub
Edit: if you use ('+{.+}$|^$\n) it deletes a) all comments and b) all empty lines. However, if you have a comment and a End Sub/Function following, it takes it up one line which results in a compiler error.
Before
''' <summary>
'''
''' </summary>
''' <remarks></remarks>
Sub Main()
''first comment
Dim a As String = "" 'second comment
End Sub
''' <summary>
'''
''' </summary>
''' <returns></returns>
''' <remarks></remarks>
Public Function asdf() As String
Return "" ' returns nothing
End Function
After
Sub Main()
Dim a As String = ""
End Sub
Public Function asdf() As String
Return ""
End Function
Edit: To delete any empty lines Search Replace the following regex ^$\n with empty.
Delete the comments first using this regex
'+\s*(\W|\w).+
'+ - one or more ' for the beginning of each comment.
\s* - if there are spaces after the comment.
(\W|\w).+ - anything that follows except for line terminators.
Then remove the blank lines left using the regex Mr. Alan Moore provided.