Find the first parameter of specific method calls with regex - regex

I have a some methods in a text file starting with
#Include("myparameter1", data);
bla1("myparameter2", data);
#Include("myparameter3", data);
That is my regex:
#"\((.*?)\)
It finds myparameter1, 2 and 3.
I only want to get literally myparameter1 and 3 without the quotes myparameter1 and myparameter3
I also tried to prepend Include before the regex but it has no effect.
if regex is too hard for getting the myparameter1 it would even be ok if only the Include stuff would work because I could split the result at the "," and then trim the quotes...

Assuming there is no escaped double-quotes within the double-quotes, you could match the required substring using ([^"]+), which means match and capture one or more characters that are not double-quotes.
C# string:
"#Include\(\"([^\"]*)\""
C# verbatim string :
#"#Include\(""([^""]*)"""
The required result is in Groups[1], for example:
foreach (Match match in matches) {
Console.WriteLine(match.Groups[1].Value);
}

Related

Need RegEx Pattern to get text between delimiters at start of text

My source text could be any number of characters between "[" an "]" at the beginning of the line. I will have ONLY one line.
For example:
[1] and some other text here
[10] more text, but maybe some brackets [KEY]
[1000000] a lot more text
I want to match/return the text between the "[" and "]".
EDIT AFTER ANSWER PROVIDED
The first answer, provided by #nickb worked for me with this AppleScript:
Note that I had to convert the RegEx to a quoted string to use in AS. This uses the Satimage AppleScript Additions find text command, which provides the RegEx engine for AppleScript.
set strRegEx to "^\\[(.*?)\\]" -- Original: "^\[(.*?)\]"
set strTextToSearch to "[10] My Note title with [KEY] "
set strCaptureGroup to find text strRegEx in strTextToSearch using {"\\1"} with regexp and string result
log strCaptureGroup
-->10
The most simple regex you could use would be this:
^\[(.*?)\]
You can see it matching your input here.
Alternatively a pure AppleScript solution
set theText to "[1] and some other text here
[10] more text, but maybe some brackets [KEY]
[1000000] a lot more text"
set resultList to {}
set {TID, text item delimiters} to {text item delimiters, "]"}
repeat with aLine in (get paragraphs of theText)
if aLine starts with "[" then set end of resultList to text 2 thru -1 of text item 1 of aLine
end repeat
set text item delimiters to TID
resultList -- {"1", "10", "1000000"}
I think this will fit your criteria:
^\[([^]]*)\].*
With the stuff in brackets in the first matching group returned.
You can try runing the following reg. exp. on each line:
[^\[]\w+[^\]]
I tested it at regex101 and it matches the contents inside the [], excluding the brackets.
/^\[(.*?)\]/
is really the most simple regex for this case, but it matches surrounding brackets too.
The exact value (without brackets) is stored in 1st capture group.
If you don't want to match brackets, you will need this:
/(?<=^\[).*?(?=\])/
… unless you're using JavaScript – unfortunately, JS doesn't support lookbehinds.
In this case you'll need this regex:
/^[^\[\]]+/
(assuming that every input will start with […] component, and will not be empty)
The regex to use depends on how you are going to use it for the input it will parse. Some of the answers here have a trailing .* and some do not. Both are correct, it just depends on what exactly you are trying to match, and crucially how you ask it about a match. For example, in Java, with the regex ^\[(.*?)\], if you feed it the whole string "[1000000] a lot more text" and call matches(), it will return false because the regex pattern does not account for any of the trailing text outside the brackets. However, if you call find() after feeding in the same string, it will match because find() works on each substring as it parses and will return true on the first match it hits, while matches() will only return true if the entire input matches the regex. find() will also find subsequent substring matches to the regex in the string each time find() is called until the parser reaches the end of the input.
Personally, I like to use regex that account for the entire input and use capture groups to isolate the actual text I want to grab from the input. But your mileage may vary.

VIM - Replace based on a search regex

I've got a file with several (1000+) records like :
lbc3.*'
ssa2.*'
lie1.*'
sld0.*'
ssdasd.*'
I can find them all by :
/s[w|l].*[0-9].*$
What i want to do is to replace the final part of each pattern found with \.*'
I can't do :%s//s[w|l].*[0-9].*$/\\\\\.\*' because it'll replace all the string, and what i need is only replace the end of it from
.'
to
\.'
So the file output is llike :
lbc3\\.*'
ssa2\\.*'
lie1\\.*'
sld0\\.*'
ssdasd\\.*'
Thanks.
In general, the solution is to use a capture. Put \(...\) around the part of the regex that matches what you want to keep, and use \1 to include whatever matched that part of the regex in the replacement string:
s/\(s[w|l].*[0-9].*\)\.\*'$/\1\\.*'/
Since you're really just inserting a backslash between two strings that you aren't changing, you could use a second set of parens and \2 for the second one:
s/\(s[w|l].*[0-9].*\)\(\.\*'\)$/\1\\\2/
Alternatively, you could use \zs and \ze to delimit just the part of the string you want to replace:
s/s[w|l].*p0-9].*\zs\ze\*\'$/\\/

regular expression matching issue

I've got a string which has the following format
some_string = ",,,xxx,,,xxx,,,xxx,,,xxx,,,xxx,,,xxx,,,"
and this is the content of a text file called f
I want to search for a specific term within the xxx (let's say that term is 'silicon')
note that the xxx can all be different and can contain any special characters (including meta characters) except for a new line
match = re.findall(r",{3}(.*?silicon.*?),{3}", f.read())
print match
But this doesn't seem to work because it returns results which are in the format:
["xxx,,,xxx,,,xxx,,,xxx,,,silicon", "xxx,,,xxx,,,xxx,,,xxsiliconxx"] but I only want it to return ["silicon", "xxsiliconxx"]
What am I doing wrong?
Try the following regex:
(?<=,{3})(?:(?!,{3}).)*?silicon.*?(?=,{3})
Example:
>>> s = ',,,xxx,,,silicon,,,xxx,,,xxsiliconxx,,,xxx'
>>> re.findall(r'(?<=,{3})(?:(?!,{3}).)*?silicon.*?(?=,{3})', s)
['silicon', 'xxsiliconxx']
I am assuming that the content in the xxx can contain commas, just not three consecutive commas or it would end the field. If the content in the xxx sections cannot contain any commas, you can use the following instead:
(?<=,{3})[^,\r\n]*?silicon.*?(?=,{3})
The reason your current approach doesn't work is that even though .*? will try to match as few characters as possible, the match will still start as early as possible. So for example the regex a*?b would match the entire string "aaaab". The only time the regex will advance the starting position is when the regex fails to match, and since ,,, can be matched by the .*?, your match will always start at the beginning of the string or just after the previous match.
The lookbehind and lookahead are used to address the issue raised by JaredC in comments, basically re.findall() won't return overlapping matches, so you need the leading and trailing ,,, to not be a part of the match.

Regex for excluding characters

I'm trying to strip a string of all special characters except a few, plus remove everything between brackets (square, or any other, Including the brackets!). My current regex is:
^[a-zA-Z0-9äöüÄÖÜ;#.]*$
\\[.+\\]
\\<.+\\>
\\s+
All sequences that match one of the above are removed
It works fine on e.g.:
Foo Bar[Foo.Bar#google.com]
reducing it too FooBar but not on e.g.:
Foo
foo#bar.com
removing them completely
Update: Updating regex as per OP's edit.
You can use the following regex and replace the match with empty string.
\[.*?\]|<.*?>|\s|[^a-zA-Z0-9äöüÄÖÜ;#.]
To remove anything between brackets except brackets, you could use the following regex and replace it with an empty string:
/\[[^\]]*\]/
To remove special characters, you could use the one below. It selects everything except what is inside the brackets. So you could once again replace it with the empty string.
/[^a-zA-Z0-9äöüÄÖÜ;#]/
You could use them in sequence or build a bigger one.
In Ruby, I have the following test:
irb(main):001:0> s = "Foo Bar[Foo.Bar#google.com]"
=> "Foo Bar[Foo.Bar#google.com]"
irb(main):005:0* s.gsub(/\[[^\]]*\]|[^a-zA-Z0-9äöüÄÖÜ;#]/, "")
=> "FooBar"
Note that the space has disappeared.

Matching everything except a specified regex

I have a huge file, and I want to blow away everything in the file except for what matches my regex. I know I can get matches and just extract those, but I want to keep my file and get rid of everything else.
Here's my regex:
"Id":\d+
How do I say "Match everything except "Id":\d+". Something along the lines of
!("Id":\d+) (pseudo regex) ?
I want to use it with a Regex Replace function. In english I want to say:
Get all text that isn't "Id":\d+ and replace it with and empty string.
Try this:
string path = #"c:\temp.txt"; // your file here
string pattern = #".*?(Id:\d+\s?).*?|.+";
Regex rx = new Regex(pattern);
var lines = File.ReadAllLines(path);
using (var writer = File.CreateText(path))
{
foreach (string line in lines)
{
string result = rx.Replace(line, "$1");
if (result == "")
continue;
writer.WriteLine(result);
}
}
The pattern will preserve spaces between multiple Id:Number occurrences on the same line. If you only have one Id per line you can remove the \s? from the pattern. File.CreateText will open and overwrite your existing file. If a replacement results in an empty string it will be skipped over. Otherwise the result will be written to the file.
The first part of the pattern matches Id:Number occurrences. It includes an alternation for .+ to match lines where Id:Number does not appear. The replacement uses $1 to replace the match with the contents of the first group, which is the actual Id part: (Id:\d+\s?).
well, the opposite of \d is \D in perl-ish regexes. Does .net have something similar?
Sorry, but I totally don't get what your problem is. Shouldn't it be easy to grep the matches into a new file?
Yoo wrote:
Get all text that isn't "Id":\d+ and replace it with and empty string.
A logical equivalent would be:
Get all text that matches "Id":\d+ and place it in a new file. Replace the old file with the new one.
I haven't use .net before, but following works in java
System.out.println("abcd Id:12351abcdf".replaceAll(".*(Id:\\d+).*","$1"));
produces output
Id:12351
Although in true sense it doesnt match the criteria of matching everything except Id:\d+, but it does the job