I am doing this in groovy.
Input:
hip_abc_batch hip_ndnh_4_abc_copy_from_stgig abc_copy_from_stgig
hiv_daiv_batch hip_a_de_copy_from_staging abc_a_de_copy_from_staging
I want to get the last column. basically anything that starts with abc_.
I tried the following regex (works for second line but not second.
\abc_.*\
but that gives me everything after abc_batch
I am looking for a regex that will fetch me anything that starts with abc_
but I can not use \^abc_.*\ since the whole string does not start with abc_
It sounds like you're looking for "words" (i.e., sequences that don't include spaces) that begin with abc_. You might try:
/\babc_.*\b/
The \b means (in some regular expression flavors) "word boundary."
Try this:
/\s(abc_.*)$/m
Here is a commented version so you can understand how it works:
\s # match one whitepace character
(abc_.*) # capture a string that starts with "abc_" and is followed
# by any character zero or more times
$ # match the end of the string
Since the regular expression has the "m" switch it will be a multi-line expression. This allows the $ to match the end of each line rather than the end of the entire string itself.
You don't need to trim the whitespace as the second capture group contains just the text. After a cursory scan of this tutorial I believe this is the way to grab the value of a capture group using Groovy:
matcher = (yourString =~ /\s(abc_.*)$/m)
// this is how you would extract the value from
// the matcher object
matcher[0][1]
I think you are looking for this: \s(abc_[a-zA-Z_]*)$
If you are using perl and you read all lines into one string, don't forget to set the the m option on your regex (that stands for "Treat string as multiple lines").
Oh, and Regex Coach is your free friend.
Related
I need to remove everything after the colon following orange
Example:
apple:orange:banana:grapes:
Becomes:
apple:orange
I've looked up a million different references for this and cannot find a solution.
Currently doing this in Notepad++ using the Find/Replace function.
Find what : (^[a-z]+:[a-z]+).*$
(^[a-z]+:[a-z]+) First capturing group. Match alphabetic characters at start of string, a colon, alphabetic characters.
.*$ Match anything up to the end of the string.
Replace with : \1
\1 Replace with captured group one.
You could of course make the expression more general:
Find what : (^[^:]+:[^:]+).*$
(^[^:]+:[^:]+) Capturing group. Match anything other than a colon at start of string, a colon, anything other than a colon.
.*$ Match anything up to end of string.
Replace with : \1
\1 Replace with captured group one.
As pointed out by revo in the comment below, you should disable the matches newline option when using the patterns above in Notepad++.
If I understand you correctly, you can use the plugin ConyEdit to do this. You can use its command cc.dac <nth>/<regex>/[<mode>] [-options]. cc.dac means: delete after column.
For Example:
With ConyEdit running in the background, copy the text and the command line below, then paste:
apple:orange:banana:grapes:
cc.dac 2/:/ -d
How big is the file?
If it's a small file, you could probably write a simple code something like following snippet in java. Most the programming languages would support such operations.
String input = "apple:orange:banana:grapes:";
String[] arrOfStr = input.split(":");
int index = arrOfStr.indexOf("orange");
String[] arrOfStrSub = Arrays.copyOf(arrOfStr, 0, index);
String output = StringUtils.join(arrOfStrSub, ':');
I want to find all instance of word (say myword), with the added condition that the word has whitespace, "#", "#" afterwords, or is the end of input.
Input string:
"myword# myword mywordrick myword# myword"
I want the regex to match everything besides mywordtrick -
myword#
myword
myword#
myword
I am able to match against the first 3 with myword[##\s]
I thought myword[##\s\z] would match against all 4, but I only get 3
I try myword[\z] and get no matches
I try myword\z and get 1 match.
I figure \z inside a [] doesn't work, because [] is character based logic, rather than position based logic.
Is there a way to use a single regex to match the expressions I am interested in? I do not want to use both myword[##\s] and myword\z unless I really have to.
Your regex would be,
myword(?:[##\s]|$)
It matches the string myword along with the symbols only if it's followed by # or # or \s or $. $ means the end of the line.
DEMO
I am attempting to edit a csv file, below is a sample line from this file.
|MIGRATE|;|10000|;|2ACC0003|;|30/09/13|;|Positive Adjmt.|;||;|MIGRATE|;|95004U
The beginning of the line |MIGRATE| needs to be modified without changing the second MIGRATE so the line would read
|MIGRATE|;|MIG_IN|;|10000|;|2ACC0003|;|30/09/13|;|Positive Adjmt.|;||;|MIGRATE|;|95004U
There are 7700 or so lines so if I am forced to do this manually I will probably cry a little.
Thanks in advance!
Just replace all the ones you want not changed with another word temporarily, then replace the rest with what you want. I'm not sure what you're asking here, but from what I can guess this might help.
It seems like you could just search for Just search for:
^\|MIGRATE\|
And replace with:
|MIGRATE|;|MIG_IN|
Make sure you've checked 'Regular expression' in the 'Search Mode' options.
Explanation: The ^ is a begin anchor; it will match the beginning of the line, ensuring that it does not match the second |MIGRATE|. The \ characters are required to escape the | characters since they normally have special meaning in regular expressions, and you want to match a literal |.
You can use beginning of line anchors:
Find:
^(\|MIGRATE\|)
Replace with:
$1;|MIG_IN|
regex101 demo
Just make sure that you are using the regular expression mode of the Search&Replace.
If you want to be a bit fancier, you can use a positive lookbehind:
Find:
(?<=^\|MIGRATE\|)
Replace with:
;|MIG_IN|
^ Will match only at the beginning of a line.
( ... ) is called a capture group, and will save the contents of the match in variable you can use (in the first regex, I accessed the variable using $1 in the replace. The first capture gets stored to $1, the second to $2, etc.)
| is a special character meaning 'or' in regex (to match a character or group of characters or another, e.g. a|b matches a or b. As such, you need to escape it with a backslash to make a regex match a literal |.
In my second regex, I used (?<= ... ) which is called a positive lookbehind. It makes sure that the part to be matched has what's inside before it. For instance, (?<=a)b matches a b only if it has an a before it. So that the b in ab matches but not in bb.
The website I linked also explains the details of the regex and you can try out some regex yourself!
I never used TPerlRegEx before and it is my first time for regex expressions.
I am looking for a small example using TPerlRegEx in Delphi Xe2 to remove the brackets and quotes as follows:
input string:
["some text"]
result:
some text
single line, no nested brackets or quotes. I have used Regexbuddy to create and test the regex however it is not giving me the result.
This works in Regex Buddy:
Regex:
\["(.+?)"\]
Replace:
$1
Use like this:
var
RegEx: TPerlRegEx;
begin
RegEx := TPerlRegEx.Create(nil);
try
Regex.RegEx := '\["(.+?)"\]';
Regex.Subject := SubjectString; // ["any text between brackets and quotes"]
Regex.Replacement := '$1';
Regex.ReplaceAll;
Result := Regex.Subject;
finally
RegEx.Free;
end;
end;
How it works:
Match the character "[" literally «\[»
Match the character """ literally «"»
Match the regular expression below and capture its match into backreference number 1 «(.+?)»
Match any single character that is not a line break character «.+?»
Between one and unlimited times, as few times as possible, expanding as needed (lazy) «+?»
Match the character """ literally «"»
Match the character "]" literally «\]»
Created with RegexBuddy
Examples abound, including one in the documentation, so I assume the question is really about what values to assign to which properties to get the specific desired output.
Set the RegEx property to the regular expression that you want to match, and set Replacement to the value you want the matched sequences to be replaced with. One way might be to set RegEx to \[|\]|" and Replacement to the empty string. That will remove all brackets and quotation marks from anywhere in the string.
To instead remove just the pairs of brackets and quotation marks that surround the string, try setting RegEx to ^\["(.*)"\]$ and Replacement to \1. That will match the entire string, and then replace it with the first matched subexpression, which excludes the four surrounding characters. To turn a string like ["foo"] ["bar"] into foo bar, then remove the start and end anchors and add a non-greedy qualifier: \["(.*?)"\].
Once you've set up the regular expression and the replacement, then you're ready to assign Subject to the string you want to process. Finally, call ReplaceAll, and when it's finished, the new string will be in Subject again.
I'm parsing through code using a Perl-REGEX parsing engine in my IDE and I want to grab any variables that look like
$hash->{ hash_key04}
and nuke the rest of the code..
So far my very basic REGEX doesnt do what I expected
(.*)(\$hash\-\>\{[\w\s]+\})(.*)
(
\$
hash
\-\>
\{
[\w\s]+
\}
)
I know to use replace for this ($1,$2,etc), but match (.*) before and after the target string doesnt seem to capture all the rest of the code!
UPADTED:
tried matching null but of course thats too greedy.
([^\0]*)
What expression in regex should i use to look only for the string pattern and remove the rest?
The problem is I want to be left with the list of $hash->{} strings after the replace runs in the IDE.
This is better approached from the other direction. Instead of trying to delete everything you don't want, what about extracting everything you do want?
my #vars = $src_text =~ /(\$hash->\{[\w\s]+\})/g;
Breaking down the regex:
/( # start of capture group
\$hash-> # prefix string with $ escaped
\{ # opening escaped delimiter
[\w\s]+ # any word characters or space
\} # closing escaped delimiter
)/g; # match repeatedly returning a list of captures
Here is another way that might fit within your IDE better:
s/(\$hash->\{[\w\s]+\})|./$1/gs;
This regex tries to match one of your hash variables at each location, and if it fails, it deletes the next character and then tries again, which after running over the whole file will have deleted everything you don't want.
Depends on your coding language. What you want is group 2 (The second set of characters in parenthesis). In perl that would be $2, in VIM it would be \2, etc ...
It depends on the platform, but generally, replace the pattern with an empty string.
In javascript,
// prints "the la in ing"
console.log('the latest in testing'.replace(/test/g, ''));
In bash
$ echo 'the latest in testing' | sed 's/test//g'
the la in ing
In C#
Console.WriteLine(Regex.Replace("the latest in testing", "test", ""));
etc
By default the wildcard . won't match newlines. You can enable newlines in its matching set using a flag depending on what regex standard you're using and under what language/api. Or you can add them explicitly yourself by defining a character set:
[.\n\r]* <- Matches any character including newline, carriage return.
Combine this with capture groups to grab desired variables from your code and skip over lines which contain no capture group.
If you want help constructing the proper regex for your context you'll need to paste some input text and specify what the output should be.
I think you want to add a ^ to the beginning of the regex s/^.(PATTERN)(.)$/$1/ so that it starts at the beginning of the line and goes to the end, removing anything except that pattern.