I have a block of text that looks something like
par.dm_std;
par.dm_POM;
par.dm_CaCO3;
and I want it to look like
par.dm_std = dm_std;
par.dm_POM = dm_std;
par.dm_CaCO3 = dm_CaCO3;
So I am essentially trying to copy everything after the "." and put an equals sign before and a semicolon afterward. I tried to run a query replace with
par\.\(.*\) -> par\.\1 = \1;
but then emacs returns the error message
Invalid use of `\' in replacement text
I can't figure out for the life of me what I am doing wrong here?
By the way, this is matlab code I am working with.
You should not escape . in the replacement text. You also should have a literal ; at the end of the match expression; otherwise, it will be included in \1 and you'll get an extra semicolon before the equal sign.
Replace regexp: par\.\(.*\);
Replace with: par.\1 = \1;
Apparently, I should have used replace-regex rather than query-replace-regex.
With the former, everything just works.
Related
I have a line: "a herf = sdfsjkdhfks http://www.google.com 134"
I want to get the "http://www.google.com" part only if there is a "<" at the beginning and a ">" in the end
For now my regex is "(?i)(http)(s:| :).+\.[A-Za-z]{2,}/?"
What can I do to check if the arrow bracket exist without taking it as part of my regular expression, I mean, I do not want arrow bracket to be the output of the match"
In this case, the output should be null cause there is no arrow bracket, but if there are, I want the output to be just "www.google.com"
Thanks in advance
Include the bracket as part of your regex, then as a second step after you've found the match, strip it out of that result string before you return the result.
If you're anchoring the angled brackets to the start and end of the regex, this could be as simple as something like .substring(1,matchedString.length()-1).
This will get the link part skipping any thing at the start and end.
import re
content = "<ahref = 123 http://googl 235>"
re.findall("<a[\s]*href[\s]*=.*(http://[^> ]*)[\s]*.*>",content)
I have a regular expression that parses lines in a driver inf file to extract just the variable names and values ignoring whitespace and end of line comments that begin with a semicolon.
It looks like this:
"^([^=\s]+)[ ]*=[ ]*([^;\r\n]+)(?<! )"
Most of the time it works just fine as per the example here: regex example 1
However, when it encounters a line that has a tab character anywhere between the variable name and the equals sign, the expression fails as per the example here: regex example 2
I have tried replacing "\s" with "\t" and "\x09" and it still doesnt work. I have edited the text file that contains the tab character with a hex editor and confirmed that it is indeed ASCII "09". I don't want to use a positive character match as the variable could actually contain quite a large number of special characters.
The appearance of the literal "=" seems to cause the problem but I cannot understand why.
For example, if I strip back the expression to this: regex example 3
and use the line with the tab character in it, it works fine. But as soon as I add the literal "=" as per the example here: regex example 4, it no longer matches, appearing to ignore the tab character.
The two [ ]* match only space characters (U+0020 SPACE) and not other whitespace characters.
Change both to [ \t]* to match tabs as well. The result would now look like:
"^([^=\s]+)[ \t]*=[ \t]*([^;\r\n]+)(?<! )"
You've just added the \t tab character in the wrong part I think.
This was your example 2 (not working):
^([^=\s]+)[ ]*=[ ]*([^;\r\n]+)(?<! )
This is your example 2 ... working (with a tab):
^([^=\s]+)[ \t]*=[ ]*([^;\r\n]+)(?<! )
^^ tab here
Seems to do the trick and match your first example: http://regex101.com/r/kQ1zH4/1
^([^=\s]+)\s*=\s*([^;\r\n]+)(?<!\s)
Try this.see demo.
http://regex101.com/r/tV8oH3/2
I have a huge file, and I want to blow away everything in the file except for what matches my regex. I know I can get matches and just extract those, but I want to keep my file and get rid of everything else.
Here's my regex:
"Id":\d+
How do I say "Match everything except "Id":\d+". Something along the lines of
!("Id":\d+) (pseudo regex) ?
I want to use it with a Regex Replace function. In english I want to say:
Get all text that isn't "Id":\d+ and replace it with and empty string.
Try this:
string path = #"c:\temp.txt"; // your file here
string pattern = #".*?(Id:\d+\s?).*?|.+";
Regex rx = new Regex(pattern);
var lines = File.ReadAllLines(path);
using (var writer = File.CreateText(path))
{
foreach (string line in lines)
{
string result = rx.Replace(line, "$1");
if (result == "")
continue;
writer.WriteLine(result);
}
}
The pattern will preserve spaces between multiple Id:Number occurrences on the same line. If you only have one Id per line you can remove the \s? from the pattern. File.CreateText will open and overwrite your existing file. If a replacement results in an empty string it will be skipped over. Otherwise the result will be written to the file.
The first part of the pattern matches Id:Number occurrences. It includes an alternation for .+ to match lines where Id:Number does not appear. The replacement uses $1 to replace the match with the contents of the first group, which is the actual Id part: (Id:\d+\s?).
well, the opposite of \d is \D in perl-ish regexes. Does .net have something similar?
Sorry, but I totally don't get what your problem is. Shouldn't it be easy to grep the matches into a new file?
Yoo wrote:
Get all text that isn't "Id":\d+ and replace it with and empty string.
A logical equivalent would be:
Get all text that matches "Id":\d+ and place it in a new file. Replace the old file with the new one.
I haven't use .net before, but following works in java
System.out.println("abcd Id:12351abcdf".replaceAll(".*(Id:\\d+).*","$1"));
produces output
Id:12351
Although in true sense it doesnt match the criteria of matching everything except Id:\d+, but it does the job
I suspect this has already been answered somewhere, but I can't find it, so...
I need to extract a string from between two tokens in a larger string, in which the second token will probably appear again meaning... (pseudo code...)
myString = "A=abc;B=def_3%^123+-;C=123;" ;
myB = getInnerString(myString, "B=", ";" ) ;
method getInnerString(inStr, startToken, endToken){
return inStr.replace( EXPRESSION, "$1");
}
so, when I run this using expression ".+B=(.+);.+"
I get "def_3%^123+-;C=123;" presumably because it just looks for the LAST instance of ';' in the string, rather than stopping at the first one it comes to.
I've tried using (?=) in search of that first ';' but it gives me the same result.
I can't seem to find a regExp reference that explains how one can specify the "NEXT" token rather than the one at the end.
any and all help greatly appreciated.
Similar question on SO:
Regex: To pull out a sub-string between two tags in a string
Regex to replace all \n in a String, but no those inside [code] [/code] tag
Replace patterns that are inside delimiters using a regular expression call
RegEx matching HTML tags and extracting text
You're using a greedy pattern by not specifying the ? in it. Try this:
".+B=(.+?);.+"
Try this:
B=([^;]+);
This matches everything between B= and ; unless it is a ;. So it matches everything between B= and the first ; thereafter.
(This is a continuation of the conversation from the comments to Evan's answer.)
Here's what happens when your (corrected) regex is applied: First, the .+ matches the whole string. Then it backtracks, giving up most of the characters it just matched until it gets to the point where the B= can match. Then the (.+?) matches (and captures) everything it sees until the next part, the semicolon, can match. Then the final .+ gobbles up the remaining characters.
All you're really interested in is the "B=" and the ";" and whatever's between them, so why match the rest of the string? The only reason you have to do that is so you can replace the whole string with the contents of the capturing group. But why bother doing that if you can access contents of the group directly? Here's a demonstration (in Java, because I can't tell what language you're using):
String s = "A=abc;B=def_3%^123+-;C=123;";
Pattern p = Pattern.compile("B=(.*?);");
Matcher m = p.matcher(s);
if (m.find())
{
System.out.println(m.group(1));
}
Why do a 'replace' when a 'find' is so much more straightforward? Probably because your API makes it easier; that's why we do it in Java. Java has several regex-oriented convenience methods in its String class: replaceAll(), replaceFirst(), split(), and matches() (which returns true iff the regex matches the whole string), but not find(). And there's no convenience method for accessing capturing groups, either. We can't match the elegance of Perl one-liners like this:
print $1 if 'A=abc;B=def_3%^123+-;C=123;' =~ /B=(.*?);/;
...so we content ourselves with hacks like this:
System.out.println("A=abc;B=def_3%^123+-;C=123;"
.replaceFirst(".+B=(.*?);.+", "$1"));
Just to be clear, I'm not saying not to use these hacks, or that there's anything wrong with Evan's answer--there isn't. I just think we should understand why we use them, and what trade-offs we're making when we do.
How can I create a regular expression that will grab delimited text from a string? For example, given a string like
text ###token1### text text ###token2### text text
I want a regex that will pull out ###token1###. Yes, I do want the delimiter as well. By adding another group, I can get both:
(###(.+?)###)
/###(.+?)###/
if you want the ###'s then you need
/(###.+?###)/
the ? means non greedy, if you didn't have the ?, then it would grab too much.
e.g. '###token1### text text ###token2###' would all get grabbed.
My initial answer had a * instead of a +. * means 0 or more. + means 1 or more. * was wrong because that would allow ###### as a valid thing to find.
For playing around with regular expressions. I highly recommend http://www.weitz.de/regex-coach/ for windows. You can type in the string you want and your regular expression and see what it's actually doing.
Your selected text will be stored in \1 or $1 depending on where you are using your regular expression.
In Perl, you actually want something like this:
$text = 'text ###token1### text text ###token2### text text';
while($text =~ m/###(.+?)###/g) {
print $1, "\n";
}
Which will give you each token in turn within the while loop. The (.*?) ensures that you get the shortest bit between the delimiters, preventing it from thinking the token is 'token1### text text ###token2'.
Or, if you just want to save them, not loop immediately:
#tokens = $text =~ m/###(.+?)###/g;
Assuming you want to match ###token2### as well...
/###.+###/
Use () and \x. A naive example that assumes the text within the tokens is always delimited by #:
text (#+.+#+) text text (#+.+#+) text text
The stuff in the () can then be grabbed by using \1 and \2 (\1 for the first set, \2 for the second in the replacement expression (assuming you're doing a search/replace in an editor). For example, the replacement expression could be:
token1: \1, token2: \2
For the above example, that should produce:
token1: ###token1###, token2: ###token2###
If you're using a regexp library in a program, you'd presumably call a function to get at the contents first and second token, which you've indicated with the ()s around them.
Well when you are using delimiters such as this basically you just grab the first one then anything that does not match the ending delimiter followed by the ending delimiter. A special caution should be that in cases as the example above [^#] would not work as checking to ensure the end delimiter is not there since a singe # would cause the regex to fail (ie. "###foo#bar###). In the case above the regex to parse it would be the following assuming empty tokens are allowed (if not, change * to +):
###([^#]|#[^#]|##[^#])*###