Regular expression extraction in text editors

Regular expression extraction in text editors - regex

I'm kind of new to programming, so forgive me if this is terribly obvious (which would be welcome news).
I do a fair amount of PHP development in my free time using pregmatch and writing most of my expressions using the free (open source?) Regex Tester.
However frequently I find myself wanting to simply quickly extract something and the only way I know to do it is to write my expression and then script it, which is probably laughable, but welcome to my reality. :-)
What I'd like is something like a simple text editor that I can feed my expression to (given a file or a buffer full of pasted text) and have it parse the expression and return a document with only the results.
What I find is usually regex search/replace functions, as in Notepad++ I can easily find (and replace) all instances using an expression, but I simply don't know how to only extract it...
And it's probably terribly obvious, can expression match only the inverse? Then I could use something like (just the expression I'm currently working on):
([^<]*)
And replace everything that doesn't match with nothing. But I'm sure this is something common and simple, I'd really appreciate any poniters.
FWIW I know grep and I could do it using that, but I'm hoping their are better gui'ified solution I'm simply ignorant of.
Thanks.
Zach
What I was hoping for would be something that worked in a more standard set of gui tools (ie, the tools I might already be using). I appreciate all the responses, but using perl or vi or grep is what I was hoping to avoid, otherwise I would have just scripted it myself (of course I did) since their all relatively powerful, low-level tools.
Maybe I wasn't clear enough. As a senior systems administrator the cli tools are familiar to me, I'm quite fond of them. Working at home however I find most of my time is spent in a gui, like Netbeans or Notepad++. I just figure there would be a simple way to achieve the regex based data extraction using those tools (since in these cases I'd already be using them).
Something vaguely like what I was referring to would be this which will take aa expression on the first line and a url on the second line and then extract (return) the data.
It's ugly (I'll take it down after tonight since it's probably riddled with problems).
Anyway, thanks for your responses. I appreciate it.

If you want a text editor with good regex support, I highly recommend Vim. Vim's regex engine is quite powerful and is well-integrated into the editor. e.g.
:g!/regex/d
This says to delete every line in your buffer which doesn't match pattern regex.
:g/regex/s/another_regex/replacement/g
This says on every line that matches regex, do another search/replace to replace text matching another_regex with replacement.
If you want to use commandline grep or a Perl/Ruby/Python/PHP one-liner any other tool, you can filter the current buffer's text through that tool and update the buffer to reflect the results:
:%!grep regex
:%!perl -nle 'print if /regex/'

Have you tried nregex.com ?
http://www.nregex.com/nregex/default.aspx
There's a plugin for Netbeans here, but development looks stalled:
http://wiki.netbeans.org/Regex
http://wiki.netbeans.org/RegularExpressionsModuleProposal
You might also try The Regulator:
http://sourceforge.net/projects/regulator/

Most regex engines will allow you to match the opposite of the regex.
Usually with the ! operator.

I know grep has been mentioned, and you don't want a cli tool, but I think ack deserves to be mentioned.
ack is a tool like grep, aimed at
programmers with large trees of
heterogeneous source code.
ack is written purely in Perl, and
takes advantage of the power of Perl's
regular expressions.

A good text editor can be used to perform the actions you are describing. I use EditPadPro for search and replace functionality and it has some other nice feaures including code coloring for most major formats. The search panel functionality includes a regular expression mode that allows you to input a regex then search for the first instance which identifies if your expression matches the appropriate information then gives you the option to replace either iteratively or all instances.
http://www.editpadpro.com

My suggestion is grep, and cygwin if you're stuck on a Windows box.
echo "text" | grep ([^<]*)
OR
cat filename | grep ([^<]*)

What I'd like is something like a
simple text editor that I can feed my
expression to (given a file or a
buffer full of pasted text) and have
it parse the expression and return a
document with only the results.
You have just described grep. This is exactly what grep does. What's wrong with it?

Related

How to match and replace n number of times with RegEx

I'm using TextWrangler, the free version of BBEdit on the Mac, which I understand uses the PCRE engine.
What I want to do is match a specific number of lines and replace as well.
After a lot of searching I came up with this:
(^(.*\r)){25}
This lets me match up to 25 lines. It works great, but the problem comes when I want to actually replace something. I can't figure out how to do it.
For example, I would like to replace all of the returns "\r" with tabs "\t".
Hopefully this is actually possible. I'd appreciate any help. Thanks!

Regexp domain is searching. You cannot replace using regexp; a programming language or editor can use regexp as the search part of its search-and-replace function. Thus, the way to do 25 replacements is purely in the domain of said programming language or editor. If it does not provide such capability, either directly in search-and-replace or as a macro/loop/other, then you cannot do it automatically.

Is there a function to create a regex pattern from a string input?

I'm lousy at regular expressions but occasionally they're the only thing that's the right solution for a problem.
Is there something in the .NET framework that allows you to input an unencoded string and get a pattern from it? Which you could then modify as required?
e.g. I want to remove a CDATA section that contains a file from some XML but I can't work out what the right pattern is for <![CDATA[hugepileofrandombinarydataherethatalsoneedstogo]]> and I don't want to ask for help each time I'm stuck on a regex pattern.

Such tools exist, google by "regex generator".
But, as suggested in comments, better learn regex. Simple patterns are easy. Something like <!\[.*?]]>
in your case.

There are Regex Design tools like expresso...
http://www.ultrapico.com/expresso.htm

It's not perfect but as there is no suitable .Net component the text to regex page at txt2re.com is the best I've seen for those people who occasionally need to build a regex to match a string but don't have the time to relearn regex each time they want to use one.

Regex - match a string not contain a 'semi-word'

I tried to make regex syntax for that but I failed.
I have 2 variables
PlayerInfo[playerid][pLevel]
and
Character[playerid]
and I want to catch only the second variable,I mean only the world what don't contain PlayerInfo, but cointains [playerid]
"(\S+)\[playerid\]" cath both words and (\S+[^PlayerInfo])\[playerid\] jump on some variables- they contais p,l,a,y ...
I need to replace in notepad++,all variables like Text[playerid] to ExClass [playerid][Text]

Couple Pluasible solutions.
List item
Notepad has a plugin called python script. Running regex from there
gives full regex functionality, the python version anyway, and a lot
of powerful potential beyond that. And I use the online python regex tester to help out.
RegRexReplace plugin helps create regex plugins in Notepad++, so when you do hit a limitation, you find out a lot quicker.
Or of course default to your alternate editor (I'm assuming you have
one?) or this online regex tool is absolutely amazing. You
can perform the action on the text online as well.
(I'd try to build a regex for you, but I'm a bit lost as to what you're looking for. Unless the Ivo Abeloos got it. If you're still coming up short, maybe a code example along with values displayed?)
Good luck!

It seems that Notepad++ support negative lookbehind since v6.

In notepad++ you could try to replace (.+)\[(.+)\] with ExClass\[\2\]\[\1\]

Try to use negative lookbehind.
(?<!PlayerInfo)\[playerid\]
EDIT: unfortunately notepad++ does not support negative lookbehind.
I tried to make a workaround based on the following naive idea:
(.[^o]|[^f]o)[playerid]
But this expression does not work either. Notepad++ seems to fail in alternative operator. Thus the answer is: it is impossible to do exactly what you want. Try to solve the problem in other way or use alternative tool.

Do calculation on captured number in regex before using it in replacement

Using a regex, I am able to find a bunch of numbers that I want to replace. However, I want to replace the number with another number that is calculated using the original - captured - number.
Is that possible in notepad++ using a kind of expression in the replacement-part?
Edit: Maybe a strange thought, but could the calculation be done in the search part, generating a second captured number that would effectively be the result?

Even if it is possible, it will almost certainly be "messy" - why not do the replacements with a simple script instead? For example..
#!/usr/bin/env ruby
f = File.new("f1.txt", File::RDWR)
contents = f.read()
contents.gsub!(/\d+/){|m|
m.to_i + 1 # convert the current match to an integer, and add one
}
f.truncate(0) # empty the existing file
f.seek(0) # seek to the start of the file, before writing again
f.write(contents) # write modified file
f.close()
..and the output:
$ cat f1.txt
This was one: 1
This two two: 2
$ ruby replacer.rb
$ cat f1.txt
This was one: 2
This two two: 3
In reply to jeroen's comment,
I was actually interested if the possibility existed in the regular expression itself as they are so widespread
A regular expression is really just a simple pattern matching syntax. To do anything more advanced than search/replace with the matches would be up to the text-editors, but the usefulness of this is very limited, and can be achieved via scripting most editors allow (Notepad++ has a plugin system, although I've no idea how easy it is to use).
Basically, if regex/search-and-replace will not achieve what you want, I would say either use your editors scripting ability or use an external script.

Is that possible in notepad++ using a kind of expression in the replacement-part?
Interpolated evaluation of regular-expression matches is a relatively advanced feature that I probably would not expect to find in a general-purpose text editing application. I played around with Notepad++ a bit but was unable to get this to work, nor could I find anything in the documentation that suggests this is possible.

Hmmm... I'd have to recommend AWK to do this.
http://en.wikipedia.org/wiki/AWK

notepad++ has limited regular expressions built in. There are extensions that add a bit more to the regular expression find and replace, but I've found those hard to use. I would recommend writing a little external program to do it for you. Either Ruby, Perl or Python would be great for it. If you know those languages. I use Ruby and have had lots of success with it.

Regex Search and Replace Program

Is there a simple and lightweight program to search over a text file and replace a string with regex?

For searching: grep - simple and fast. Included with Linux, here's a Windows version, not sure about Mac.
For replacing: sed. Here's a Windows version, not sure about Mac.
Of course, if you want to actually open up a file and see its contents while you search and replace, you can use emacs for that. Or ConTEXT. Or vim. Or what have you. ;)
See also this question.

Perl excels at this, with its -i, -n, -p and -e switches. See the slides from my talk Field Guide To The Perl Command Line Switches for examples.
Others have mentioned sed and awk, and it's no surprise that Perl was inspired by them. However, Perl may well be easier to get and install for you and/or your users.

There's also sed, which is a useful tool to learn the basics of - great for doing quick regex based substitutions.
Quick example, to change "foo" to "bar" in input.txt ...
sed -e 's/foo/bar/g' input.txt > output.txt

Many decent text editors have the option as well, vim, emacs, EditPlus and so on.

sed or awk. I recommend the book sed&awk to master the subject or the booklet sed&awk pocket reference for a quick reference. Of course mastering regular expressions is a must...

You didn't mention what platform you're using... If you are interested in a relatively simple GUI tool, there's regexxer. Otherwise, the commandline tools such as sed that were mentioned earlier can be very useful.

It depends if you're dealing with one or many files. At the risk of being pilloried, I'm assuming you're using Windows because you didn't specify a platform.
For one file at a time, Notepad2 does the trick and is extremely fast, lightweight and portable.
For search/replace over multiple files at once, try Agent Ransack.

Try WildGem: http://www.skytopia.com/software/wildgem
I'm the creator. Small, super-fast, portable and self-contained. You can use Regex, but it also has its own simple language syntax to make queries much easier in theory.
I quote:
Unlike similar programs, WildGem is fast with a dual split display, and updates or highlights matches as you type in realtime. A unique colour coded syntax allows you to easily find/replace text without worrying about having to escape special symbols.
Here's a screenshot:

NOt knowing the platform, I'd say the ad that popped-up pon this page might be appropriate: PowerGREP. Don't know anything about it, but it sounds similar to what you're looking for.

Use emacs or xemacs. It has a perfect regexp replacement function. You can even use constructions like /1 (or /2 or /3) to get a matched expression back in your replacement that was identified with ( ) around them. To prevent a vi-emacs clash: vi will also have similar constructions. I'm not sure of any modern editors that support this functionality.
Tip: Try out a simple replacement first, it can be a bit unclear as you might up add '\' to escape the special RegExp constructions...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js