I'm looking for an application/text editor that [closed] - regex

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
can best help me systematically modify the "replace" field of a regex search as it encounters each match.
For example, I have an xml file that needs the phrase "id = $number" inserted at regular points in the text, and basically, $number++ each time the regex matches (id = 1, id = 2, etc) until the end of the file.
I know I could just write a bash/perl/python script or some such, but I'd like it to be at least moderately user-friendly so I could teach my intelligent (but less technically-inclined) workers how to use it and make their own modifications. Regexing is not a problem for them.
The closest I've come so far is Notepad++'s Column Editor and 'increase [number] by' function, but with this I have to write a separate regex to align everything, add the increments, and then write another to put it back. Unfortunately, I need to use this function on too many different types of files and 'replace's to make macros feasible.
Ideally, the program would also be available for both Windows & Linux (WINE is acceptable but native is much preferred), and have a 'VI/VIM input' option (if it's a text editor), but these are of secondary importance.
Of course, it'd be nice if there is an OSS solution, and I'd be glad to donate $20-$50 to the developer(s) if it provides the solution I'm looking for.
Apologies for the length, and thanks so much for your help!

emacs (version 22 and later) can do what you're looking for. See Steve Yegge's blog for a really interesting read about it. I think this should work:
M-x replace-regexp
Replace regexp: insert pattern regexp here
Replace regexp with: id = \#
\# is a special metacharacter that gets replaced by the total number of replacements that have occurred so far, starting from 0. If you want the list to start from 1 instead of 0, use the following replacement string:
id = \,(1+ \#)

JEdit can probably help you:
http://www.jedit.org/
you can do all kinds of regex and even bean result based replacing with it.

UltraEdit32 is great and I believe it has the features you need. There is a free 30-day download so you can make sure. :)

I know you want an app available on Windows/Linux, but there's another solution on Mac : TextWrangler, and it's free.

Take a look at UltraEdit32. It's very good. Not free, but available in Windows, Linux and Mac platforms. It has regex based search & replace.

This script should let you do what you want in Vim.

Vim functions can do the incrementing number trick and aren't too hard to write. For example the Vim wiki says how to do this. See also :h sub-replace-\=.
function! Counter()
let i = g:c
let g:c = g:c + 1
return i
endfunction
:let c=1|%s/<\w\+\zs/\=' id="' . Counter() . '"'/g
We've probably left user-friendliness long behind at this point but Vim's Ruby support can do this kind of thing easily too:
:ruby c=0
:rubydo $_.gsub!(/<\w+/){|m| c += 1; m + ' id="' + c.to_s + '"'}
Or Perl:
:perl $c=1
:perldo s/<\w+/$& . ' id="' . $c++ . '"'/eg

To me, this sounds like it might be a job for awk, rather than a job for an editor.

Related

Regular expressions in ignored_words in Sublime Text 3 spell_check?

I'm trying to spellcheck a latex file. I would like the spellchecker to ignore strings containing a number. In my settings file I have
"ignored_words":
[
"textbf",
"renewenvironment",
etc...
]
If I add something like ".*[0-9].*" to "ignored_words" it doesn't seem to do anything. Is there a way to accomplish this?
It is not possible to use regex in spell checking at this point.
ST uses Hunspell as its spell checker. Adding regex to Hunspell is an open feature request. Not being closed means there is some hope that it may be on a long term enhancement list, maybe.
Until Hunspell adds this capability it seems impossible to achieve what you are seeking in ST.
Keeping an eye on the feature request may be worth it to see if there is any progress.

Regex: finding a number between a range with decimals [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 8 years ago.
Improve this question
I can not for the life of me get my head around this regex stuff after a few days of fiddling around I find myself seeking help from those wiser than I. Could any of you kind souls write me a line(s) that will find and match a number between 0.00 and x.xx? I do need the decimals however so hopefully this can be done.
I actually tried using
(\b|^)(0.00|0.01|0.02)(\b|$)
until x.xx and so forth but I couldn't fit the rest of it in because I need it to go into the 100.00+. Would anyone mind whipping something up real quick for me? : ) I would appreciate it more than you can imagine! Thanks very much for your time.
Ray.
Edit:
So i forgot to explain what I'm trying to achieve here, I'm using it in conjunction with a Chrome addon called Page Monitor (life saver folks try it out when you have time to kill!) which pings every time an a website updates, this also works for shares but I'm trying to make it only alert me when the price drops below a certain point eg $4.99 per share, will (\b|^)([0-9]+\.[0-9]{2})(\b|$) and ([0-9]+.[0-9]+) suffice?
Why isn't this good enough: ([0-9]+\.[0-9]+) ?
If you can give an example of input and what is the output you expect, it would be easier to write a regex.
Updated: $ sign is a reserved character in RegEx, it means end-of-line, so you need to use \$, if you plan on using it.
So your regex would be \$([0-9]+\.[0-9]+), this would capture your $4.99 and $5.10, etc, not just $4.99
Regexs in general are good at capturing data, less at analyzing it, but if you must, you can do this to determine when the price goes below $4.99 =>
\$(([0-3]\.[0-9]+)|(4\.[0-8][0-9])|(4\.9[0-8]))
It should be obvious that its a waste of resource :)
Didn't provide enough info but this will match if the number is the entire value or if it is within a larger string and the number is not withing something else like "foo8.9bar". This will match any 1 or more digit number on the left side of the decimal and exactly 2 numbers on the right side
(\b|^)([0-9]+\.[0-9]{2})(\b|$)
(\b|^) and (\b|$) are redundant because \b implies ^ and $.
this regex: (\d+\.\d{2}) should do it.

how to do vi search and replace within a range in sublime text

I enabled vintage mode on sublime text.. but there are some important vim commands that are lacking.. so let's say I want to do a search and replace like so
:10,25s/searchedText/toReplaceText/gc
so I wanna search searchedText and replace it with toReplaceText from lines 10 to 25 and be prompted every time (ie yes/no)..
how do I do this with Sublime Text? everytime I hit : it gives me this funny menu.. any way around that?
If you so much would like to see vim in action, try the other way around; ie enable sublime stuff in vim.
Here are 2 links that might come in handy:
subvim and vim multiple cursors (Which is one amazing feature in sublime that lacks in native vim).
Hope that gets you creative ;)
Unfortunately vintage mode does not understand ranges. The best way I know how to do this is with incremental search:
highlight the first occurrence of searchedText on line 10
hit cmnd/ctrl D to have Sublime find the next occurence
If you you want the next occurrence ignored, hit cmnd/ctrl K
Once you have highlighted all the occurrences, you can replace them all at once, as Sublime has left cursors behind on every occurrence you opted in on.
VintageEx gives you a Vim-like command-line where you can at least perform substitutions. Well, that's how far I went when trying it. I don't know how extended the subset of Vim commands it implements is but I'd guess that it's not as large as the original and, like with Vintage, probably different and unsettling enough to keep a relatively experienced Vimmer out.
Anyway, I just tried it again and indeed you can more or less do the kind of substitution you are looking for, which instantly makes ST a lot more useful:
:3,5s/foo/bar/g
:.,5s/bar/foo/g
:,5/foo/bar/g
:,+5/bar/foo/g
Unfortunately, it doesn't support the /c flag.
a plugin named vintageous offers more features including search function. It's available in package control
although this question is answered.. i figured this would add some value
the full functionality of vi search/replace is possible with the ruby mine IDE, once you install the ideavim plugin. The idea is perfect for ruby on rails by the way.

Regular expression to search for Gadaffi [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I'm trying to search for the word Gadaffi, which can be spelled in many different ways. What's the best regular expression to search for this?
This is a list of 30 variants:
Gadaffi
Gadafi
Gadafy
Gaddafi
Gaddafy
Gaddhafi
Gadhafi
Gathafi
Ghadaffi
Ghadafi
Ghaddafi
Ghaddafy
Gheddafi
Kadaffi
Kadafi
Kaddafi
Kadhafi
Kazzafi
Khadaffy
Khadafy
Khaddafi
Qadafi
Qaddafi
Qadhafi
Qadhdhafi
Qadthafi
Qathafi
Quathafi
Qudhafi
Kad'afi
My best attempt so far is:
\b[KG]h?add?af?fi$\b
But I still seem to be missing some variants. Any suggestions?
Easy... (Qadaffi|Khadafy|Qadafi|...)... it's self-documented, maintainable, and assuming your regexp engine actually compiles regular expressions (rather than interpreting them), it will compile to the same DFA that a more obfuscated solution would.
Writing compact regular expressions is like using short variable names to speed up a program. It only helps if your compiler is brain-dead.
\b[KGQ]h?add?h?af?fi\b
Arabic transcription is (Wiki says) "Qaḏḏāfī", so maybe adding a Q. And one H ("Gadhafi", as the article (see below) mentions).
Btw, why is there a $ at the end of the regex?
Btw, nice article on the topic:
Gaddafi, Kadafi, or Qaddafi? Why is the Libyan leader’s name spelled so many different ways?.
EDIT
To match all the names in the article you've mentioned later, this should match them all. Let's just hope it won't match a lot of other stuff :D
\b(Kh?|Gh?|Qu?)[aeu](d['dt]?|t|zz|dhd)h?aff?[iy]\b
One interesting thing to note from your list of potential spellings is that there's only 3 Soundex values for the contained list (if you ignore the outlier 'Kazzafi')
G310, K310, Q310
Now, there are false positives in there ('Godby' also is G310), but by combining the limited metaphone hits as well, you can eliminate them.
<?
$soundexMatch = array('G310','K310','Q310');
$metaphoneMatch = array('KTF','KTHF','FTF','KHTF','K0F');
$text = "This is a big glob of text about Mr. Gaddafi. Even using compound-Khadafy terms in here, then we might find Mr Qudhafi to be matched fairly well. For example even with apostrophes sprinkled randomly like in Kad'afi, you won't find false positives matched like godfrey, or godby, or even kabbadi";
$wordArray = preg_split('/[\s,.;-]+/',$text);
foreach ($wordArray as $item){
$rate = in_array(soundex($item),$soundexMatch) + in_array(metaphone($item),$metaphoneMatch);
if ($rate > 1){
$matches[] = $item;
}
}
$pattern = implode("|",$matches);
$text = preg_replace("/($pattern)/","<b>$1</b>",$text);
echo $text;
?>
A few tweaks, and lets say some cyrillic transliteration, and you'll have a fairly robust solution.
Using CPAN module Regexp::Assemble:
#!/usr/bin/env perl
use Regexp::Assemble;
my $ra = Regexp::Assemble->new;
$ra->add($_) for qw(Gadaffi Gadafi Gadafy Gaddafi Gaddafy
Gaddhafi Gadhafi Gathafi Ghadaffi Ghadafi
Ghaddafi Ghaddafy Gheddafi Kadaffi Kadafi
Kaddafi Kadhafi Kazzafi Khadaffy Khadafy
Khaddafi Qadafi Qaddafi Qadhafi Qadhdhafi
Qadthafi Qathafi Quathafi Qudhafi Kad'afi);
say $ra->re;
This produces the following regular expression:
(?-xism:(?:G(?:a(?:d(?:d(?:af[iy]|hafi)|af(?:f?i|y)|hafi)|thafi)|h(?:ad(?:daf[iy]|af?fi)|eddafi))|K(?:a(?:d(?:['dh]a|af?)|zza)fi|had(?:af?fy|dafi))|Q(?:a(?:d(?:(?:(?:hd)?|t)h|d)?|th)|u(?:at|d)h)afi))
I think you're over complicating things here. The correct regex is as simple as:
\u0627\u0644\u0642\u0630\u0627\u0641\u064a
It matches the concatenation of the seven Arabic Unicode code points that forms the word القذافي (i.e. Gadaffi).
If you want to avoid matching things that no-one has used (ie avoid tending towards ".+") your best approach would be to create a regular expression that's just all the alternatives (eg. (Qadafi|Kadafi|...)) then compile that to a DFA, and then convert the DFA back into a regular expression. Assuming a moderately sensible implementation that would give you a "compressed" regular expression that's guaranteed not to contain unexpected variants.
If you've got a concrete listing of all 30 possibilities, just concatenate them all together with a bunch of "ors". Then you can be sure that it only matches the exact things you've listed, and no more. Your RE engine will probably be able to optimize in further, and, well, with 30 choices even if it doesn't it's still not a big deal. Trying to fiddle around with manually turning it into a "clever" RE can't possibly turn out better and may turn out worse.
(G|Gh|K|Kh|Q|Qh|Q|Qu)(a|au|e|u)(dh|zz|th|d|dd)(dh|th|a|ha|)(\x27|)(a|)(ff|f)(i|y)
Certainly not the most optimized version, split on syllables to maximize matches while trying to make sure we don't get false positives.
Well since you are matching small words why don't you try a similarity search engine with the Levenshtein distance? You can allow at most k insertions or deletions. This way you can change the distance function to other things that work better for your specific problem. There are many functions available in the simMetrics library.
A possible alternative is the online tool for generate regular expressions from examples http://regex.inginf.units.it.
Give it a chance!
Why not do a mixed approach? Something between a list of all possibilities and a complicated Regex that matches far too much.
Regex is about pattern matching and I can't see a pattern for all variants in the list. Trying to do so, will also find things like "Gazzafy" or "Quud'haffi" which are most probably not a used variant and definitly not on the list.
But I can see patterns for some of the variants, and so I ended up with this:
\b(?:Gheddafi|Gathafi|Kazzafi|Kad'afi|Qadhdhafi|Qadthafi|Qudhafi|Qu?athafi|[KG]h?add?h?aff?[iy]|Qad[dh]?afi)\b
At the beginning I list the ones where I can't see a pattern, then followed by some variants where there are patterns.
See it here on www.rubular.com
I know this is an old question, but...
Neither of these two regexes is the prettiest, but they are optimized and both match ALL the variations in the original post.
"Little Beauty" #1
(?:G(?:a(?:d(?:d(?:af[iy]|hafi)|af(?:f?i|y)|hafi)|thafi)|h(?:ad(?:daf[iy]|af?fi)|eddafi))|K(?:a(?:d(?:['dh]a|af?)|zza)fi|had(?:af?fy|dafi))|Q(?:a(?:d(?:(?:(?:hd)?|t)h|d)?|th)|u(?:at|d)h)afi)
"Little Beauty" #2
(?:(?:Gh|[GK])adaff|(?:(?:Gh|[GKQ])ad|(?:Ghe|(?:[GK]h|[GKQ])a)dd|(?:Gadd|(?:[GKQ]a|Q(?:adh|u))d|(?:Qad|(?:Qu|[GQ])a)t)h|Ka(?:zz|d'))af)i|(?:Khadaff|(?:(?:Kh|G)ad|Gh?add)af)y
Rest in Peace, Muammar.
Just an addendum: you should add "Gheddafi" as alternate spelling. So the RE should be
\b[KG]h?[ae]dd?af?fi$\b
[GQK][ahu]+[dtez]+\'?[adhz]+f{1,2}(i|y)
In parts:
[GQK]
[ahu]+
[dtez]+
\'?
[adhz]+
f{1,2}(i|y)
Note: Just wanted to give a shot at this.
What else starts with Q, G, or K, has a d, z or t in the middle, and ends in "fi" the people actually search for?
/\b[GQK].+[dzt].+fi\b/i
Done.
>>> print re.search(a, "Gadasadasfiasdas") != None
False
>>> print re.search(a, "Gadasadasfi") != None
True
>>> print re.search(a, "Qa'dafi") != None
True
Interesting that I'm getting downvoted. Can someone leave some false positives in the comments?

I'm going to be teaching a few developers regular expressions - what are some good homework problems? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I'm thinking of presenting questions in the form of "here is your input: [foo], here are the capture groups/results: [bar]" (and maybe writing a small script to test their answers for my results).
What are some good regex questions to ask? I need everything from beginner questions like "validate a 4 digit number" to "extract postal codes from addresses".
A few that I can think off the top of my head:
Phone numbers in any format e.g. 555-5555, 555 55 55 55, (555) 555-555 etc.
Remove all html tags from text.
Match social security number (Finnish one is easy;)
All IP addresses
IP addresses with shorthand netmask (xx.xx.xx.xx/yy)
There's a bunch of examples of various regular expression techniques over at www.regular-expressions.info - everything for simple literal matching to backreferences and lookahead.
To keep things a bit more interesting than the usual email/phone/url stuff, try looking for more original exercises. Avoid boredom.
For example, have a look at the Forsysth-Edwards Notation which is used for describing a particular board position of a chess game.
Have your students validate and extract all the bits of information from a string like this:
rnbqkbnr/pp1ppppp/8/2p5/4P3/5N2/PPPP1PPP/RNBQKB1R b KQkq - 1 2
Additionaly, have a look at algebraic chess notation, used to describe moves. Extract chess moves out of a piece of text (and make them bold).
1. e4 e5 2. Nf3 Black now defends his pawn 2...Nc6 3. Bb5 Black threatens c4
Validate phone numbers (extract area code + rest of number with grouping) (Assuming US phone number, otherwise generalize for you style)
Play around with validating email address (probably want to tell the students that this is hugely complicated regular expression but for simple ones it is pretty straight forward)
regexplib.com has a good library you can search through for examples.
H0w about extract first name, middle name, last name, personal suffix (Jr., III, etc.) from a format like:
Smith III, John Paul
How about Reg Ex to remove line breaks and tabs from the input
I would start with the common ones:
validate email
validate phone number
separate the parts of a URL
Be cruel. Tell them parse HTML.
RegEx match open tags except XHTML self-contained tags
Are you teaching them theory of finite automata as well?
Here is a good one: parse the addresses of churches correctly from this badly structured format (copy and paste it as text first)
http://www.churchangel.com/WEBNY/newhart.htm
I'm a fan of parsing date strings. Define a few common data formats, as well as time and date-time formats. These are often good exercises because some dates are simple mixes of digits and punctuation. There's a limited degree of freedom in parsing dates.
Just to throw them for a loop, why not reword a question or two to suggest that they write a regular expression to generate data fitting a specific pattern like email addresses, phone numbers, etc.? It's the same thing as validating, but can help them get out of the mindset that regex is just for validation (whereas the data generation tool in visual studio uses regex to randomly generate data).
Rather than teaching examples based from the data set, I would do examples from the perspective of the rule set to get basics across. Give them simple examples to solve that leads them to use ONE of several basic groupings in each solution. Then have a couple of "compound" regex's at the end.
Simple:
s/abc/def/
Spinners and special characters:
s/a\s*b/abc/
Grouping:
s/[abc]/def/
Backreference:
s/ab(c)/def$1/
Anchors:
s/^fred/wilma/
s/$rubble/and betty/
Modifiers:
s/Abcd/def/gi
After this, I would give a few examples illustrating the pitfalls of trying to match html tags or other strings that shouldn't be done with regex's to show the limitations.
Try to think of some tests that don't include ones that can be found with Google.
Asking a email validator should pose no trouble finding..
Try something like a 5 proof test.
Input 5 digit. Sum up each digit must be dividable by five: 12345 = 1+2+3+4+5 = 15 / 5 = 3(.0)