Algorithm to get a Regex - regex

Something like this is on my mind: I put one or a few strings in, and the algorithm shows me a matching regex.
Is there an "easy" way to do this, or does something like this already exist?
Edit 1: Yes, I'm trying to find a way to generate regex.
Edit 2: Regulazy is not what I am looking for. The common use for the code I want is to find a correct RegEx; for example, article numbers:
I put in 123456, the regex should be \d{6}
I put in nb-123456, the regex should be \w{2}-\d{6}

If you have Emacs you can use regexp-opt. For example, evaluating:
(regexp-opt (list "my" "list" "of" "some" "strings" "to" "search"))
returns
"list\\|my\\|of\\|s\\(?:earch\\|ome\\|trings\\)\\|to"

Perl can do it: http://www.hakank.org/makeregex/
So does ruby: http://www.toolbox-mag.de/data/makeregex.html
Note: not so perfect solution.
And there is a CLI tool: txt2regex.
There was txt2re, once upon a time...

It sounds like you want an algorithm to generate a regular grammar based on some samples. In a lot of cases, there are many possible grammars for a given set of examples--there can even be infinite possible grammars. Of course, the possibilities can be limited by a second set of required non-matches, which can limit it to zero possibilities if the non-matching strings are too inclusive.
txt2re does something like this.

How about the following (matches every string)?
.*

I think that Regulazy by Roy Osherove does this to a certain extent, or it may be Regulator. BOth are on this page:
http://weblogs.asp.net/rosherove/pages/tools-and-frameworks-by-roy-osherove.aspx

if your input strings are not random strings and they are based on some rules, by using a parser (i.e. jflex), you can create a regex generator which will generate a regex w.r.t. the given strings.

Look at txt2re.
This site holds a form that takes a sample string and generates a regex pattern that can match the given string.
Then it generates the corresponding script for the following languages: Perl, PHP, Python, Java, Javascript, ColdFusion, C, C++ Ruby, VB, VBScript, J#.net, C#.net, C++.net, VB.net

Related

What is the proper way to check if a string contains a set of words in regex?

I have a string, let's say, jkdfkskjak some random string containing a desired word
I want to check if the given string has a word from a set of words, say {word1, word2, word3} in latex.
I can easily do it in Java, but I want to achieve it using regex. I am very new to regular expressions.
if you want only to recognise the words as part of a word, then use:
(word1|word2|...|wordn)
(see first demo)
if you want them to appear as isolated words, then
\b(word1|word2|...|wordn)\b
should be the answer (see second demo)
I am not able to understand the complete context like what kind of text you have or what kind of words will this be but I can offer you a easy solution the literal way programmatically you can generate this regex (dormammu|bargain) and then search this in text like this "dormammu I come to bargain". I have no clue about latex but I think that is not your question.
For more information you can tinker with it at [regex101][1].
If you are having trouble understanding it [regexone][2] this is the place to go. For beginners its a good start.
[1]: http://regex101.com [2]: https://regexone.com/

Futile attempt to run regular expression find/replace in MS Word using groups on Mac

According to the received wisdom MS Word (more or less) supports find/replace with use of regular expressions. I have a simple regular expression:
^(C[[:alpha:]]*)(\d*)(.*)$
That I'm running on the data:
indSIMDdecile
CSdeccrim12006
CSdeccrim12006
CSdeccrim12009
CSdeccrim12009
CSdeccrim12012
CSdeccrim12012
CSdeceduc12004
CSdeceduc12004
CSdeceduc12006
CSdeceduc12006
CSdeceduc12009
CSdeceduc12009
CSdeceduc12012
CSdeceduc12012
CSdecemp12004.x
I'm interested in returning the first word prior to the digit 1, which works as demonstrated on regex101 here.
Problem
I would like to the same but in MS Word (v. 15.18 on Mac). After getting error messages of trying to supply unsuitable syntax I learned that MS Word does not support to the full regex syntax. I simplified my expression to something on the lines:
but the search does not find any strings and nothing gets replaced. Hence my questions, is it possible to use MS Word on Mac with regex?
The linked help website hints that something like that should be possible, but so far now luck.
The simple answer is "no", if you mean "Does Mac Word have a UI feature that lets you use one of the modern dialects of regex?" Word's Find/Replace only supports its own Regular Expression syntax.
In this case, I think the following will give you what you need:
Find with wildcards:
(C)([!1]#)(1)
and a replace by
\1
(If you also had to find "C1", then that doesn't work, and unfortunately nor does
(C)([!1]{0,})(1)
because Word does not allow 0 in the {,} pattern)
But there is a problem with "#". If the text the "#" is looking for is long, the find/replace may fail. There is supposed to be a 255 limit, but it seems rather more arbitrary than that. (I have long suspected a buffer overrun type error in the Word code, but perhaps there is a simpler explanation).
If you mean, "is there any way to use modern regex with Word?", then the answer is "Yes, but you only get to operate on a copy of the text in the document. You will need to create your own code to do the 'replace' part of the find replace, and that means that you would have to deal with any of the issues such as preserving formatting that Word's built-in find/replace might get right for you.
On the Windows side, people who want a better regex than Word's often use VBScript's regexp object because it is easily used from VBA. VBA itself only really has the "like" operator, which also only has fairly crude pattern matching abilities. I think there are examples of VBScript rexexp use on StackOverflow. On the Mac side, you would either have to use VBA and "shell out" to one of the built-in Mac/Unix utilities to do your finding (and perhaps replacing), or perhaps use Applescript or Javascript application scripting to do it. As far as I can remember Applescript does not have a 'modern' regex built-in either.
[As a bit of history, Word's "regular expressions" were I think introduced in Word 6, around 1993, at a time when most dialects of regex were much more crude than they are today. I don't think Word's version has moved along much at all - it probably added some Unicode support at some point, but that's probably about it. I assume that people using modern regex don't regard it as regex at all, and I personally prefer not to call Word's Regular Expressions 'regex' precisely for that reason.]

regex for parsing string in matlab

Just a small question. I need to parse
xyz1/Allrun1Mbps_10000us.sca
and extract values between "_" and "us" (here for example 10000). I am not able to create the regex properly for it.
According to this, matlab supports lookarounds. (?<=_).*?(?=us) uses lookarounds to check the information before or after is present. _ and us respectivelly. You can rename the _ and us depending on your needs.
You could also use the site I mentioned to craft your own regular expressions from now on. The first result of google when you write "matlab regex" has all the answers you need.
you can use this _(.*?)us. $1 gives the result.
or in more specific _(\d+?)us

Is there a function to create a regex pattern from a string input?

I'm lousy at regular expressions but occasionally they're the only thing that's the right solution for a problem.
Is there something in the .NET framework that allows you to input an unencoded string and get a pattern from it? Which you could then modify as required?
e.g. I want to remove a CDATA section that contains a file from some XML but I can't work out what the right pattern is for <![CDATA[hugepileofrandombinarydataherethatalsoneedstogo]]> and I don't want to ask for help each time I'm stuck on a regex pattern.
Such tools exist, google by "regex generator".
But, as suggested in comments, better learn regex. Simple patterns are easy. Something like <!\[.*?]]>
in your case.
There are Regex Design tools like expresso...
http://www.ultrapico.com/expresso.htm
It's not perfect but as there is no suitable .Net component the text to regex page at txt2re.com is the best I've seen for those people who occasionally need to build a regex to match a string but don't have the time to relearn regex each time they want to use one.

How to use regex to extract nested patterns

Hi I'm struggling with some regex
I've got a string like this:
a:b||c:{d:e||f:g}||h:i
basically name value pairings. I want to be able to parse out the pairings so I get:
a:b
c:{d:e||f:g}
h:i
then I can further parse the pairings contained in { } if required
It is the nesting that is making me scratch my head. Any regex experts out there that can give me a hand?
thanks,
Rob
Arbitrarily nested patterns is irregular. So, no, you can't just use regex to parse this.
Is there any limit on the depth of nesting in your strings ? If not your language is not regular and regular expressions are the wrong tool -- as you are discovering already.