Convert compiled regex to string - regex

I don't have much experience in Go, but basically I want to print my regex on the screen after using it. I cant find anything on Google. This seems something rather easy to do but I have tried several things and nothing else worked.
var swagger_regex = regexp.MustCompile(`[0-9][.][0-9]`)
.... some code here ....
fmt.Println("Your '_.swagger' attribute does not match " + string(swagger_regex))

The regexp.Regexp type has a Regexp.String() method which does this exactly:
String returns the source text used to compile the regular expression.
You don't even have to call it manually, as the fmt package checks and calls the String() method if the type of the passed value has it.
Example:
r := regexp.MustCompile(`[0-9][.][0-9]`)
fmt.Println("Regexp:", r)
// If you need the source text as a string:
s := r.String()
fmt.Println("Regexp:", s)
Output (try it on the Go Playground):
Regexp: [0-9][.][0-9]
Regexp: [0-9][.][0-9]

Related

How to replace parts of a string in lua "in a single pass"?

I have the following string of anchors (where I want to change the contents of the href) and a lua table of replacements, which tells which word should be replaced for:
s1 = '<a href="word7">'
replacementTable = {}
replacementTable["word1"] = "potato1"
replacementTable["word2"] = "potato2"
replacementTable["word3"] = "potato3"
replacementTable["word4"] = "potato4"
replacementTable["word5"] = "potato5"
The expected result should be:
<a href="word7">
I know I could do this iterating for each element in the replacementTable and process the string each time, but my gut feeling tells me that if by any chance the string is very big and/or the replacement table becomes big, this apporach is going to perform poorly.
So I though it could be best if I could do the following: apply the regular expression for finding all the matches, get an iterator for each match and replace each match for its value in the replacementTable.
Something like this would be great (writing it in Javascript because I don't know yet how to write lambdas in Lua):
var newString = patternReplacement(s1, '<a[^>]* href="([^"]*)"', function(match) { return replacementTable[match] })
Where the first parameter is the string, the second one the regular expression and the third one a function that is executed for each match to get the replacement. This way I think s1 gets parsed once, being more efficient.
Is there any way to do this in Lua?
In your example, this simple code works:
print((s1:gsub("%w+",replacementTable)))
The point is that gsub already accepts a table of replacements.
In the end, the solution that worked for me was the following one:
local updatedBody = string.gsub(body, '(<a[^>]* href=")(/[^"%?]*)([^"]*")', function(leftSide, url, rightSide)
local replacedUrl = url
if (urlsToReplace[url]) then replacedUrl = urlsToReplace[url] end
return leftSide .. replacedUrl .. rightSide
end)
It kept out any querystring parameter giving me just the URI. I know it's a bad idea to parse HTML bodies with regular expressions but for my case, where I required a lot of performance, this was performing a lot faster and just did the job.

Remove all text from string after a sequence of words in Scala

I am trying to assemble a UDF in Scala that takes a column from a data frame and manipulates it to remove HTML and other useless pieces of text.
The column I need to modify is very messy, sometimes there is HTML, sometimes there is not... Searching SO I have found a regex solution to remove HTML
what I'd like to accomplish now is to find a regex that can find a specific word in the text and delete all the text after that word.
I think I understand from this SO answer that the regex should be something like \).* if you want to remove all after ), so I am trying to adapt this to my case, unsuccessfully due to my lack of knowledge about regex.
I have strings like:
I am interested to hear from you, thanks Sent from iPhone other stuff I want to delete....
I'd like to retain the first part of the string up to "Sent from" excluded, so a perfect output would be:
I am interested to hear from you, thanks
What I have so far is something like:
val toStringNoHTML = udf[String, String](_.toString
// code from SO as linked above
.replaceAll("""<(?!\/?a(?=>|\s.*>))\/?.*?>""", " ")
// delete all text after key word
.replaceAll("""'Sent from'.*""", "")
// remove all punctuation
.replaceAll("""[\p{Punct}\n]""", " ")
)
While the HTML gets remove, the "Sent from" and all the text after does not. Any hint how to adjust the regex to make it work?
EDIT
as pointed out in the comment, a small typo prevented my code to work, thanks for the help:
.replaceAll("""'Sent from'.*""", "")
should be
.replaceAll("""Sent from.*""", "")
Instead of doing multiple replaceAll(pattern, blank) I'd be tempted to start with an extraction.
val msgRE = "(.*>)?(.*)Sent from.*".r
val result = udfStr match {
case msgRE(_, msg) => Some(msg.trim) // .replaceAll() can be added here
case _ => None
}
Here the result is an Option[String] but that really depends on how you want to handle the non-matching input.
If more cleaning is needed after the extraction then replaceAll() can be added where indicated (or the extraction pattern can be better refined).

Need Regex Help

Can anybody help me with a regex? I have a string with digits like:
X024123099XYAAXX99RR
I need a regex to check if a user has inserted the correct information. The rule should have also a fallback that the input is checked from left to right.
For example, when tested these inputs should return TRUE:
X024
X024123099X
X024123099XYA
X024123099XYAAXX99R
And these ones should return FALSE:
XX024
X02412AA99X
X024123099XYAAXX9911
And so on. The regex must check for the correct syntax, beginning from the left.
I have something like that, but this seems not to be correct:
\w\d{0,12}\w{0,6}\d{0,2}\w{0,2}
Big thanks for any help (I'm new to regex)
You could take OpenSauce's regex and then hack it to pieces to allow partial matches:
^[A-Z](\d{0,9}$|\d{9}([A-Z]{0,6}$|[A-Z]{6}(\d{0,2}$|\d{2}([A-Z]{0,2}$))))
It's not pretty but as far as I can tell it encodes your requirements.
Essentially I took each case of something like \d{9} and replaced it with something like (\d{0,9}$|\d{9}<rest of regex>).
I added ^ and $ because otherwise it will match substrings in an otherwise invalid string. For example, it will see an invalid string like XX024 and think it is okay because it contains X024.
If I understand you correctly, your strings should match the regex
[A-Z]\d{9}[A-Z]{6}\d{2}[A-Z]{2}
but you also want to check if a string could be a prefix of a matching string, is that correct? You might be able to express this in a single regex, but I can't think of a way to do so that's easy to read.
You haven't said which language you're using, but if your language gives you a way to tell if the end of the input string was reached while checking the regex, that would give you an easy way to get what you want. E.g. in java, the method Matcher.hitEnd tells you whether the end was reached, so the below code:
static Pattern pattern = Pattern.compile( "[A-Z]\\d{9}[A-Z]{6}\\d{2}[A-Z]{2}" );
static Matcher matcher = pattern.matcher( "" );
public static void main(String[] args) {
String[] strings = {
"X024",
"X024123099X",
"X024123099XYA",
"X024123099XYAAXX99R",
"XX024",
"X02412AA99X",
"X024123099XYAAXX9911"
};
for ( String string : strings ) {
out.format( "%s %s\n", string, inputOK(string) ? "OK" : "not OK" );
}
}
static boolean inputOK(String input) {
return matcher.reset(input).matches() || matcher.hitEnd();
}
gives output:
X024 OK
X024123099X OK
X024123099XYA OK
X024123099XYAAXX99R OK
XX024 not OK
X02412AA99X not OK
X024123099XYAAXX9911 not OK

Using a Variable in an AS3, Regexp

Using Actionscript 3.0 (Within Flash CS5)
A standard regex to match any digit is:
var myRegexPattern:Regex = /\d/g;
What would the regex look like to incorporate a string variable to match?
(this example is an 'IDEAL' not a 'WORKING' snippet) ie:
var myString:String = "MatchThisText"
var myRegexPatter_WithString:Regex = /\d[myString]/g;
I've seen some workarounds which involve creating multiple regex instances, then combine them by source, with the variable in question, which seems wrong. OR using the flash string to regex creator, but it's just plain sloppy with all the double and triple escape sequences required.
There must be some pain free way that I can't find in the live docs or on google. Does AS3 hold this functionality even? If not, it really should.
Or I am missing a much easier means of simply avoiding this task that I'm simply naive too due to my newness to regex?
I've actually blogged about this, so I'll just point you there: http://tyleregeto.com/using-vars-in-regular-expressions-as3 It talks about the possible solutions, but there is no ideal one like you mention.
EDIT
Here is a copy of the important parts of that blog entry:
Here is a regex to strip the tags from a block of text.
/<("[^"]*"|'[^']*'|[^'">])*>/ig
This nifty expression works like a charm. But I wanted to update it so the developer could limit which tags it stripped to those specified in a array. Pretty straight forward stuff, to use a variable value in a regex you first need to build it as a string and then convert it. Something like the following:
var exp:String = 'start-exp' + someVar + 'more-exp';
var regex:Regexp = new RegExp(exp);
Pretty straight forward. So when approaching this small upgrade, that's what I did. Of course one big problem was pretty clear.
var exp:String = '/<' + tag + '("[^"]*"|'[^']*'|[^'">])*>/';
Guess what, invalid string! Better escape those quotes in the string. Whoops, that will break the regex! I was stumped. So I opened up the language reference to see what I could find. The "source" parameter, (which I've never used before,) caught my eye. It returns a String described as "the pattern portion of the regular expression." It did the trick perfectly. Here is the solution:
var start:Regexp = /])*>/ig;
var complete:RegExp = new RegExp(start.source + tag + end.source);
You can reduce it down to this for convenience:
var complete:RegExp = new RegExp(/])*>/.source + tag, 'ig');
As Tyler correctly points out (and his answer works just fine), you can assemble your regex as a string end then pass this string to the RegExp constructor with the new RegExp("pattern", "flags") syntax.
function assembleRegex(myString) {
var re = new RegExp('\\d' + myString, "i");
return re;
}
Note that when using a string to store a regex pattern, you do need to add some extra backslashes to get it to work right (e.g. to get a \d in the regex, you need to specify \\d in the string). Note also that the string pattern does not use the forward slash delimiters. In other words, the following two statements are equivalent:
var re1 = /\d/ig;
var re2 = new Regexp("\\d", "ig");
Additional note: You may need to process the myString variable to escape any backslashes it might contain (if they are to be interpreted as literal). If this is the case the function becomes:
function assembleRegex(myString) {
myString = myString.replace(/\\/, '\\\\');
var re = new RegExp('\\d' + myString);
return re;
}

looking for a regular expression to extract all text outputs to user from js file

i have some huge js files and there are some texts/messages/... which are output for a human beeing. the problem is they don't run over the same method.
but i want to find them all to refactor the code.
now i am searching for a regular expression to find those messages.
...his.submit_register = function(){
if(!this.agb_accept.checked) {
out_message("This is a Messge tot the User in English." , "And the Title of the Box. In English as well");
return fals;
}
this.valida...
what i want to find is all the strings which are not source code.
in this case i want as return:
This is a Messge tot the User in
English. And the Title of the Box. In
English as well
i tried something like: /\"(\S+\s{1})+\S\"/, but this wont work ...
thanks for help
It's not possible to parse Javascript source code using regular expressions because Javascript is not a regular language. You can write a regular expression that works most of the time:
/"(.*?)"/
The ? means that the match is not greedy.
Note: this will not correctly handle strings that contain ecaped quotes.
A simple java regex solving your problem (assuming that the message doesn't contain a " character):
Pattern p = Pattern.compile("\"(.+?)\"");
The extraction code :
Matcher m;
for(String line : lines) {
m = p.matcher(line);
while(m.find()) {
System.out.println(m.group(1));
}
}