Regular expression that allows square-brackets - regex

This works perfect..
result2 = Regex.Replace(result2, "[^A-Za-z0-9/.,>#:\s]", "", RegexOptions.Compiled)
But I need to allow square brackets ([ and ]).
Does this look correct to allow Brackets without changing what is allowed and not allowed from the above?
result2 = Regex.Replace(result2, "[^A-Za-z0-9\[\]/.,>#:\s]", "", RegexOptions.Compiled)
Reason I need a second opinion is that I think if this is correct something else is blocking it that is out of my control.

I cant say any one person did or did not answer the question or try to help, I would split the solution among everyone if I could because it made me think. The key was to separate the brackets by using a single \ . Thanks everyone for your help.
result = Regex.Replace(result, "[^A-Za-z0-9/\[\].,>#\s]", "", RegexOptions.Compiled)

The gimmick #tripleee mentioned does indeed work in .NET. Just make sure ] is the first character (or in this case, first after the ^.
result2 = Regex.Replace(result2, "[^][A-Za-z0-9/.,>#:\s]", "");
But be careful about porting the regex to other flavors. Some will treat it as a syntax error, and some will treat it as two atoms: [^] and [A-Za-z0-9/.,>#:\s], the first of which matches literally anything but nothing--i.e., any character including newlines.
On a side note, why are you using the RegexOptions.Compiled option? That's something you should use only when you know you need it. The increased performance will almost never be significant, and it comes with a pretty high price tag, as explained here.
http://msdn.microsoft.com/en-us/library/8zbs0h2f.aspx

Related

Using a regular expression to insert text in a match

Regular Expressions are incredible. I'm in my regex infancy so help solving the following would be greatly appreciated.
I have to search through a string to match for a P character that's not surrounded by operators, power or negative signs. I then have to insert a multiplication sign. Case examples are:
33+16*55P would become 33+16*55*P
2P would become 2*P
P( 33*sin(45) ) would become P*(33*sin(45))
I have written some regex that I think handles this although I don't know how using regex I can insert a character:
The reg is I've written is:
[^\^\+\-\/\*]?P+[^\^\+\-\/\*]
The language where the RegEx will be used is ActionScript 3.
A live example of the regex can be seen at:
http://www.regexr.com/39pkv
I would be massively grateful if someone could show me how I insert a multiplication sign in middle of the match ie P2, becomes P*2, 22.5P becomes 22.5P
ActionScript 3 has search, match and replace functions that all utilise regular expressions. I'm unsure how I'd use string.replace( expression, replaceText ) in this context.
Many thanks in advance
Welcome to the wonder (and inevitable frustration that will lead to tearing your hair out) that is regular expressions. You should probably read over the documentation on using regular expressions in ActionScript, as well as this similar question.
You'll need to combine RegExp.test() with the String.replace() function. I don't know ActionScript, so I don't know if it will work as is, but based on the documentation linked above, the below should be a good start for testing and getting an idea of what the form of your solution might look like. I think #Vall3y is right. To get the replace right, you'd want to first check for anything leading up to a P, then for anything after a P. So two functions is probably easier to get right without getting too fancy with the Regex:
private function multiplyBeforeP(str:String):String {
var pattern:RegExp = new RegExp("([^\^\+\-\/\*]?)P", "i");
return str.replace(pattern, "$1*P");
}
private function multiplyAfterP(str:String):String {
var pattern:RegExp = new RegExp("P([^\^\+\-\/\*])", "i");
return str.replace(pattern, "P*$1");
}
Regex is used to find patterns in strings. It cannot be used to manipulate them. You will need to use action script for that.
Many programming languages have a string.replace method that accepts a regex pattern. Since you have two cases (inserting after and before the P), a simple solution would be to split your regex into two ([^\^\+\-\/\*]?P+ and P+[^\^\+\-\/\*] for example, this might need adjustment), and switch each pattern with the matching string ("*P" and "P*")

Regex Replacing characters with zero

I have the following string 3}HFB}4AF4}1 -M}1.
I have searched for this string using the regex :
([0-9])(\})([A-Z]{3})(\})([0-9][A-Z]{2}[0-9])(\})([0-9])(\s\-)([A-Z])(\})([0-9]).
I want to replace the } with 0. The Result I am looking for is 30HFB04AF401-M01, any assistance is appriciated. The tool I am using is Regex Buddy
A possible solution
Problem solved? In JavaScript at least :-)
"3}HFB}4AF4}1 -M}1".replace(/\}/g, "0");
// "30HFB04AF401 -M01"
I'm missing the point, right?
Assuming the language is JavaScript, we can write something like
"dfghj456783}HFB}4AF4}1 -M}1fghjkl8765".replace(/(?:[\d\w\s]+)([0-9]}[A-Z]{3}}[0-9][A-Z]{2}[0-9]}[0-9] -[A-Z]}[0-9])(?:[\d\w\s]+)/g, function () {
return arguments[1].replace(/}/g, "0");
});
What's possible in other languages though may be a different story.
Try the home of RegexBuddy for details.
So you've already got an expression to find instances of the string. Now you can either use groups to replace the characters, or you can use a separate regular expression over the string you found, simply replacing the } character within group(0) (which is the entire matched part of the input). I would certainly prefer the latter.
Fred seems to have created the replacement method for you already, so I won't repeat it here.
I have managed to find a solution to the formating in the JGSoft Lanugage used by Regex Buddy, thanks to all that provided suggestions that helped me channel my thoughts in the right direction.
Solution(I am still a beginner with Regex hence the syntax might not be efficent, but it does the job!!)
Using Group Names instead of Regex assiging groups with backreference and $ syntax.
Hence to replace 0 for } in the string 3}HFB}4AF4}1 -M}1 or any similar string. I used the following search and replacement syntax
Search : (?<Gp1>([0-9]))(?:})(?<Gp2>([A-Z]){3})(?:})(?<Gp3>([0-9])([A-Z]{2})([0-9]))(?:})(?<Gp4>([0-9]))(?:\s-)(?<Gp5>([A-Z]))(?:})(?<Gp6>[0-9])
Replace : ${Gp1}0${Gp2}0${Gp3}0${Gp4}-${Gp5}0${Gp6}
Result : 30HFB04AF401-M01

regex: trying to improve this regex

I am using this regex :
[']?[%]?[^"]#([^#]*)#[%]?[']?
on this text:
insert into table (id,name,age) values ('#var1#' ,#var2#,'#var3#', 3, 'name') where id = '#id#' like ""
and test=<cfqueryparam value="#id#">
For some reason it is catching the comma between #var2# and '#var3#'
but when I include a [^,] it starts doing weird stuff.
Can someone help me with this one.
As I read my regex now, it should find anything that:
might have a single quote
might have a percentage
doesn't have a double quote
then has a hash (#)
followed by no hash, but all other characters
then has a hash and followed by a percentage or quote
So why, when I add "no comma" in front does the regex break??
Updated Question:
okay, Ill try to explain: a query can look like this:
SELECT e.*, m.man_id, m.man_title, c.cat_id, c.cat_name
FROM ec_products e, ec_categories c, ec_manufacturers m
WHERE c.cat_id = e.prod_category AND
e.prod_manufacturer = m.man_id AND
e.prod_title LIKE <cfqueryparam value="%#attributes.keyword#%"> and
test='#var1#'
ORDER BY e.prod_title
Now I want every value between ##, but not the values that are surrounded by a queryparam tag. So in the example I do want #var1# but not #attributes.keyword#. Reason for this is that all params in the query that are not surrounded by a tag are unsafe and can cause SQL injection. My current regex is
(?!")'?%?#(?!\d)[\w.\(\)]+#%?'?(?!")
and it is almost there. It does find the attributes.keyword because of the %. I just want anything that that has ## but not surrounded by double quotes, so not "##". This will give me all unsafe params in the sql, like '#var#', or #aNumber#, or '%##', or '%##%', or '##%, but NOT things like
<cfqueryparam value="#variable#">
. I hope you understand my intentions?
I think you might be misunderstanding [^"]. It doesn't mean "doesn't have a double quote", but rather means, "one character, which is not a double-quote". Similarly, [^,] means "one character, which is not a comma". So your regex:
[']?[%]?[^"]#([^#]*)#[%]?[']?
will match — for example — this:
2#,'#
which consists of zero single-quotes, zero percent-signs, one character-which-is-not-a-double-quote (namely 2), one hash-sign, two characters-which-are-not-hash-signs (namely ,'), one hash-sign, zero percent-signs, and zero apostrophes. The ,' is what will be captured by the parentheses.
Update for updated question:
I don't think that what you describe is possible using just a ColdFusion regex, because it would require "lookbehind" (to ensure that something is not preceded by a double-quote), which ColdFusion regexes apparently (according to a Google-search) do not support. However:
This StackOverflow answer gives a way of using Java regexes in ColdFusion. If you use that technique, then you can use the Java regex '?%?(?<!")(?<!"')(?<!"%)(?<!"'%)#(?!\d)[\w.()]+#(?!%?'?")%?'? to ensure that there's no preceding double-quote.
You never mentioned how you're actually using this regex. Would it work for you to match .'?%?#(?!\d)[\w.()]+#%?'?(?!") (i.e., to match not just the section of interest, but also the preceding character), and then separately confirm that the matched substring doesn't start with a double-quote?
I also feel compelled to mention, since it sounds like you're trying to use regex-based pattern-matching to help detect and address points of possible SQL injection, that this is a bad idea; you will never be able to do this perfectly, so if anything, I think it will end up increasing your risk of SQL injection (by increasing your reliance on a buggy methodology).
Preserving your capture group from the initial regex, here is a revised expression.
'?%?(?!")#([^#]+)#%?'?
Based on the information you provided this should be correct.
'?%?(?!")#[^#]+#%?'?

What is the difference between "a{1}" and "a" in regex?

Some string was matched with the following `regex
([0-9]\s+){1}
Why did author use {1} in the end of regex?
Can I safely remove it?
Yes, there is no difference at all. Possibly it was left over from tweaks made while the regex was being built and tested.
{1} limits the regex match to only one integer or space, in your example.
It is probably a leftover from debugging/writing the query when the author experimented with {1,2} or so.
Yes, you can remove it.
if it is the result of an interpreted code (log/debug coming from script for exemple) the 1 could be the value of a variable.
If it is directly in a script, {1} is the default behavior so it is the same (but take longer to work due to extra interpreation to make by the parser)

How do I make this "Use of uninitialized value" warning go away?

Let's say I want to write a regular expression to change all <abc>, <def>, and <ghi> tags into <xyz> tags.. and I also want to change their closing tags to </xyz>. This seems like a reasonable regex (ignore the backticks; StackOverflow has trouble with the less-than signs if I don't include them):
`s!<(/)?(abc|def|ghi)>!<${1}xyz>!g;`
And it works, too. The only problem is that for opening tags, the optional $1 variable gets assigned undef, and so I get a "Use of uninitialized value..." warning.
What's an elegant way to fix this? I'd rather not make this into two separate regexs, one for opening tags and another for closing tags, because then there are two copies of the taglist that need to be maintained, instead of just one.
Edit: I know I could just turn off warnings in this region of the code, but I don't consider that "elegant".
Move the question mark inside the capturing bracket. That way $1 will always be defined, but may be a zero-length string.
How about:
`s!(</?)(abc|def|ghi)>!${1}xyz>!g;`
s!<(/?)(abc|def|ghi)>!<${1}xyz>!g;
The only difference is changing "(/)?" to "(/?)". You have already identified several functional solution. This one has the elegance you asked for, I think.
Here is one way:
s!<(/?)(abc|def|ghi)>!<$1xyz>!g;
Update: Removed irrelevant comment about using (?:pattern).
You could just make your first match be (</?), and get rid of the hard-coded < on the "replace" side. Then $1 would always have either < or </. There may be more elegant solutions to address the warning issue, but this one should handle the practical problem.
To make the regex capture $1 in either case, try:
s!<(/|)?(abc|def|ghi)>!<${1}xyz>!g;
^
note the pipe symbol, meaning '/' or ''
For '' this will capture the '' between '<' and 'abc>', and for '', capture '/' between '<' and 'abc>'.
I'd rather not make this into two
separate regexs, one for opening tags
and another for closing tags, because
then there are two copies of the
taglist that need to be maintained
Why? Put your taglist into a variable and interpolate that variable into as many regexes as you like. I'd consider this even whith a single regex because it's much more readable with a complicated regex (and what regex isn't complicated?).
Be careful in as much as HTML is a bit harder then it looks to be at first glance. For example, do you want to change "<abc foo='bar'>" to "<xyz foo='bar'>"? Your regex won't. Do you want to change "<img alt='<abc>'>"? The regex will. Instead, you might want to do something like this:
use HTML::TreeBuilder;
my $tree=HTML::TreeBuilder->new_from_content("<abc>asdf</abc>");
for my $tag (qw<abc def ghi>) {
for my $elem ($tree->look_down(_tag => $tag)) {
$elem->tag('xyz');
}
}
print $tree->as_HTML;
That keeps you from having to do the fiddly bits of parsing HTML yourself.
Add
no warnings 'uninitialized';
or
s!<(/)?(abc|def|ghi)>! join '', '<', ${1}||'', 'xyz>' !ge;