validating HTML fields in form using regex using perl - regex

I have a couple of quick questions regarding using regex to validate some fields in a form. But I seem to be having some problems.
so here is the code
$userNameReg = "[a-zA-Z0-9_]+";
$passwordReg = "([a-zA-Z]*)([A-Z]+)([0-9]+)";
$emailReg = "[a-zA-Z0-9_]#[a-zA-Z]\.[a-zA-Z]{2,3}";
if ($onLoad !=1)
{
#controlValue = ($userName, $password, $phoneNumber, $email);
#regex = ($userNameReg, $passwordReg, "phoneNumber", $emailReg);
#validated;
for ($i=0; $i<4; $i++)
{
$retVal= validatecontrols ($controlValue[$i], $regex[$i]);
if ($retVal)
{
$count++;
}
if (!$retVal)
{
$validated[$i]="*"
}
}
sub validatecontrols
{
$ctrlVal = shift();
$regexVal = shift();
if ($ctrlVal =~ /$regexVal/)
{
return 1;
}
if ($ctrlVal !~ /$regexVal/)
{
return 0;
}
}
}
So what happens is that it still validates special characters, and I can't understand why. It does throw a flag if I enter a single special character but if its part of a word in the middle, beginning or end it validates.
Also please disregard the phone number part, because I haven't gotten to that part yet. I still have to create a regex that validates the phone number, digits only, first digit greater than 2.
Thank you all in advance for your help and insight.
Cheers

My guess is that you're missing start/end anchors. So [a-zA-Z0-9_]+ should be ^[a-zA-Z0-9_]+$. This way pattern will only match full string.
Also I strongly recommend you to enable use strict;. It can save you from a lot of mistype errors. Just add following to the beginning of the script:
use strict;
use warnings;
This will force perl to only allow defined variables. In most case you'll need to add my to first use of your variables (for example my $ctrlVal).
In validatecontrols you don't need second if statement. You can just return false like this:
sub validatecontrols
{
my $ctrlVal = shift();
my $regexVal = shift();
if ($ctrlVal =~ /$regexVal/)
{
return 1;
}
return 0;
}

Related

how to exclude a string if it's in a URL using regex?

I'm replacing a number of different strings, but only want them to replace in normal text, and not get rewritten when they appear as a link in a document. The regex to find the strings is very straightforward: /word|anotherword|athirdword/gi but what that means is that if there's a link that contains anotherwordit's getting found and then replaced as well, breaking the link.
I think I just need a part in my regex that says "but just ignore anything that starts with http or https" but not sure how to write that.
thanks so much!
edit. here's what I'm doing with the javascript
if (node.nodeType === 3) {
var text = node.nodeValue;
var replacedText = text.replace(/word|anotherword|athirdword/gi, 'replaced text');
if (replacedText !== text) {
element.replaceChild(document.createTextNode(replacedText), node);
}
}
the result replaces those three strings anywhere on a page, which is great. except it changes http://www.foo.com/the-whole-world into http://www.foo.com/the-whole-replaced text which obviously breaks the link.
I would try negative lookbehind.
Negative lookbehind differs greatly from flavor to flavor, so it won't work in different flavors.
For JavaScript, you can try following:
str.replace(/(http:[\/\.-a-z0-9]+)?(word|anotherword|athirdword)/gi, function($0, $1){
return $1 ? $0 : '';
});
Fiddle.
You can split the string first, then do a conditional replace:
function condReplace(str) {
var sentences = [];
var res = str.split(/(https?:\/\/[^\s]+)(?:\s+|$)/i);
res.forEach(function(entry) {
if (entry) {
if (entry.match(/^http?:\/\//i)) {
sentences.push(entry);
} else {
sentences.push(entry.replace(/word|anotherword|athirdword/g, "REPLACED"))
}
}
});
document.write(sentences.join(" "));
}
var str = "http://sometext.com/word.doc and This is a word normal text anotherword containing a anotherword another link http://www.foo.com/the-whole-word. This is a single word.";
condReplace(str);

Writing a bubble sort using Perl regular expressions

I'm beginning to learn perl and I'm writing a simple bubble sort using regular expressions. However, I can't get it to sort properly (alphabetically, delimiting by whitespace). It just ends up returning the same string. Can someone help? I'm sure it's something really simple. Thanks:
#!/usr/bin/perl
use warnings;
use strict;
my $document=<<EOF;
This is the beginning of my text...#more text here;
EOF
my $continue = 1;
my $swaps = 0;
my $currentWordNumber = 0;
while($continue)
{
$document =~ m#^(\w+\s+){$currentWordNumber}#g;
if($document =~ m#\G(\w+)(\s+)(\w+)#)
{
if($3 lt $1)
{
$document =~ s#\G(\w+)(\s+)(\w+)#$3$2$1#;
$swaps++;
}
else
{
pos($document) = 0;
}
$currentWordNumber++;
}
else
{
$continue = 0 if ($swaps == 0);
$swaps = 0;
$currentWordNumber = 0;
}
}
print $document;
SOLVED: I figured out the problem. I wasn't taking into account punctuation after a word.
If you just want to sort all the words, you don't have to use regular expressions... Simply splitting up the text by newlines and white spaces should be much faster:
sub bsort {
my #x = #_;
for my $i (0..$#x) {
for my $j (0..$i) {
#x[$i, $j] = #x[$j, $i] if $x[$i] lt $x[$j];
}
}
return #x;
}
print join (" ", bsort(split(/\s+/, $document)));

perl script to read content between marks

In the perl , how to read the contents between two marks. Source data like this
START_HEAD
ddd
END_HEAD
START_DATA
eee|234|ebf
qqq| |ff
END_DATA
--Generate at 2011:23:34
then I only want to get data between "START_DATA" and "END_DATA". How to do this ?
sub readFile(){
open(FILE, "<datasource.txt") or die "file is not found";
while(<FILE>){
if(/START_DATA/){
record(\*FILE);#start record;
}
}
}
sub record($){
my $fileHandle = $_[0];
while(<fileHandle>){
print $_."\n";
if(/END_DATA/) return ;
}
}
I write this code, it doesn't work. do you know why ?
Thanks
Thanks
You can use the range operator:
perl -ne 'print if /START_DATA/ .. /END_DATA/'
The output will include the *_DATA lines, too, but it should not be so hard to get rid of them.
Besides a few typos, your code is not too far off. Had you used
use strict;
use warnings;
You might have figured it out yourself. Here's what I found:
Don't use prototypes if you do not need them, or know what they do.
Normal sub declaration is sub my_function (prototype) {, but you can leave out the prototype and just use sub my_function {.
while (<fileHandle>) { is missing the $ sign to denote that it is
a variable (scalar) and not a global. Should be $fileHandle.
print $_."\n"; will add an extra newline. Just print; will do
what you expect.
if(/END_DATA/) return; is a syntax error. Brackets are not optional
in perl in this case. Unless you reverse the statement.
Use either:
return if (/END_DATA/);
or
if (/END_DATA/) { return }
Below is the cleaned up version. I commented out your open() while testing, so this would be a functional code example.
use strict;
use warnings;
readFile();
sub readFile {
#open(FILE, "<datasource.txt") or die "file is not found";
while(<DATA>) {
if(/START_DATA/) {
recordx(\*DATA); #start record;
}
}
}
sub recordx {
my $fileHandle = $_[0];
while(<$fileHandle>) {
print;
if (/END_DATA/) { return }
}
}
__DATA__
START_HEAD
ddd
END_HEAD
START_DATA
eee|234|ebf
qqq| |ff
END_DATA
--Generate at 2011:23:34
This is a pretty simple thing to do with regular expressions, just use the /s or /m (single line or multiple line) flags - /s allows the . operator to match newlines, so you can do /start_data(.+)end_data/is.

Why my perl script isn't finding bad indetation from my regex match

My work's coding standard uses this bracket indentation:
some declaration
{
stuff = other stuff;
};
control structure, function, etc()
{
more stuff;
for(some amount of time)
{
do something;
}
more and more stuff;
}
I'm writing a perl script to detect incorrect indentation. Here's what I have in the body of a while(<some-file-handle>):
# $prev holds the previous line in the file
# $current holds the current in the file
if($prev =~ /^(\t*)[^;]+$/ and $current =~ /^(?<=!$1\t)[\{\}].+$/) {
print "$file # line ${.}: Bracket indentation incorrect\n";
}
Here, I'm trying to match:
$prev: A line not ended with a semi-colon, followed by...
$current: A line not having the number of leading tabs+1 of the previous line.
This doesn't seem to match anything, at the moment.
the $prev variable needs some modification.
it should be something like \t* then .+ then not ending in semicolon
also, the $current should be like:
anything ending in ; or { or } not having the number of leading tabs+1 of the previous line.
EDIT
the perl code to try the $prev
#!/usr/bin/perl -l
open(FP,"example.cpp");
while(<FP>)
{
if($_ =~ /^(\t*)[^;]+$/) {
print "got the line: $_";
}
}
close(FP);
//example.cpp
for(int i = 0;i<10;i++)
{
//not this;
//but this
}
//output
got the line: {
got the line: //but this
got the line: }
it did not detect the line with the for loop ...
am i missing something...
i see a couple of problems...
your prev regex matches all lines which do not have a ; anywhere. which will break on lines like (for int x = 1; x < 10; x++)
if the indent of the opening { is incorrect, you will not detect that.
try this instead, it only cares if you have a ;{ (followed by any whitespace) at the end.
/^(\s*).*[^{;]\s*$/
now you should change your strategy so that if you see a line which does not end in { or ; you increment the indent counter.
if you see a line which ends in }; or } decrement your indent counter.
compare all lines against this
/^\t{$counter}[^\s]/
so...
$counter = 0;
if (!($curr =~ /^\t{$counter}[^\s]/)) {
# error detected
}
if ($curr =~ /[};]+/) {
$counter--;
} else if ($curr =~ /^(\s*).*[^{;]\s*$/) }
$counter++;
}
sorry for not styling my code according to your standards... :)
And you intend to only count tabs (not spaces) for indentation?
Writing this kind of checker is complicated. Just think about all the possible constructs that uses braces that should not change indentation:
s{some}{thing}g
qw{ a b c }
grep { defined } #a
print "This is just a { provided to confuse";
print <<END;
This {
$is = not $code
}
END
But anyway, if the issues above aren't important to you, consider whether the semi colon is important at all in your regex. After all, writing
while($ok)
{
sort { some_op($_) }
grep { check($_} }
my_func(
map { $_->[0] } #list
);
}
Should be possible.
Have you considered looking at Perltidy?
Perltidy is a Perl script that reformats Perl code into set standards. Granted, what you have isn't part of the Perl standard, but you can probably tweak the curly braces via the configuration file Perltidy uses. If all else fails, you can hack through the code. After all, Perltidy is just a Perl script.
I haven't really used it, but it might be worth looking into. Your problem is trying to locate all the various edge cases, and making sure you're handling them correctly. You can parse 100 programs to find that the 101st reveal problems in your formatter. Perltidy has been used by thousands of people on millions of lines of code. If there is an issue, it probably already has been found.

Want to Encode text during Regex.Replace call

I have a regex call that I need help with.
I haven't posted my regex, because it is not relevant here.
What I want to be able to do is, during the Replace, I also want to modify the ${test} portion by doing a Html.Encode on the entire text that is effecting the regex.
Basically, wrap the entire text that is within the range of the regex with the bold tag, but also Html.Encode the text inbetween the bold tag.
RegexOptions regexOptions = RegexOptions.Compiled | RegexOptions.IgnoreCase;
text = Regex.Replace(text, regexBold, #"<b>${text}</b>", regexOptions);
There is an incredibly easy way of doing this (in .net). Its called a MatchEvaluator and it lets you do all sorts of cool find and replace. Essentially you just feed the Regex.Replace method the method name of a method that returns a string and takes in a Match object as its only parameter. Do whatever makes sense for your particular match (html encode) and the string you return will replace the entire text of the match in the input string.
Example: Lets say you wanted to find all the places where there are two numbers being added (in text) and you want to replace the expression with the actual number. You can't do that with a strict regex approach, but you can when you throw in a MatchEvaluator it becomes easy.
public void Stuff()
{
string pattern = #"(?<firstNumber>\d+)\s*(?<operator>[*+-/])\s*(?<secondNumber>\d+)";
string input = "something something 123 + 456 blah blah 100 - 55";
string output = Regex.Replace(input, pattern, MatchMath);
//output will be "something something 579 blah blah 45"
}
private static string MatchMath(Match match)
{
try
{
double first = double.Parse(match.Groups["firstNumber"].Value);
double second = double.Parse(match.Groups["secondNumber"].Value);
switch (match.Groups["operator"].Value)
{
case "*":
return (first * second).ToString();
case "+":
return (first + second).ToString();
case "-":
return (first - second).ToString();
case "/":
return (first / second).ToString();
}
}
catch { }
return "NaN";
}
Find out more at http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.matchevaluator.aspx
Don't use Regex.Replace in this case... use..
foreach(Match in Regex.Matches(...))
{
//do your stuff here
}
Heres an implementation of this I've used to pick out special replace strings from content and localize them.
protected string FindAndTranslateIn(string content)
{
return Regex.Replace(content, #"\{\^(.+?);(.+?)?}", new MatchEvaluator(TranslateHandler), RegexOptions.IgnoreCase);
}
public string TranslateHandler(Match m)
{
if (m.Success)
{
string key = m.Groups[1].Value;
key = FindAndTranslateIn(key);
string def = string.Empty;
if (m.Groups.Count > 2)
{
def = m.Groups[2].Value;
if(def.Length > 1)
{
def = FindAndTranslateIn(def);
}
}
if (group == null)
{
return Translate(key, def);
}
else
{
return Translate(key, group, def);
}
}
return string.Empty;
}
From the match evaluator delegate you return everything you want replaced, so where I have returns you would have bold tags and an encode call, mine also supports recursion, so a little over complicated for your needs, but you can just pare down the example for your needs.
This is equivalent to doing an iteration over the collection of matches and doing parts of the replace methods job. It just saves you some code, and you get to use a fancy shmancy delegate.
If you do a Regex.Match, the resulting match objects group at the 0th index, is the subset of the intput that matched the regex.
you can use this to stitch in the bold tags and encode it there.
Can you fill in the code inside {} to add the bold tag, and encode the text?
I'm confused as to how to apply the changes to the entire text block AND replace the section in the text variable at the end.