Custom vallidator to ban a specific wordlist - regex

I need a custom validator to ban a specific list of banned words from a textarea field.
I need exactly this type of implementation, I know that it's not logically correct to let the user type part of a query but it's exactly what I need.
I tried with a regExp but it has a strange behaviour.
My RegExp
/(drop|update|truncate|delete|;|alter|insert)+./gi
my Validator
export function forbiddenWordsValidator(sqlRe: RegExp): ValidatorFn {
return (control: AbstractControl): { [key: string]: any } | null => {
const forbidden = sqlRe.test(control.value);
return forbidden ? { forbiddenSql: { value: control.value } } : null;
};
}
my formControl:
whereCondition: new FormControl("", [
Validators.required,
forbiddenWordsValidator(this.BAN_SQL_KEYWORDS)...
It works only in certain cases and I don't understand why does the same string works one time and doesn't work if i delete a char and rewrite it or sometimes if i type a whitespace the validator returns ok.

There are several issues here:
The global g modifier leads to unexpected alternated results when used in RegExp#test and similar methods that move the regex index after a valid match, it must be removed
. at the end requires any 1 char other than line break char, hence it must be removed.
Use
/drop|update|truncate|delete|;|alter|insert/i
Or, to match the words as whole words use
/\b(?:drop|update|truncate|delete|alter|insert)\b|;/i
This way, insert in insertion and drop in dropout won't get "caught" (=matched).
See the regex demo.

it's not a great idea to give such power to the user

Related

Regex for finding the name of a method containing a string

I've got a Node module file containing about 100 exported methods, which looks something like this:
exports.methodOne = async user_id => {
// other method contents
};
exports.methodTwo = async user_id => {
// other method contents
fooMethod();
};
exports.methodThree = async user_id => {
// other method contents
fooMethod();
};
Goal: What I'd like to do is figure out how to grab the name of any method which contains a call to fooMethod, and return the correct method names: methodTwo and methodThree. I wrote a regex which gets kinda close:
exports\.(\w+).*(\n.*?){1,}fooMethod
Problem: using my example code from above, though, it would effectively match methodOne and methodThree because it finds the first instance of export and then the first instance of fooMethod and goes on from there. Here's a regex101 example.
I suspect I could make use of lookaheads or lookbehinds, but I have little experience with those parts of regex, so any guidance would be much appreciated!
Edit: Turns out regex is poorly-suited for this type of task. #ctcherry advised using a parser, and using that as a springboard, I was able to learn about Abstract Syntax Trees (ASTs) and the recast tool which lets you traverse the tree after using various tools (acorn and others) to parse your code into tree form.
With these tools in hand, I successfully built a script to parse and traverse my node app's files, and was able to find all methods containing fooMethod as intended.
Regex isn't the best tool to tackle all the parts of this problem, ideally we could rely on something higher level, a parser.
One way to do this is to let the javascript parse itself during load and execution. If your node module doesn't include anything that would execute on its own (or at least anything that would conflict with the below), you can put this at the bottom of your module, and then run the module with node mod.js.
console.log(Object.keys(exports).filter(fn => exports[fn].toString().includes("fooMethod(")));
(In the comments below it is revealed that the above isn't possible.)
Another option would be to use a library like https://github.com/acornjs/acorn (there are other options) to write some other javascript that parses your original target javascript, then you would have a tree structure you could use to perform your matching and eventually return the function names you are after. I'm not an expert in that library so unfortunately I don't have sample code for you.
This regex matches (only) the method names that contain a call to fooMethod();
(?<=exports\.)\w+(?=[^{]+\{[^}]+fooMethod\(\)[^}]+};)
See live demo.
Assuming that all methods have their body enclosed within { and }, I would make an approach to get to the final regex like this:
First, find a regex to get the individual methods. This can be done using this regex:
exports\.(\w+)(\s|.)*?\{(\s|.)*?\}
Next, we are interested in those methods that have fooMethod in them before they close. So, look for } or fooMethod.*}, in that order. So, let us name the group searching for fooMethod as FOO and the name of the method calling it as METH. When we iterate the matches, if group FOO is present in a match, we will use the corresponding METH group, else we will reject it.
exports\.(?<METH>\w+)(\s|.)*?\{(\s|.)*?(\}|(?<FOO>fooMethod)(\s|.)*?\})
Explanation:
exports\.(?<METH>\w+): Till the method name (you have already covered this)
(\s|.)*?\{(\s|.)*?: Some code before { and after, non-greedy so that the subsequent group is given preference
(\}|(?<FOO>fooMethod)(\s|.)*?\}): This has 2 parts:
\}: Match the method close delimiter, OR
(?<FOO>fooMethod)(\s|.)*?\}): The call to fooMethod followed by optional code and method close delimiter.
Here's a JavaScript code that demostrates this:
let p = /exports\.(?<METH>\w+)(\s|.)*?\{(\s|.)*?(\}|(?<FOO>fooMethod)(\s|.)*?\})/g
let input = `exports.methodOne = async user_id => {
// other method contents
};
exports.methodTwo = async user_id => {
// other method contents
fooMethod();
};
exports.methodThree = async user_id => {
// other method contents
fooMethod();
};';`
let match = p.exec( input );
while( match !== null) {
if( match.groups.FOO !== undefined ) console.log( match.groups.METH );
match = p.exec( input )
}

One-liner to extract domain from email address

How to optionally extract domain from local-part#domain? My attempt is
Try(email.split("#")(1)).toOption
but seems there should be a way without depending on exception handling. Ideally, I am after one-liner.
Not one liner, and only works on 2.13. But this seems very clear to me.
def extractDomain(email: String): Option[String] = email match {
case s"${_}#${domain}" => Some(domain)
case _ => None
}
(Note, if there are more than one # sign, this will just split on the first one).
email.dropWhile(_ != '#').drop(1)
email.split("#").lastOption
These are equivalent ONLY if what's passed is an email address.
If the string passed doesn't include # then lastOption will still return a Some() of the entire string, whereas your solution will return a None.
So if you can trust your input then this answer provides a cleaner approach.
You can use Some(email.split("#")(1)), this will split the String and then wrap in Some, which is instance of Option.
Let me cheat a little: I will prepare separate file Email.scala with extractor:
object Email{
def unapply(mail: String): Option[(String, String)] = {
mail match {
case s"$user#$domain" => Some(user, domain)
case _ => None
}
}
}
and then it can be used with pattern matching:
val Email(_, domain) = "test#domain.com"
Not a one-liner, but I always match on array extractors when I do String.split (pre-2.13), I think it's short enough and reads much better than getting parts by index.
email.split("#", 2) match {
case Array(_, domainPart # _*) => domainPart.headOption
}
limit = 2 makes sure that domainPart has at most 1 element.
Note you don't need a catch-all in this case, since split will always return at least one value in the array (although makes sense to cover it with tests to protect against future changes).

Regular expression logic

I'm developing a web application using Angular 6. I have a question:
I'm creating a custom input component (for text input) such as:
#Component({
selector: 'input-text',
templateUrl: './input-text.component.html'
]
})
export class InputTextComponent {
#Input() pattern?: string;
}
I would like a user can insert a regular expression for the validation of the input field, in this way:
<input-text pattern="^[a-z0-9_-]{8,15}$"></input-text>
The template of my component is defined like this:
<input type="text" [attr.pattern]="pattern"/>
Unfortunately I know absolutely nothing about regular expressions.
I would like to do two things:
1 - Create a method that checks the validity of the regular expression and changes the visual style.
2 - Make sure that if the input (with a pattern field) is inserted into a form, the attribute form.valid remains false until the expression is valid.
Thanks for your help!
Check regex validity
You can simply catch exceptions thrown by the RegExp constructor when instanciating it.
try {
const regex = new RegExp(pattern);
} catch (error) {
// If it goes here, then the regex model is not correct
console.error(error.message)
}
Change the visual style
You can simply use the ngClass attribute to change your input style.
If you enter the catch statement, set a style variable to change the class like so
private hasBadInput: boolean;
// [...]
catch (error) {
hasBadInput= true;
}
Then apply a specific class in that case:
<input-text [ngClass]="{'yourErrorClass': hasBadInput}"><input-text>
Form validity
You did well using [attr.pattern], the form should automatically consider the entered pattern. You should try your form with a hard written regex before, and then use the input one.
Follow this official guideline to create Angular 2+ forms.

URL validation in typescript

I want to make a custom validator that should check the input Url is valid or not.
I want to use the following regex that I tested in expresso, but comes off invalid when used in typescript (the compiler fails to parse it):
(((ht|f)tp(s?))\://)?((([a-zA-Z0-9_\-]{2,}\.)+[a-zA-Z]{2,})|((?:(?:25[0-5]|2[0-4]\d|[01]\d\d|\d?\d)(?(\.?\d)\.)){4}))(:[a-zA-Z0-9]+)?(/[a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~]*)?
The above url checks for optional http:\\\ and also will validate an Ip address
The following url's should be valid :
192.1.1.1
http://abcd.xyz.in
https://192.1.1.126
abcd.jhjhj.lo
The following url's should be invalid:
192.1
http://hjdhfjfh
168.18.5
Kindly assist
The forward slashes / are not escaped in the regex.
What is valid or invalid in Javascript is valid or invalid in Typescript and vice-versa.
There may be another option for you, that relies on the URL class. The idea is to try converting the string into a URL object. If that fails, the string does not contain a valid URL.
public isAValidUrl(value: string): boolean {
try {
const url = new URL(value);
return isValid(url.pathname);
} catch (TypeError) {
return false;
}
}
isValid(value: URL): boolean {
// you may do further tests here, e.g. by checking url.pathname
// for certain patterns
}
Alternatively to returning a boolean you may return the created URL or null instead of a boolean or - if that exists in JavaScript or TypeScript: something like an Optional<URL>. You should adapt the method's name then, of course.

How to find and replace SkippedTokensTrivia using Roslyn

I'm trying to fix the following VBA statement (converting some old code just for fun and to learn Roslyn, not at all looking for anything perfect) to remove the Set keyword so it's a valid VB.NET statement:
Set f = New Foo()
When I look at it through the Syntax Visualizer, I see it turns into trailing trivia.
I'm trying to figure out how to find it using a query. I tried several approaches but all of the following came up empty:
var attempt1 = root.DescendantTokens().Where(t=>t.IsKind(SyntaxKind.SkippedTokensTrivia));
var attempt2 = root.DescendantTokens().Where(t => t.IsKind(SyntaxKind.SetKeyword));
var attempt3 = root.DescendantTrivia().Where(t => t.IsKind(SyntaxKind.SetKeyword));
var attempt4 = root.DescendantNodes()
.OfType<EmptyStatementSyntax>()
.Where(e => e.DescendantTokens().Any(t => t.IsKeyword()));
(Yes, I'm using C# to work with a VisualBasicSyntaxTree)
I can't seem to just find the SetKeyword token that appears in the visualizer, so I thought maybe it's doing some more heavy lifting to piece together what it really is (is that what's meant by structured trivia?). I read something in the documentation that mentioned the compiler can choose to represent it a couple of different ways, so I thought that may be what's going on here.
The query was just the first thing I tried, but in reality I have a SyntaxRewriter I'm using to visit the code to find and fix all such problems (I'm already able to fix missing parentheses around ArgumentLists, for example) but in this case I can't seem to figure out which Visit method to override.
So again, 1) how to query for these from the root and 2) the best override to select from a rewriter. I've been beating my face on the keyboard for two days on this which exponentially increases the likelihood that I'm having a cranio/recto-insertion moment and I need one of you kind souls to pull me out of it.
Cheers!
Brian
Edit: Fixed typo in query attempt1
So it appears that when the compiler reaches an error condition, it will skip all tokens up to the next point where it can recover and continue parsing (the end of the line in this case). The node representing this error condition is an EmptyStatement with trailing syntax trivia containing the rest of the text as parsed tokens.
So if you're going to rewrite a node, you'll want to rewrite EmptyStatements. But you don't want to write just any empty statement, just the ones with the "BC30807" diagnostic code.
public override SyntaxNode VisitEmptyStatement(EmptyStatementSyntax node)
{
var diagnostic = GetLetSetDiagnostic(node);
if (diagnostic == null)
return base.VisitEmptyStatement(node);
return RewriteLetSetStatement(node);
}
private Diagnostic GetLetSetDiagnostic(EmptyStatementSyntax node)
{
//'Let' and 'Set' assignment statements are no longer supported.
const string code = "BC30807";
return node.GetDiagnostics().SingleOrDefault(n => n.Id == code);
}
The implementation of the RewriteLetSetStatement() method is a bit of a mystery to me, I'm not sure how it can be implemented utilizing the compiler services effectively, I don't think that this is a use case that it covers well. The trivia retains the parsed tokens, but there's not much you can do with those tokens AFAIK.
Ideally, we'd just want to ignore the Set token from the tokens and throw it back into the parser to be reparsed. And as far as I can tell, that's not possible, we can only parse from text.
So, I guess the next best thing to do would be to take the text, rewrite it to remove the Set and parse the text again.
private SyntaxNode RewriteLetSetStatement(EmptyStatementSyntax node)
{
var letSetTokens = node.GetTrailingTrivia()
.Where(triv => triv.IsKind(SyntaxKind.SkippedTokensTrivia))
.SelectMany(triv => triv.GetStructure().ChildTokens())
.TakeWhile(tok => new[] {SyntaxKind.LetKeyword, SyntaxKind.SetKeyword}
.Contains(tok.VisualBasicKind()));
var span = new RelativeTextSpan(node.FullSpan);
var newText = node.GetText().WithChanges(
// replacement spans must be relative to the text
letSetTokens.Select(tok => new TextChange(span.GetSpan(tok.Span), ""))
);
return SyntaxFactory.ParseExecutableStatement(newText.ToString());
}
private class RelativeTextSpan(private TextSpan span)
{
public TextSpan GetSpan(TextSpan token)
{
return new TextSpan(token.Start - span.Start, token.Length);
}
}