EDIT: Please feel free to add additional validations that would be useful for others, using this simple directive.
--
I'm trying to create an Angular Directive that limits the characters input into a text box. I've been successful with a couple common use cases (alphbetical, alphanumeric and numeric) but using popular methods for validating email addresses, dates and currency I can't get the directive to work since I need it negate the regex. At least that's what I think it needs to do.
Any assistance for currency (optional thousand separator and cents), date (mm/dd/yyyy) and email is greatly appreciated. I'm not strong with regular expressions at all.
Here's what I have currently:
http://jsfiddle.net/corydorning/bs05ys69/
HTML
<div ng-app="example">
<h1>Validate Directive</h1>
<p>The Validate directive allow us to restrict the characters an input can accept.</p>
<h3><code>alphabetical</code> <span style="color: green">(works)</span></h3>
<p>Restricts input to alphabetical (A-Z, a-z) characters only.</p>
<label><input type="text" validate="alphabetical" ng-model="validate.alphabetical"/></label>
<h3><code>alphanumeric</code> <span style="color: green">(works)</span></h3>
<p>Restricts input to alphanumeric (A-Z, a-z, 0-9) characters only.</p>
<label><input type="text" validate="alphanumeric" ng-model="validate.alphanumeric" /></label>
<h3><code>currency</code> <span style="color: red">(doesn't work)</span></h3>
<p>Restricts input to US currency characters with comma for thousand separator (optional) and cents (optional).</p>
<label><input type="text" validate="currency.us" ng-model="validate.currency" /></label>
<h3><code>date</code> <span style="color: red">(doesn't work)</span></h3>
<p>Restricts input to the mm/dd/yyyy date format only.</p>
<label><input type="text" validate="date" ng-model="validate.date" /></label>
<h3><code>email</code> <span style="color: red">(doesn't work)</span></h3>
<p>Restricts input to email format only.</p>
<label><input type="text" validate="email" ng-model="validate.email" /></label>
<h3><code>numeric</code> <span style="color: green">(works)</span></h3>
<p>Restricts input to numeric (0-9) characters only.</p>
<label><input type="text" validate="numeric" ng-model="validate.numeric" /></label>
JavaScript
angular.module('example', [])
.directive('validate', function () {
var validations = {
// works
alphabetical: /[^a-zA-Z]*$/,
// works
alphanumeric: /[^a-zA-Z0-9]*$/,
// doesn't work - need to negate?
// taken from: http://stackoverflow.com/questions/354044/what-is-the-best-u-s-currency-regex
currency: /^[+-]?[0-9]{1,3}(?:,?[0-9]{3})*(?:\.[0-9]{2})?$/,
// doesn't work - need to negate?
// taken from here: http://stackoverflow.com/questions/15196451/regular-expression-to-validate-datetime-format-mm-dd-yyyy
date: /(?:0[1-9]|1[0-2])\/(?:0[1-9]|[12][0-9]|3[01])\/(?:19|20)[0-9]{2}/,
// doesn't work - need to negate?
// taken from: http://stackoverflow.com/questions/46155/validate-email-address-in-javascript
email: /^([\w-]+(?:\.[\w-]+)*)#((?:[\w-]+\.)*\w[\w-]{0,66})\.([a-z]{2,6}(?:\.[a-z]{2})?)$/i,
// works
numeric: /[^0-9]*$/
};
return {
require: 'ngModel',
scope: {
validate: '#'
},
link: function (scope, element, attrs, modelCtrl) {
var pattern = validations[scope.validate] || scope.validate
;
modelCtrl.$parsers.push(function (inputValue) {
var transformedInput = inputValue.replace(pattern, '')
;
if (transformedInput != inputValue) {
modelCtrl.$setViewValue(transformedInput);
modelCtrl.$render();
}
return transformedInput;
});
}
};
});
I am pretty sure, there is better way, probably regex is also not best tool for that, but here is mine proposition.
This way you can only restrict which characters are allowed for input and to force user to use proper format, but you will need to also validate final input after user will finish typing, but this is another story.
The alphabetic, numeric and alphanumeric are quite simple, for input and validating input, as it is clear what you can type, and what is a proper final input. But with dates, mails, currency, you cannot validate input with regex for full valid input, as user need to type it in first, and in a meanwhile the input need to by invalid in terms of final valid input. So, this is one thing to for example restrict user to type just digits and / for a date format, like: 12/12/1988, but in the end you need to check if he typed proper date or just 12/12/126 for example. This need to be checked when answer is submited by user, or when text field lost focus, etc.
To just validate typed character, you can try with this:
JSFiddle DEMO
First change:
var transformedInput = inputValue.replace(pattern, '')
to
var transformedInput = inputValue.replace(pattern, '$1')
then use regular expressions:
/^([a-zA-Z]*(?=[^a-zA-Z]))./ - alphabetic
/^([a-zA-Z0-9]*(?=[^a-zA-Z0-9]))./ - alphanumeric
/(\.((?=[^\d])|\d{2}(?![^,\d.]))|,((?=[^\d])|\d{3}(?=[^,.$])|(?=\d{1,2}[^\d]))|\$(?=.)|\d{4,}(?=,)).|[^\d,.$]|^\$/- currency (allow string like: 343243.34, 1,123,345.34, .05 with or without $)
^(((0[1-9]|1[012])|(\d{2}\/\d{2}))(?=[^\/])|((\d)|(\d{2}\/\d{2}\/\d{1,3})|(.+\/))(?=[^\d])|\d{2}\/\d{2}\/\d{4}(?=.)).|^(1[3-9]|[2-9]\d)|((?!^)(3[2-9]|[4-9]\d)\/)|[3-9]\d{3}|2[1-9]\d{2}|(?!^)\/\d\/|^\/|[^\d/] - date (00-12/00-31/0000-2099)
/^(\d*(?=[^\d]))./ - numeric
/^([\w.$-]+\#[\w.]+(?=[^\w.])|[\w.$-]+\#(?=[^\w.-])|[\w.#-]+(?=[^\w.$#-])).$|\.(?=[^\w-#]).|[^\w.$#-]|^[^\w]|\.(?=#).|#(?=\.)./i - email
Generally, it use this pattern:
([valid characters or structure] captured in group $1)(?= positive lookahead for not allowed characters) any character
in effect it will capture all valid character in group $1, and if user type in an invalid character, whole string is replaced with already captured valid characters from group $1. It is complemented by part which shall exclude some obvious invalid character(s), like ## in a mail, or 34...2 in currency.
With understanding how these regular expression works, despite that it looks quite complex, I think it easy to extend it, by adding additional allowed/not allowed characters.
Regular expression for validating currency, dates and mails are easy to find, so I find it redundant to post them here.
OffTopic. Whats more the currency part in your demo is not working, it is bacause of: validate="currency.us" instead of validate="currency", or at least it works after this modification.
In my opinion it is impossible to create regular expressions that will work for matching things like dates or emails with the
parser you use. This is mainly because you would need non-capturing groups in your
regular expressions (which is possible), which are not replaced by the
inputValue.replace(pattern, '') call you have in your parser function. And this is the
part that is not possible in JavaScript. JavaScript replaces what you put in non-capturing
groups as well.
So... you'll need to go for a different approach. I would suggest to go for positive
regular expressions, which will yield a match when the input is valid.
Then you need of course to change the code of your parser. You could for instance
decide to chop off characters from the end of the input text until what remains passes
the regular expression test. This you could code as follows:
modelCtrl.$parsers.push(function (inputValue) {
var transformedInput = inputValue;
while (transformedInput && !pattern.exec(transformedInput)) {
// validation fails: chop off last character and try again
transformedInput = transformedInput.slice(0, -1);
}
if (transformedInput !== inputValue) {
modelCtrl.$setViewValue(transformedInput);
modelCtrl.$render();
}
return transformedInput;
});
Now life has become a bit easier. Just pay attention that you make your regular
expressions in such a way that they do not reject partial input. So "01/" should be
considered valid for a date, otherwise the user can never get to type in a date. On
the other hand, as soon as it becomes clear that adding characters will no longer
allow for valid input, the regular expression should reject it. So "101" should be
rejected as a date, as you can never add characters at the end to make it a valid date.
Also, all of these regular expressions should check the whole input, so as a consequence
they need to make use of the ^ and $ symbols.
Here is what the regular expression for a (partial) date could look like:
^([0-9]{0,2}|[0-9]{2}[\/]([0-9]{0,2}|[0-9]{2}[\/][0-9]{0,4}))$
This means: an input of 0 to 2 digits is valid, or exactly 2 digits followed by a slash, followed by either:
0 to 2 digits, or
exactly 2 digits followed by a slash, followed by 0 to 4 digits
Admittedly, not as smart as the one you had found, but that one would need a lot of editing to allow for partially entered dates. It is possible, but
it represents a very long expression with a lot of brackets and |.
Once you have all the regular expressions set up, you could think to further improve
the parser. One idea would be to not let it chop off characters from the end, but to
let it test all strings with one character removed somewhere compared to the original,
and see which one passes the test. If there is no way found to remove one character and have
success, then remove two consecutive characters in any place of the input value,
then three, ... etc, until you find a value that passes the test or arrive at an empty value.
This will work better for cases where the user inserts characters half way their input.
Just an idea...
import { Directive, ElementRef, EventEmitter, HostListener, Input, Output, Renderer2 } from '#angular/core';
import { ControlValueAccessor, NG_VALUE_ACCESSOR } from '#angular/forms';
import { CurrencyPipe, DecimalPipe } from '#angular/common';
import { ValueChangeEvent } from '#goomTool/goom-elements/events/value-change-event.model';
const noOperation = () => {
};
#Directive({
selector: '[formattedNumber]',
providers: [{
provide: NG_VALUE_ACCESSOR,
useExisting: FormattedNumberDirective,
multi: true
}]
})
export class FormattedNumberDirective implements ControlValueAccessor {
#Input() public configuration;
#Output() public valueChange: EventEmitter<ValueChangeEvent> = new EventEmitter();
public locale: string = process.env.LOCALE;
private el: HTMLInputElement;
// Keeps track of the value without formatting
private innerInputValue: any;
private specialKeys: string[] =
['Backspace', 'Tab', 'End', 'Home', 'Enter', 'Shift', 'ArrowRight', 'ArrowLeft', 'Delete'];
private onTouchedCallback: () => void = noOperation;
private onChangeCallback: (a: any) => void = noOperation;
constructor(private elementRef: ElementRef,
private decimalPipe: DecimalPipe,
private currencyPipe: CurrencyPipe,
private renderer: Renderer2) {
this.el = elementRef.nativeElement;
}
public writeValue(value: any) {
if (value !== this.innerInputValue) {
if (!!value) {
this.renderer.setAttribute(this.elementRef.nativeElement, 'value', this.getFormattedValue(value));
}
this.innerInputValue = value;
}
}
public registerOnChange(fn: any) {
this.onChangeCallback = fn;
}
public registerOnTouched(fn: any) {
this.onTouchedCallback = fn;
}
// On Focus remove all non-digit ,display actual value
#HostListener('focus', ['$event.target.value'])
public onfocus(value) {
if (!!this.innerInputValue) {
this.el.value = this.innerInputValue;
}
}
// On Blur set values to pipe format
#HostListener('blur', ['$event.target.value'])
public onBlur(value) {
this.innerInputValue = value;
if (!!value) {
this.el.value = this.getFormattedValue(value);
}
}
/**
* Allows special key, Unit Interval, value based on regular expression
*
* #param event
*/
#HostListener('keydown', ['$event'])
public onKeyDown(event) {
// Allow Backspace, tab, end, and home keys . .
if (this.specialKeys.indexOf(event.key) !== -1) {
if (event.key === 'Backspace') {
this.updateValue(this.getBackSpaceValue(this.el.value, event));
}
if (event.key === 'Delete') {
this.updateValue(this.getDeleteValue(this.el.value, event));
}
return;
}
const next: string = this.concatAtIndex(this.el.value, event);
if (this.configuration.angularPipe && this.configuration.angularPipe.length > 0) {
if (!this.el.value.includes('.')
&& (this.configuration.min == null || this.configuration.min < 1)) {
if (next.startsWith('0') || next.startsWith('0.') || next.startsWith('.')) {
if (next.length > 1) {
this.updateValue(next);
}
return;
}
}
}
/* pass your pattern in component regex e.g.
* regex = new RegExp(RegexPattern.WHOLE_NUMBER_PATTERN)
*/
if (next && !String(next).match(this.configuration.regex)) {
event.preventDefault();
return;
}
if (!!this.configuration.minFractionDigits && !!this.configuration.maxFractionDigits) {
if (!!next.split('\.')[1] && next.split('\.')[1].length > this.configuration.minFractionDigits) {
return this.validateFractionDigits(next, event);
}
}
this.innerInputValue = next;
this.updateValue(next);
}
private updateValue(newValue) {
this.onTouchedCallback();
this.onChangeCallback(newValue);
if (newValue) {
this.renderer.setAttribute(this.elementRef.nativeElement, 'value', newValue);
}
}
private validateFractionDigits(next, event) {
// create real-time pattern to validate min & max fraction digits
const regex = `^[-]?\\d+([\\.,]\\d{${this.configuration.minFractionDigits},${this.configuration.maxFractionDigits}})?$`;
if (!String(next).match(regex)) {
event.preventDefault();
return;
}
this.updateValue(next);
}
private concatAtIndex(current: string, event) {
return current.slice(0, event.currentTarget.selectionStart) + event.key +
current.slice(event.currentTarget.selectionEnd);
}
private getBackSpaceValue(current: string, event) {
return current.slice(0, event.currentTarget.selectionStart - 1) +
current.slice(event.currentTarget.selectionEnd);
}
private getDeleteValue(current: string, event) {
return current.slice(0, event.currentTarget.selectionStart) +
current.slice(event.currentTarget.selectionEnd + 1);
}
private transformCurrency(value) {
return this.currencyPipe.transform(value, this.configuration.currencyCode, this.configuration.display,
this.configuration.digitsInfo, this.locale);
}
private transformDecimal(value) {
return this.decimalPipe.transform(value, this.configuration.digitsInfo, this.locale);
}
private transformPercent(value) {
return this.decimalPipe.transform(value, this.configuration.digitsInfo, this.locale) + ' %';
}
private getFormattedValue(value) {
switch (this.configuration.angularPipe) {
case ('decimal'): {
return this.transformDecimal(value);
}
case ('currency'): {
return this.transformCurrency(value);
}
case ('percent'): {
return this.transformPercent(value);
}
default: {
return value;
}
}
}
}
----------------------------------
export const RegexPattern = Object.freeze({
PERCENTAGE_PATTERN: '^([1-9]\\d*(\\.)\\d*|0?(\\.)\\d*[1-9]\\d*|[1-9]\\d*)$', // e.g. '.12% ' or 12%
DECIMAL_PATTERN: '^(([-]+)?([1-9]\\d*(\\.|\\,)\\d*|0?(\\.|\\,)\\d*[1-9]\\d*|[1-9]\\d*))$', // e.g. '123.12'
CURRENCY_PATTERN: '\\$?[-]?[0-9]{1,3}(?:,?[0-9]{3})*(?:\\.[0-9]{2})?$', // e.g. '$123.12'
KEY_PATTERN: '^[a-zA-Z\\-]+-[0-9]+', // e.g. ABC-1234
WHOLE_NUMBER_PATTERN: '^([-]?([1-9][0-9]*)|([0]+)$)$' // e.g 1234
});
**I am making a project on sentiment analysis. so i used stanford POS tagger to tag the sentence. I want to extract noun phrases from the sentences but it was only tagging noun.
How do i get noun phrases from that. i code in java.
i searched on websites and i found this for making a noun phrase:
For noun phrases, this pattern or regular expression is the following:
(Adjective | Noun)* (Noun Preposition)? (Adjective | Noun)* Noun
i.e. Zero or more adjectives or nouns, followed by an option group of a noun and a preposition, followed again by zero or more adjectives or nouns, followed by a single noun.
i was trying to code it using java's reguler expression library. i.e regex. but couldnt find the desired result.
Does anyone has code for it?
**
I have coded this. and solution is..
it will extracy all the noun phrase from a sentence containing only noun.
for eg. like NP is: the white tiger. it will extract "white tiger".
public static void maketree(String sent, int sno, Sentences sen)
{
try
{
LexicalizedParser parser = LexicalizedParser.loadModel("stanford-parser-full-2014-01-04\\stanford-parser-3.3.1-models\\edu\\stanford\\nlp\\models\\lexparser\\englishPCFG.ser.gz");
String sent2 = "Picture Quality of this camera is very good";
String sent1[] = sent2.split(" ");
List<CoreLabel> rawWords = Sentence.toCoreLabelList(sent1);
Tree x = parser.apply(rawWords);
x.indexLeaves();
System.out.println(x);
findNP(x,sen);
}
catch (Exception e)
{
e.printStackTrace();
}
}
public static void findNP(Tree t, Sentences sent)
{
if (t.label().value().equals("NP"))
{
noun(t,sent);
}
else
{
for (Tree child : t.children())
{
findNP(child,sent);
}
}
}
public static void noun(Tree t,Sentences sent)
{
String noun="";
for(Tree temp : t.children())
{
String val = temp.label().value();
if(val.equals("NN") || val.equals("NNS") || val.equals("NNP") || val.equals("NNPS"))
{
Tree nn[] = temp.children();
String ss = Sentence.listToString(nn[0].yield());
if(noun=="")
{
noun = ss;
}
else
{
noun = noun+" "+ss;
}
}
else
{
if(noun!="")
{
sent.nouns[i++] = noun;
noun = "";
}
noun(temp,sent);
}
}
if(noun!="")
{
sent.nouns[i++] = noun;
}
}
Could you please check the link and comment on this. Could you please me if
"the white tiger" would get the same result with your above code.probably the code is not complete and thats why I am getting some error.
for eg:
sent.nouns[i++] = noun; // sent.nouns????? it seems to be undefined. could you please get the complete code or if you can commnet on the below link.
here is the link
Extract Noun phrase using stanford NLP
Thanks for the help
This took me a while to figure out so I will Post my results here in the Question as this is Answered.
Question: How do i split a string using a array of possible delimiters in a name field while keeping the delimiter in the split array and excluding white-space the split may create in the array.
Example: Sam Washington& Jenna
My issue was the name parser i created was writing
Firstname:Sam
LastName : Jenna
Using the following code I was able to Parse it out like this
FirstName: Sam
Lastname : Washington
Firstname2 Jenna
Be careful However because if you are going to use my list of joiners do not include string values that can be found in common names such as "And" and "OR"
This would parse your names EX: "Andy" would be "And" , "Y"
EX2: "Gregory would be "Greg" "or" "y"
Hope this helps someone. If you have questions please feel free to shoot me a message.
/// <summary>
/// remove bad name parts
/// </summary>
/// <param name="parts">name parsed for review</param>
public static void CheckBadNames(ref string[] parts)
{
string[] BadName = new string[] {"LIFE", "ESTATE" ,"(",")","*","AN","LIFETIME","INTREST","MARRIED",
"UNMARRIED","MARRIED/UNMARRIED","SINGLE","W/","/W","THE","ET",
"ALS","AS", "TENANT","WIFE", "HUSBAND", "NOT", "DRIVE" ,"INSURED",
"EXCLUDED","DISABLED" ,"LICENSED","TRUSTEE","ATSOT","A T S O T",
"AKA", "-ATSOT","OF","DBA","EVOCABLE","FAMILY","INTEREST","MASTER"};
string[] joiners = new string[9] { "&", #"AND\", #"OR\", "\\", "&/OR", "AND/OR", "&-OR", "/", "OF/AND" };
Restart:
List<string> list = new List<string>(parts); //convert array to list
foreach (string part in list)
{
if (BadName.Any(s => part.ToUpper().Equals(s)) || part == "-")
{
list.Remove(part);
parts = list.ToArray();
goto Restart;
}
//check to see if any part ends with joiner
if (joiners.Any(s => part.ToUpper().EndsWith(s)))
{
//check if by ends with means that it is just a joiner
if (joiners.Any(s => part.ToUpper().Equals(s)))
{
continue;
}
else //name part ends with a joiner EX. Washington&
{
foreach (string div in joiners.Where(s => part.ToUpper().Contains(s))) // each string that contains a joiner
{
var temp = Regex.Split(part, "(" + div + ")").Where(x => x != String.Empty); // split into parts ignore leading or trailing spaces
int pos = list.IndexOf(part);
list.Remove(part);
for (int i = 0; i < temp.Count(); i++)
{
list.Insert(pos + i, temp.ElementAt(i));
}
parts = list.ToArray();
goto Restart;
}
}
}
}
if (parts.Count() == 0)
{
return;
}
if (joiners.Any(s => list.Last().ToUpper().Equals(s))) //remove last part if is a joiner
{
list.Remove(list.Last());
}
parts = list.ToArray(); // convert list back to array
}
I'm trying to parse really complicated csv, which is generated wittout any quotes for columns with commas.
The only tip I get, that commas with whitespace before or after are included in field.
Jake,HomePC,Microsoft VS2010, Microsoft Office 2010
Should be parsed to
Jake
HomePC
Microsoft VS2010, Microsoft Office 2010
Can anybody advice please on how to include "\s," and ,"\s" to column body.
If your language supports lookbehind assertions, split on
(?<!\s),(?!\s)
In C#:
string[] splitArray = Regex.Split(subjectString,
#"(?<!\s) # Assert that the previous character isn't whitespace
, # Match a comma
(?!\s) # Assert that the following character isn't whitespace",
RegexOptions.IgnorePatternWhitespace);
split by r"(?!\s+),(?!\s+)"
in python you can do this like
import re
re.split(r"(?!\s+),(?!\s+)", s) # s is your string
Try this. It gave me the desired result which you have mentioned.
StringBuilder testt = new StringBuilder("Jake,HomePC,Microsoft VS2010, Microsoft Office 2010,Microsoft VS2010, Microsoft Office 2010");
Pattern varPattern = Pattern.compile("[a-z0-9],[a-z0-9]", Pattern.CASE_INSENSITIVE);
Matcher varMatcher = varPattern.matcher(testt);
List<String> list = new ArrayList<String>();
int startIndex = 0, endIndex = 0;
boolean found = false;
while (varMatcher.find()) {
endIndex = varMatcher.start()+1;
if (startIndex == 0) {
list.add(testt.substring(startIndex, endIndex));
} else {
startIndex++;
list.add(testt.substring(startIndex, endIndex));
}
startIndex = endIndex;
found = true;
}
if (found) {
if (startIndex == 0) {
list.add(testt.substring(startIndex));
} else {
list.add(testt.substring(startIndex + 1));
}
}
for (String s : list) {
System.out.println(s);
}
Please note that the code is in Java.