Regex for file extensions, how to understand it - regex

In the jQuery-File-Upload and its basic-plus.html demo there is a regex for the file extensions.
/(\.|\/)(gif|jpe?g|png)$/i
(gif|jpe?g|png)$/i - second part is obvious matches particular file extensions, case-insensitive.
But what about this part :
(\.|\/)
Decimal point is one character, | matches either the first or the second part ?
jQuery-File-Upload basic-plus.html

This part : (\.|\/) mean :
\. : a real dot (escaped), it is not the usual dot, which match every character.
| OR
\/ a slash (escaped)
So you're catching :
.gif, /gif, .GIF, /GIF...
.jpg, /jpg, .JPG, /JPG, .jpeg, /jpeg, .JPEG, /JPEG...
.png, /png, .PNG, /PNG...
This is used to check both file extension (.jpg, .gif...) and MimeType (image/jpeg, image/gif...).
Extract from jquery.fileupload-validate.js sources :
$.widget('blueimp.fileupload', $.blueimp.fileupload, {
options: {
// The regular expression for allowed file types, matches
// against either file type or file name:
acceptFileTypes: /(\.|\/)(gif|jpe?g|png)$/i,
...
},
processActions: {
validate: function (data, options) {
...
// Check file.type AND file.name
if (options.acceptFileTypes &&
!(options.acceptFileTypes.test(file.type) ||
options.acceptFileTypes.test(file.name))) {
file.error = settings.i18n('acceptFileTypes');
}
...
}
}
});

Related

Regex (with lookahead/lookbehind extensions) that that maches on anything in node_modules besides a certain sub folder

For a Jest config file where I am configuring code transformations, I am trying to create a pattern that will ignore all files in node_modules/.pnpm, except a certain package (a fork of #react-unicons), but I am consistently failing with the whole lookahead attempts, either matching too much or too little.
Is this even possible?
From the list of strings below, I only want to filter out all other lines from .pnpm other than the folder github.com+ACME+react-unicons#763e2d31e16c2abba8924f5d5970452912f18704. That means files from outside of node_modules/.pnpm needs to be kept in the output.
"/home/me/code/proj-foo/node_modules/.pnpm/github.com+ACME+react-unicons#763e2d31e16c2abba8924f5d5970452912f18704/node_modules/#iconscout/react-unicons/icons/uil-times.js",
"/home/me/code/proj-foo/node_modules/.pnpm/github.com+ACME+react-unicons#763e2d31e16c2abba8924f5d5970452912f18704/node_modules/#iconscout/react-unicons/icons/uil-arrow-left.js",
"/home/me/code/proj-foo/node_modules/.pnpm/github.com+ACME+react-unicons#763e2d31e16c2abba8924f5d5970452912f18704/node_modules/#iconscout/react-unicons/.github/workflows/autotag.yml",
"/home/me/code/proj-foo/node_modules/.pnpm/github.com+ACME+react-unicons#763e2d31e16c2abba8924f5d5970452912f18704/node_modules/#iconscout/react-unicons/icons/uil-transaction.js",
/home/me/code/proj-foo/node_modules/",
"/home/me/code/proj-foo/node_modules/exif-js",
"/home/me/code/proj-foo/node_modules/eslint-plugin-testing-library",
"/home/me/code/proj-foo/node_modules/#typescript-eslint",
"/home/me/code/proj-foo/node_modules/chart.js",
"/home/me/code/proj-foo/node_modules/#iconscout",
"/home/me/code/proj-foo/node_modules/#iconscout/react-unicons",
"/home/me/code/proj-foo/node_modules/react-redux",
"/home/me/code/proj-foo/node_modules/.bin",
"/home/me/code/proj-foo/node_modules/stmux",
"/home/me/code/proj-foo/node_modules/only-allow",
"/home/me/code/proj-foo/node_modules/react-addons-deep-compare",
"/home/me/code/proj-foo/code/my-app/src/components/one-offs",
"/home/me/code/proj-foo/code/my-app/src/components/one-offs/MainGuiLayout.tsx",
"/home/me/code/proj-foo/code/my-app/src/components/one-offs/AppWrapper.jsx"
const files = [
"/home/me/code/proj-foo/node_modules/.pnpm/#babel+runtime#7.17.2/node_modules/#babel/runtime/regenerator/index.js",
"/home/me/code/proj-foo/node_modules/.pnpm/regenerator-runtime#0.13.9/node_modules/regenerator-runtime/runtime.js",
"/home/me/code/proj-foo/node_modules/.pnpm/lodash#4.17.21/node_modules/lodash/isEmpty.js",
"/home/me/code/proj-foo/node_modules/.pnpm/#material-ui+core#4.12.3_b8fdba992ce7d797017dc07106486496/node_modules/#material-ui/core/Zoom/Zoom.js",
"/home/me/code/proj-foo/node_modules/.pnpm/github.com+ACME+react-unicons#763e2d31e16c2abba8924f5d5970452912f18704/node_modules/#iconscout/react-unicons/icons/uil-times.js",
"/home/me/code/proj-foo/node_modules/.pnpm/github.com+ACME+react-unicons#763e2d31e16c2abba8924f5d5970452912f18704/node_modules/#iconscout/react-unicons/icons/uil-arrow-left.js",
"/home/me/code/proj-foo/node_modules/.pnpm/github.com+ACME+react-unicons#763e2d31e16c2abba8924f5d5970452912f18704/node_modules/#iconscout/react-unicons/.github/workflows/autotag.yml",
"/home/me/code/proj-foo/node_modules/.pnpm/github.com+ACME+react-unicons#763e2d31e16c2abba8924f5d5970452912f18704/node_modules/#iconscout/react-unicons/icons/uil-transaction.js",
"/home/me/code/proj-foo/node_modules/.pnpm/github-slugger#1.4.0",
"/home/me/code/proj-foo/node_modules/.pnpm/github-slugger#1.4.0/node_modules",
"/home/me/code/proj-foo/node_modules/.pnpm/github-slugger#1.4.0/node_modules/github-slugger",
"/home/me/code/proj-foo/node_modules/",
"/home/me/code/proj-foo/node_modules/exif-js",
"/home/me/code/proj-foo/node_modules/eslint-plugin-testing-library",
"/home/me/code/proj-foo/node_modules/#typescript-eslint",
"/home/me/code/proj-foo/node_modules/chart.js",
"/home/me/code/proj-foo/node_modules/#iconscout",
"/home/me/code/proj-foo/node_modules/#iconscout/react-unicons",
"/home/me/code/proj-foo/node_modules/react-redux",
"/home/me/code/proj-foo/node_modules/.bin",
"/home/me/code/proj-foo/node_modules/stmux",
"/home/me/code/proj-foo/node_modules/only-allow",
"/home/me/code/proj-foo/node_modules/react-addons-deep-compare",
"/home/me/code/proj-foo/code/my-app/src/components/one-offs",
"/home/me/code/proj-foo/code/my-app/src/components/one-offs/MainGuiLayout.tsx",
"/home/me/code/proj-foo/code/my-app/src/components/one-offs/AppWrapper.jsx"
]
const nodeModulesPattern = 'node_modules'
const nodeModulesPattern0 = 'node_modules/.pnpm'
const nodeModulesPattern1 = 'node_modules/(?!#iconscout)'
const nodeModulesPattern2 = 'node_modules/(?!\.pnpm)'
const nodeModulesPattern3 = 'node_modules/.pnpm/(?!(github.com\+ACME))/'
const re = new RegExp(nodeModulesPattern0)
// only print what does NOT match
for ( const line of files ) {
if(!re.test(line)) console.log("NO match: ", line);
}
A correct regex would mean the output wuold contain lines starting with "NO match: " for all of the following files (which is a subset of the files array above):
"/home/me/code/proj-foo/node_modules/.pnpm/github.com+ACME+react-unicons#763e2d31e16c2abba8924f5d5970452912f18704/node_modules/#iconscout/react-unicons/icons/uil-times.js",
"/home/me/code/proj-foo/node_modules/.pnpm/github.com+ACME+react-unicons#763e2d31e16c2abba8924f5d5970452912f18704/node_modules/#iconscout/react-unicons/icons/uil-arrow-left.js",
"/home/me/code/proj-foo/node_modules/.pnpm/github.com+ACME+react-unicons#763e2d31e16c2abba8924f5d5970452912f18704/node_modules/#iconscout/react-unicons/.github/workflows/autotag.yml",
"/home/me/code/proj-foo/node_modules/.pnpm/github.com+ACME+react-unicons#763e2d31e16c2abba8924f5d5970452912f18704/node_modules/#iconscout/react-unicons/icons/uil-transaction.js",
"/home/me/code/proj-foo/node_modules/",
"/home/me/code/proj-foo/node_modules/exif-js",
"/home/me/code/proj-foo/node_modules/eslint-plugin-testing-library",
"/home/me/code/proj-foo/node_modules/#typescript-eslint",
"/home/me/code/proj-foo/node_modules/chart.js",
"/home/me/code/proj-foo/node_modules/#iconscout",
"/home/me/code/proj-foo/node_modules/#iconscout/react-unicons",
"/home/me/code/proj-foo/node_modules/react-redux",
"/home/me/code/proj-foo/node_modules/.bin",
"/home/me/code/proj-foo/node_modules/stmux",
"/home/me/code/proj-foo/node_modules/only-allow",
"/home/me/code/proj-foo/node_modules/react-addons-deep-compare",
"/home/me/code/proj-foo/code/my-app/src/components/one-offs",
"/home/me/code/proj-foo/code/my-app/src/components/one-offs/MainGuiLayout.tsx",
"/home/me/code/proj-foo/code/my-app/src/components/one-offs/AppWrapper.jsx"
Here is a suggestion : we either accept
the lines not containing pnpn in the first 7 group / + name
the lines containing the string github\.com\+ACME\+react-unicons
^(((?!pnpm).)*|.*github\.com\+ACME\+react-unicons.*)$

How can I match an exact string with variable text before and behind the string in PHP?

I' use a small validation script that tells me when a given url is blocked by robots.txt.
For example there is a given url like http://www.example.com/dir/test.html
My current script tells me if the url is blocked, when there is a line in robots.txt like:
Disallow: /test1.html
But it also says that the url is blocked when there are lines like:
Disallow: /tes
Thats wrong.
I googled something like "regex exact string" and found lots of solutions for the problem above.
But this leads to another problem. When I check exact string in an url http://www.example.com/dir/test1/page.html and in robots.txt is a line like
Disallow: /test1/page.html
My script doesn't get it because it looks for
Disallow: /dir/test1/page.html
And says: That the target page.html is not blocked - but it is!
How can I match an exact string with variable text before and behind the string?
Here is the short-version of the script:
/* example for $rules */
$rules = array("/tes", "/test", "/test1", "/test/page.html", "/test1/page.html", "/dir/test1/page.html")
/*example for $parsed['path']:*/
"dir/test.html"
"dir/test1/page.html"
"test1/page.html"
foreach ($rules as $rule) {
// check if page is disallowed to us
if (preg_match("/^$rule/", $parsed['path']))
return false;
}
EDIT:
This is the whole function:
function robots_allowed($url, $useragent = false) {
// parse url to retrieve host and path
$parsed = parse_url($url);
$agents = array(preg_quote('*'));
if ($useragent)
$agents[] = preg_quote($useragent);
$agents = implode('|', $agents);
// location of robots.txt file
$robotstxt = !empty($parsed['host']) ? #file($parsed['scheme'] . "://" . $parsed['host'] . "/robots.txt") : "";
// if there isn't a robots, then we're allowed in
if (empty($robotstxt))
return true;
$rules = array();
$ruleApplies = false;
foreach ($robotstxt as $line) {
// skip blank lines
if (!$line = trim($line))
continue;
// following rules only apply if User-agent matches $useragent or '*'
if (preg_match('/^\s*User-agent: (.*)/i', $line, $match)) {
$ruleApplies = preg_match("/($agents)/i", $match[1]);
}
if ($ruleApplies && preg_match('/^\s*Disallow:(.*)/i', $line, $regs)) {
// an empty rule implies full access - no further tests required
if (!$regs[1])
return true;
// add rules that apply to array for testing
$rules[] = preg_quote(trim($regs[1]), '/');
}
}
foreach ($rules as $rule) {
// check if page is disallowed to us
if (preg_match("/^$rule/", $parsed['path']))
return false;
}
// page is not disallowed
return true;
}
The URL comes from user input.
Try everything at once, avoid the array.
/(?:\/?dir\/)?\/?tes(?:(?:t(?:1)?)?(?:\.html|(?:\/page\.html)?))/
https://regex101.com/r/VxL30W/1
(?: /?dir / )?
/?tes
(?:
(?:
t
(?: 1 )?
)?
(?:
\.html
|
(?: /page \. html )?
)
)
I've found a solution to match /test or /test/hello or /test/ but not to match /testosterone or /hellotest:
(?:\/test$|\/test\/)
With PHP-Variables:
if (preg_match("/(?:" . $rule . "$|" . $rule . "\/)/", $parsed['path']))
Based on the funktion above.
https://regex101.com/r/DFVR5T/3
Can I use (?:\/ ...) or is that wrong?

Angular Input Restriction Directive - Negating Regular Expressions

EDIT: Please feel free to add additional validations that would be useful for others, using this simple directive.
--
I'm trying to create an Angular Directive that limits the characters input into a text box. I've been successful with a couple common use cases (alphbetical, alphanumeric and numeric) but using popular methods for validating email addresses, dates and currency I can't get the directive to work since I need it negate the regex. At least that's what I think it needs to do.
Any assistance for currency (optional thousand separator and cents), date (mm/dd/yyyy) and email is greatly appreciated. I'm not strong with regular expressions at all.
Here's what I have currently:
http://jsfiddle.net/corydorning/bs05ys69/
HTML
<div ng-app="example">
<h1>Validate Directive</h1>
<p>The Validate directive allow us to restrict the characters an input can accept.</p>
<h3><code>alphabetical</code> <span style="color: green">(works)</span></h3>
<p>Restricts input to alphabetical (A-Z, a-z) characters only.</p>
<label><input type="text" validate="alphabetical" ng-model="validate.alphabetical"/></label>
<h3><code>alphanumeric</code> <span style="color: green">(works)</span></h3>
<p>Restricts input to alphanumeric (A-Z, a-z, 0-9) characters only.</p>
<label><input type="text" validate="alphanumeric" ng-model="validate.alphanumeric" /></label>
<h3><code>currency</code> <span style="color: red">(doesn't work)</span></h3>
<p>Restricts input to US currency characters with comma for thousand separator (optional) and cents (optional).</p>
<label><input type="text" validate="currency.us" ng-model="validate.currency" /></label>
<h3><code>date</code> <span style="color: red">(doesn't work)</span></h3>
<p>Restricts input to the mm/dd/yyyy date format only.</p>
<label><input type="text" validate="date" ng-model="validate.date" /></label>
<h3><code>email</code> <span style="color: red">(doesn't work)</span></h3>
<p>Restricts input to email format only.</p>
<label><input type="text" validate="email" ng-model="validate.email" /></label>
<h3><code>numeric</code> <span style="color: green">(works)</span></h3>
<p>Restricts input to numeric (0-9) characters only.</p>
<label><input type="text" validate="numeric" ng-model="validate.numeric" /></label>
JavaScript
angular.module('example', [])
.directive('validate', function () {
var validations = {
// works
alphabetical: /[^a-zA-Z]*$/,
// works
alphanumeric: /[^a-zA-Z0-9]*$/,
// doesn't work - need to negate?
// taken from: http://stackoverflow.com/questions/354044/what-is-the-best-u-s-currency-regex
currency: /^[+-]?[0-9]{1,3}(?:,?[0-9]{3})*(?:\.[0-9]{2})?$/,
// doesn't work - need to negate?
// taken from here: http://stackoverflow.com/questions/15196451/regular-expression-to-validate-datetime-format-mm-dd-yyyy
date: /(?:0[1-9]|1[0-2])\/(?:0[1-9]|[12][0-9]|3[01])\/(?:19|20)[0-9]{2}/,
// doesn't work - need to negate?
// taken from: http://stackoverflow.com/questions/46155/validate-email-address-in-javascript
email: /^([\w-]+(?:\.[\w-]+)*)#((?:[\w-]+\.)*\w[\w-]{0,66})\.([a-z]{2,6}(?:\.[a-z]{2})?)$/i,
// works
numeric: /[^0-9]*$/
};
return {
require: 'ngModel',
scope: {
validate: '#'
},
link: function (scope, element, attrs, modelCtrl) {
var pattern = validations[scope.validate] || scope.validate
;
modelCtrl.$parsers.push(function (inputValue) {
var transformedInput = inputValue.replace(pattern, '')
;
if (transformedInput != inputValue) {
modelCtrl.$setViewValue(transformedInput);
modelCtrl.$render();
}
return transformedInput;
});
}
};
});
I am pretty sure, there is better way, probably regex is also not best tool for that, but here is mine proposition.
This way you can only restrict which characters are allowed for input and to force user to use proper format, but you will need to also validate final input after user will finish typing, but this is another story.
The alphabetic, numeric and alphanumeric are quite simple, for input and validating input, as it is clear what you can type, and what is a proper final input. But with dates, mails, currency, you cannot validate input with regex for full valid input, as user need to type it in first, and in a meanwhile the input need to by invalid in terms of final valid input. So, this is one thing to for example restrict user to type just digits and / for a date format, like: 12/12/1988, but in the end you need to check if he typed proper date or just 12/12/126 for example. This need to be checked when answer is submited by user, or when text field lost focus, etc.
To just validate typed character, you can try with this:
JSFiddle DEMO
First change:
var transformedInput = inputValue.replace(pattern, '')
to
var transformedInput = inputValue.replace(pattern, '$1')
then use regular expressions:
/^([a-zA-Z]*(?=[^a-zA-Z]))./ - alphabetic
/^([a-zA-Z0-9]*(?=[^a-zA-Z0-9]))./ - alphanumeric
/(\.((?=[^\d])|\d{2}(?![^,\d.]))|,((?=[^\d])|\d{3}(?=[^,.$])|(?=\d{1,2}[^\d]))|\$(?=.)|\d{4,}(?=,)).|[^\d,.$]|^\$/- currency (allow string like: 343243.34, 1,123,345.34, .05 with or without $)
^(((0[1-9]|1[012])|(\d{2}\/\d{2}))(?=[^\/])|((\d)|(\d{2}\/\d{2}\/\d{1,3})|(.+\/))(?=[^\d])|\d{2}\/\d{2}\/\d{4}(?=.)).|^(1[3-9]|[2-9]\d)|((?!^)(3[2-9]|[4-9]\d)\/)|[3-9]\d{3}|2[1-9]\d{2}|(?!^)\/\d\/|^\/|[^\d/] - date (00-12/00-31/0000-2099)
/^(\d*(?=[^\d]))./ - numeric
/^([\w.$-]+\#[\w.]+(?=[^\w.])|[\w.$-]+\#(?=[^\w.-])|[\w.#-]+(?=[^\w.$#-])).$|\.(?=[^\w-#]).|[^\w.$#-]|^[^\w]|\.(?=#).|#(?=\.)./i - email
Generally, it use this pattern:
([valid characters or structure] captured in group $1)(?= positive lookahead for not allowed characters) any character
in effect it will capture all valid character in group $1, and if user type in an invalid character, whole string is replaced with already captured valid characters from group $1. It is complemented by part which shall exclude some obvious invalid character(s), like ## in a mail, or 34...2 in currency.
With understanding how these regular expression works, despite that it looks quite complex, I think it easy to extend it, by adding additional allowed/not allowed characters.
Regular expression for validating currency, dates and mails are easy to find, so I find it redundant to post them here.
OffTopic. Whats more the currency part in your demo is not working, it is bacause of: validate="currency.us" instead of validate="currency", or at least it works after this modification.
In my opinion it is impossible to create regular expressions that will work for matching things like dates or emails with the
parser you use. This is mainly because you would need non-capturing groups in your
regular expressions (which is possible), which are not replaced by the
inputValue.replace(pattern, '') call you have in your parser function. And this is the
part that is not possible in JavaScript. JavaScript replaces what you put in non-capturing
groups as well.
So... you'll need to go for a different approach. I would suggest to go for positive
regular expressions, which will yield a match when the input is valid.
Then you need of course to change the code of your parser. You could for instance
decide to chop off characters from the end of the input text until what remains passes
the regular expression test. This you could code as follows:
modelCtrl.$parsers.push(function (inputValue) {
var transformedInput = inputValue;
while (transformedInput && !pattern.exec(transformedInput)) {
// validation fails: chop off last character and try again
transformedInput = transformedInput.slice(0, -1);
}
if (transformedInput !== inputValue) {
modelCtrl.$setViewValue(transformedInput);
modelCtrl.$render();
}
return transformedInput;
});
Now life has become a bit easier. Just pay attention that you make your regular
expressions in such a way that they do not reject partial input. So "01/" should be
considered valid for a date, otherwise the user can never get to type in a date. On
the other hand, as soon as it becomes clear that adding characters will no longer
allow for valid input, the regular expression should reject it. So "101" should be
rejected as a date, as you can never add characters at the end to make it a valid date.
Also, all of these regular expressions should check the whole input, so as a consequence
they need to make use of the ^ and $ symbols.
Here is what the regular expression for a (partial) date could look like:
^([0-9]{0,2}|[0-9]{2}[\/]([0-9]{0,2}|[0-9]{2}[\/][0-9]{0,4}))$
This means: an input of 0 to 2 digits is valid, or exactly 2 digits followed by a slash, followed by either:
0 to 2 digits, or
exactly 2 digits followed by a slash, followed by 0 to 4 digits
Admittedly, not as smart as the one you had found, but that one would need a lot of editing to allow for partially entered dates. It is possible, but
it represents a very long expression with a lot of brackets and |.
Once you have all the regular expressions set up, you could think to further improve
the parser. One idea would be to not let it chop off characters from the end, but to
let it test all strings with one character removed somewhere compared to the original,
and see which one passes the test. If there is no way found to remove one character and have
success, then remove two consecutive characters in any place of the input value,
then three, ... etc, until you find a value that passes the test or arrive at an empty value.
This will work better for cases where the user inserts characters half way their input.
Just an idea...
import { Directive, ElementRef, EventEmitter, HostListener, Input, Output, Renderer2 } from '#angular/core';
import { ControlValueAccessor, NG_VALUE_ACCESSOR } from '#angular/forms';
import { CurrencyPipe, DecimalPipe } from '#angular/common';
import { ValueChangeEvent } from '#goomTool/goom-elements/events/value-change-event.model';
const noOperation = () => {
};
#Directive({
selector: '[formattedNumber]',
providers: [{
provide: NG_VALUE_ACCESSOR,
useExisting: FormattedNumberDirective,
multi: true
}]
})
export class FormattedNumberDirective implements ControlValueAccessor {
#Input() public configuration;
#Output() public valueChange: EventEmitter<ValueChangeEvent> = new EventEmitter();
public locale: string = process.env.LOCALE;
private el: HTMLInputElement;
// Keeps track of the value without formatting
private innerInputValue: any;
private specialKeys: string[] =
['Backspace', 'Tab', 'End', 'Home', 'Enter', 'Shift', 'ArrowRight', 'ArrowLeft', 'Delete'];
private onTouchedCallback: () => void = noOperation;
private onChangeCallback: (a: any) => void = noOperation;
constructor(private elementRef: ElementRef,
private decimalPipe: DecimalPipe,
private currencyPipe: CurrencyPipe,
private renderer: Renderer2) {
this.el = elementRef.nativeElement;
}
public writeValue(value: any) {
if (value !== this.innerInputValue) {
if (!!value) {
this.renderer.setAttribute(this.elementRef.nativeElement, 'value', this.getFormattedValue(value));
}
this.innerInputValue = value;
}
}
public registerOnChange(fn: any) {
this.onChangeCallback = fn;
}
public registerOnTouched(fn: any) {
this.onTouchedCallback = fn;
}
// On Focus remove all non-digit ,display actual value
#HostListener('focus', ['$event.target.value'])
public onfocus(value) {
if (!!this.innerInputValue) {
this.el.value = this.innerInputValue;
}
}
// On Blur set values to pipe format
#HostListener('blur', ['$event.target.value'])
public onBlur(value) {
this.innerInputValue = value;
if (!!value) {
this.el.value = this.getFormattedValue(value);
}
}
/**
* Allows special key, Unit Interval, value based on regular expression
*
* #param event
*/
#HostListener('keydown', ['$event'])
public onKeyDown(event) {
// Allow Backspace, tab, end, and home keys . .
if (this.specialKeys.indexOf(event.key) !== -1) {
if (event.key === 'Backspace') {
this.updateValue(this.getBackSpaceValue(this.el.value, event));
}
if (event.key === 'Delete') {
this.updateValue(this.getDeleteValue(this.el.value, event));
}
return;
}
const next: string = this.concatAtIndex(this.el.value, event);
if (this.configuration.angularPipe && this.configuration.angularPipe.length > 0) {
if (!this.el.value.includes('.')
&& (this.configuration.min == null || this.configuration.min < 1)) {
if (next.startsWith('0') || next.startsWith('0.') || next.startsWith('.')) {
if (next.length > 1) {
this.updateValue(next);
}
return;
}
}
}
/* pass your pattern in component regex e.g.
* regex = new RegExp(RegexPattern.WHOLE_NUMBER_PATTERN)
*/
if (next && !String(next).match(this.configuration.regex)) {
event.preventDefault();
return;
}
if (!!this.configuration.minFractionDigits && !!this.configuration.maxFractionDigits) {
if (!!next.split('\.')[1] && next.split('\.')[1].length > this.configuration.minFractionDigits) {
return this.validateFractionDigits(next, event);
}
}
this.innerInputValue = next;
this.updateValue(next);
}
private updateValue(newValue) {
this.onTouchedCallback();
this.onChangeCallback(newValue);
if (newValue) {
this.renderer.setAttribute(this.elementRef.nativeElement, 'value', newValue);
}
}
private validateFractionDigits(next, event) {
// create real-time pattern to validate min & max fraction digits
const regex = `^[-]?\\d+([\\.,]\\d{${this.configuration.minFractionDigits},${this.configuration.maxFractionDigits}})?$`;
if (!String(next).match(regex)) {
event.preventDefault();
return;
}
this.updateValue(next);
}
private concatAtIndex(current: string, event) {
return current.slice(0, event.currentTarget.selectionStart) + event.key +
current.slice(event.currentTarget.selectionEnd);
}
private getBackSpaceValue(current: string, event) {
return current.slice(0, event.currentTarget.selectionStart - 1) +
current.slice(event.currentTarget.selectionEnd);
}
private getDeleteValue(current: string, event) {
return current.slice(0, event.currentTarget.selectionStart) +
current.slice(event.currentTarget.selectionEnd + 1);
}
private transformCurrency(value) {
return this.currencyPipe.transform(value, this.configuration.currencyCode, this.configuration.display,
this.configuration.digitsInfo, this.locale);
}
private transformDecimal(value) {
return this.decimalPipe.transform(value, this.configuration.digitsInfo, this.locale);
}
private transformPercent(value) {
return this.decimalPipe.transform(value, this.configuration.digitsInfo, this.locale) + ' %';
}
private getFormattedValue(value) {
switch (this.configuration.angularPipe) {
case ('decimal'): {
return this.transformDecimal(value);
}
case ('currency'): {
return this.transformCurrency(value);
}
case ('percent'): {
return this.transformPercent(value);
}
default: {
return value;
}
}
}
}
----------------------------------
export const RegexPattern = Object.freeze({
PERCENTAGE_PATTERN: '^([1-9]\\d*(\\.)\\d*|0?(\\.)\\d*[1-9]\\d*|[1-9]\\d*)$', // e.g. '.12% ' or 12%
DECIMAL_PATTERN: '^(([-]+)?([1-9]\\d*(\\.|\\,)\\d*|0?(\\.|\\,)\\d*[1-9]\\d*|[1-9]\\d*))$', // e.g. '123.12'
CURRENCY_PATTERN: '\\$?[-]?[0-9]{1,3}(?:,?[0-9]{3})*(?:\\.[0-9]{2})?$', // e.g. '$123.12'
KEY_PATTERN: '^[a-zA-Z\\-]+-[0-9]+', // e.g. ABC-1234
WHOLE_NUMBER_PATTERN: '^([-]?([1-9][0-9]*)|([0]+)$)$' // e.g 1234
});

Regular expression for { } object for

I have a huge file with a lot of stuff that looks like:
skaune.malmo = rsr.path("m 640.4,516.9 3.9,-2.8 23.8,1.2 13.6,-5.2 3,-3.9 17.4,-1.8 1.2,-3.4 9.7,-4.6 0.5,-1.5 -7.3,-10.3 12.5,-4.5 -2.3,-4.1 3.9,-6.1 -3.4,-11.3 -2.7,-4.6 -2.3,-1.1 -6.4,3.4 -7.2,-0.7 -1.2,-6.8 -5.6,1.1 0,-10.6 -6.8,-5.7 0,-1.1 -11.4,-4.1 -4.1,4.1 0,2.3 1.9,1.5 0,3 -1.9,1.1 -1.1,1.6 -7.2,3 -5.3,0.4 -2.7,1.1 -5.2,5.7 -3.4,3.4 -10.2,5.2 -2.7,3.4 -3,4.6 -1.1,4.5 0,6.8 0,5.7 0.7,5 5.6,12.4 3.4,9.8 3.4,3.9").attr({id: 'path4',parent: 'Skanska_kommuner',fill: '#5eacdd','stroke-width': '0','stroke-opacity': '1'}).transform("t-40.677966,-76.271186").data('id', 'path4');
skaune.bjuv = rsr.path("m 643.8,137 -6.3,-2.8 -2.3,6.8 0,2.8 4.1,10.8 5,8 4.1,0 -0.7,7.3 -3.9,4 3.9,1.2 5.2,12.4 0,1.2 -1.1,2.7 -10.9,-1.1 -1.6,1.8 2.7,3.4 3,-1.2 1.1,5.7 -1.1,5.7 2.7,2.2 -2.7,-0.6 0,0.6 -0.7,5.7 3.4,1.2 1.1,-3 9.8,7.5 2.3,-3.4 3.8,2.3 7.5,-3 7.9,-4.5 -3.4,-11.8 10.2,5 1.6,-2.8 -6.8,-6.3 4.1,-11.3 -4.5,-5 5,-4.6 5.2,0.5 3.8,-8.4 1.8,-5.2 6.9,0 -2.3,-11.4 -3.4,0.7 -4.6,-3.4 -12.4,-14.7 -5.2,2.2 -14.8,-8.6 -3.8,0.7 -5.3,0 -1.5,0 1.5,3.9 -3.4,0 -5,6.8").attr({id: 'path6',parent: 'Skanska_kommuner',fill: '#5eacdd','stroke-width': '0','stroke-opacity': '1'}).transform("t-40.677966,-76.271186").data('id', 'path6');
skaune.astarp = rsr.path("m 713,149.4 3.9,-5.6 -8,-5 8,-13.2 -8,-8.4 -5.6,-18.1 -10.7,2.3 0,-6.8 -7.5,-9.1 -7.2,3.8 -9.8,0 0,3 -6.8,16.3 -3.8,4.5 -13.7,2.3 0,-3.4 -5.6,0.7 0,3.9 -14.8,3.3 1.2,5.7 -1.2,1.8 -1.8,2.8 1.8,4 1.2,2.8 2.7,1.8 7.9,3.4 1.8,-8 6.8,2.8 5,-6.8 3.4,0 -1.5,-3.9 1.5,0 5.3,0 3.8,-0.7 14.8,8.6 5.2,-2.2 12.4,14.7 8.4,6.8 0,-4.1 10.9,0").attr({id: 'path8',parent: 'Skanska_kommuner',fill: '#5eacdd','stroke-width': '0','stroke-opacity': '1'}).transform("t-40.677966,-76.271186").data('id', 'path8');
skaune.orkelljunga = rsr.path("m 753.8,1.6 10.7,-5.7 9.8,-9.1 0,-6.3 18.1,-16.4 15.9,-2.2 13.6,-15.5 19.3,-1.5 7.9,-6.4 12.5,14.8 4.5,-3.4 10.2,5.2 13.6,-3 6.8,-3.8 0.5,2.2 -7.3,1.6 -0.7,9.8 -20.4,10.6 -5,16.6 -4.5,1.6 6.1,25.6 -6.8,2.3 1.8,5.6 -4,2.3 -3.9,-9.1 -7.9,-1.1 -4.6,7.9 -4.5,0.5 0,9.1 -4.5,6.8 -25.7,-2.7 -27.6,32.8 -14.3,-2.2 -3.9,3.8 0,6.4 -6.8,6.1 -8.6,-3.9 1.8,-8.6 -7.9,0.7 -3.4,-7.9 -9.1,-5.7 -2.3,-7.9 11.4,-4.6 -10.9,-5.2 2.9,-3.8 1.2,0 6.8,-1.2 9,-9.1 10.2,-17.6 0,-8.4").attr({id: 'path10',parent: 'Skanska_kommuner',fill: '#5eacdd','stroke-width': '0','stroke-opacity': '1'}).transform("t-40.677966,-76.271186").data('id', 'path10');
and so on... It has a function called .attr({/alot of different stuff i want to delete/}) and I want to replace that with a variable called attr.(style). So everything inside { ... } should be replaced with style. How do I do that? What is the regexp string for { and everything inside }?
Any help appreciated.
You haven't said what kind of regex, but in most types I know you're looking for:
attr\({[^}]+}\)
...and replacing with attr(style) (or attr.(style), if that . after attr wasn't a typo). Depending on regex flavor, you may have to add or remove some backslashes there (for instance, with vim's default magic settings, I believe it would be attr({[^}]\+})). Basically:
Match attr({ literally
Match all characters within the {} using [^}]+
Match }) literally

Unicode Regex; Invalid XML characters

The list of valid XML characters is well known, as defined by the spec it's:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
My question is whether or not it's possible to make a PCRE regular expression for this (or its inverse) without actually hard-coding the codepoints, by using Unicode general categories. An inverse might be something like [\p{Cc}\p{Cs}\p{Cn}], except that improperly covers linefeeds and tabs and misses some other invalid characters.
I know this isn't exactly an answer to your question, but it's helpful to have it here:
Regular Expression to match valid XML Characters:
[\u0009\u000a\u000d\u0020-\uD7FF\uE000-\uFFFD]
So to remove invalid chars from XML, you'd do something like
// filters control characters but allows only properly-formed surrogate sequences
private static Regex _invalidXMLChars = new Regex(
#"(?<![\uD800-\uDBFF])[\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|[\x00-\x08\x0B\x0C\x0E-\x1F\x7F-\x9F\uFEFF\uFFFE\uFFFF]",
RegexOptions.Compiled);
/// <summary>
/// removes any unusual unicode characters that can't be encoded into XML
/// </summary>
public static string RemoveInvalidXMLChars(string text)
{
if (string.IsNullOrEmpty(text)) return "";
return _invalidXMLChars.Replace(text, "");
}
I had our resident regex / XML genius, he of the 4,400+ upvoted post, check this, and he signed off on it.
For systems that internally stores the codepoints in UTF-16, it is common to use surrogate pairs (xD800-xDFFF) for codepoints above 0xFFFF and in those systems you must verify if you really can use for example \u12345 or must specify that as a surrogate pair. (I just found out that in C# you can use \u1234 (16 bit) and \U00001234 (32-bit))
According to Microsoft "the W3C recommendation does not allow surrogate characters inside element or attribute names." While searching W3s website I found C079 and C078 that might be of interest.
I tried this in java and it works:
private String filterContent(String content) {
return content.replaceAll("[^\\u0009\\u000a\\u000d\\u0020-\\uD7FF\\uE000-\\uFFFD]", "");
}
Thank you Jeff.
The above solutions didn't work for me if the hex code was present in the xml. e.g.
<element></element>
The following code would break:
string xmlFormat = "<element>{0}</element>";
string invalid = " ";
string xml = string.Format(xmlFormat, invalid);
xml = Regex.Replace(xml, #"[\x01-\x08\x0B\x0C\x0E\x0F\u0000-\u0008\u000B\u000C\u000E-\u001F]", "");
XDocument.Parse(xml);
It returns:
XmlException: '', hexadecimal value 0x08, is an invalid character.
Line 1, position 14.
The following is the improved regex and fixed the problem mentioned above:
&#x([0-8BCEFbcef]|1[0-9A-Fa-f]);|[\x01-\x08\x0B\x0C\x0E\x0F\u0000-\u0008\u000B\u000C\u000E-\u001F]
Here is a unit test for the first 300 unicode characters and verifies that only invalid characters are removed:
[Fact]
public void validate_that_RemoveInvalidData_only_remove_all_invalid_data()
{
string xmlFormat = "<element>{0}</element>";
string[] allAscii = (Enumerable.Range('\x1', 300).Select(x => ((char)x).ToString()).ToArray());
string[] allAsciiInHexCode = (Enumerable.Range('\x1', 300).Select(x => "&#x" + (x).ToString("X") + ";").ToArray());
string[] allAsciiInHexCodeLoweCase = (Enumerable.Range('\x1', 300).Select(x => "&#x" + (x).ToString("x") + ";").ToArray());
bool hasParserError = false;
IXmlSanitizer sanitizer = new XmlSanitizer();
foreach (var test in allAscii.Concat(allAsciiInHexCode).Concat(allAsciiInHexCodeLoweCase))
{
bool shouldBeRemoved = false;
string xml = string.Format(xmlFormat, test);
try
{
XDocument.Parse(xml);
shouldBeRemoved = false;
}
catch (Exception e)
{
if (test != "<" && test != "&") //these char are taken care of automatically by my convertor so don't need to test. You might need to add these.
{
shouldBeRemoved = true;
}
}
int xmlCurrentLength = xml.Length;
int xmlLengthAfterSanitize = Regex.Replace(xml, #"&#x([0-8BCEF]|1[0-9A-F]);|[\u0000-\u0008\u000B\u000C\u000E-\u001F]", "").Length;
if ((shouldBeRemoved && xmlCurrentLength == xmlLengthAfterSanitize) //it wasn't properly Removed
||(!shouldBeRemoved && xmlCurrentLength != xmlLengthAfterSanitize)) //it was removed but shouldn't have been
{
hasParserError = true;
Console.WriteLine(test + xml);
}
}
Assert.Equal(false, hasParserError);
}
Another way to remove incorrect XML chars in C# with using XmlConvert.IsXmlChar Method (Available since .NET Framework 4.0)
public static string RemoveInvalidXmlChars(string content)
{
return new string(content.Where(ch => System.Xml.XmlConvert.IsXmlChar(ch)).ToArray());
}
or you may check that all characters are XML-valid.
public static bool CheckValidXmlChars(string content)
{
return content.All(ch => System.Xml.XmlConvert.IsXmlChar(ch));
}
.Net Fiddle - https://dotnetfiddle.net/v1TNus
For example, the vertical tab symbol (\v) is not valid for XML, it is valid UTF-8, but not valid XML 1.0, and even many libraries (including libxml2) miss it and silently output invalid XML.
In PHP the regex would look like the following way:
protected function isStringValid($string)
{
$regex = '/[^\x{9}\x{a}\x{d}\x{20}-\x{D7FF}\x{E000}-\x{FFFD}\x{10000}-\x{10FFFF}]+/u';
return (preg_match($regex, $string, $matches) === 0);
}
This would handle all 3 ranges from the xml specification:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]