Remove special characters using Pentaho - Replace in String - regex

I wanted to remove the special characters like ! # # $ % ^ * _ = + | \ } { [ ] : ; < > ? / in a string field.
I used the "Replace in String" step and enabled the use RegEx. However, I do not know the right syntax that I will put in "Search" to remove all these characters from the string. If I only put one character in the "Search" it was removed from the string. How can I remove all of these??
This is the picture of how I did it:

As per documentation, the regex flavor is Java. You may use
\p{Punct}
See the Java regex syntax reference:
\p{Punct} Punctuation: One of !"#$%&'()*+,-./:;<=>?#[]^_`{|}~

Related

Regex to evaluate phone number in Pentaho [duplicate]

I wanted to remove the special characters like ! # # $ % ^ * _ = + | \ } { [ ] : ; < > ? / in a string field.
I used the "Replace in String" step and enabled the use RegEx. However, I do not know the right syntax that I will put in "Search" to remove all these characters from the string. If I only put one character in the "Search" it was removed from the string. How can I remove all of these??
This is the picture of how I did it:
As per documentation, the regex flavor is Java. You may use
\p{Punct}
See the Java regex syntax reference:
\p{Punct} Punctuation: One of !"#$%&'()*+,-./:;<=>?#[]^_`{|}~

Regular expression not working with \ and ]

I have a regex for validating a password which has to be at least 8 chars and must contain letter(upper and lower case) number and a special character from set ^ $ * . [ ] { } ( ) ? - " ! # # % & / \ , > < ' : ; | _ ~ ` .
I face 2 problems, after adding the / to the reg exp its not recognized (other characters are still working OK. If I add the /] as well the expression no longer works (everything is invalid though the pattern seems to be ok in the browser debug mode).
The regex string
static get PASSWORD_VALIDATION_REGEX(): string {
return '(?=.*[a-z])(?=.*[0-9])(?=.*[A-Z])' + // contains lowercase number uppercase
'(?=.*[\-~\$#!%#<>\|\`\\\/\[;:=\+\{\}\.\(\)*^\?&\"\,\'])' + // special
'.{8,}'; // more than allowed char
}
I used the regexp as a form validator and as a match in function
password: ['', {validators: [Validators.required,
Validators.pattern(StringUtils.PASSWORD_VALIDATION_REGEX)
],
updateOn: 'change'
}
]
//....
value.match(StringUtils.PASSWORD_VALIDATION_REGEX)
Tried to use only (?=.*[\\]) for the special chars list, in that case I've received a console error
Invalid regular expression: /^(?=.*[a-z])(?=.*[0-9])(?=.*[A-Z])(?=.*[\]).{8,}$/: Unterminated character class
For '(?=.*[\]])' no console error but the following error is present in the form validation 'pattern'
actualValue: "AsasassasaX000[[][]"
requiredPattern: "^(?=.*[a-z])(?=.*[0-9])(?=.*[A-Z])(?=.*[]]).{8,}$"
The same value and pattern fails on https://regex101.com/
Thanks for your help / suggestions in advance!
You have overescaped your pattern and failed to escape the ] char correctly. In JavaScript regex, ] inside a character class must be escaped.
If you are confused with how to define escapes inside a string literal (and it is rather confusing indeed), you should use a regex literal. One thing to remember about the regex use with Validators.pattern is that the string pattern is anchored by the Angular framework by enclosing the whole pattern with ^ and $, so these anchors must be present when you define the pattern as a regex literal.
Use
static get PASSWORD_VALIDATION_REGEX(): string {
return /^(?=.*[a-z])(?=.*[0-9])(?=.*[A-Z])(?=.*[-~$#!%#<>|`\\\/[\];:=+{}.()*^?&",']).{8,}$/;
}
Note the \] that matches a ] char and \\ to match \ inside [...].

How to filter a string for invalid filename characters using regex

My problem is that I don't want the user to type in anything wrong so I am trying to remove it and my problem is that I made a regex which removes everything except words and that also remove . , - but I need these signs to make the user happy :D
In a short summary: This Script removes bad characters in an input field using a regex.
Input field:
$CustomerInbox = New-Object System.Windows.Forms.TextBox #initialization -> initializes the input box
$CustomerInbox.Location = New-Object System.Drawing.Size(10,120) #Location -> where the label is located in the window
$CustomerInbox.Size = New-Object System.Drawing.Size(260,20) #Size -> defines the size of the inputbox
$CustomerInbox.MaxLength = 30 #sets max. length of the input box to 30
$CustomerInbox.add_TextChanged($CustomerInbox_OnTextEnter)
$objForm.Controls.Add($CustomerInbox) #adding -> adds the input box to the window
Function:
$ResearchGroupInbox_OnTextEnter = {
if ($ResearchGroupInbox.Text -notmatch '^\w{1,6}$') { #regex (Regular Expression) to check if it does match numbers, words or non of them!
$ResearchGroupInbox.Text = $ResearchGroupInbox.Text -replace '\W' #replaces all non words!
}
}
Bad Characters I don't want to appear:
~ " # % & * : < > ? / \ { | } #those are the 'bad characters'
Note that if you want to replace invalid file name chars, you could leverage the solution from How to strip illegal characters before trying to save filenames?
Answering your question, if you have specific characters, put them into a character class, do not use a generic \W that also matches a lot more characters.
Use
[~"#%&*:<>?/\\{|}]+
See the regex demo
Note that all these chars except for \ do not need escaping inside a character class. Also, adding the + quantifier (matches 1 or more occurrences of the quantified subpattern) streamlines the replacing process (matches whole consecutive chunks of characters and replaced all of them at once with the replacement pattern (here, empty string)).
Note you may also need to account for filenames like con, lpt1, etc.
To ensure the filename is valid, you should use the GetInvalidFileNameChars .NET method to retrieve all invalid character and use a regex to check whether the filename is valid:
[regex]$containsInvalidCharacter = '[{0}]' -f ([regex]::Escape([System.IO.Path]::GetInvalidFileNameChars()))
if ($containsInvalidCharacter.IsMatch(($ResearchGroupInbox.Text)))
{
# filename is invalid...
}
$ResearchGroupInbox.Text -replace '~|"|#|%|\&|\*|:|<|>|\?|\/|\\|{|\||}'
Or as #Wiketor suggest you can obviate it to '[~"#%&*:<>?/\\{|}]+'

Regex check if a file has any extension

I am looking for a regex to test if a file has any extension. I define it as: file has an extension if there is no slashes present after the last ".". The slashes are always backslashes.
I started with this regex
.*\..*[^\\]
Which translates to
.* Any char, any number of repetitions
\. Literal .
.* Any char, any number of repetitions
[^\\] Any char that is NOT in a class of [single slash]
This is my test data (excluding ##, which is my comments)
\path\foo.txt ## I only want to capture this line
\pa.th\foo ## But my regex also captures this line <-- PROBLEM HERE
\path\foo ## This line is correctly filtered out
What would be a regex to do this?
Your solution is almost correct. Use this:
^.*\.[^\\]+$
Sample at rubular.
I wouldn't use a regular expression here. I'd split on / and ..
var path = '\some\path\foo\bar.htm',
hasExtension = path.split('\').pop().split('.').length > 1;
if (hasExtension) console.log('Weee!');
Here goes a more simple function to check it.
const hasExtension = path => {
const lastDotIndex = path.lastIndexOf('.')
return lastDotIndex > 1 && path.length - 1 > lastDotIndex
}
if (hasExtension(path)) console.log('Sweet')
You can also try even more simpler approach:
(\.[^\\]+)$
Details:
$ = Look from the end of string
[^\\]+ = Any character except path separator one or more time
\. = looks for <dot> character before extension
Live Demo

problem in not replaceing minus sign(-) with a blank using regex

I am using this regex expression to replace some characters with ""
I used it as
query=query.replace(/[^a-zA-Z 0-9 * ? : . + - ^ "" _]+/g,'');
But when my query is as +White+Diamond, i get result +White+Diamond, but when query is -White+diamond i am getting White+diamond, it means - is replaced by "" that i don't want.
Please tell me what is the problem.
In regex, - means "from ... to ...", escape your - with a backslash: \-.
What SteeveDroz said:
query=query.replace(/[^a-zA-Z0-9*?:.+\-^"_ ]+/g,'');
I'm assuming you want to exclude spaces as well. If not, remove the final space from the character class.