Regex to match certain characters anywhere between two characters - regex

I want to detect (and return) any punctuation within brackets. The line of text I'm looking at will have multiple sets of brackets (which I can assume are properly formatted). So given something like this:
[abc.] [!bc]. [.e.g] [hi]
I'd want to detect all those cases and return something like [[.], [!], [..]].
I tried to do /{.*?([.,!?]+).*?}/g but then it returns true for [hello], [hi] which I don't want to match!
I'm using JS!

You can match substrings between square brackets and then remove all chars that are not punctuation:
const text = '[abc.] [!bc]. [.e.g]';
const matches = text.match(/\[([^\][]*)]/g).map(x => `[${x.replace(/[^.,?!]/g, '')}]`)
console.log(matches);
If you need to make your regex fully Unicode aware you can leverage ECMAScript 2018+ compliant solution like
const text = '[abc.] [!bc、]. [.e.g]';
const matches = text.match(/\[([^\][]*)]/g).map(x => `[${x.replace(/[^\p{P}\p{S}]/gu, '')}]`)
console.log(matches);
So,
\[([^\][]*)] matches a string between [ and ] with no other [ and ] inside
.replace(/[^.,?!]/g, '') removes all chars other than ., ,, ? and !
.replace(/[^\p{P}\p{S}]/gu, '') removes all chars other than Unicode punctuation proper and symbols.

Related

How to create a regex consisting of specific characters occupying a specific order, but having different length

Let us say we have the word residence. It could be matched by the following regular expression in js:
residence.match( new RegExp(/[residence]{4,9}/, 'i' ) )
This is fine, but there is a problem for me:
All the letters are interchangeable. This expression could match also:
ceresiden, denceresid, ence etc...
I would like to have the order of the characters preserved. The regex should match strings like:
resid sidence ience rednce etc..
How can I achieve this?
Thanks!
Use lookahead to test the valid strings and then test the length:
var test = [
'ceresiden',
'denceresid',
'ence',
'res',
'den',
'resid',
'sidence',
'ience',
'rednce',
];
console.log(test.map(function (a) {
return a + ' :' + a.match(/^(?=r?e?s?i?d?e?n?c?e?$).{4,9}$/);
}));
So you want them in the same order, but possibly with any number of characters removed? Then just make each character optional: /(r?e?s?i?d?e?n?c?e?){4,9}/i
Note: this regex, and the one you posted, will not match any of the strings you specified, because of the {4,9} quantifier. A string matching the pattern must including substrings matching the "residence"-optional pattern at least 4 times in a row (without spaces, etc.) for it to match.

Regexp to extract studyinstanceuid from dump

I need to capture numbers and dots between brackets on lines containing the string 0020,000d, for example:
I: (0020,000d) UI [1.2.410.200001.1104.20160720104648421 ] # 38, 1 StudyInstanceUID
Using this regexp 0020,000d.*\[([\.0-9]+)\] I can match the needed value only if it doesn't have a space inside the brackets. How can I match the needed value ignoring any other character?.
Edit
If I use this regexp 0020,000d.*\[([\.0-9(\s|^\s))]+)\] I can capture numbers and dots and/or spaces, now if the string contains a space how can I capture in a group everything but the space?.
To clarify, I want to extract the 1.2.410.200001.1104.20160720104648421 string.
Codifying my (apparently helpful) answer from the comments:
You just need to allow zero or more spaces after the numbers-and-dots sequence before the closing bracket:
0020,000d.*\[([.0-9]+) *\]
Also, please note that you don't need to escape a dot in a character class.
Try this
let regex = /(?!\[)[.\d]+(?=[(\s)*\]])/g
let str = 'I: (0020,000d) UI [1.2.410.200001.1104.20160720104648421 ]'
let result = str.match(regex);
console.log(result);

Surrounding one group with special characters in using substitute in vim

Given string:
some_function(inputId = "select_something"),
(...)
some_other_function(inputId = "some_other_label")
I would like to arrive at:
some_function(inputId = ns("select_something")),
(...)
some_other_function(inputId = ns("some_other_label"))
The key change here is the element ns( ... ) that surrounds the string available in the "" after the inputId
Regex
So far, I have came up with this regex:
:%substitute/\(inputId\s=\s\)\(\"[a-zA-Z]"\)/\1ns(/2/cgI
However, when deployed, it produces an error:
E488: Trailing characters
A simpler version of that regex works, the syntax:
:%substitute/\(inputId\s=\s\)/\1ns(/cgI
would correctly inser ns( after finding inputId = and create string
some_other_function(inputId = ns("some_other_label")
Challenge
I'm struggling to match the remaining part of the string, ex. "select_something") and return it as:
"select_something")).
You have many problems with your regex.
[a-zA-Z] will only match one letter. Presumably you want to match everything up to the next ", so you'll need a \+ and you'll also need to match underscores too. I would recommend \w\+. Unless more than [a-zA-Z_] might be in the string, in which case I would do .\{-}.
You have a /2 instead of \2. This is why you're getting E488.
I would do this:
:%s/\(inputId = \)\(".\{-}\)"/\1ns(\2)/cgI
Or use the start match atom: (that is, \zs)
:%s/inputId = \zs\".\{-}"/ns(&)/cgI
You can use a negated character class "[^"]*" to match a quoted string:
%s/\(inputId\s*=\s*\)\("[^"]*"\)/\1ns(\2)/g

Regex to create url friendly string

I want to create a url friendly string (one that will only contain letters, numbers and hyphens) from a user input to :
remove all characters which are not a-z, 0-9, space or hyphens
replace all spaces with hyphens
replace multiple hyphens with a single hyphen
Expected outputs :
my project -> my-project
test project -> test-project
this is # long str!ng with spaces and symbo!s -> this-is-long-strng-with-spaces-and-symbos
Currently i'm doing this in 3 steps :
$identifier = preg_replace('/[^a-zA-Z0-9\-\s]+/','',strtolower($project_name)); // remove all characters which are not a-z, 0-9, space or hyphens
$identifier = preg_replace('/(\s)+/','-',strtolower($identifier)); // replace all spaces with hyphens
$identifier = preg_replace('/(\-)+/','-',strtolower($identifier)); // replace all hyphens with single hyphen
Is there a way to do this with one single regex ?
Yeah, #Jerry is correct in saying that you can't do this in one replacement as you are trying to replace a particular string with two different items (a space or dash, depending on context). I think Jerry's answer is the best way to go about this, but something else you can do is use preg_replace_callback. This allows you to evaluate an expression and act on it according to what the match was.
$string = 'my project
test project
this is # long str!ng with spaces and symbo!s';
$string = preg_replace_callback('/([^A-Z0-9]+|\s+|-+)/i', function($m){$a = '';if(preg_match('/(\s+|-+)/i', $m[1])){$a = '-';}return $a;}, $string);
print $string;
Here is what this means:
/([^A-Z0-9]+|\s+|-+)/i This looks for any one of your three quantifiers (anything that is not a number or letter, more than one space, more than one hyphen) and if it matches any of them, it passes it along to the function for evaluation.
function($m){ ... } This is the function that will evaluate the matches. $m will hold the matches that it found.
$a = ''; Set a default of an empty string for the replacement
if(preg_match('/(\s+|-+)/i', $m[1])){$a = '-';} If our match (the value stored in $m[1]) contains multiple spaces or hyphens, then set $a to a dash instead of an empty string.
return $a; Since this is a function, we will return the value and that value will be plopped into the string wherever it found a match.
Here is a working demo
I don't think there's one way of doing that, but you could reduce the number of replaces and in an extreme case, use a one liner like that:
$text=preg_replace("/[\s-]+/",'-',preg_replace("/[^a-zA-Z0-9\s-]+/",'',$text));
It first removes all non-alphanumeric/space/dash with nothing, then replaces all spaces and multiple dashes with a single one.
Since you want to replace each thing with something different, you will have to do this in multiple iterations.
Sorry D:

RegEx for including alphanumeric and special characters

I have requirement to allow alphanumeric and certain other characters for a field. I am using this regular expression:
"^[a-zA-Z0-9!##$&()-`.+,/\"]*$".
The allowed special characters are! # # $ & ( ) - ‘ . / + , “
But when I test the pattern with a string "test_for_extended_alphanumeric" , the string passes the test. I don't have "_" allowed in the pattern. What am I doing wrong?
You need to escape the hyphen:
"^[a-zA-Z0-9!##$&()\\-`.+,/\"]*$"
If you don't escape it then it means a range of characters, like a-z.
In your character class the )-' is interpreted as a range in the same way as e.g. a-z, it therefore refers to any character with a decimal ASCII code from 41 ) to 96 '.
Since _ has code 95, it is within the range and therefore allowed, as are <, =, > etc.
To avoid this you can either escape the -, i.e. \-, or put the - at either the start or end of the character class:
/^[a-zA-Z0-9!##$&()`.+,/"-]*$/
There is no need to escape the ", and note that because you are using the * quantifier, an empty string will also pass the test.
Using this regex you allow all alphanumeric and special characters. Here \w is allowing all digits and \s allowing space
[><?#+'`~^%&\*\[\]\{\}.!#|\\\"$';,:;=/\(\),\-\w\s+]*
The allowed special characters are ! # # $ & ( ) - ‘ . / + , “ = { } [ ] ? / \ |
Hyphens in character classes denote a range unless they are escaped or at the start or end of the character class. If you want to include hyphens, it's typically a good idea to put them at the front so you don't even have to worry about escaping:
^[-a-zA-Z0-9!##$&()`.+,/\"]*$
By the way, _ does indeed fall between ) and the backtick in ASCII:
http://en.wikipedia.org/wiki/ASCII#ASCII_printable_characters
How about this.. which allows special characters and as well as alpha numeric
"[-~]*$"
Because I don't know how many special characters exist, it is difficult to check the string contains special character by white list. It may be more efficient to check the string contains only alphabet or numbers.
for kotlin example
fun String.hasOnlyAlphabetOrNumber(): Boolean {
val p = Pattern.compile("[^a-zA-Z0-9]")
return !(p.matcher(this).matches())
}
for swift4
func hasOnlyAlphabetOrNumber() -> Bool {
if self.isEmpty { return false }
do {
let pattern = "[^a-zA-Z0-9]"
let regex = try NSRegularExpression(pattern: pattern, options: .caseInsensitive)
return regex.matches(in: self, options: [], range: NSRange(location: 0, length: self.count)).count == 0
} catch {
return false
}
}
Regex sucks. Here is mine
/^[a-zA-Z\d-!##$%^&._"'()+,/;<>=|?[]\`~{}]$/
Mine is a little different than others but it is more self explanatory. You use \ in front of any special symbol like ] or . I had issues with -, , and ] so I had to put ], \, and move the - to the left. I also had issues with | but I moved it left and it fixed it.