How to Trim a Leading and Trailing char in regular expressions? - regex

I have a requirement to trim a leading and trailing character of a fixed length column.
Ex: I have column IdNumber which is of fixed length say 11, with below values
X3343438594
7743438534X
I want to trim the leading and trailing X, and result should look like this.
3343438594
7743438534

Try this:
Search: ^X(?=\d{11}$)|(?<=^\d{11})X$
Replace: <blank>
Regex breakdown:
^X means "start of input then X"
(?=\d{11}$) means "followed by 11 digits then end"
| means "logical OR"
(?<=^\d{11}) means "preceded by start then 11 digits"
X$ means "X then end of input"
You want to delete all matches, so replace them with nothing.

var re = /(?=^X|X$)(([A-Z])(\d{10})(\s)(\d{10})([A-Z]))/;
var str = 'X3343438594 7743438534X';
var subst = '$3$4$5';
var result = str.replace(re, subst);
alert(result);
The regex first asserts that the string should have an X at the beginning or at the end, regardless of the length of your data (not necessarily 11 characters). If that's the case, it tests for a pattern that starts with one letter, followed by 10 digits (totalling 11 characters), then a space, then ten digits followed by one letter (another 11 characters).

Related

Regex: Last Occurrence of a Repeating Character

So, I am looking for a Regex that is able to match with every maximal non-empty substring of consonants followed by a maximal non-empty substring of vowels in a String
e.g. In the following strings, you can see all expected matches:
"zcdbadaerfe" = {"zcdba", "dae", "rfe"}
"foubsyudba" = {"fou", "bsyu", "dba"}
I am very close! This is the regex I have managed to come up with so far:
([^aeiou].*?[aeiou])+
It returns the expected matches except for it only returns the first of any repeating lengths of vowels, for example:
String: "cccaaabbee"
Expected Matches: {"cccaaa", "bbee"}
Actual Matches: {"ccca", "bbe"}
I want to figure out how I can include the last found vowel character that comes before (a) a constant or (b) the end of the string.
Thanks! :-)
Your pattern is slightly off. I suggest using this version:
[b-df-hj-np-tv-z]+[aeiou]+
This pattern says to match:
[b-df-hj-np-tv-z]+ a lowercase non vowel, one or more times
[aeiou]+ followed by a lowercase vowel, one or more times
Here is a working demo.
const rgx = /[^aeiou]+[aeiou]+(?=[^aeiou])|.*[aeiou](?=\b)/g;
Segment
Description
[^aeiou]+
one or more of anything BUT vowels
[aeiou]+
one or more vowels
(?=[^aeiou])
will be a match if it is followed by anything BUT a vowel
|
OR
.*[aeiou](?=\b)
zero or more of any character followed by a vowel and it needs to be followed by a non-word
function lastVowel(str) {
const rgx = /[^aeiou]+[aeiou]+(?=[^aeiou])|.*[aeiou](?=\b)/g;
return [...str.matchAll(rgx)].flat();
}
const str1 = "cccaaabbee";
const str2 = "zcdbadaerfe";
const str3 = "foubsyudba";
console.log(lastVowel(str1));
console.log(lastVowel(str2));
console.log(lastVowel(str3));

Regex for string *11F23H3*: Start and end with *, 7 Uppercase literals or numbers in between

I need to check strings like *11F23H3* that start and end with a *and have 7 uppercase literals or numbers in between. So far I have:
if (!barcode.match('[*A-Z0-9*]')) {
console.error(`ERROR: Barcode not valid`);
process.exitCode = 1;
}
But this does not cover strings like *11111111111*. How would the correct regex look like?
I need to check strings like 11F23H3 that start and end with a *and have 7 uppercase literals or numbers in between
You can use this regex:
/\*[A-Z0-9]{7}\*/
* is regex meta character that needs to be escaped outside character class
[A-Z0-9]{7} will match 7 characters containing uppercase letter or digits
RegEx Demo
Code:
var re = /\*[A-Z0-9]{7}\*/;
if (!re.test(barcode)) {
console.error(`ERROR: Barcode ${barcode} in row ${row} is not valid`);
process.exitCode = 1;
}
Note that if barcode is only going to have this string then you should also use anchors like this to avoid matching any other text on either side of *:
var re = /^\*[A-Z0-9]{7}\*$/;

delete the words with length greater than X in R

In R programming after i remove the punctuation, numbers and non-ascii characters, i remained with many words with long characters:
ques1<-gsub("[[:digit:]]"," ", ques1,perl=TRUE)
ques1<-gsub("[[:punct:]]"," ", ques1,perl=TRUE)
ques1<-iconv(ques1, "latin1", "ASCII", sub=" ")
ques1<-rm_white(ques1)
ques1
I checked the longest length of character is 35 using
max(nchar(strsplit(ques1, " ")[[1]]))
[1] 35
Now, i want to remove the words which has more than 10 characters, as i didn't want them, such as
wwwhotmailcomlearnbyexample
Please help me out !!!
Use the following gsub:
ques1 = "A long sentence with long wwwhotmailcomlearnbyexample"
gsub("\\b[[:alpha:]]{11,}\\b", "", ques1, perl=T)
The \\b[[:alpha:]]{11,}\\b regex will match words with length of 11 or more (\\b is a word boundary and [:alpha:] stands for any letter).
See IDEONE demo

How to format a string to replace all existing number inside a string to prefix with leading zero using regex

Anyone knows how to use regex to convert a string with characters and numbers to prefix with leading zero for each occurance of a number inside the string.
Eg ABC123 -> ABC000100020003
BCD02 - > BCD00000002
CD1A2 - > CD0001A0002
i.e for each occurance of a number it will prefix with leading zeros (total 4 digit for each occurance of a number)
Other characters to remain the same.
search /(\d)/g
and replace with 000\1
will do it.
demo here : http://regex101.com/r/aB8iE9
javascript demo here:
var str = "ABC123";
var res = str.replace(/(\d)/g, '000$1');
console.log(res);

Regex: mask all but the last 5 digits, ignoring non-digits

I want to match a number containing 17-23 digits interspersed with spaces or hyphens, then replace all but the last five digits with asterisks. I can match with the following regex:
((?:(?:\d)([\s-]*)){12,18})(\d[\s-]*){5}
My problem is that I can't get the regex to group all instances of [\s-] in the first section, and I have no idea how to get it to replace the initial 12-18 digits with asterisks (*).
How about this:
s/\d(?=(?:[ -]*\d){5,22}(?![ -]*\d))/*/g
The positive lookahead insures that there are at least 5 digits ahead of the just-matched digit, while the embedded negative lookahead insures that aren't more than 22.
However, there could still be more digits before the first-matched digit. That is, if there are 24 or more digits, this regex only operates on the last 23 of them. I don't know if that's a problem for you.
Even assuming that this is feasible with regex alone I'd bet that it would be way slower than using the non-capturing version of your regex and then reverse iterating over the match, leaving the first 5 digits alone and replacing the rest of them with '*'.
I think your regex is ok, but you might need to have a callback where you can insert the asterisks with another inline regex. The below is a Perl example.
s/((?:\d[\s-]*){12,18})((?:\d[\s-]*){4}\d)/ add_asterisks($1,$2) /xeg
use strict;
use warnings;
my $str = 'sequence of digits 01-2 3-456-7-190 123-416 78 ';
if ($str =~ s/((?:\d[\s-]*){12,18})((?:\d[\s-]*){4}\d)/ add_asterisks($1,$2) /xeg )
{
print "New string: '$str'\n";
}
sub add_asterisks {
my ($pre,$post) = #_;
$pre =~ s/\d/*/g;
return $pre . $post;
}
__END__
Output
New string: 'sequence of digits **-* *-***-*-*** ***-416 78 '
To give a java regex variant to Alan Moore's answer and using all word characters [a-zA-Z0-9] as \w instead of just digits \d.
This will also work with any length string.
public String maskNumber(String number){
String regex = "\\w(?=(?:\\W*\\w){4,}(?!\\W*\\w))";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(number);
while(m.find()){
number = number.replaceFirst(m.group(),"*");
}
return number;
}
This example
String[] numbers = {
"F4546-6565-55654-5457",
"F4546-6565-55654-54-D57",
"F4546-6565-55654-54-D;5.7",
"F4546-6565-55654-54-g5.37",
"hd6g83g.duj7*ndjd.(njdhg75){7dh i8}",
"####.####.####.675D-45",
"****.****.****.675D-45",
"**",
"12"
};
for (String number : numbers){
System.out.println(maskNumber(number));
}
Gives:
*****-****-*****-5457
*****-****-*****-*4-D57
*****-****-*****-*4-D;5.7
*****-****-*****-**-g5.37
*******.*********.(*******){*dh i8}
####.####.####.**5D-45
****.****.****.**5D-45
**
12