Need a regex to capture numbered citations [duplicate] - regex

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 5 years ago.
Bit of a regex newbie... sorry. I have a document with IEEE style citations, or numbers in brackets. They can be one number, as in [23], or several, as in [5, 7, 14], or a range, as in [12-15].
What I have now is [\[|\s|-]([0-9]{1,3})[\]|,|-].
This is capturing single numbers, and the first number in a group, but not subsequent numbers or either number in a range.
Then I need to refer to that number in an expression like \1.
I hope this is clear! I have a suspicion I don't understand the OR operator.

How about this?
(\[\d+\]|\[\d+-\d+\]|\[\d+(,\d+)*\])
Actually this can be even more siplified to : (\[\d+-\d+\]|\[\d+(,\d+)*\])
my #test = (
"[5,7,14]",
"[23]",
"[12-15]"
);
foreach my $val (#test) {
if ($val =~ /(\[\d+-\d+\]|\[\d+(,\d+)*\])/ ) {
print "match $val!\n";
}
else {
print "no match!\n";
}
}
This prints:
match [5,7,14]!
match [23]!
match [12-15]!
Whitespaces are not taken into account but you can add them if you need to

I think Jim's Answer is helpful, but some generalizing and coding for better understand:
If Questions was looking for more complex but possible one like [1,3-5]:
(\[\d+(,\s?\d+|\d*-\d+)*\])
^^^^ optional space after ','
//validates:
[3,33-24,7]
[3-34]
[1,3-5]
[1]
[1, 2]
Demo for this Regex
JavaScript code for replacing digits by links:
//define input string:
var mytext = "[3,33-24,7]\n[3-34]\n[1,3-5]\n[1]\n[1, 2]" ;
//call replace of matching [..] that calls digit replacing it-self
var newtext = mytext.replace(/(\[\d+(,\s?\d+|\d*-\d+)*\])/g ,
function(ci){ //ci is matched citations `[..]`
console.log(ci);
//so replace each number in `[..]` with custom links
return ci.replace(/\d+/g,
function(digit){
return ''+digit+'' ;
});
});
console.log(newtext);
/*output:
'[3,33-24,7]
[3-34]
[1,3-5]
[1]
[1, 2]'
*/

Related

Issue with REGEX in javascript [duplicate]

This question already has answers here:
How to tell if a string contains a certain character in JavaScript?
(21 answers)
Closed 4 years ago.
I get an issue using REGEX, it is probably about my REGEX but I need some helps.
I need to match all string containing "D"...
Test string 1 : D
Test string 2 : aaaaaaDqqqqq
Test string 3 : Dssssssss
Test string 4 : D4564646
Test string 5 : 1321313D2312
Test string 6 : ppppprrrrrr
My regex :
/^.+D.+|(:?^|\s)D$/gi
It works only for 1 and 2 and it should works for 1, 2, 3, 4 and 5.
In your case problem is with + operator which is literally Matches between one and unlimited times so it wont work if letter "D" will be in the beggining or the end of string. Try this regex: ^.*D.*$ with asterix, as it is defined as Matches between zero and unlimited times
See example
Following regex should work for you
.*D.*
If all you need to do is test for whether or not a string contains a D character, it's just /D/
var tests = [
"D",
"aaaaaaDqqqqq",
"Dssssssss",
"D4564646",
"1321313D2312",
"ppppprrrrrr"
]
tests.forEach(str => console.log(/D/.test(str)))
Instead of using a regex simply use the includes function
var string = "aaaaaaDqqqqq",
substring = "D";
if(string.includes(substring)){
console.log("contain")
}else{
console.log("don't contain")
}

Java 8 Matcher says there are matches but groupCount is 0 [duplicate]

This question already has answers here:
Java RegEx Matcher.groupCount returns 0
(4 answers)
Closed 5 years ago.
Java 8 here. I have the following function:
public String extractArgs(String function) {
Pattern inRegex = Pattern.compile("in\\(.*\\)");
Matcher inMatch = inRegex.matcher(function);
log("num in(...) function matches: " + inMatch.groupCount() + "but does inMatch.matches()? " + inMatch.matches());
if(inMatch.groupCount() > 0) {
return inMatch.group(1);
} else {
return "";
}
}
When I pass it "in(hello)" as an argument, I get the following output:
num in(...) function matches: 0 but does inMatch.matches()? true
My understanding is that if inMatch.matches() returns true (which is happening), that I should have at least one match group (inMatch.groupCount > 0).
I'm trying to compare the inputted arg string against the regex and (if there is a match) obtain the blurb of text that is contained inside the "in(...) function". Hence if I call extractArgs("in(hello)") then it should return the string "hello". Where am I going awry?!
There is no capture groups in your original regex. To introduce a capture group (which you want), you should define the regular expression along the lines of:
in\\((.*)\\)
Note that I've added a () brackets around where the arguments supposed to go. This way I have created a capture group around .*, so you will now have the groups:
pattern.matcher("in(hello, World!)").group(1) // hello, World!

Regular expression exact part of letters match in middle of the word including Dot[.] [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
I am looking for regular expression for exact letters in middle of word. which should match Dot(.) also.
Currently I am using regular expression is "\\w*."+inputString +"\\w*", "i" , actually period represents any letter in this expression.
eg:
inputData = {name:[abc.12, abcdef, bc1454, test, rahul, bc.reju, rewbc.]}
inputString = "bc."
var wordFormat = new RegExp('\\w*'+inputString +'\\w*', 'i');
workFormat.test(inputData);
scenario 1: Starting of word.
input : 'bc.'
actual output is: abc.12
expecting output is: abc.12, bc.reju, rewbc.
expect output should get only one because passing inputString matches only one item in array of object (inputData) so, expecting output item is 1.
Here is a demo - Regex101
You can modify this by replacing the "bc" with your search string.
\w*bc\w*\.\d*
updated Expression
\w*bc\.\d*
Here is the example that you can use as your requirement:
regular = "bc." //These was the actual input expression you want to test
var regularplacholder = '[character]';
var regularexpresion="/";
for(var index =0;index<regular.length;index++)
{
regularexpresion+=(regularplacholder.replace('character',regular[index]));
}
regularexpresion+='*'
if(regularexpresion.test('test string')){
//your logic here
}

Regex permutations without repetition [duplicate]

This question already has answers here:
How to find all permutations of a given word in a given text?
(6 answers)
Closed 7 years ago.
I need a RegEx to check if I can find a expression in a string.
For the string "abc" I would like to match the first appearance of any of the permutations without repetition, in this case 6: abc, acb, bac, bca, cab, cba.
For example, in this string "adesfecabefgswaswabdcbaes" it'd find a coincidence in the position 7.
Also I'd need the same for permutations without repetition like this "abbc". The cases for this are 12: acbb, abcb, abbc, cabb, cbab, cbba, bacb, babc, bcab, bcba, bbac, bbca
For example, in this string "adbbcacssesfecabefgswaswabdcbaes" it'd find a coincidence in the position 3.
Also, I would like to know how would that be for similar cases.
EDIT
I'm not looking for the combinations of the permutations, no. I already have those. WHat I'm looking for is a way to check if any of those permutations is in a given string.
EDIT 2
This regex I think covers my first question
([abc])(?!\1)([abc])(?!\2|\1)[abc]
Can find all permutations(6) of "abc" in any secuence of characters.
Now I need to do the same when I have a repeated character like abbc (12 combinations).
([abc])(?!\1)([abc])(?!\2|\1)[abc]
You can use this without g flag to get the position.See demo.The position of first group is what you want.
https://regex101.com/r/nS2lT4/41
https://regex101.com/r/nS2lT4/42
The only reason you might "need a regex" is if you are working with a library or tool which only permits specifying certain kinds of rules with a regex. For instance, some editors can be customized to color certain syntactic constructs in a particular way, and they only allow those constructs to be specified as regular expressions.
Otherwise, you don't "need a regex", you "need a program". Here's one:
// are two arrays equal?
function array_equal(a1, a2) {
return a1.every(function(chr, i) { return chr === a2[i]; });
}
// are two strings permutations of each other?
function is_permutation(s1, s2) {
return array_equal(s1.split('').sort(), s2.split('').sort());
}
// make a function which finds permutations in a string
function make_permutation_finder(chars) {
var len = chars.length;
return function(str) {
for (i = 0; i < str.length - len; i++) {
if (is_permutation(chars, str.slice(i, i+len))) return i;
}
return -1;
};
}
> finder = make_permutation_finder("abc");
> console.log(finder("adesfecabefgswaswabdcbaes"));
< 6
Regexps are far from being powerful enough to do this kind of thing.
However, there is an alternative, which is precompute the permutations and build a dynamic regexp to find them. You did not provide a language tag, but here's an example in JS. Assuming you have the permutations and don't have to worry about escaping special regexp characters, that's just
regexp = new RegExp(permuations.join('|'));

Regex: mask all but the last 5 digits, ignoring non-digits

I want to match a number containing 17-23 digits interspersed with spaces or hyphens, then replace all but the last five digits with asterisks. I can match with the following regex:
((?:(?:\d)([\s-]*)){12,18})(\d[\s-]*){5}
My problem is that I can't get the regex to group all instances of [\s-] in the first section, and I have no idea how to get it to replace the initial 12-18 digits with asterisks (*).
How about this:
s/\d(?=(?:[ -]*\d){5,22}(?![ -]*\d))/*/g
The positive lookahead insures that there are at least 5 digits ahead of the just-matched digit, while the embedded negative lookahead insures that aren't more than 22.
However, there could still be more digits before the first-matched digit. That is, if there are 24 or more digits, this regex only operates on the last 23 of them. I don't know if that's a problem for you.
Even assuming that this is feasible with regex alone I'd bet that it would be way slower than using the non-capturing version of your regex and then reverse iterating over the match, leaving the first 5 digits alone and replacing the rest of them with '*'.
I think your regex is ok, but you might need to have a callback where you can insert the asterisks with another inline regex. The below is a Perl example.
s/((?:\d[\s-]*){12,18})((?:\d[\s-]*){4}\d)/ add_asterisks($1,$2) /xeg
use strict;
use warnings;
my $str = 'sequence of digits 01-2 3-456-7-190 123-416 78 ';
if ($str =~ s/((?:\d[\s-]*){12,18})((?:\d[\s-]*){4}\d)/ add_asterisks($1,$2) /xeg )
{
print "New string: '$str'\n";
}
sub add_asterisks {
my ($pre,$post) = #_;
$pre =~ s/\d/*/g;
return $pre . $post;
}
__END__
Output
New string: 'sequence of digits **-* *-***-*-*** ***-416 78 '
To give a java regex variant to Alan Moore's answer and using all word characters [a-zA-Z0-9] as \w instead of just digits \d.
This will also work with any length string.
public String maskNumber(String number){
String regex = "\\w(?=(?:\\W*\\w){4,}(?!\\W*\\w))";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(number);
while(m.find()){
number = number.replaceFirst(m.group(),"*");
}
return number;
}
This example
String[] numbers = {
"F4546-6565-55654-5457",
"F4546-6565-55654-54-D57",
"F4546-6565-55654-54-D;5.7",
"F4546-6565-55654-54-g5.37",
"hd6g83g.duj7*ndjd.(njdhg75){7dh i8}",
"####.####.####.675D-45",
"****.****.****.675D-45",
"**",
"12"
};
for (String number : numbers){
System.out.println(maskNumber(number));
}
Gives:
*****-****-*****-5457
*****-****-*****-*4-D57
*****-****-*****-*4-D;5.7
*****-****-*****-**-g5.37
*******.*********.(*******){*dh i8}
####.####.####.**5D-45
****.****.****.**5D-45
**
12