Regex find all first unique occurences of character in a string [closed] - regex

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have following string
1,2,3,a,b,c,a,b,c,1,2,3,c,b,a,2,3,1,
I would like to get only the first occurrence of any number without changing the order. This would be
1,2,3,a,b,c,
With this regex (found # https://stackoverflow.com/a/29480898/9307482) I can find them, but only the last occurrences. And this reverses the order.
(\w)(?!.*?\1) (https://regex101.com/r/3fqpu9/1)
It doesn't matter if the regex ignores the comma. The order is important.

Regular expression is not meant for that purpose. You will need to use an index filter or Set on array of characters.
Since you don't have a language specified I assume you are using javascript.
Example modified from: https://stackoverflow.com/a/14438954/1456201
String.prototype.uniqueChars = function() {
return [...new Set(this)];
}
var unique = "1,2,3,a,b,c,a,b,c,1,2,3,c,b,a,2,3,1,".split(",").join('').uniqueChars();
console.log(unique); // Array(6) [ "1", "2", "3", "a", "b", "c" ]

I would use something like this:
// each index represents one digit: 0-9
const digits = new Array(10);
// make your string an array
const arr = '123abcabc123cba231'.split('');
// test for digit
var reg = new RegExp('^[0-9]$');
arr.forEach((val, index) => {
if (reg.test(val) && !reg.test(digits[val])) {
digits[val] = index;
}
});
console.log(`occurrences: ${digits}`); // [,0,1,2,,,,....]
To interpret, for the digits array, since you have nothing in the 0 index you know you have zero occurrences of zero. Since you have a zero in the 1 index, you know that your first one appears in the first character of your string (index zero for array). Two appears in index 1 and so on..

A perl way to do the job:
use Modern::Perl;
my $in = '4,d,e,1,2,3,4,a,b,c,d,e,f,a,b,c,1,2,3,c,b,a,2,3,1,';
my (%h, #r);
for (split',',$in) {
push #r, $_ unless exists $h{$_};
$h{$_} = 1;
}
say join',',#r;
Output:
4,d,e,1,2,3,a,b,c,f

Related

Get the number between two characters - Typescript [duplicate]

This question already has answers here:
RegExp in TypeScript
(5 answers)
Closed 2 years ago.
I am new to Typescript and trying to make a webhook in my Google Cloud Functions.
I have a string: C1234567890A460450P10TS1596575969702
I want to use regex to extract the number 1234567890 from that string.
The first character C is fixed and does not change, the character A after the number is variable and can be any other alphabet.
The regex that matches the number is (?<=C)(\d{10})(?=\w).
I want to know how to execute this regex in Typescript so that I can get the number into a variable(eg: const number = [the number extracted from the string] //value 1234567890)
Edit 1:
Based on the provided suggestions (which I had tried already before posting this question), here is the code I could make out of it:
const string = request.body.string;
let regxp = new RegExp('(?<=C)(\d{10})(?=\w)');
const number = regxp.exec(string);
response.send(number);
This gives a blank response.
There is two problems, you never parsed the returned string to a number with parseInt and (?<=C) (positive lookbehind) is not always supported.
Second, your regular expression can be simplified into ^C\d{10} and a .splice(1) to remove the C.
const string: string = request.body.string;
const matches = s.match(/^C\d{10}/);
let number: number;
if(matches !== null) {
number = parseInt(matches[0].slice(1));
} else {
res.status(400).end(); // Assuming this is express
return;
}
res.send(number); // 1234567890
Playground

Regular expression exact part of letters match in middle of the word including Dot[.] [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
I am looking for regular expression for exact letters in middle of word. which should match Dot(.) also.
Currently I am using regular expression is "\\w*."+inputString +"\\w*", "i" , actually period represents any letter in this expression.
eg:
inputData = {name:[abc.12, abcdef, bc1454, test, rahul, bc.reju, rewbc.]}
inputString = "bc."
var wordFormat = new RegExp('\\w*'+inputString +'\\w*', 'i');
workFormat.test(inputData);
scenario 1: Starting of word.
input : 'bc.'
actual output is: abc.12
expecting output is: abc.12, bc.reju, rewbc.
expect output should get only one because passing inputString matches only one item in array of object (inputData) so, expecting output item is 1.
Here is a demo - Regex101
You can modify this by replacing the "bc" with your search string.
\w*bc\w*\.\d*
updated Expression
\w*bc\.\d*
Here is the example that you can use as your requirement:
regular = "bc." //These was the actual input expression you want to test
var regularplacholder = '[character]';
var regularexpresion="/";
for(var index =0;index<regular.length;index++)
{
regularexpresion+=(regularplacholder.replace('character',regular[index]));
}
regularexpresion+='*'
if(regularexpresion.test('test string')){
//your logic here
}

Need a regex to capture numbered citations [duplicate]

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 5 years ago.
Bit of a regex newbie... sorry. I have a document with IEEE style citations, or numbers in brackets. They can be one number, as in [23], or several, as in [5, 7, 14], or a range, as in [12-15].
What I have now is [\[|\s|-]([0-9]{1,3})[\]|,|-].
This is capturing single numbers, and the first number in a group, but not subsequent numbers or either number in a range.
Then I need to refer to that number in an expression like \1.
I hope this is clear! I have a suspicion I don't understand the OR operator.
How about this?
(\[\d+\]|\[\d+-\d+\]|\[\d+(,\d+)*\])
Actually this can be even more siplified to : (\[\d+-\d+\]|\[\d+(,\d+)*\])
my #test = (
"[5,7,14]",
"[23]",
"[12-15]"
);
foreach my $val (#test) {
if ($val =~ /(\[\d+-\d+\]|\[\d+(,\d+)*\])/ ) {
print "match $val!\n";
}
else {
print "no match!\n";
}
}
This prints:
match [5,7,14]!
match [23]!
match [12-15]!
Whitespaces are not taken into account but you can add them if you need to
I think Jim's Answer is helpful, but some generalizing and coding for better understand:
If Questions was looking for more complex but possible one like [1,3-5]:
(\[\d+(,\s?\d+|\d*-\d+)*\])
^^^^ optional space after ','
//validates:
[3,33-24,7]
[3-34]
[1,3-5]
[1]
[1, 2]
Demo for this Regex
JavaScript code for replacing digits by links:
//define input string:
var mytext = "[3,33-24,7]\n[3-34]\n[1,3-5]\n[1]\n[1, 2]" ;
//call replace of matching [..] that calls digit replacing it-self
var newtext = mytext.replace(/(\[\d+(,\s?\d+|\d*-\d+)*\])/g ,
function(ci){ //ci is matched citations `[..]`
console.log(ci);
//so replace each number in `[..]` with custom links
return ci.replace(/\d+/g,
function(digit){
return ''+digit+'' ;
});
});
console.log(newtext);
/*output:
'[3,33-24,7]
[3-34]
[1,3-5]
[1]
[1, 2]'
*/

dict to remove smart quotes [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
charmap = [
(u"\u201c\u201d", "\""),
(u"\u2018\u2019", "'")
]
_map = dict((c, r) for chars, r in charmap for c in list(chars))
fixed = "".join(_map.get(c, c) for c in s)
print fixed
I was looking to write a similar script to replace smart quotes and curly apostrophes from text answered here here: Would someone be kind enough to explain the two lines:
_map = dict((c, r) for chars, r in charmap for c in list(chars))
fixed = "".join(_map.get(c, c) for c in s)
and possibly rewrite them in a longer-winded format with comments to explain what is exactly going on - I'm a little confused whether its an inner/outer loop combo or sequential checking over items in a dictionary.
_map = dict((c, r) for chars, r in charmap for c in list(chars))
means:
_map = {} # an empty dictionary
for (c, r) in charmap: # c - string of symbols to be replaced, r - replacement
for chars in list(c): # chars - individual symbol from c
_map[chars] = r # adding entry replaced:replacement to the dictionary
and
fixed = "".join(_map.get(c, c) for c in s)
means
fixed = "" # an empty string
for c in s:
fixed = fixed + _map.get(c, c) # first "c" is key, second is default for "not found"
as method .joinsimply concatenates elements of sequence with given string as a separators between them (in this case "", i. e. without a separator)
It's faster and more straightforward to use the built in string function translate:
#!python2
#coding: utf8
# Start with a Unicode string.
# Your codecs.open() will read the text in Unicode
text = u'''\
"Don't be dumb"
“You’re smart!”
'''
# Build a translation dictionary.
# Keys are Unicode ordinal numbers.
# Values can be ordinals, Unicode strings, or None (to delete)
charmap = { 0x201c : u'"',
0x201d : u'"',
0x2018 : u"'",
0x2019 : u"'" }
print text.translate(charmap)
Output:
"Don't be dumb"
"You're smart!"

Regex permutations without repetition [duplicate]

This question already has answers here:
How to find all permutations of a given word in a given text?
(6 answers)
Closed 7 years ago.
I need a RegEx to check if I can find a expression in a string.
For the string "abc" I would like to match the first appearance of any of the permutations without repetition, in this case 6: abc, acb, bac, bca, cab, cba.
For example, in this string "adesfecabefgswaswabdcbaes" it'd find a coincidence in the position 7.
Also I'd need the same for permutations without repetition like this "abbc". The cases for this are 12: acbb, abcb, abbc, cabb, cbab, cbba, bacb, babc, bcab, bcba, bbac, bbca
For example, in this string "adbbcacssesfecabefgswaswabdcbaes" it'd find a coincidence in the position 3.
Also, I would like to know how would that be for similar cases.
EDIT
I'm not looking for the combinations of the permutations, no. I already have those. WHat I'm looking for is a way to check if any of those permutations is in a given string.
EDIT 2
This regex I think covers my first question
([abc])(?!\1)([abc])(?!\2|\1)[abc]
Can find all permutations(6) of "abc" in any secuence of characters.
Now I need to do the same when I have a repeated character like abbc (12 combinations).
([abc])(?!\1)([abc])(?!\2|\1)[abc]
You can use this without g flag to get the position.See demo.The position of first group is what you want.
https://regex101.com/r/nS2lT4/41
https://regex101.com/r/nS2lT4/42
The only reason you might "need a regex" is if you are working with a library or tool which only permits specifying certain kinds of rules with a regex. For instance, some editors can be customized to color certain syntactic constructs in a particular way, and they only allow those constructs to be specified as regular expressions.
Otherwise, you don't "need a regex", you "need a program". Here's one:
// are two arrays equal?
function array_equal(a1, a2) {
return a1.every(function(chr, i) { return chr === a2[i]; });
}
// are two strings permutations of each other?
function is_permutation(s1, s2) {
return array_equal(s1.split('').sort(), s2.split('').sort());
}
// make a function which finds permutations in a string
function make_permutation_finder(chars) {
var len = chars.length;
return function(str) {
for (i = 0; i < str.length - len; i++) {
if (is_permutation(chars, str.slice(i, i+len))) return i;
}
return -1;
};
}
> finder = make_permutation_finder("abc");
> console.log(finder("adesfecabefgswaswabdcbaes"));
< 6
Regexps are far from being powerful enough to do this kind of thing.
However, there is an alternative, which is precompute the permutations and build a dynamic regexp to find them. You did not provide a language tag, but here's an example in JS. Assuming you have the permutations and don't have to worry about escaping special regexp characters, that's just
regexp = new RegExp(permuations.join('|'));