Find a substring between two optional markers [closed] - regex

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I was trying to extract substrings from a string of the following form:
dest=comp;jump
I'm looking for a regexp to retrieve comp, but both dest and jump are optional, in which case = or ; is ommitted. So these are all valid configurations:
dest=comp;jump
dest=comp
comp;jump
comp
dest, comp and jump are arbitrary strings, but do not contain equality signs nor semicolons.
What I managed to come up with is the following:
(?:=)([^;=]*)(?:;)
Unfortunately, it doesn't work when either dest or jump is ommitted.

How about:
(?:.*=|^)([^;]+)(?:;|$)
The string you're searching is in group 1.

If the whole line has to have that form, this should do:
if line.chomp =~ /\A(?:[^;=]+=)?([^=;]+)(?:;[^;=]+)?\z/
puts $1
end
This will skip ill-formed lines like
"dest=dest=comp;jump;jump"

I wouldn't try to make it all happen inside a single regular expression. That path makes it harder to read and maintain. Instead I'd break it into more atomic tests using case/when statements:
If you only want comp I'd use:
array = %w[
dest=comp;jump
dest=comp
comp;jump
comp
].map{ |str|
case str
when /.+=(.+);/, /=(.+)/, /(.+);/
$1
else
str
end
}
array
# => ["comp", "comp", "comp", "comp"]
The when clause breaks the complexity down into three small tests, each of them very easy to understand:
Does the string have both '=' and ';'? Then return the substring between those two characters.
Does the string have '='? Then return the last word.
Does the string have ';'? Then return the first word.
Return the word.
If you need to know which of your terms are being returned then a bit more code is needed:
%w[
dest=comp;jump
dest=comp
comp;jump
comp
].each{ |str|
dest, comp, jump = case str
when /(.+)=(.+);(.+)/
[$1, $2, $3]
when /(.+)=(.+)/
[$1, $2, nil]
when /(.+);(.+)/
[nil, $1, $2]
else
[nil, str, nil]
end
puts 'dest="%s" comp="%s" jump="%s"' % [dest, comp, jump]
}
# >> dest="dest" comp="comp" jump="jump"
# >> dest="dest" comp="comp" jump=""
# >> dest="" comp="comp" jump="jump"
# >> dest="" comp="comp" jump=""

I would just try to split the expression into two, makes it easier to understand what is happening:
string = 'dest=comp;jump'
trimming_regexp = [/.*=/, /;.*/]
trimming_regexp.each{|exp| string.slice!(exp)}

Related

Regex find all first unique occurences of character in a string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have following string
1,2,3,a,b,c,a,b,c,1,2,3,c,b,a,2,3,1,
I would like to get only the first occurrence of any number without changing the order. This would be
1,2,3,a,b,c,
With this regex (found # https://stackoverflow.com/a/29480898/9307482) I can find them, but only the last occurrences. And this reverses the order.
(\w)(?!.*?\1) (https://regex101.com/r/3fqpu9/1)
It doesn't matter if the regex ignores the comma. The order is important.
Regular expression is not meant for that purpose. You will need to use an index filter or Set on array of characters.
Since you don't have a language specified I assume you are using javascript.
Example modified from: https://stackoverflow.com/a/14438954/1456201
String.prototype.uniqueChars = function() {
return [...new Set(this)];
}
var unique = "1,2,3,a,b,c,a,b,c,1,2,3,c,b,a,2,3,1,".split(",").join('').uniqueChars();
console.log(unique); // Array(6) [ "1", "2", "3", "a", "b", "c" ]
I would use something like this:
// each index represents one digit: 0-9
const digits = new Array(10);
// make your string an array
const arr = '123abcabc123cba231'.split('');
// test for digit
var reg = new RegExp('^[0-9]$');
arr.forEach((val, index) => {
if (reg.test(val) && !reg.test(digits[val])) {
digits[val] = index;
}
});
console.log(`occurrences: ${digits}`); // [,0,1,2,,,,....]
To interpret, for the digits array, since you have nothing in the 0 index you know you have zero occurrences of zero. Since you have a zero in the 1 index, you know that your first one appears in the first character of your string (index zero for array). Two appears in index 1 and so on..
A perl way to do the job:
use Modern::Perl;
my $in = '4,d,e,1,2,3,4,a,b,c,d,e,f,a,b,c,1,2,3,c,b,a,2,3,1,';
my (%h, #r);
for (split',',$in) {
push #r, $_ unless exists $h{$_};
$h{$_} = 1;
}
say join',',#r;
Output:
4,d,e,1,2,3,a,b,c,f

Need a regex to capture numbered citations [duplicate]

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 5 years ago.
Bit of a regex newbie... sorry. I have a document with IEEE style citations, or numbers in brackets. They can be one number, as in [23], or several, as in [5, 7, 14], or a range, as in [12-15].
What I have now is [\[|\s|-]([0-9]{1,3})[\]|,|-].
This is capturing single numbers, and the first number in a group, but not subsequent numbers or either number in a range.
Then I need to refer to that number in an expression like \1.
I hope this is clear! I have a suspicion I don't understand the OR operator.
How about this?
(\[\d+\]|\[\d+-\d+\]|\[\d+(,\d+)*\])
Actually this can be even more siplified to : (\[\d+-\d+\]|\[\d+(,\d+)*\])
my #test = (
"[5,7,14]",
"[23]",
"[12-15]"
);
foreach my $val (#test) {
if ($val =~ /(\[\d+-\d+\]|\[\d+(,\d+)*\])/ ) {
print "match $val!\n";
}
else {
print "no match!\n";
}
}
This prints:
match [5,7,14]!
match [23]!
match [12-15]!
Whitespaces are not taken into account but you can add them if you need to
I think Jim's Answer is helpful, but some generalizing and coding for better understand:
If Questions was looking for more complex but possible one like [1,3-5]:
(\[\d+(,\s?\d+|\d*-\d+)*\])
^^^^ optional space after ','
//validates:
[3,33-24,7]
[3-34]
[1,3-5]
[1]
[1, 2]
Demo for this Regex
JavaScript code for replacing digits by links:
//define input string:
var mytext = "[3,33-24,7]\n[3-34]\n[1,3-5]\n[1]\n[1, 2]" ;
//call replace of matching [..] that calls digit replacing it-self
var newtext = mytext.replace(/(\[\d+(,\s?\d+|\d*-\d+)*\])/g ,
function(ci){ //ci is matched citations `[..]`
console.log(ci);
//so replace each number in `[..]` with custom links
return ci.replace(/\d+/g,
function(digit){
return ''+digit+'' ;
});
});
console.log(newtext);
/*output:
'[3,33-24,7]
[3-34]
[1,3-5]
[1]
[1, 2]'
*/

dict to remove smart quotes [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
charmap = [
(u"\u201c\u201d", "\""),
(u"\u2018\u2019", "'")
]
_map = dict((c, r) for chars, r in charmap for c in list(chars))
fixed = "".join(_map.get(c, c) for c in s)
print fixed
I was looking to write a similar script to replace smart quotes and curly apostrophes from text answered here here: Would someone be kind enough to explain the two lines:
_map = dict((c, r) for chars, r in charmap for c in list(chars))
fixed = "".join(_map.get(c, c) for c in s)
and possibly rewrite them in a longer-winded format with comments to explain what is exactly going on - I'm a little confused whether its an inner/outer loop combo or sequential checking over items in a dictionary.
_map = dict((c, r) for chars, r in charmap for c in list(chars))
means:
_map = {} # an empty dictionary
for (c, r) in charmap: # c - string of symbols to be replaced, r - replacement
for chars in list(c): # chars - individual symbol from c
_map[chars] = r # adding entry replaced:replacement to the dictionary
and
fixed = "".join(_map.get(c, c) for c in s)
means
fixed = "" # an empty string
for c in s:
fixed = fixed + _map.get(c, c) # first "c" is key, second is default for "not found"
as method .joinsimply concatenates elements of sequence with given string as a separators between them (in this case "", i. e. without a separator)
It's faster and more straightforward to use the built in string function translate:
#!python2
#coding: utf8
# Start with a Unicode string.
# Your codecs.open() will read the text in Unicode
text = u'''\
"Don't be dumb"
“You’re smart!”
'''
# Build a translation dictionary.
# Keys are Unicode ordinal numbers.
# Values can be ordinals, Unicode strings, or None (to delete)
charmap = { 0x201c : u'"',
0x201d : u'"',
0x2018 : u"'",
0x2019 : u"'" }
print text.translate(charmap)
Output:
"Don't be dumb"
"You're smart!"

Regex permutations without repetition [duplicate]

This question already has answers here:
How to find all permutations of a given word in a given text?
(6 answers)
Closed 7 years ago.
I need a RegEx to check if I can find a expression in a string.
For the string "abc" I would like to match the first appearance of any of the permutations without repetition, in this case 6: abc, acb, bac, bca, cab, cba.
For example, in this string "adesfecabefgswaswabdcbaes" it'd find a coincidence in the position 7.
Also I'd need the same for permutations without repetition like this "abbc". The cases for this are 12: acbb, abcb, abbc, cabb, cbab, cbba, bacb, babc, bcab, bcba, bbac, bbca
For example, in this string "adbbcacssesfecabefgswaswabdcbaes" it'd find a coincidence in the position 3.
Also, I would like to know how would that be for similar cases.
EDIT
I'm not looking for the combinations of the permutations, no. I already have those. WHat I'm looking for is a way to check if any of those permutations is in a given string.
EDIT 2
This regex I think covers my first question
([abc])(?!\1)([abc])(?!\2|\1)[abc]
Can find all permutations(6) of "abc" in any secuence of characters.
Now I need to do the same when I have a repeated character like abbc (12 combinations).
([abc])(?!\1)([abc])(?!\2|\1)[abc]
You can use this without g flag to get the position.See demo.The position of first group is what you want.
https://regex101.com/r/nS2lT4/41
https://regex101.com/r/nS2lT4/42
The only reason you might "need a regex" is if you are working with a library or tool which only permits specifying certain kinds of rules with a regex. For instance, some editors can be customized to color certain syntactic constructs in a particular way, and they only allow those constructs to be specified as regular expressions.
Otherwise, you don't "need a regex", you "need a program". Here's one:
// are two arrays equal?
function array_equal(a1, a2) {
return a1.every(function(chr, i) { return chr === a2[i]; });
}
// are two strings permutations of each other?
function is_permutation(s1, s2) {
return array_equal(s1.split('').sort(), s2.split('').sort());
}
// make a function which finds permutations in a string
function make_permutation_finder(chars) {
var len = chars.length;
return function(str) {
for (i = 0; i < str.length - len; i++) {
if (is_permutation(chars, str.slice(i, i+len))) return i;
}
return -1;
};
}
> finder = make_permutation_finder("abc");
> console.log(finder("adesfecabefgswaswabdcbaes"));
< 6
Regexps are far from being powerful enough to do this kind of thing.
However, there is an alternative, which is precompute the permutations and build a dynamic regexp to find them. You did not provide a language tag, but here's an example in JS. Assuming you have the permutations and don't have to worry about escaping special regexp characters, that's just
regexp = new RegExp(permuations.join('|'));

Dynamic regexprep in MATLAB

I have the following strings in a long string:
a=b=c=d;
a=b;
a=b=c=d=e=f;
I want to first search for above mentioned pattern (X=Y=...=Z) and then output like the following for each of the above mentioned strings:
a=d;
b=d;
c=d;
a=b;
a=f;
b=f;
c=f;
d=f;
e=f;
In general, I want all the variables to have an equal sign with the last variable on the extreme right of the string. Is there a way I can do it using regexprep in MATLAB. I am able to do it for a fixed length string, but for variable length, I have no idea how to achieve this. Any help is appreciated.
My attempt for the case of two equal signs is as follows:
funstr = regexprep(funstr, '([^;])+\s*=\s*+(\w+)+\s*=\s*([^;])+;', '$1 = $3; \n $2 = $3;\n');
Not a regexp but if you stick to Matlab you can make use of the cellfun function to avoid loop:
str = 'a=b=c=d=e=f;' ; %// input string
list = strsplit(str,'=') ;
strout = cellfun( #(a) [a,'=',list{end}] , list(1:end-1), 'uni', 0).' %'// Horchler simplification of the previous solution below
%// this does the same than above but more convoluted
%// strout = cellfun( #(a,b) cat(2,a,'=',b) , list(1:end-1) , repmat(list(end),1,length(list)-1) , 'uni',0 ).'
Will give you:
strout =
'a=f;'
'b=f;'
'c=f;'
'd=f;'
'e=f;'
Note: As Horchler rightly pointed out in comment, although the cellfun instruction allows to compact your code, it is just a disguised loop. Moreover, since it runs on cell, it is notoriously slow. You won't see the difference on such simple inputs, but keep this use when super performances are not a major concern.
Now if you like regex you must like black magic code. If all your strings are in a cell array from the start, there is a way to (over)abuse of the cellfun capabilities to obscure your code do it all in one line.
Consider:
strlist = {
'a=b=c=d;'
'a=b;'
'a=b=c=d=e=f;'
};
Then you can have all your substring with:
strout = cellfun( #(s)cellfun(#(a,b)cat(2,a,'=',b),s(1:end-1),repmat(s(end),1,length(s)-1),'uni',0).' , cellfun(#(s) strsplit(s,'=') , strlist , 'uni',0 ) ,'uni',0)
>> strout{:}
ans =
'a=d;'
'b=d;'
'c=d;'
ans =
'a=b;'
ans =
'a=f;'
'b=f;'
'c=f;'
'd=f;'
'e=f;'
This gives you a 3x1 cell array. One cell for each group of substring. If you want to concatenate them all then simply: strall = cat(2,strout{:});
I haven't had much experience w/ Matlab; but your problem can be solved by a simple string split function.
[parts, m] = strsplit( funstr, {' ', '='}, 'CollapseDelimiters', true )
Now, store the last part of parts; and iterate over parts until that:
len = length( parts )
for i = 1:len-1
print( strcat(parts(i), ' = ', parts(len)) )
end
I do not know what exactly is the print function in matlab. You can update that accordingly.
There isn't a single Regex that you can write that will cover all the cases. As posted on this answer:
https://stackoverflow.com/a/5019658/3393095
However, you have a few alternatives to achieve your final result:
You can get all the values in the line with regexp, pick the last value, then use a for loop iterating throughout the other values to generate the output. The regex to get the values would be this:
matchStr = regexp(str,'([^=;\s]*)','match')
If you want to use regexprep at any means, you should write a pattern generator and a replace expression generator, based on number of '=' in the input string, and pass these as parameters of your regexprep func.
You can forget about Regex and Split the input to generate the output looping throughout the values (similarly to alternative #1) .