Regex match scss function / mixin - regex

I am trying to match a function or mixin used in an SCSS string so I may remove it but I am having a bit of trouble.
For those unfamiliar with SCSS this is an example of the things I am trying to match (from bootstrap 4).
#mixin _assert-ascending($map, $map-name) {
$prev-key: null;
$prev-num: null;
#each $key, $num in $map {
#if $prev-num == null {
// Do nothing
} #else if not comparable($prev-num, $num) {
#warn "Potentially invalid value for #{$map-name}: This map must be in ascending order, but key '#{$key}' has value #{$num} whose unit makes it incomparable to #{$prev-num}, the value of the previous key '#{$prev-key}' !";
} #else if $prev-num >= $num {
#warn "Invalid value for #{$map-name}: This map must be in ascending order, but key '#{$key}' has value #{$num} which isn't greater than #{$prev-num}, the value of the previous key '#{$prev-key}' !";
}
$prev-key: $key;
$prev-num: $num;
}
}
And a small function:
#function str-replace($string, $search, $replace: "") {
$index: str-index($string, $search);
#if $index {
#return str-slice($string, 1, $index - 1) + $replace + str-replace(str-slice($string, $index + str-length($search)), $search, $replace);
}
#return $string;
}
So far I have the following regex:
#(function|mixin)\s?[[:print:]]+\n?([^\}]+)
However it only matches to the first } that it finds which makes it fail, this is because it needs to find the last occurance of the closing curly brace.
My thoughts are that a regex capable of matching a function definition could be adapted but I can't find a good one using my Google foo!
Thanks in advance!

I would not recommend to use a regex for that, since a regex is not able to handle recursion, what you might need in that case.
For Instance:
#mixin test {
body {
}
}
Includes two »levels« of scope here ({{ }}), so your regex should be able to to count brackets as they open and close, to match the end of the mixin or function. But that is not possible with a regex.
This regex
/#mixin(.|\s)*\}/gm
will match the whole mixin, but if the input is like that:
#mixin foo { … }
body { … }
It will match everything up to the last } what includes the style definition for the body. That is because the regex cannot know which } closes the mixin.
Have a look at this answer, it explains more or less the same thing but based on matching html elements.
Instead you should use a parser, to parse the whole Stylesheet into syntax tree, than remove unneeded functions and than write it to string again.

In fact, like #philipp said, regex can't replace syntax analysis like compilers do.
But here is a sed command which is a little ugly but could make the trick :
sed -r -e ':a' -e 'N' -e '$!ba' -e 's/\n//g' -e 's/}\s*#(function|mixin)/}\n#\1/g' -e 's/^#(function|mixin)\s*str-replace(\s|\()+.*}$//gm' <your file>
-e ':a' -e 'N' -e '$!ba' -e 's/\n//g' : Read all file in a loop and remove the new line (See https://stackoverflow.com/a/1252191/7990687 for more information)
-e 's/}\s*#(function|mixin)/}\n#\1/g' : Make each #mixin or #function statement the start of a new line, and the preceding } the last character of the previous line
's/^#(function|mixin)\s*str-replace(\s|\()+.*}$//gm' : Remove the line corresponding to the #function str-replace or #mixin str-replace declaration
But it will result in an output that will loose indentation, so you will have to reindent it after that.
I tried it on a file where I copy/paste multiple times the sample code you provided, so you will have to try it on your file because there could be cases where the regex will match more element than wanted. If it is the case, provide us a test file to try to resolve these issues.

After much headache here is the answer to my question!
The source needs to be split line by line and read, maintining a count of the open / closed braces to determine when the index is 0.
$pattern = '/(?<remove>#(function|mixin)\s?[\w-]+[$,:"\'()\s\w\d]+)/';
$subject = file_get_contents('vendor/twbs/bootstrap/scss/_variables.scss'); // just a regular SCSS file containing what I've already posted.
$lines = explode("\n",$subject);
$total_lines = count($lines);
foreach($lines as $line_no=>$line) {
if(preg_match($pattern,$line,$matches)) {
$match = $matches['remove'];
$counter = 0;
$open_braces = $closed_braces = 0;
for($i=$line_no;$i<$total_lines;$i++) {
$current = $lines[$i];
$open_braces = substr_count($current,"{");
$closed_braces = substr_count($current,"}");
$counter += ($open_braces - $closed_braces);
if($counter==0) {
$start = $line_no;
$end = $i;
foreach(range($start,$end) as $a) {
unset($lines[$a]);
} // end foreach(range)
break; // break out of this if!
} // end for loop
} // end preg_match
} // endforeach
And we have a $lines array without any functions or mixins.
There is probably a more elegant way to do this but I don't have the time or the willing to write an AST parser for SCSS
This can be quite easily adapted into making a hacked one however!

Related

Jenkinsfile/Groovy: how to use variables in regex pattern find-counts?

In the following declarative syntax pipeline:
pipeline {
agent any
stages {
stage( "1" ) {
steps {
script {
orig = "/path/to/file"
two_lev_down = (orig =~ /^(?:\/[^\/]*){2}(.*)/)[0][1]
echo "${two_lev_down}"
depth = 2
two_lev_down = (orig =~ /^(?:\/[^\/]*){depth}(.*)/)[0][1]
echo "${two_lev_down}"
}
}
}
}
}
...the regex is meant to match everything after the third instance of "/".
The first, i.e. (orig =~ /^(?:\/[^\/]*){2}(.*)/)[0][1] works.
But the second, (orig =~ /^(?:\/[^\/]*){depth}(.*)/)[0][1] does not. It generates this error:
java.util.regex.PatternSyntaxException: Illegal repetition near index 10
^(?:/[^/]*){depth}(.*)
I assume the problem is the use of the variable depth instead of a hardcoded integer, since that's the only difference between the working code and error-generating code.
How can I use a Groovy variable in a regex pattern find-count? Or what is the Groovy-language idiomatic way to write a regex that returns everything after the nth occurrence of a pattern?
You are missing the $ in front of your variable. It should be:
orig = "/path/to/file"
depth = 2
two_lev_down = (orig =~ /^(?:\/[^\/]*){$depth}(.*)/)[0][1]
assert '/file' == two_lev_down
Why?
In Groovy the String-interpolation (over GString) works for 3 String literals:
usual double quotes: "Hello $world, my name is ${name.toUpperCase()}"
Slashy-strings used usually as regexp-literals: /.{$depth}/
Multi-line double-quoted Strings:
def email = """
Dear ${user}.
Thank your for blablah.
"""

Regular expression is too complex error in tcl

I have not seen this error for a small list. Issue popped up when the list went >10k. Is there any limit on the number of regex patterns in tcl?
puts "#LEVELSHIFTER_TEMPLATES_LIMITSFILE:$perc_limit(levelshifter_templates)"
puts "#length of templates is :[llength $perc_limit(levelshifter_templates)]"
if { [regexp [join $perc_limit(levelshifter_templates) |] $temp] }
#LEVELSHIFTER_TEMPLATES_LIMITSFILE:HDPELT06_LVLDBUF_CAQDP_1 HDPELT06_LVLDBUF_CAQDPNRBY2_1 HDPELT06_LVLDBUF_CAQDP_1....
#length of templates is :13520
ERROR: couldn't compile regular expression pattern: regular expression is too complex
If $temp is a single word and you're really just doing a literal test, you should invert the check. One of the easiest ways might be:
if {$temp in $perc_limit(levelshifter_templates)} {
# ...
}
But if you're doing that a lot (well, more than a small number of times, 3 or 4 say) then building a dictionary for this might be best:
# A one-off cost
foreach key $perc_limit(levelshifter_templates) {
# Value is arbitrary
dict set perc_limit_keys $key 1
}
# This is now very cheap
if {[dict exists $perc_limit_keys $temp]} {
# ...
}
If you've got multiple words in $temp, split and check (using the second technique, which is now definitely worthwhile). This is where having a helper procedure can be a good plan.
proc anyWordIn {inputString keyDictionary} {
foreach word [split $inputString] {
if {[dict exists $keyDictionary $word]} {
return true
}
}
return false
}
if {[anyWordIn $temp $perc_limit_keys]} {
# ...
}
Assuming you want to see if the value in temp is an exact match for one of the elements of the list in perf_limit(levelshifter_templates), here's a few ways that are better than trying to use regular expressions:
Using lsearch`:
# Sort the list after populating it so we can do an efficient binary search
set perf_limit(levelshifter_templates) [lsort $perf_limit(levelshifter_templates)]
# ...
# See if the value in temp exists in the list
if {[lsearch -sorted $perf_limit(levelshifter_templates) $temp] >= 0} {
# ...
}
Storing the elements of the list in a dict (or array if you prefer) ahead of time for an O(1) lookup:
foreach item $perf_limit(levelshifter_templates) {
dict set lookup $item 1
}
# ...
if {[dict exists $lookup $temp]} {
# ...
}
I found a simple workaround for this problem by using a foreach statement to loop over all the regexes in the list instead of joining them and searching, which failed for a super-long list.
foreach pattern $perc_limit(levelshifter_templates) {
if { [regexp $pattern $temp]}
#puts "$fullpath: [is_std_cell_dev $dev]"
puts "##matches: $pattern return 0"
return 0
}
}

Passing a parameter to a regular expression to match the first letter in a word in perl

So here is what I'm doing. This is for homework, and I know I can't come on here and get you guys to do my homework for me but I'm stuck. We have to use perl (First time ever using it so forgive my stupidity) to make a function $starts_with that takes a parameter $str0 and $prefix. if $str0 starts with $prefix. then the function returns true. if it doesn't then it isn't pretty simple. We have to use regular expressions because that is the whole point of the exercise so here is my code
sub starts_with
{
$str0 = $_[0];
$prefix = $_[1];
if($prefix =~ /^($str0)/)
{
print $str0."\n";
print m/^(prefix)/."\n";
$startsWith = "Y"
}
if ($startsWith eq "Y")
{
print $str0." starts with ".$prefix."\n";
}
else
{
print $str0." does not start with ".$prefix."\n";
}
}
I'm almost ashamed to put this up here because I have no Idea what I'm doing yet. But I am trying to learn. I don't know how to do true false in perl thats why I have the $startsWith variable. you can fix that if you want. the part I need to fix is the line
if(str0 =~ /^($prefix)/)
I also need to find out how to refer to the first letter in str0...I think
A couple points without giving away the answer:
1) Arguments to functions are passed in a special variable called #_, which is what you are accessing when you say $_[0] and $_[1], but can be written much more concisely by assigned the argument list (#_) to your variables in list context
sub starts_with {
my ($str0, $prefix) = #_;
...
}
2) This statement: if($prefix =~ /^($str0)/) tests the exact opposite condition you are trying to prove. It says does the prefix start with the value of the variable $str0. What you really want to test is if $str0 starts with $prefix.
It might also be using to prefix your pattern with m flag, m/PATTERN which means match this pattern.
3) You don't have a return statement in your function, (As #M42 points out) the result of the last expression is returned; that expression being print will return true. You probably want to return true or false explicity.
See if you can use this to get started.
What I would do :
use Modern::Perl; # or use strict; use warnings; use feature qw/say/;
sub starts_with {
# better use #_, the default array instead of just elements of them
# ...like $_[0]
my ($str, $pref) = #_;
# very short expression, the pattern matching return a boolean.
# \Q\E is there to treat the prefix as-is (no metacharacters)
return $str =~ /^\Q$pref\E/;
}
# using our function
if (starts_with("foobar", "f")) {
say "TRUE";
}
else {
say "FALSE";
}
Golfing it a bit...
sub starts_with { $_[0] =~ /^\Q$_[1]/ }
Don't hand that version in though :-)

regular expression help: catch this: |TrxId=475665|

For example I have a string:
MsgNam=WMS.WEATXT|VersionsNr=0|TrxId=475665|MndNr=0257|Werk=0000|WeaNr=0171581054|WepNr=|WeaTxtTyp=110|SpraNam=ru|WeaTxtNr=2|WeaTxtTxt=100 111|
and I want to catch this: |TrxId=475665|
after TrxId= it could be any numbers and any amount of them, so regex should catch as well:
|TrxId=111333| and |TrxId=0000011112222| and |TrxId=123|
TrxId=(\d+)
That would give a group (1) with the TrxId.
PS: Use global modifier.
The regex should look somewhat like this:
TrxId=[0-9]+
It will match TrxId= followed by at least one digit.
An example solution in Python:
In [107]: data = 'MsgNam=WMS.WEATXT|VersionsNr=0|TrxId=475665|MndNr=0257|Werk=0000|WeaNr=0171581054|WepNr=|WeaTxtTyp=110|SpraNam=ru|WeaTxtNr=2|WeaTxtTxt=100 111|'
In [108]: m = re.search(r'\|TrxId=(\d+)\|', data)
In [109]: m.group(0)
Out[109]: '|TrxId=475665|'
In [110]: m.group(1)
Out[110]: '475665'
/MsgNam\=.*?\|(TrxId\=\d+)\|.*/
for example in perl:
$a = "MsgNam=WMS.WEATXT|VersionsNr=0|TrxId=475665|MndNr=0257|Werk=0000|WeaNr=0171581054|WepNr=|WeaTxtTyp=110|SpraNam=ru|WeaTxtNr=2|WeaTxtTxt=100111|";
$a =~ /MsgNam\=.*?\|(TrxId\=\d+)\|.*/;
print $1;
will print TrxId=475665
You know what your delimiters look like, so you don't need a regex, you need to split. Here's an implementation in Perl.
use strict;
use warnings;
my $input = "MsgNam=WMS.WEATXT|VersionsNr=0|TrxId=475665|MndNr=0257|Werk=0000|WeaNr=0171581054|WepNr=|WeaTxtTyp=110|SpraNam=ru|WeaTxtNr=2|WeaTxtTxt=100 111|";
my #first_array = split(/\|/,$input); #splitting $input on "|"
#Now, since the last character of $input is "|", the last element
#of this array is undef (ie the Perl equivalent of null)
#So, filter that out.
#first_array = grep{defined}#first_array;
#Also filter out elements that do not have an equals sign appearing.
#first_array = grep{/=/}#first_array;
#Now, put these elements into an associative array:
my %assoc_array;
foreach(#first_array)
{
if(/^([^=]+)=(.+)$/)
{
$assoc_array{$1} = $2;
}
else
{
#Something weird may be happening...
#we may have an element starting with "=" for example.
#Do what you want: throw a warning, die, silently move on, etc.
}
}
if(exists $assoc_array{TrxId})
{
print "|TrxId=" . $assoc_array{TrxId} . "|\n";
}
else
{
print "Sorry, TrxId not found!\n";
}
The code above yields the expected output:
|TrxId=475665|
Now, obviously this is more complex than some of the other answers, but it's also a bit more robust in that it allows you to search for more keys as well.
This approach does have a potential issue if your keys appear more than once. In that case, it's easy enough to modify the code above to collect an array reference of values for each key.

Why does my regex fail when the number ends in 0?

This is a really basic regex question but since I can't seem to figure out why the match is failing in certain circumstances I figured I'd post it to see if anyone else can point out what I'm missing.
I'm trying to pull out the 2 sets of digits from strings of the form:
12309123098_102938120938120938
1321312_103810312032123
123123123_10983094854905490
38293827_1293120938129308
I'm using the following code to process each string:
if($string && $string =~ /^(\d)+_(\d)+$/) {
if(IsInteger($1) && IsInteger($2)) { print "success ('$1','$2')"; }
else { print "fail"; }
}
Where the IsInterger() function is as follows:
sub IsInteger {
my $integer = shift;
if($integer && $integer =~ /^\d+$/) { return 1; }
return;
}
This function seems to work most of the time but fails on the following for some reason:
1287123437_1268098784380
1287123437_1267589971660
Any ideas on why these fail while others succeed? Thanks in advance for your help!
This is an add-on to the answers from unicornaddict and ZyX: what are you trying to match?
If you're trying to match the sequences left and right of '_', unicorn addict is correct and your regex needs to be ^(\d+)_(\d+)$. Also, you can get rid of the first qualifier and the 'IsIntrger()` function altogether - you already know it's an integer - it matched (\d+)
if ($string =~ /^(\d+)_(\d+)$/) {
print "success ('$1','$2')";
} else {
print "fail\n";
}
If you're trying to match the last digit in each and wondering why it's failing, it's the first check in IsInteger() ( if($intger && ). It's redundant anyway (you know it's an integer) and fails on 0 because, as ZyX notes - it evaluates to false.
Same thing applies though:
if ($string =~ /^(\d)+_(\d)+$/) {
print "success ('$1','$2')";
} else {
print "fail\n";
}
This will output success ('8','8') given the input 12309123098_102938120938120938
Because you have 0 at the end of the second string, (\d)+ puts only the last match in the $N variable, string "0" is equivalent to false.
When in doubt, check what your regex is actually capturing.
use strict;
use warnings;
my #data = (
'1321312_103810312032123',
'123123123_10983094854905490',
);
for my $s (#data){
print "\$1=$1 \$2=$2\n" if $s =~ /^(\d)+_(\d)+$/;
# Output:
# $1=2 $2=3
# $1=3 $2=0
}
You probably intended the second of these two approaches.
(\d)+ # Repeat a regex group 1+ times,
# capturing only the last instance.
(\d+) # Capture 1+ digits.
In addition, both in your main loop and in IsInteger (which seems unnecessary, given the initial regex in the main loop), you are testing for truth rather than something more specific, such as defined or length. Zero, for example, is a valid integer but false.
Shouldn't + be included in the grouping:
^(\d+)_(\d+)$ instead of ^(\d)+_(\d)+$
Many people have commented on your regex, but the problem you had in your IsInteger (which you really don't need for your example). You checked for "truth" when you really want to check for defined:
sub IsInteger {
my $integer = shift;
if( defined $integer && $integer =~ /^\d+$/) { return 1; }
return;
}
You don't need most of the infrastructure in that subroutine though:
sub IsInteger {
defined $_[0] && $_[0] =~ /^\d+$/
}