Maven replacer plugin - repeat while matches exist - regex

I am using the maven replacer plugin and I've run into a situation where I have a regular expression that matches across lines which I need to run on the input file until all matches have been replaced. The configuration for this expression looks like this:
<regexFlags>
<regexFlag>DOTALL</regexFlag>
</regexFlags>
<replacements>
<replacement>
<token>\#([^\n\r=\#]+)\#=([^\n\r]*)(.*)(\#default\.\1\#=[^\n\r]*)(.*)</token>
<value>#$1#=$2$3$5</value>
<replacement>
<replacements>
The input could look like this:
#d.e.f#=y
#a.b.c#=x
#h.i.j#=aaaa
#default.a.b.c#=QQQ
#asdfasd.fasdfs.asdfa#=23423
#default.h.i.j#=234
#default.RR.TT#=393993
and I want the output to look like this:
#d.e.f#=y
#a.b.c#=x
#h.i.j#=aaaa
#asdfasd.fasdfs.asdfa#=23423
#default.RR.TT#=393993
The intention is to re-write the file, but without the tokens with a #default prefix, where another token without the prefix has already been defined.
#default.a.b.c#=QQQ and #default.h.i.j#=234 have been removed from the output because other tokens already contains a.b.c and h.i.j.
The current problem I have is that the replacer plugin only replaces the first match, so my output looks like this:
#d.e.f#=y
#a.b.c#=x
#h.i.j#=aaaa
#asdfasd.fasdfs.asdfa#=23423
#default.h.i.j#=234
#default.RR.TT#=393993
Here, #default.a.b.c=QQQ is gone, which is correct, but #default.h.i.j#=234 is still present.
If I were writing this in code, I think I could probably just loop while attempting to match on the entire output, and break when there are no matches. Is there a way to do this with the replacer plugin?
Edit: I may have over simplified my example. A more realistic one is:
#d.e.f#=y
#a.b.c#=x
#h.i.j#=aaaa
#default.a.b.c#=QQQ
#asdfasd.fasdfs.asdfa#=23423
#default.h.i.j#=234
#default.RR.TT#=393993
#x.y.z#=0
#default.q.r.s#=1
#l.m.n#=8.3
#q.r.s#=78
#blah.blah.blah#=blah
This shows that it's possible for a default.x.x.x=y to precede a x.x.x=y token (as #default.q.r.s#=1 preceedes #q.r.s#=78`), my prior example wasn't clear about this. I do actually have an expression to capture this, it looks a bit like this:
\#default\.([^\n\r=#|]+)#=([^\n\r|]*)(.*)#\1#=([^\n\r|]*)(.*)
I know line separators are missing from this even though they were in the other one - I was experimenting with removing all line separators and treating it as a single line but that hasn't helped. I can resolve this problem simply by running each replacement multiple times by copying and pasting the configurations a few times, but that is not a good solution and will fail eventually.

I don't believe you could solve this problem as is, a work-around is to reverse the order of the file top to bottom, perform lookahead regex and then reverse the result order
pattern = #default\.(.*?)#[^\r\n]+(?=[\s\S]*#\1#) Demo
another way (depending on the capabilities of "Maven") is to run this pattern
#(.*)(#[\s\S]*)#default\.\1.*
and replace with #$1$2 Demo in a loop until there are no matches
then run this pattern
#default\.(.*)#.*(?=[\s\S]*\1)
and replace with nothing Demo in a loop until there are no matches

It doesn't look like the replacer plugin can actually do what I want. I got around this by using regular expressions to build multiple filter files, and then applying them to the resource files.
My original goal had been to use regular expressions to build a single, clean, and tidy filter file. In the end, I discovered that I was able to get away with just using multiple filters (not as clean or tidy) and apply them in the correct order.

Related

Trying to eliminate second regex exec

I am wondering if there is a way to declare boundaries other start of line or end of line but based on a value in the text. I am trying to optimize my code and right now I find a section in my doc and extract it based on a regular expression. Then I run that extracted section through another expression.
For simplicity my text looks like the
<start><doc><font>123</font></doc><doc><font>234</font></doc><doc><font>345</font></doc><doc><font>456</font></doc><end>
Since my <start> is not the start but somewhere in doc I have to find that. I assume if its possible it should be more effective then running two expr exec's to get the data. Anything small will help as my script will have to run at least one million times.
Not really sure about the efficiency, if your data would be as simple and clean as it is printed in the question, this expression might be an start:
(<start>(<doc>(<font>.*?<\/font>)<\/doc>)<end>)
Otherwise, you might want to clean your data first, and maybe find some alternative solutions.
DEMO

Regex-Match while ignoring a char from Searchword

I am using an Engineering Program which lets me Code formulas in order to filter out specific lines in a database. I am trying to look for a certain line in the database which contains e.g. "concrete" as a property.
In the Code I can use regular expressions.
The regex I was using so far looked like this:
".*(concrete).*";
so if the line in the database contains concrete, I will get the wanted result.
Now the Problem is: i would like to switch the word concrete with a variable, so that it Looks like this:
".*(#VARIABLE1).*";
(the Syntax with the # works in the program btw.)
the Problem is: if i set the variable as concrete, the program automatically switches it for 'concrete' . Obviously, the word concrete cant be found anymore, since the searchterm now contains the two ' Symbols in the beginning and i the end.
Is there a way to ignore those two characters using the Right regex?
what I want it to do is the following:
If a line in the database contains "25cm concrete in Grey"
I should get a match from the regex.
with the searchterm ".*(concrete).*"; it works, with the variable ".*(#VARIABLE1).*"; it doesnt.
EDIT:
the whole "Formula" in the program Looks like that:
if(Match(QTO(Typ:="Attribut{FloorsLayer_02_MaterialName}");".*(#V_QUALITY).*" ;"regex") ;QTO(Typ:="Attribut{Fläche}");0)
I want the if-condition to be true, when the match inside is true.
the whole QTO function is just the programs Syntax to use a certain Attribute into the match-function, the middle part is my Problem. I really don't know the programming language or anything,I'm new to this. hope it helps!
Thats more of a hack than a real solution and i'm not sure if it even works:
if you use the regex
.*(#VARIABLE1)?).*
and the string ?concrete(
this will result in a regex looking like this:
.*('?concrete(')?).*
which makes the additional characters optional.
This uses the following assumtption:
the string (#VARIABLE1) gets replaced by the ('<content of VARIABLE1>')

Notepad++ - Selecting or Highlighting multiple sections of repeated text IN 1 LINE

I have a text file in Notepad++ that contains about 66,000 words all in 1 line, and it is a set of 200 "lines" of output that are all unique and placed in 1 line in the basic JSON form {output:[{output1},{output2},...}]}.
There is a set of characters matching the RegEx expression "id":.........,"kind":"track" that occurs about 285 times in total, and I am trying to either single them out, or copy all of them at once.
Basically, without some super complicated RegEx terms, I am stuck because I can't figure out how to highlight all of them at once, and also the Remove Unbookmarked Lines feature does not apply because this is all in one line. I have only managed to be able to Mark every single occurrence.
So does this require a large number of steps to get the file into multiple lines and work from there, or is there something else I am missing?
Edit: I have come up with a set of Macro schemes that make the process of doing this manually work much faster. It's another alternative but still takes a few steps and quite some time.
Edit 2: I intended there to be an answer for actually just highlighting the different sections all at once, but I guess that it not possible. The answer here turns out to be more useful in my case, allowing me to have a list of IDs without everything else.
You seem to already have a regex which matches single instances of your pattern, so assuming it works and that we must use Notepad++ for this:
Replace .*?("id":.........,"kind":"track").*?(?="id".........,"kind":"track"|$) with \1.
If this textfile is valid JSON, this opens you up to other, non-notepad++ options, like using Python with the json module.
Edited to remove unnecessary steps

Regex for converting file path to package/namespace

Given the following file path:
/Users/Lawrence/MyProject/some/very/interesting/Code.scala
I would like to generate the following using a single regex replace (the root can be a constant):
some.very.interesting
This is for the purpose of generating a snippet for Sublime Text which can automatically insert the correct package/namespace header for my scala/java classes :)
Sublime Text uses the following syntax for their regex replace patterns (aka 'substitutions'):
{input/regex/replace/flags}
Hence why an iterative approach cannot be taken - it has to be done in one pass! Also, substitutions cannot be nested :(
If you know the maximum number of nested folders.You can specify that in your regex.
For 1 to 3 nested folders
Regex:/Users/Lawrence/MyProject/(\w+)/?(\w+)?/?(\w+)?/[^/]+$
Replace:$1.$2.$3
For 1 to 5 nested folders
Regex:/Users/Lawrence/MyProject/(\w+)/?(\w+)?/?(\w+)?/?(\w+)?/?(\w+)?/[^/]+$
Replace:$1.$2.$3.$4.$5
Given the constraints this is only thing you can do
Input
/Users/Lawrence/MyProject/some/very/interesting/Code.scala
Regex
^/Users/Lawrence/MyProject/[^/]+/[^/]+/[^/]+/Code.scala
or
^/[^/]+/[^/]+/[^/]+/([^/]+)/([^/]+)/([^/]+)/
Replace
\1.\2.\3
Update
This gets you closer, but not exactly it:
Regex
(^/Users/Lawrence/MyProject/|/Code\.scala$|/)
Replacement
.
Output would be:
.some.very.interesting.
Without multiple replacements in a single line and without recursive back references it's going to be hard.
You might have to do a second replacement, replacing something like this with an empty string (if you can):
(^\.|\.$)

Find Lines with N occurrences of a char

I have a txt file that I’m trying to import as flat file into SQL2008 that looks like this:
“123456”,”some text”
“543210”,”some more text”
“111223”,”other text”
etc…
The file has more than 300.000 rows and the text is large (usually 200-500 chars), so scanning the file by hand is very time consuming and prone to error. Other similar (and even more complex files) were successfully imported.
The problem with this one, is that “some lines” contain quotes in the text… (this came from an export from an old SuperBase DB that didn’t let you specify a text quantifier, there’s nothing I can do with the file other than clear it and try to import it).
So the “offending” lines look like this:
“123456”,”this text “contains” a quote”
“543210”,”And the “above” text is bad”
etc…
You can see the problem here.
Now, 300.000 is not too much if I could perform a search using a text editor that can use regex, I’d manually remove the quotes from each line. The problem is not the number of offending lines, but the impossibility to find them with a simple search. I’m sure there are less than 500, but spread those in a 300.000 lines txt file and you know what I mean.
Based upon that, what would be the best regex I could use to identify these lines?
My first thought is: Tell me which lines contain more than 4 quotes (“).
But I couldn’t come up with anything (I’m not good at Regex beyond the basics).
this pattern ^("[^"]+){4,} will match "lines containing more than 4 quotes"
you can experiment with replacing 4 with 5 or more, depending on your data.
I think that you can be more direct with a Regex than you're planning to be. Depending on your dialect of Regex, something like this should do it:
^"\d+",".*".*"
You could also use a regex to remove the outside quotes and use a better delimeter instead. For example, search for ^"([0-9]+)","(.*)"$ and replace it with \1+++++DELIM+++++\2.
Of course, this doesn't directly answer your question, but it might solve the problem.