Regular Expression Union - regex

I am trying to have a union of regular expression such that it excludes *.log file and includes *.pl file by
^(.(?!(.log)))|^(.*\.pl)$
What would be the correct syntax for that?

You are close. Your lookahead would look like this.
^(?!.*\.log).*\.pl$
However, your regex is anchored so you could simply match the ones that end in .pl.
^.*\.pl$
To match all lines excluding the ones that end in .log assuming this is a possibility for what you need.
^(?!.*\.log).*$

You don't need to exclude the others, if it doesn't match it will not be in the output (demo):
^.*?\.pl$

Was not sure about the exact scenario in the question, but if you actually need to list all files with extensions, excluding the *.log and *.pl ones, you could use:
^.*\.(?!(pl|log))[^.]+$

Related

Regex - Globally replace a slash in between given XML tags only

I'm trying to replace backslashes by forwardslashes, globally, over several lines, in an xml file but only in a given tag.
Example where I want to work on the content of Path:
<name>file1</name><path>c:\folder\folder</path><test>just\the\lolz</test>
<name>file2</name><path>c:\folder\folder\folder</path><test>some more\lolz</test>
Should become:
<name>file1</name><path>c:/folder/folder</path><test>just\the\lolz</test>
<name>file2</name><path>c:/folder/folder/folder</path><test>some more\lolz</test>
I've been trying with look arounds and recursion but I'm getting nowhere...
Last useless try was:
(?<=path>)(\w*?(\x2F))+(?=.*<\/path>)
Thanks!
You can search for this:
(?<=path>[^<]*)\\
And replace with this:
/
It's worth cautioning you that this will not work with any and every XML file. Truly parsing XML files and properly handling any and all possible legal XML is not possible (or at least recommended) with regex, but as long as the data is consistent, that should suffice.
This is what you need: (?<=<path>.*)\/(?=.*<\/path>)
Note: Does not work in JavaScript, because JavaScript does not support lookbehinds.
Let me explain:
(?<=<path>.*) this is a lookbehind for and any characters until it finds preceding the character you insert after it in our case \/ (an escaped / )
(?=.*<\/path>) this is a lookahead, works the same as the lookbehind, but searches everything to the right of the string preceding it, in our case the /. The lookahead does work in JavaScript.
Hope it helped.
Here is simple selecting phrases between <path>...</path>
(?<=path>).*(?=<\/path)
Example in Regex101

capturing a repeating pattern with regex

I'm trying to match a pattern like this CODE-UH87H-98HSH-HB383-JWWB2U and I have the following regex pattern CODE\-[A-Z0-9]+\-[A-Z0-9]+\-[A-Z0-9]+\-[A-Z0-9]+ but is there a better way of doing this? I tried CODE(\-[A-Z0-9]+\-){4} and it didn't work
I tried CODE(\-[A-Z0-9]+\-){4} and it didn't work
That does require two dashes in succession. In full, it would be CODE\-[A-Z0-9]+\-\-[A-Z0-9]+\-\-[A-Z0-9]+\-\-[A-Z0-9]+\-. What you want is
CODE(\-[A-Z0-9]+){4}
You were almost there. CODE(\-[A-Z0-9]+){4} should work!
When the pattern between the dashes may contain any character, the following regex is even shorter:
CODE(-[^-]+){4}
Of course you may have to add \ for escaping before the dash depending on what regex engine you will use.

A regular expression to match the server root

I need a regular expression to match the next pattern : \server\root\
So if the path is longer such as \\server\root\subroot it is not matched.
Check that the required pattern occurs at the start of the string, and at the end of the string (i.e. nothing follows it).
^\\server\\root\\$
You will most likely have to escape the slashes, hence the double \\.
If the requirement is to match any two-level path ,the following might be useful
^\\[\w]+\\[\w]+\\$
\server\root\$
above should do it, it says it must end with a root. and it will not match anything else.
Probably
^\\[[:word:]]+\\[[:word:]]+\\$
will be OK. You can replace [[:word:]] with \w probably, depending on your regex engine.
^\\[\w]+\\[\w]+\\$ should match any two level paths.

Regex to include some files but with one exception

I would like a regex that includes all filenames with a certain ending ex. ".err" but not if this filename starts with e.g. "test". In other words include "*.err"-files but not "test-whatever.err"-files.
I have found that
(?!test.*\.err$).*\.err
excludes the test*.err files and that
.*\.err
includes all the *.err files, but I need them both in the same expression.
Also the fact that the ".err" can be written as ".ERR" or ".Err" must be taken into concideration for this regex to work properly for me.
All thoughts and ideas are appreciated!
Regards
Rickard
Use this
^(?i)(?!test).*\.err$
See it here online on Regexr
The important parts, that are different to yours:
Use anchors. ^ and $ are anchoring your pattern to the start and to the end of the string
(?i) makes it "ignorecase", so that err will also match "ERR" or "ErR" and test will also match "Test" and TEST ...
You didn't gave the language, but this features should work with the most flavours.
How about this one:
^(?!test).*\.err$

How do I properly match Regular Expressions?

I have a list of objects output from ldapsearch as follows:
dn: cn=HPOTTER,ou=STUDENTS,ou=HOGWARTS,o=SCHOOL
dn: cn=HGRANGER,ou=STUDENTS,ou=HOGWARTS,o=SCHOOL
dn: cn=RWEASLEY,ou=STUDENTS,ou=HOGWARTS,o=SCHOOL
dn: cn=DMALFOY,ou=STUDENTS,ou=HOGWARTS,o=SCHOOL
dn: cn=SSNAPE,ou=FACULTY,ou=HOGWARTS,o=SCHOOL
dn: cn=ADUMBLED,ou=FACULTY,ou=HOGWARTS,o=SCHOOL
So far, I have the following regex:
/\bcn=\w*,/g
Which returns results like this:
cn=HPOTTER,
cn=HGRANGER,
cn=RWEASLEY,
cn=DMALFOY,
cn=SSNAPE,
cn=ADUMBLED,
I need a regex that returns results like this:
HPOTTER
HGRANGER
RWEASLEY
DMALFOY
SSNAPE
ADUMBLED
What do I need to change in my regex so the pattern (the cn= and the comma) is not included in the results?
EDIT: I will be using sed to do the pattern matching, and piping the output to other command line utilities.
You will have to perform a grouping. This is done by modifying the regex to:
/\bcn=\(\w*\),/g
This will then populate your result into a grouping variable. Depending on your language how to extract this value will differ. (For you with sed the variable will be \1)
Note that most regex flavors you don't have to escape the brackets (), but since you're using sed you will need to as shown above.
For an excellent resource on Regular Expressions I suggest: Mastering Regular Expressions
OK, the place where you asked the more specific question was closed as "exact duplicate" of this, so I'm copying my answer from there to here:
If you want to use sed, you can use something like the following:
sed -e 's/dn: cn=\([^,]*\),.*$/\1/'
You have to use [^,]* because in sed, .* is "greedy" meaning it will match everything it can before looking at any following character. That means if you use \(.*\), in your pattern it will match up to the last comma, not up to the first comma.
Check out Expresso I have used it in the past to build my RegEx. It is good to help learning too.
The quick and dirty method is to use submatches assuming your engine supports it:
/\bcn=(\w*),/g
Then you would want to get the first submatch.
Without knowing what language you're using, we can't tell for sure, but in most regular expression parsers, if you use parenthesis, such as
/\bcn=(\w*),/g
then you'll be able to get the first matching pattern (often \1) as exactly what you are searching for. To be more specific, we need to know what language you are using.
If your regex supports Lookaheads and Lookbehinds then you can use
/(?<=\bcn=)\w*(?=,)/g
That will match
HPOTTER
HGRANGER
RWEASLEY
DMALFOY
SSNAPE
ADUMBLED
But not the cn= or the , on either side. The comma and cn= still have to be there for the match, it just isn't included in the result.
Sounds more like a simple parsing problem and not regex. An ANTLR grammar would sort this out in no time.