I am doing some replaces in some huge SSIS packages to reflect changes in table- and column names.
Some of the tabels have columnnames witch are identical to the tablenames and I need to match the columnname without matching the tablename.
So what i need is a way to match MyName in [MyName] but not in [dbo].[MyName]
(?<=\[)(MyName)(?=\]) matches both, and I thought that (?<!\[dbo\]\.)(?<=\[)(MyName)(?=\]) would do the trick, but it does not seem to work.
You need to include the opening square bracket in the first lookbehind:
(?<!\[dbo\]\.\[)(?<=\[)(MyName)(?=\])
Related
Is there a way in regular expressions to match a subset of words against a set of words separated by a separator that does not involve creating a new pattern for every new word added to the set.
Right now I cannot think of anything else than creating a (?:{item1, item2, ...}) pattern for every extra item in the set (see example below).
Example matching a single word of the set:
Set: foo,bar,baz
Match: foo
RegExp:/^(foo|bar|baz)$/ <- MATCH
Example that will match a subset of words:
Set: foo,bar,baz
Match: foo,bar
RegExp: /^(foo|bar|baz)(?:,(foo|bar|baz)(?:,(foo|bar|baz))?)?$/ <- MATCH
The pattern grows rapidly when adding new items to the set. Is there some (magical) way to do this in a shorter version?
One general approach which looks slightly better than your current attempt would be to use lookaheads:
^(?=.*\bfoo\b)(?=.*\bbar\b).*$
Demo
You may add one lookahead assertion for each CSV term which needs to be matched in the input CSV list.
Edit: If you want OR behavior here, then we can use an alternation of lookaheads. To match either foo or bar as a CSV term we can try:
^(?:(?=.*\bfoo\b)|(?=.*\bbar\b)).*$
I have a couple of RegEx that work on the online regex websites but not in Pentaho. Could you please help?
Here's the string:
:6585d0f0ba88767ac3b590f719596d864d73e9c1:
harmonicbalance/src/harmonicbalance/HarmonicBalanceFlowModel.cpp
harmonicbalance/src/harmonicbalance/HbFlutterModel.cpp
:8302994b565553c83a048b8905ae597349d99627:
emp/src/emp/PhasePairSingleParticleReynoldsNumber.h
emp/src/emp/TomiyamaDragCoefficientMethod.cpp
:9da194f17ec08bb20ad1be8df68b78ca137ab18a:
combustion/src/combustion/ReactingSpeciesTransportBasedModel.cpp
combustion/src/complexchemistry/TurbulentFlameClosure.cpp
:6a59f0be1e347a65e525e58742bb304639ea9bc4:
meshing/src/meshing/SurfaceMeshManipulation.cpp
physics/src/discretization/FvIndirectRegionInterfaceManager.cpp
physics/src/discretization/FvIndirectRegionInterfaceManager.h
physics/src/discretization/FvRepresentation.cpp
physics/src/discretization/FvRepresentation.h
:64b7f6d36b11b6cd94c20cad53463b7deef8c85a:
resourceclient/src/resourceclient/ResourcePool.cpp
resourceclient/src/resourceclient/ResourcePool.h
resourceclient/src/resourceclient/RestClient.cpp
resourceclient/src/resourceclient/RestClient.h
resourceclient/src/resourceclient/test/ResourcePoolTest.cpp
I would like to capture two groups. First group will extract all commit SHA1 and the other group would extract file names.
Below are the expressions I tried:
(?:^:([A-Za-z0-9]+):|(?!^)\G)\n+([A-Za-z/.-]+)
https://regex101.com/r/3IBkPz/1
^:(\w+):\s+((?:\s*(?!:)[^\s]+)+)
https://regex101.com/r/oIoDvM/1
Thoughts?
AFAIK (as of PDI-8.0), the Regex Evaluation step does NOT support the regex 'g' modifier, your regex pattern must cover all the text to be able to make a match.
For example: the following pattern will not match anything in Regex Evaluation step:
:([0-9a-f]+):\s+([^:]+)
but if I prepend .* to this pattern and pick "Enable dotall mode":
.*:([0-9a-f]+):\s+([^:]+)
it will match the last commit(sha1 + filenames). You can try move .* to the end of
the original pattern which will get you the first commit. So if you want to retrieve
the full list of commits(sha1 + filenames) with the g modifier, this step is
probably not a solution for you.
As the fields are basically split by colons ':' and new lines, you can probably try the following approach:
Use Split field to rows step, Delimiter=':' and include rownum in output, this rownum can be used to filter rows where even number is sha1 and odd number is filenames
Use Analytic Query step to create a new field with LEAD = 1, so now you can get sha1 and filenames in the same row
Use Calculator and Fileter step to calculate the remainer of rownum/2 and keep only rows with the odd number of rownum
Use Split fields to rows again to split filenames to filename using "\n"(Delimiter is a Regular Expression). you might want to filter out the EMPTY filename, since the delimiter only support one char
So I have the following table in PostgreSQL.
This is a test table only with one column route that has values of route names like
I-95
US-95N
I-95 S
I want to remove the trailing direction literals from all the route names.
UPDATE <schema>.<table>
SET route= regexp_replace(route, '%[:digit:](S|N|E|W)', '%[:digit:]', 'ig');
No change in the records happens. Anyone has any idea what I am doing wrong here?
To remove any single letter signifying a cardinal direction following immediately after a digit:
UPDATE tbl
SET route = regexp_replace(route, '(\d)[SNEW]', '\1', 'ig')
SQL Fiddle.
A positive lookbehind match would be even more elgant, but sadly only lookahead matches are implemented. So I use a back-reference to re-insert the first (captured) part from the match.
The bracket expression [SNEW] is simpler for the case than multiple branches (S|N|E|W), which would need non-capturing parentheses in this case: (:?S|N|E|W).
I am writing a regex to match a list of items that follow a specific complex format, so the regex for that is very long. The items on this list have to be separated by either a comma, which can optionally be padded with either one space on the right or spaces on both sides, so the regex for matching the delimiter is ( , )|(, ?). Also, I want the list to be between square brackets.
For example, it should match the following:
[]
[validItem]
[validItem,validItem, validItem]
But not the following:
[validItem,invalidItem]
[validItemvalidItem]
[validItem, validItem ]
The regex I currently have is: \[verylongregex(?:(?: , )|(?:, ?)verylongregex)*\], but I'd like to simplify this to include the regex pattern that matches the element format only once.
Does regex have a method to match X groups separated by another group?
Here is an answer. I don`t know if it is what you are looking for, but here it is nonetheless.
1/ Assuming you want to capture the list in one group:
(\[(?:complexRegex(?: , |, ?|\]))+)
Demo: http://regex101.com/r/pW2oZ1/1
2/ Assuming you want all element of the list matched separately, this is a much more complex thing (at least for my knowledge...). Here is a working (complex) solution:
(?:\[|(?!\[)\G(?: , |, ?))(complexRegex)(?=(?:(?: , |, ?)complexRegex)*\])
Demo: http://regex101.com/r/iB3jD1/2
I don't have the time to write an explanation right now if it's needed. Ask for it in the comments if you want one, I'll write it later today. Sorry...
I am using positive lookbehind and lookahead to match a word between certain parts (FROM and TO strings).
.*(?<=FROM)\s+(.*?)\s+(?=TO).*
EDIT: That approach cannot be changed. Need to assume, not a workaround for the approach itself, thank you! It's more a theoretical question about how to deal with that lokaheads in-between matching.
I'd like to input an string like
FROM table a, table2 b TO
and obtain as \1 table and table2. a and b labels are optional.
My problem is that if I place something like (?:(\w+)\s*,?)+? for matching every table part, it seems like it's done backwards
http://regex101.com/r/mV4rD8
If I'm understanding what you want correctly, you don't need lookahead/behind. You can do:
FROM (?:(\w+)(?: \w)*(?:,)? )+TO
Of the three parts inside the outermost parentheses, the second and third need to be treated separately because they are optional for different reasons. The second is present if the a and b labels are present. The third is present if the table is not the last one in the list.
This will capture the table names as you described. So e.g.:
FROM table1 a, table2, table3 c TO
Will capture "table1", "table2" and "table3".
I used literal spaces, but you can replace them with \s if you prefer.
EDIT: With the lookahead and lookbehind still present, as per your requirement:
.*(?<=FROM)\s+(?:(\w+)(?:\s+\w)*(?:\s*,)?\s+)+(?=TO).*