Replace string using regular expression in KETTLE - regex

I would like to use regular expression for replacing a certain pattern in the Kettle. For example, AAAA >5< BBBB, I want to replace this with AAAA 555 BBBB. I know how to find the pattern, but I am not sure how to replace that with new string. The one thing I have to keep is that I have to find pattern together ><, not separately like > or < because there is another pattern <5>.

You can use the "Replace in String" step in a transformation.
Set use RegEx to "Y", type your regex on the Search box, with capturing groups if necessary, and the replacement string in the replacement box, referring to capture groups as $1, $2, ...
It'll replace all occurrences of the regex in the original string.
If the Out Stream field is ommitted, it'll overwrite the In stream field.

If you want the pattern >\d< replaced by a triple of the found digit, you can use Replace-In-String in regex mode:
Search: (.*)(>(\d)<)(.*)
Replace: $1$3$3$3$4
If you want all such patterns treated the same:
Search: (>(\d)<)
Replace: $2$2$2
EDIT due to your improved requirement
Since you intend to convert your "simple" markup to a more HTML-like markup, you better use a User-Defined-Java-Expression. Also, you must avoid to reintroduce simple markup when replacing repeatedly.

Related

VSCode - find and replace with regexp, but keep word

I have multiple occurance of src={icons.ICON_NAME_HERE} in my code, that I would like to change to name="ICON_NAME_HERE".
Is it possible to do it with regular expressions, so I can keep whatever is in code as ICON_NAME_HERE?
To clarify:
I have for example src={icons.upload} and src={icons.download}, I want to do replace all with one regexp, so those gets converted to name="upload" and name="download"
Try searching on the following pattern:
src=\{icons\.([^}]+)\}
And then replace with your replacement:
name="$1"
In case you are wondering, the quantity in parentheses in the search pattern is captured during the regex search. Then, we can access that captured group using $1 in the replacement. In this case, the captured group should just be the name of the icon.

VIM - Replace based on a search regex

I've got a file with several (1000+) records like :
lbc3.*'
ssa2.*'
lie1.*'
sld0.*'
ssdasd.*'
I can find them all by :
/s[w|l].*[0-9].*$
What i want to do is to replace the final part of each pattern found with \.*'
I can't do :%s//s[w|l].*[0-9].*$/\\\\\.\*' because it'll replace all the string, and what i need is only replace the end of it from
.'
to
\.'
So the file output is llike :
lbc3\\.*'
ssa2\\.*'
lie1\\.*'
sld0\\.*'
ssdasd\\.*'
Thanks.
In general, the solution is to use a capture. Put \(...\) around the part of the regex that matches what you want to keep, and use \1 to include whatever matched that part of the regex in the replacement string:
s/\(s[w|l].*[0-9].*\)\.\*'$/\1\\.*'/
Since you're really just inserting a backslash between two strings that you aren't changing, you could use a second set of parens and \2 for the second one:
s/\(s[w|l].*[0-9].*\)\(\.\*'\)$/\1\\\2/
Alternatively, you could use \zs and \ze to delimit just the part of the string you want to replace:
s/s[w|l].*p0-9].*\zs\ze\*\'$/\\/

How to store regex match in a variable to use with Search/Replace in Notepad++

I need to capture a word using a pattern and store somewhere to use in replace operation:
For instance I have this text
ab iiids
as sasas
md aisjaij
as asijasija
I am able to capture everething before the first space using the following pattern:
.*?(?=[" "])
Let´s say I could use '$' to represent the match in the search and replace so I would like to have the following in the replace field
INSERT INTO table (value) VALUES ("$")
Could not find a solution.
Is this even possible?
Yes, $& or $0 represents the entire match in the replacement string:
Find what: .*?(?=[ ])
Replace with: INSERT INTO table \(value\) VALUES \("$0"\)
Note that you don't need the quotes inside the character class. Otherwise your match will stop at a quote as well. Also note, that there is another catch, that you have to escape parentheses in the replacement string (thanks to acdcjunior for noticing that). That's because Notepad++ uses boost, which supports a parenthesis-delimited conditional construct.
Using your input, this will result in
INSERT INTO table (value) VALUES ("ab") iiids
INSERT INTO table (value) VALUES ("as") sasas
INSERT INTO table (value) VALUES ("md") aisjaij
INSERT INTO table (value) VALUES ("as") asijasija
For more advanced use cases, you can wrap parts of the regex in parentheses, and reference the substrings they matched with $1, $2, and so on.
By the way, a more explicit pattern would be:
^[^ ]*(?=[ ])
This makes sure that you don't get additional matches where you don't want them.
Finally, here is a reference of all the things you can do in the replacement string.

Using parameters in regular expressions

I am trying to use NotePad++ to do a search and replace using the regex function that replaces a string of characters but maintains one part of the string. My description isn't very good so perhaps it will be better if I just give you the example.
Throughout and xml doc I have the following elements...
<AddressLine3>addressLine3>
<AddressLine2>addressLine2>
I want to replace these with
<addressLine3> <addressLine2>
So I need to maintain the address line number.
I know that
AddressLine([0-9]{1})>addressLine([0-9]{1})
is a valid reg ex but I'm not sure what to put in the replace with section to tell it to maintain whatever value was found by ([0-9]{1}).
Thanks.
It's \{number of the group}, so \1, \2, ...
Edit with your precisions (I changed a bit your regex for simpler groups):
(AddressLine[0-9]{1}>)(addressLine[0-9]{1}) is replaced by \2
You can capture it in group and replace them
Find:(AddressLine[0-9])>(addressLine[0-9])
Replace:$1 <$2
Find what : (<AddressLine\d>)AddressLine\d
Replace by: $1
You have to select the choice regular expression

How to cycle through delimited tokens with a Regular Expression?

How can I create a regular expression that will grab delimited text from a string? For example, given a string like
text ###token1### text text ###token2### text text
I want a regex that will pull out ###token1###. Yes, I do want the delimiter as well. By adding another group, I can get both:
(###(.+?)###)
/###(.+?)###/
if you want the ###'s then you need
/(###.+?###)/
the ? means non greedy, if you didn't have the ?, then it would grab too much.
e.g. '###token1### text text ###token2###' would all get grabbed.
My initial answer had a * instead of a +. * means 0 or more. + means 1 or more. * was wrong because that would allow ###### as a valid thing to find.
For playing around with regular expressions. I highly recommend http://www.weitz.de/regex-coach/ for windows. You can type in the string you want and your regular expression and see what it's actually doing.
Your selected text will be stored in \1 or $1 depending on where you are using your regular expression.
In Perl, you actually want something like this:
$text = 'text ###token1### text text ###token2### text text';
while($text =~ m/###(.+?)###/g) {
print $1, "\n";
}
Which will give you each token in turn within the while loop. The (.*?) ensures that you get the shortest bit between the delimiters, preventing it from thinking the token is 'token1### text text ###token2'.
Or, if you just want to save them, not loop immediately:
#tokens = $text =~ m/###(.+?)###/g;
Assuming you want to match ###token2### as well...
/###.+###/
Use () and \x. A naive example that assumes the text within the tokens is always delimited by #:
text (#+.+#+) text text (#+.+#+) text text
The stuff in the () can then be grabbed by using \1 and \2 (\1 for the first set, \2 for the second in the replacement expression (assuming you're doing a search/replace in an editor). For example, the replacement expression could be:
token1: \1, token2: \2
For the above example, that should produce:
token1: ###token1###, token2: ###token2###
If you're using a regexp library in a program, you'd presumably call a function to get at the contents first and second token, which you've indicated with the ()s around them.
Well when you are using delimiters such as this basically you just grab the first one then anything that does not match the ending delimiter followed by the ending delimiter. A special caution should be that in cases as the example above [^#] would not work as checking to ensure the end delimiter is not there since a singe # would cause the regex to fail (ie. "###foo#bar###). In the case above the regex to parse it would be the following assuming empty tokens are allowed (if not, change * to +):
###([^#]|#[^#]|##[^#])*###