Regular Expression Question regarding search&replace - regex

I'm trying to match cases with regular expression to search and replace some text of given pattern. I can match the pattern, but I'd like to keep some of the literals when replacing.
For example, from the string "abcd123," I'd like to keep abcd but remove 123. I can match the pattern using a simple regular expression like [a-zA-Z0-9]+, but when I want to replace it, I don't know what to use for the replacement. Is this even possible with just regular expressions?
Thanks a lot.

The answer depends on what language/regex engine you are using. You typically use parentheses to save sections matched and either $1, $2, ... or \1, \2, ... in the replacement string to refer to those sections.
For example, from JavaScript:
var x = "Hello World";
x.replace( /([A-Z])\w+/g, '$1xx' );
// "Hxx Wxx"
What language or text editor are you using?

Related

Replace string using regular expression in KETTLE

I would like to use regular expression for replacing a certain pattern in the Kettle. For example, AAAA >5< BBBB, I want to replace this with AAAA 555 BBBB. I know how to find the pattern, but I am not sure how to replace that with new string. The one thing I have to keep is that I have to find pattern together ><, not separately like > or < because there is another pattern <5>.
You can use the "Replace in String" step in a transformation.
Set use RegEx to "Y", type your regex on the Search box, with capturing groups if necessary, and the replacement string in the replacement box, referring to capture groups as $1, $2, ...
It'll replace all occurrences of the regex in the original string.
If the Out Stream field is ommitted, it'll overwrite the In stream field.
If you want the pattern >\d< replaced by a triple of the found digit, you can use Replace-In-String in regex mode:
Search: (.*)(>(\d)<)(.*)
Replace: $1$3$3$3$4
If you want all such patterns treated the same:
Search: (>(\d)<)
Replace: $2$2$2
EDIT due to your improved requirement
Since you intend to convert your "simple" markup to a more HTML-like markup, you better use a User-Defined-Java-Expression. Also, you must avoid to reintroduce simple markup when replacing repeatedly.

get a substring with regular expression not left

I have a text like this:
a = CreateObject("1-SI")
foo bar 'blah blah CreateObject("2-No")
'CreateObject("3-No")
with regular expression i want select all CreateObject("...") substrings that don't have the ' character on the left
How can I do this?
You can do it like this (example at RegExr)
^(?:[^']*?)(CreateObject\(".*?"\))
Not sure about VB6s regex - but this doesn't require lookahead or behind.
The first capturing group is the CreateObject(..) part. You will need to use multiline mode (if possible in VB6).
Why don't you just try [^']*CreateObject(...)?
Another solotion would be at negative lookbehinds. Note that this kind of construct is not supported by all programming languages, not to speak of regexp engines in text editors.

regular expression for find and replace

I've got strings like:
('Michael Herold','Michael Herold'),
but I need to remove the last parts so I end up with:
('Michael Herold'),
I'm still new to Regular Expressions so they confuse me. I'm using Notepad++.
find: \('([^']*)','\1'\)
Replace: ('\1')
So the actual function you use will depend on the language. Notepad++ is a text editor, not a language.
The regular expression that you will want will be ",'Michael Herold'" and you'll replace any matches with "", the empty string.
So in PHP for example, you'll have
$source = "('Michael Herold','Michael Herold')";
$pattern = "/(,'Michael Herold')+/";
$newString = $preg_replace($pattern, $source, "");
Do the equivalent in whatever language you use.
I'm not sure what flavor of regular expressions Notepad++ uses, but try replacing this expression:
\('([^']*)','\1'\)
with this one:
('$1')
The \1 matches whatever was found in the first set of single quotes (Michael Herold in your example), and $1 is replaced with that same string. (Try \1 if $1 doesn't work in Notepad++.)
See it in action here.

Matching single or double quoted strings in Vim

I am having a hard time trying to match single or double quoted strings with Vim's
regular expression engine.
The problem is that I am assigning the regular expression to a variable and then using that
to play with matchlist.
For example, let's assume I know I am in a line that contains a quoted string and I want to match it:
let regex = '\v"(.*)"'
That would work to match anything that is double-quoted. Similarly, this would match single quoted strings:
let regex = "\v'(.*)'"
But If I try to use them both, like:
let regex = '\v['|"](.*)['|"]'
or
let regex = '\v[\'|\"](.*)[\'|\"]'
Then Vim doesn't know how to deal with it because it thinks that some quotes are not being closed in the actual variable definition and it messes up the regular expression.
What would be the best way to catch single or double quoted strings with a regular expression?
Maybe (probably!) I am missing something really simple to be able to use both quotes and not worry about the surrounding quotes for the actual regular expression.
Note that I prefer single quotes for regular expression because that way I do not need to double-backslash for escaping.
You need to use back references. Like so:
let regex = '\([''"]\)\(.\{-}\)\1'
Or with very-magic
let regex = '\v([''"])(.{-})\1'
Alternatively you could use (as it will not mess with your sub-matches):
let regex = '\%("\([^"]*\)"\|''\([^'']*\)''\)'
or with very magic:
let regex = '\v%("([^"]*)"|''([^'']*)'')'
look at this post
Replacing quote marks around strings in Vim?
might help in some way
This is a workable script I write for syntax the quoted strings.
syntax region myString start=/\v"/ skip=/\v(\\[\\"]){-1}/ end=/\v"/
syntax region myString start=/\v'/ end=/\v'/
You may use \v(\\[\\"]){-1} to skip something.

How to cycle through delimited tokens with a Regular Expression?

How can I create a regular expression that will grab delimited text from a string? For example, given a string like
text ###token1### text text ###token2### text text
I want a regex that will pull out ###token1###. Yes, I do want the delimiter as well. By adding another group, I can get both:
(###(.+?)###)
/###(.+?)###/
if you want the ###'s then you need
/(###.+?###)/
the ? means non greedy, if you didn't have the ?, then it would grab too much.
e.g. '###token1### text text ###token2###' would all get grabbed.
My initial answer had a * instead of a +. * means 0 or more. + means 1 or more. * was wrong because that would allow ###### as a valid thing to find.
For playing around with regular expressions. I highly recommend http://www.weitz.de/regex-coach/ for windows. You can type in the string you want and your regular expression and see what it's actually doing.
Your selected text will be stored in \1 or $1 depending on where you are using your regular expression.
In Perl, you actually want something like this:
$text = 'text ###token1### text text ###token2### text text';
while($text =~ m/###(.+?)###/g) {
print $1, "\n";
}
Which will give you each token in turn within the while loop. The (.*?) ensures that you get the shortest bit between the delimiters, preventing it from thinking the token is 'token1### text text ###token2'.
Or, if you just want to save them, not loop immediately:
#tokens = $text =~ m/###(.+?)###/g;
Assuming you want to match ###token2### as well...
/###.+###/
Use () and \x. A naive example that assumes the text within the tokens is always delimited by #:
text (#+.+#+) text text (#+.+#+) text text
The stuff in the () can then be grabbed by using \1 and \2 (\1 for the first set, \2 for the second in the replacement expression (assuming you're doing a search/replace in an editor). For example, the replacement expression could be:
token1: \1, token2: \2
For the above example, that should produce:
token1: ###token1###, token2: ###token2###
If you're using a regexp library in a program, you'd presumably call a function to get at the contents first and second token, which you've indicated with the ()s around them.
Well when you are using delimiters such as this basically you just grab the first one then anything that does not match the ending delimiter followed by the ending delimiter. A special caution should be that in cases as the example above [^#] would not work as checking to ensure the end delimiter is not there since a singe # would cause the regex to fail (ie. "###foo#bar###). In the case above the regex to parse it would be the following assuming empty tokens are allowed (if not, change * to +):
###([^#]|#[^#]|##[^#])*###