Regular Expression: remove text before and after a statement - regex

I want to remove text from a log-file with regular expression.
everything before: logger=
everything after: ):
backup.log (logger=org.brother.powerlab.database.backup): Log database backup
upgrade.log (logger=org.brother.powerlab.database.upgrade): Log database upgrade
clean.log (logger=org.brother.powerlab.database.clean): Log database cleanup
speedtest.log (logger=org.brother.powerlab.database.speedtest): Log database speedtest
statistics.log (logger=org.brother.powerlab.database.statistics): Log database statistics
This can be done with notepad++ with 2 regular expressions.
How to do this with 2 regular expressions? Thanks!

This can be done by single regex find and replace.
Find what: .*(logger=.*\):).*
Replace with: $1

Related

How to simplify g-mail addresses using regular expressions in Hive

I would like to simplify a gmail address in Hive by removing anything unnecessary. I can already remove "." using "translate()", however gmail also allows anything placed between a "+" and the "#" to be ignored. The following regular expression works in Teradata:
select REGEXP_REPLACE('test+friends#gmail.com', '\+.+\\#' ,'\\#');
gives: 'test#gmail.com', but in Hive, I get:
FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments
''\#'': org.apache.hadoop.hive.ql.metadata.HiveException: Unable to
execute method public org.apache.hadoop.io.Text
org.apache.hadoop.hive.ql.udf.UDFRegExpReplace.evaluate(org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text)
on object org.apache.hadoop.hive.ql.udf.UDFRegExpReplace#131b58d4 of
class org.apache.hadoop.hive.ql.udf.UDFRegExpReplace with arguments
{test+friends#gmail.com:org.apache.hadoop.io.Text,
+.+#:org.apache.hadoop.io.Text, #:org.apache.hadoop.io.Text} of size 3
How do I get this regular expression to work in Hive?
You don't need to escape # in regular expressions. Try:
select REGEXP_REPLACE('test+friends#gmail.com', '\+[^#]+#' ,'#');
You should also use [^#]+ rather than .+ so the match stops at the first #. Otherwise if there are multiple addresses in the input, the match will span all of them.
I found the answer:
select REGEXP_REPLACE('test+friends#gmail.com', '[+].+#' ,'#');
or
select REGEXP_REPLACE('test+friends#gmail.com', '\+.+#' ,'#');
Does the trick. Teradata and Hive seem to have significant differences in how they process regular expressions.

Replacing string using regex in notepad++

I have .txt file like this
The Catalog entry "33102490" - Catalog group "1293"
Stack trace:
com.ibm.commerce.catalog.dataload.exception.CatalogDataLoadApplicationException: The Catalog entry "33102490" - Catalog group "1293"
$1l.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at com.ibm.ws.bootstrap.WSLauncher.main(WSLauncher.java:267)
I want only "33102490" and "1293" in the file. All other things need to be replaced.
Ctrl+H
Find what: ^.*Catalog entry ("\d+").*Catalog group ("\d+").*$
Replace with: $1\n$2
Replace
Make sure you have checked Regular expression and . matches newline

Notepad++ Regular Expression Condition Replacement

I have a set of SQL script that wants to change schema.
create table Service.Table1 (col1 varchar(100));
create table Operation.Table2 (col1 varchar(100));
create table Support.Table3 (col1 varchar(100));
However, the schema is going to change
Service -> Sev
Operation -> Opn
Support -> Spt
The search regular expression is easy ([A-Za-z0-9_]+)\.([A-Za-z0-9_]+)
However, how to do the conditional replacement in Notepad++ or other tools if they can?
Thanks!
If you have a predefined set of the schemas, you may use the conditional replacement in Notepad++ like this:
Find: (?:(?<a>Service)|(?<b>Operation)|(?<c>Support))\.(?<n>[A-Z0-9_]+)
Replace: (?{a}Sev:(?{b}Opn:Spt)).$+{n}
Match Case must be ticked off, and Regular expression must be on.
I would run replace 3 times, once for each schema name:
Find:
create table Service\.
Replace with:
create table Svc.
Find:
create table Support\.
Replace with:
create table Spt.
Find:
create table Operation\.
Replace with:
create table Opn.
Or here is one that uses groups references:
Find:
Service(\.[^\s]+)(.*)
Replace with:
Svc\1\2
Here \1 will hold the dot operator and the table name and \2 holds the rest of the line.
Notepad++ regex implementation is not really powerfull; so,
other tools if they can?
Here is a way to do it:
perl -pi.back -e '%tr=(Service=>"Sev",Operation=>"Opn",Support=>"Spt");s/(?<=create table )(\w+)/$tr{$1}/e;' TheFile
You can add any number of Original => 'Modified' as you want within the hash %tr.
TheFile will be backuped into TheFile.back before processing.

Find/Replace regex to remove html tags

Using find and replace, what regex would remove the tags surrounding something like this:
<option value="863">Viticulture and Enology</option>
Note: the option value changes to different numbers, but using a regular expression to remove numbers is acceptable
I am still trying to learn but I can't get it to work.
I'm not using it to parse HTML, I have data from one of our company websites that we need in excel, but our designer deleted the original data file and we need it back. I have a list of the options and need to remove the HTML tags, using Notepad++ to find and replace
This works for me Notepad++ 5.8.6 (UNICODE)
search : <option value="\d+">(.*?)</option>
replace : $1
Be sure to select "Regular expression" and ". matches newline"
I have done by using following regular expression:
Find this : <.*?>|</.*?>
and
replace with : \r\n (this for new line)
By using this regular expression (<.*?>|</.*?>) we can easily find value between your HTML tags like below:
I have input:
<otpion value="123">1</option><otpion value="1234">2</option><otpion value="1235">3</option><otpion value="1236">4</option><otpion value="1237">5</option>
I need to find values between options like 1,2,3,4,5
and got below output :
This works perfectly for me:
Select "Regular Expression" in "Find" Mode.
Enter [<].*?> in "Find What" field and leave the "Replace With" field empty.
Note that you need to have version 5.9 of Notepad++ for the ? operator to work.
as found here:
digoCOdigo - strip html tags in notepad++
Something like this would work (as long as you know the format of the HTML won't change):
<option value="(\d+)">(.+)</option>
String s = "<option value=\"863\">Viticulture and Enology</option>";
s.replaceAll ("(<option value=\"[0-9]+\">)([^<]+)</option>", "$2")
res1: java.lang.String = Viticulture and Enology
(Tested with scala, therefore the res1:)
With sed, you would use a little different syntax:
echo '<option value="863">Viticulture and Enology</option>'|sed -re 's|(<option value="[0-9]+">)([^<]+)</option>|\2|'
For notepad++, I don't know the details, but "[0-9]+" should mean 'at least one digit', "[^<]" anything but a opening less-than, multiple times. Masking and backreferences may differ.
Regexes are problematic, if they span multiple lines, or are hidden by a comment, a regex will not recognize it.
However, a lot of html is genereated in a regex-friendly way, always fitting into a line, and never commented out. Or you use it in throwaway code, and can check your input before.

Regular expression for changing links in Dreamweaver

I'm in the process of moving my Dreamweaver-based website to a CMS, and I would like to replace site-wide the following kind of links:
a href="http://www.domain.com/category/item ### title.html" (where ### is a number)
to
a href="http://www.domain.com/category/item###"
What is the correct regular expression I should use in the find and replace built-in engine?
I propose
'(http://www.domain.com/category/item) *(\d+).+?\.html'
as RE chain
and to substitute the entire match with $1 + $2