I want to remove text from a log-file with regular expression.
everything before: logger=
everything after: ):
backup.log (logger=org.brother.powerlab.database.backup): Log database backup
upgrade.log (logger=org.brother.powerlab.database.upgrade): Log database upgrade
clean.log (logger=org.brother.powerlab.database.clean): Log database cleanup
speedtest.log (logger=org.brother.powerlab.database.speedtest): Log database speedtest
statistics.log (logger=org.brother.powerlab.database.statistics): Log database statistics
This can be done with notepad++ with 2 regular expressions.
How to do this with 2 regular expressions? Thanks!
This can be done by single regex find and replace.
Find what: .*(logger=.*\):).*
Replace with: $1
Related
I would like to simplify a gmail address in Hive by removing anything unnecessary. I can already remove "." using "translate()", however gmail also allows anything placed between a "+" and the "#" to be ignored. The following regular expression works in Teradata:
select REGEXP_REPLACE('test+friends#gmail.com', '\+.+\\#' ,'\\#');
gives: 'test#gmail.com', but in Hive, I get:
FAILED: SemanticException [Error 10014]: Line 1:7 Wrong arguments
''\#'': org.apache.hadoop.hive.ql.metadata.HiveException: Unable to
execute method public org.apache.hadoop.io.Text
org.apache.hadoop.hive.ql.udf.UDFRegExpReplace.evaluate(org.apache.hadoop.io.Text,org.apache.hadoop.io.Text,org.apache.hadoop.io.Text)
on object org.apache.hadoop.hive.ql.udf.UDFRegExpReplace#131b58d4 of
class org.apache.hadoop.hive.ql.udf.UDFRegExpReplace with arguments
{test+friends#gmail.com:org.apache.hadoop.io.Text,
+.+#:org.apache.hadoop.io.Text, #:org.apache.hadoop.io.Text} of size 3
How do I get this regular expression to work in Hive?
You don't need to escape # in regular expressions. Try:
select REGEXP_REPLACE('test+friends#gmail.com', '\+[^#]+#' ,'#');
You should also use [^#]+ rather than .+ so the match stops at the first #. Otherwise if there are multiple addresses in the input, the match will span all of them.
I found the answer:
select REGEXP_REPLACE('test+friends#gmail.com', '[+].+#' ,'#');
or
select REGEXP_REPLACE('test+friends#gmail.com', '\+.+#' ,'#');
Does the trick. Teradata and Hive seem to have significant differences in how they process regular expressions.
I have .txt file like this
The Catalog entry "33102490" - Catalog group "1293"
Stack trace:
com.ibm.commerce.catalog.dataload.exception.CatalogDataLoadApplicationException: The Catalog entry "33102490" - Catalog group "1293"
$1l.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at com.ibm.ws.bootstrap.WSLauncher.main(WSLauncher.java:267)
I want only "33102490" and "1293" in the file. All other things need to be replaced.
Ctrl+H
Find what: ^.*Catalog entry ("\d+").*Catalog group ("\d+").*$
Replace with: $1\n$2
Replace
Make sure you have checked Regular expression and . matches newline
I have a set of SQL script that wants to change schema.
create table Service.Table1 (col1 varchar(100));
create table Operation.Table2 (col1 varchar(100));
create table Support.Table3 (col1 varchar(100));
However, the schema is going to change
Service -> Sev
Operation -> Opn
Support -> Spt
The search regular expression is easy ([A-Za-z0-9_]+)\.([A-Za-z0-9_]+)
However, how to do the conditional replacement in Notepad++ or other tools if they can?
Thanks!
If you have a predefined set of the schemas, you may use the conditional replacement in Notepad++ like this:
Find: (?:(?<a>Service)|(?<b>Operation)|(?<c>Support))\.(?<n>[A-Z0-9_]+)
Replace: (?{a}Sev:(?{b}Opn:Spt)).$+{n}
Match Case must be ticked off, and Regular expression must be on.
I would run replace 3 times, once for each schema name:
Find:
create table Service\.
Replace with:
create table Svc.
Find:
create table Support\.
Replace with:
create table Spt.
Find:
create table Operation\.
Replace with:
create table Opn.
Or here is one that uses groups references:
Find:
Service(\.[^\s]+)(.*)
Replace with:
Svc\1\2
Here \1 will hold the dot operator and the table name and \2 holds the rest of the line.
Notepad++ regex implementation is not really powerfull; so,
other tools if they can?
Here is a way to do it:
perl -pi.back -e '%tr=(Service=>"Sev",Operation=>"Opn",Support=>"Spt");s/(?<=create table )(\w+)/$tr{$1}/e;' TheFile
You can add any number of Original => 'Modified' as you want within the hash %tr.
TheFile will be backuped into TheFile.back before processing.
Using find and replace, what regex would remove the tags surrounding something like this:
<option value="863">Viticulture and Enology</option>
Note: the option value changes to different numbers, but using a regular expression to remove numbers is acceptable
I am still trying to learn but I can't get it to work.
I'm not using it to parse HTML, I have data from one of our company websites that we need in excel, but our designer deleted the original data file and we need it back. I have a list of the options and need to remove the HTML tags, using Notepad++ to find and replace
This works for me Notepad++ 5.8.6 (UNICODE)
search : <option value="\d+">(.*?)</option>
replace : $1
Be sure to select "Regular expression" and ". matches newline"
I have done by using following regular expression:
Find this : <.*?>|</.*?>
and
replace with : \r\n (this for new line)
By using this regular expression (<.*?>|</.*?>) we can easily find value between your HTML tags like below:
I have input:
<otpion value="123">1</option><otpion value="1234">2</option><otpion value="1235">3</option><otpion value="1236">4</option><otpion value="1237">5</option>
I need to find values between options like 1,2,3,4,5
and got below output :
This works perfectly for me:
Select "Regular Expression" in "Find" Mode.
Enter [<].*?> in "Find What" field and leave the "Replace With" field empty.
Note that you need to have version 5.9 of Notepad++ for the ? operator to work.
as found here:
digoCOdigo - strip html tags in notepad++
Something like this would work (as long as you know the format of the HTML won't change):
<option value="(\d+)">(.+)</option>
String s = "<option value=\"863\">Viticulture and Enology</option>";
s.replaceAll ("(<option value=\"[0-9]+\">)([^<]+)</option>", "$2")
res1: java.lang.String = Viticulture and Enology
(Tested with scala, therefore the res1:)
With sed, you would use a little different syntax:
echo '<option value="863">Viticulture and Enology</option>'|sed -re 's|(<option value="[0-9]+">)([^<]+)</option>|\2|'
For notepad++, I don't know the details, but "[0-9]+" should mean 'at least one digit', "[^<]" anything but a opening less-than, multiple times. Masking and backreferences may differ.
Regexes are problematic, if they span multiple lines, or are hidden by a comment, a regex will not recognize it.
However, a lot of html is genereated in a regex-friendly way, always fitting into a line, and never commented out. Or you use it in throwaway code, and can check your input before.
I'm in the process of moving my Dreamweaver-based website to a CMS, and I would like to replace site-wide the following kind of links:
a href="http://www.domain.com/category/item ### title.html" (where ### is a number)
to
a href="http://www.domain.com/category/item###"
What is the correct regular expression I should use in the find and replace built-in engine?
I propose
'(http://www.domain.com/category/item) *(\d+).+?\.html'
as RE chain
and to substitute the entire match with $1 + $2