regular expression for finding alphanumeric & hyphen PATTERN in a text file - regex

From the below text i want to fetch PATTERN ALPHABATES-seprated by HYPHEN-NUMBERS for example in this file "MGRAPP-713" (it will change for each file but PATTERN WILL REMAIN CONSTANT)
123.txt
REPORTS:
restrict CBD [Jawad Hameed] [2018-01-31 16:31:00 -0500]`enter code here`
debug [Jawad Hameed] [2018-01-31 16:09:08 -0500]
debug [Jawad Hameed] [2018-01-31 15:59:52 -0500]
Merge pull request #65 from HotelKey/MGRAPP-713 [GitHub] [2018-01-31 11:35:30 0100]
MGRAPP-713 [sabrio] [2018-01-30 15:30:56 +0100]
I'm using: grep '[A-Z0-9-]' 123.txt

Your question isn't very clear. Are you always looking for that particular string? Do you want the whole line, or just that field? Is the string you're looking for always at the beginning of the line?
Based on my guess about what you meant, I'd suggest:
$ awk '/^[A-Z]+-[0-9]/ {print $1}' mgrapp
MGRAPP-713
Whenever you want to print part of line matching a pattern, awk is your friend.
Edit
In your comment, you clarify your objective somewhat. Here's a slightly more elaborate solution:
$ awk '/^[A-Z]+-[0-9]/ {
match($1, /^[A-Z]+-[0-9]+/)
printf "%.*s\n", RLENGTH, $1 }' mgrapp
MGRAPP-713
But I can't write your program for you. I'm just demonstrating that awk lets you write simple programs to grab strings out of text files. Like any powerful tool, it takes time to learn. It's time well spent because, you know, "Luck favors the prepared mind."

Use this:
[A-Z]+-[0-9]+
[A-Z]+ -- for matching all caps word,
- -- for hyphen,
[0-9]+ -- for numbers

Related

regex replace in lines starting with {\s between first space to ;}

i have some corrupt rtf files with lines like this:
{\s39\li0\fi0\ri0\sb0\sa0\ql\vertalt\fs22 Fußzeile Zchn;}
^----------------------------^
i want to replace all [^a-zA-Z0-9_\{}; ]
but only in lines beginning with "{\s" and ending with "};" from the first "space" to "};"
the first "space" and "};" should not be replaced.
You didn't specify language, here is Regex101 example:
({\\s.+?\s)(.*)(})
So, I'm unsure what language/technology you'd like to use here, but if using C# is an option, you can check out this previous question. The answer gets you almost the way there.
For your example:
var text = #"{\s39\li0\fi0\ri0\sb0\sa0\ql\vertalt\fs22 Fußzeile Zchn;}";
var pattern = #"^({\\s\S*\s[a-zA-Z0-9_\{}; ]*)([^a-zA-Z0-9_\{}; ]*)([^}]*})";
var replaced = System.Text.RegularExpressions.Regex.Replace(text, pattern, "$1$3");
This will get you to replace one contiguous blob of bad characters, which addresses your example, but unfortunately, not your question. There is probably a more elegant solution, but I think you'll have to iteratively run that expression until the input and output of Regex.Replace() are equal.
If you can use sed in a terminal, you could do something like this.
sed -i 's/^\({\\s[^ ]*\s\).*\(\;}\)\(}\)\?$/\1\2/' filename
Turned my file containing:
{\s39\li0\fi0\ri0\sb0\sa0\ql\vertalt\fs22 Fußzeile Zchn;}
To:
{\s39\li0\fi0\ri0\sb0\sa0\ql\vertalt\fs22 ;}

match regex to remove password from file

I have a file which contain following lines
app.mail.host = 10.1.1.1
app.mail.debug = true
app.db.username = spate
app.db.password = 1#4FnL&#7!
I want to use regex to change actual password and replace with XXXXXXX for security purpose.
I am trying following regex but it doesn't working.
sed 's/app\.db\.password="[^"]\+/app\.db\.password="XXXXXXXX"/g' foo.txt
As Hbcdev noted, you aren't matching due to whitespace. As this appears to be "security" code (in which case -- why are you storing that password in plaintext at all?), it's probably better to be whitespace-tolerant than match the input byte-for-byte. Something like:
sed 's/app\.db\.password[ \t]*=.*/app.db.password="xxxxx"/
(untested) is probably going to work a little more robustly. Note that it will strip your password field even if it doesn't begin with a quote.
Still, doing this kind of hackery with a shell script sounds dangerous. What are you trying to accomplish?
You haven't left spaces around the = sign in your sed expression. Once you do that I think it'll work.

use regex to modify srt file?

one format of srt file looks like this:
0:00:04 --> 00:00:10
and another format looks like this
0:00:04,000 --> 00:00:10,000
I want to process the first kind of file to append an ,000 to each time-frame for compatibility purposes so that the first file has the ,000 formatting that I need like the above second example.
I was thinking of trying to use some string functions like mid(), right(), instring() but wondered if regex might do the job better, any suggestions on how to do this?
You can use this regex to match the first group :
^([0-9]{1,2}:[0-9]{2}:[0-9]{2}) --> ([0-9]{1,2}:[0-9]{2}:[0-9]{2})$
And then replace $1 by $1 + ",000" and $2 by $2 + ",000"
Since you don't indicate which language you used, I did a simple example in PHP :
<?php
$string = 'April 15, 2003';
$pattern = '/(\w+) (\d+), (\d+)/i';
$replacement = '${1}1,$3';
echo preg_replace("/^([0-9]{1,2}:[0-9]{2}:[0-9]{2}) --> ([0-9]{1,2}:[0-9]{2}:[0-9]{2})$/i", "$1,000 --> $2,000", "0:00:04 --> 00:00:10");
// output : 0:00:04,000 --> 00:00:10,000
?>
With sed (it's available on Windows too):
sed -i '/\d\+\:\d\+:\d\+ --> \d\+\:\d\+:\d\+/ s_\(\d\+\:\d\+:\d\+\)\s*-->\(\d\+\:\d\+:\d\+\)\s*_\1,000 --> \2,000_' INPUT.srt
It will be done inplace.
(And I know it's not the correct regex to capture time definitions... but it works for this job.)
Sure, that sounds like a good idea. A simple approach would be to match for (\d?\d:\d\d:\d\d) and replace it with the match itself plus ,000 (for "the match itself" use a back reference, which might be something like \1 or $1, depending on your language).
Try implementing this, and if you need further help, start a new question where you mention what you have tried, where you are stuck and which language you are using.
Why not simply
sed -e 's/ -->\|$/,000&/' old.srt >new.srt
provided that old.srt consistently contains the shorter format only.

How to analyze log files with regexp? alternatives?

I want to analyze some logs for some statisics of usage.
Basically what I wanna do is use regexp to ease the pain of analysis
So I have a text file with logs something along this
2011-09-17 09:16:33,531 INFO [someJava.class.special] sendRequest: fromGevoName=null, ctrlPageId=fooBar, actionId=search,
2011-09-17 09:16:33,976 INFO [someJavaB.class] fooBar
2011-09-17 09:16:33,982 DEBUG [someOtherJava.class] abc blabala
2011-09-17 09:16:33,987 INFO [someJava.class.special] sendRequest completed: fromGevoName=XYZ, toPageId=fooBar, userId=someUser
....
I want to count the occurrences of all words at position
[someJava.class.special] ctrlPageId=....
in this case fooBar and only this occurrences. There are many different fooBar and I want to count how often one occurred.
My idea was to replace with a matching group and repeat it, something along this
((?s).*\[someJava.class.special\] sendRequest: fromGevoname=.* ctrlPageId=([^,]*)(?-s).*)*
and replace it with the matching group \2
Afterwards analyse the list in excel.
But my greptool does not repeat the regexp, it only matches once. I use grepWin, is there maybe a different tool / regexp for this?
Well it basically was a problem of wingrep or grepwin. The modifier (?s) which enables linebreaks on dots or disables it (?-s) does not work if you use it repeatedly.
So I exchanged the regexp with something along this:
([\n-\[\(\]\.,:0-9a-zA-Z]).*\[someJava.class.special\] sendRequest: fromGevoname=.* ctrlPageId ([^,]*)(?-s).*
so basically i exchanged the first linebreakmatching dot with all symbols which might occur in the string including linebreaks. It works... i'm sure there is a better solution, always open for it
I'm not sure I understand, but if the output you are looking for is:
someJava fooBar
Something like this should work (php script):
<?php
$log = file_get_contents('file.log')
preg_match_all("#\[(?<className>\w+)\.class(.special)?\](.*?)ctrlPageId=(?<controllerName>\w+)#i", $log, $m);
for ($i=0; $i < count($m[0]); $i++) {
echo $m['className'][$i] . ' ' . $m['controllerName'][$i] . "\n";
}

Regex Negation : Matching patterns other than specific strings

I am using a Voice-to-Text application which gives transcription files as output.. The transcribed text contains a few tags like (s) (for sentence beginning)..(/s)( for sentence end ).. (VOCAL_NOISE)(for un-recognized words).. but the text also contains unwanted tags like (VOCAL_N) , (VOCAL_NOISED) , (VOCAL_SOUND), (UNKNOWN).. i am using SED to process the text.. but cannot write an appropriate regex to replace all other tags except (s), (/s) and (VOCAL_NOISE), with the tag ~NS.. would appreciate if someone could help me with it..
Example text:
(s) Hi Stacey , this is Stanley (/s) (s) I would (VOCAL_N) appreciate if you could call (UNKNOWN) and let him know I want an appointment (VOCAL_NOISE) with him (/s)
Output should be:
(s) Hi Stacey , this is Stanley (/s) (s) I would ~NS appreciate if you could call ~NS and let him know I want an appointment (VOCAL_NOISE) with him (/s)
This should take care of it:
sed 's|([^)]*)|\n&\n|g;s#\n\((/\?s)\|(VOCAL_NOISE)\)\n#\1#g;s|\n\(([^)]*)\)\n|~NS|g' inputfile
Explanation:
s|([^)]*)|\n&\n|g - divide the line by putting every parenthesized string between two newlines
s#\n\((/\?s)\|(VOCAL_NOISE)\)\n#\1#g - remove the newlines around "(s)", "(/s)" and "(VOCAL_NOISE)" (keepers)
s|\n\(([^)]*)\)\n|~NS|g - replace anything else between newlines that is within parentheses with "~NS"
This works since newlines are guaranteed not to appear within a newly read line of text.
Edit: Shortened the command by using alternation \(foo\|bar\)
Previous version:
sed 's|([^)]*)|\n&\n|g;s|\n\((/\?s)\)\n|\1|g; s|\n\((VOCAL_NOISE)\)\n|\1|g;s|\n\(([^)]*)\)\n|~NS|g' inputfile
This is a dirty trick that is far from being optimal but it should work for you:
sed '
s|(\(/\?\)s)|[\1AAA]|g;
s|(VOCAL_NOISE)|[BBB]|g;
s/([^)]*)/~NS/g;
s|\[\(/\?\)AAA\]|(\1s)|g;
s|\[BBB\]|(VOCAL_NOISE)|g'
The trick is to replace (s), (/s) and (VOCAL_NOISE) with patterns which are not present in the input text (in this case [AAA], [/AAA] and [BBB]); then we replace every instance of (.*) with ~NS; in the end we get back the fake patterns to their original value.
I could suggest this using vim:
:%s/\((\w\+)\)\&\(\((s)\|(VOCAL_NOISE)\)\#!\)/\~NS/g
Using a shell (bash) you can do the following:
vim file -c '%s/\((\w\+)\)\&\(\((s)\|(VOCAL_NOISE)\)\#!\)/\~NS/g' -c "wq"
Make a backup first, I am not responsible for any damage if this is wrong.
Simply this ?
sed -E 's/\((VOCAL_N|UNKNOWN)\)/~NS/'
In this case, you'd have a blacklist (you know what to filter out). Or do you absolutely need a whitelist (you know what to NOT filter out) ?
awk -vRS=")" -vFS="(" '$2!~/s|\\s|VOCAL_NOISE/{$2="~NS"}RT' ORS=")" file |sed 's/~NS)/~NS/g'