Notepad++ Find/Replace Wildcard problems - regex

Hopefully this is simple because I can't seem to figure it out.
I have a game that outputs a log with information I'd like to review, but it's bogged with tags.
<color=#9B9B9BFF>abndnd_b9o66v</color>.<color=#1EFF00FF>out_ys0a67</color>
<color=#9B9B9BFF>uknown_ospiw8</color>.<color=#1EFF00FF>p_vyuxzb</color>
<color=#9B9B9BFF>anonymous_yzgoqq</color>.<color=#1EFF00FF>pub_info_o1rotu</color>
<color=#9B9B9BFF>unidentified_t7stef</color>.<color=#1EFF00FF>out_gems04</color>
<color=#9B9B9BFF>abndnd_5vs06o</color>.<color=#1EFF00FF>public_7gshh2</color>
<color=#9B9B9BFF>anon_7kq2k4</color>.<color=#1EFF00FF>pub_wxn46t</color>
<color=#9B9B9BFF>anon_i83kkg</color>.<color=#1EFF00FF>info_ev39gs</color>
I can simply filter it by hand, but I know a regex may be able to help, I just can't seem to figure out the syntax correctly and how to trim the tags without tampering with the needed text
and my end result I'm trying to get is this:
abndnd_b9o66v.out_ys0a67
uknown_ospiw8.p_vyuxzb
anonymous_yzgoqq.pub_info_o1rotu
unidentified_t7stef.out_gems04
abndnd_5vs06o.public_7gshh2
anon_7kq2k4.pub_wxn46t
anon_i83kkg.info_ev39gs

Try this:
<color=.*?>(.*?)</color>\.<color=.*?>(.*?)</color>
Replace by this:
\1\.\2

Related

Regex: How to extract dialogue tags from fiction, with speaker information

Totally stumped on this. I need help extracting dialogue from a story so I can hand it off for narration.
Basically, this is a problem where I have a big chunk of text (a novel), and I want to extract all the dialogue from the text in a format I can pipe into a spreadsheet.
But, I also want, if it exists, the speaker information as well. So, given a string like:
'"I'm really hungry," she said.'
I would like the values returned as:
[ "I'm really hungry", "she said" ]
If there is no dialogue, as in this example:
"I'm not hungry."
the result would just be:
["I'm really hungry."]
Is this madness? Is it even possible? I have fooled around with this regex (am not a regex guru, knowing only enough to be dangerous):
"([^"]*)"
Which seems to get the dialogue tags, but doesn't get the speaker info. Any advice in how to get the speaker info as well would be greatly appreciated. I've been wrestling with this for awhile now.
Maybe a better approach would be to get the dialogue in one field, and the entire paragraph it is found in as the second field. That could also work, but I have no idea where to start with this.
Basically I want to put these all into a spreadsheet so I can hand them off to a narrator with enough context that they know whose dialogue is who's in the story.
Any help is greatly appreciated!
It definitely is possible
Look at this regex: ^.*?'?(?P<line>\".*\")(?P<actor>[^'\n]*)'?.*?$
demo here: https://regex101.com/r/UCRZwY/5
It basically marks the outer quotes as optional, but if it does find them, stores whatever provided as '$actor' (and the line as '$line') these are of course just names i've given them, feel free to change
Note updated to include such text as part of regular sentence, see example in demo

How do I Build a Regex Expression to Find String

I've been studying content on the regex topic, but am having trouble understanding how to make it work! I need to build a regex to locate a particular string, potentially in multiple places throughout numerous log files. If I were keying the search expression into a text editor, it would look like this...
*Failed to Install*
Following is a typical example of a line containing the string I would like to search for (exit code # will vary)
!!! Failed to install, with exit code 1603
I would really appreciate any help on how to build the regex for this. I suspect I might need the end of line character too?
I plan on using it in a variation of the script that was provided by https://stackoverflow.com/users/3142139/m-hassan in the following thread
Use PowerShell to Quickly Search Files for Regex and Output to CSV
I'm a newbie to powershell scripts, but I'd rather spend the time to figure this out, than pour over hundreds of log files!
Thanks,
Jim
You're in luck - You only require very simple regex for this. Assuming you want to capture the error code, this will work fine:
^.*Failed to install.*(exit code \d+)$
Try it online!
If you don't care about the error code, and just want to know if it failed or not, you can honestly get away with something as simple as:
^.*Failed to install.*$
Hope this helps.

How to remove query string ? from image url vb.net

Whats the best way to remove a query string (the question mark variables) from a image url.
Say I got a good image such as
http://i.ebayimg.com/00/s/MTYwMFgxNjAw/z/zoMAAOSwMpZUniWv/$_12.JPG?set_id=880000500F
But I can't really save it properly without adding a bunch of useless checking code because of the query string crap after it.
I just need
http://i.ebayimg.com/00/s/MTYwMFgxNjAw/z/zoMAAOSwMpZUniWv/$_12.JPG
Looking for the proper regular expression that handles this so I could replace it with blank.
It might be simple enough not to worry about regex.
This would work:
Dim cleaned = url.Substring(0, url.IndexOf("?"c))

Are my regex just wrong or is there a buggy behaviour in td-agent's format behaviour?

I am using fluentd, elasticsearch and kibana to organize logs. Unfortunately, these logs are not written using any standard like apache, so I had to come up with the regex for the format myself. I used this site here to verify that they are working: http://fluentular.herokuapp.com/ .
The logs have roughly this format here:
DEBUG: 24.04.2014 16:00:00 [SingleActivityStrategy] Start Activitiy 'barbecue' zu verabeiten.
the format regex I am using is as follows:
format /(?<pri>([INFO]|[DEBUG]|[ERROR])+)...(?<date>(\d{2}\.\d{2}\.\d{4})).(?<time>(\d{2}:\d{2}:\d{2})).\[(?<subject>(.*))\].(?<msg>(.*))/
Now, judging by that website that is supposed to test specifically fluentd's behaviour with regexes, the output SHOULD be this one:
Record
Key Value
pri DEBUG
date 24.04.2014
subject SingleActivityStrategy
msg Start Activitiy 'barbecue' zu verabeiten.
Instead though, I have this ?bug? that pri is always shortened to DEBU. Same for ERROR which becomes ERRO, only INFO stays INFO. I am not very experienced with regular expressions and I find it hard to believe that this is a bug, still it confuses me and any help is greatly appreciated.
I'm not sure I can link the complete config file because I dont personally own these log files and I am trying to keep it on a level that my boss won't get mad at me for posting sensitive information, but should it definately be needed, I will post them later on after having asked him how much I can reveal.
In general, the logs always look roughly like this:
First the priority, which is either DEBUG, ERROR or INFO, next the date , next what we call the subject which is always written in [ ] and finally just a message.
Here is a link to fluentular with the format I am using and a teststring that produces the right result in fluentular, but not in my config file:
Fluentular
Sorry I couldn't make it work like a regular link to just click on.
Another link to test out regex with my format and test string is this one:
http://rubular.com/r/dfXOkQYNXP
tl;dr version:
my td-agent format regex cuts off the last letter, although fluentular says it shouldn't. My fault or a bug?
How the regex would look if you're trying to match the data specifically:
(INFO|DEBUG|ERROR)\:\s+(\d{2}\.\d{2}\.\d{4})\s(\d{2}:\d{2}:\d{2})\s\[(.*)\](.*)
In your format string, you were using . and ... for where your spaces and colon should be. I'm not to sure on why this works in Fluentular, but you should have matched the \: explicitly and each space between the values.
So you'd be looking at the following regular expression with the Fluentd fields (which are grouping names):
(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))
Meaning your td-agent.conf should look like:
<source>
type tail
path /var/log/foo/bar.log
pos_file /var/log/td-agent/foo-bar.log.pos
tag foo.bar
format /(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))/
</source>
I would also take a look into comparing Logstash vs. Fluentd. I like Logstash far more because you create Grok filters to match the type of data you want, and it makes formatting your fields much easier because you are providing an abstraction layer, but you essentially will get the same data.
And I would watch out when you're using sites like Rubular, as they are fairly particular about multi-line matching and the like. I'd suggest something like Regexr which gives immediate feedback and you can set global and multiline matching as well.

How to use the matched text in the replacement text in Vim

I have a block of codes with timestamp in front of each line like this:
12/02/2010 12:20:12 function myFun()
12/02/2010 12:20:13 {....
The first column is a date time value. I would like to comment them out by using Vim, thus:
/*12/02/2010 12:20:12*/ function myFun()
/*12/02/2010 12:20:13*/ {....
I tried to search for date first:
/\d\d\/\d\d\/\d\d\d\d \d\d:\d\d:\d\d
I got all the timestamps marked correctly. However When I tried to replace them by the command:
%s/\d\d\/\d\d\/\d\d\d\d \d\d:\d\d:\d\d/\/*\d\d\/\d\d\/\d\d\d\d \d\d:\d\d:\d\d*\//
I got the following result:
/*dd/dd/dddd dd:dd:dd*/ function myFun()
/*dd/dd/dddd dd:dd:dd*/ {....
I think I need to name the search part and put them back in the replace part. How I can do it?
I suppose I would just do something like:
:%s-^../../.... ..:..:..-/* & */-
I would actually not us a regex to do this. It takes too long to enter the correct formatting. I would instead use a Visual Block. The sequence works out to be something like this.
<C-V>}I/* <ESC>
3f\s
<C-V>I */
I love regex, and don't want to knock the regex solutions, but find when doing things with pre-formatted blocks, that this is easier, and requires less of a diversion from the real task, which isn't figuring out how to write a regex.
%s/\d\d\/\d\d\/\d\d\d\d \d\d:\d\d:\d\d/\/*&*\//
:%s/^\([0-9/]* [0-9:]* \)\(.*\)/\/*\1*\/ \2/