Issue on parsing logs using regex - regex

I have tried separating the wowza logs using regex for data analysis, but I couldn't separate the section below.
I need a SINGLE regex pattern that would satisfy below both log formats.
Format 1:
live wowz://test1.example.com:443/live/_definst_/demo01|wowz://test2.example.com:443/live/_definst_/demo01 test
Format 2:
live demo01 test
I am trying to split the line on the 3 parameters and capturing them in the groups app, streamname and id, but streamname should only capture the text after the last /.
This is what I've tried:
(?<stream_name>[^/]+)$ --> Using this pattern I could only separate the format 1 "wowz" section. Not entire Format 1 example mentioned above.
Expected Output
{
"app": [
[
"live"
]
],
"streamname": [
[
"demo1"
]
],
"id": [
[
"test"
]
]
}

You can achieve what you specified using the following regex:
^(?<app>\S+) (?:\S*/)?(?<streamname>\S+) (?<id>\S+)$
regex101 demo
\S+ matches any number of characters except whitespace.
(?:\S*/)? to optionally consume the characters in the second parameter up to the last /. This is not included in the group, so it won't be captured.

Related

How to add character after a line

I'm trying to perform a few regex steps, and I'd like to add a quotation mark and a comma (",) at the end of these lines without altering any of the rest of the characters in the line.
How would I keep things intact but add the ", after the words: device1, device2, device3 ?
Example of lines I'm working with:
object network device1
host 192.168.1.11
object network device2
host 192.168.1.12
object network device 3
host 192.168.1.13
After my first step of regex, I have modified my first line to include the curly bracket and some formatting with the words "category" and "name" as shown below. However, I don't want to change the word device1, but want to include a quotation and comma after the word device1
{
"category": "network",
"name": "device1
host 192.168.1.11
{
"category": "network",
"name": "device2
host 192.168.1.11
{
"category": "network",
"name": "device3
host 192.168.1.13
I can't figure out how to include the ", with my first step in my regex replace sequence?
I'm using both regexr.com and Notepad++.
You can use this regex to match each entity in your input data:
object\s+(\w+)\s+([^\r\n]+)[\r\n]+host\s+([\d.]+)
This matches:
object\s+ : the word "object" followed by a number of spaces
(\w+) : some number of word (alphanumeric plus _) characters, captured in group 1
\s+ : a number of spaces
([^\r\n]+) : some number of non-end-of-line characters, captured in group 2
[\r\n]+ : some number of end-of-line characters
host\s+ : the word "host" followed by a number of spaces
([\d.]+) : some number of digit and period characters, captured in group 3
This can then be replaced by:
{\n "category": "$1",\n "name": "$2",\n "host": "$3"\n},
To give output (for your sample data) of:
{
"category": "network",
"name": "device1",
"host": "192.168.1.11"
},
{
"category": "network",
"name": "device2",
"host": "192.168.1.12"
},
{
"category": "network",
"name": "device 3",
"host": "192.168.1.13"
},
Regex demo on regex101
Now you can simply add [ at the beginning of the file and replace the last , with a ] to make a valid JSON file.
This is the regex "name": "(device\d+) but since you have not mentioned any programming language you might get some pattern error based on the language you are using for example "" in java will need escape character so if you are using java then use this regex
\"name\": \"(device\d+)
Now you have to extract group (device\d+) and put your " there
for example in java you can do it with string.replaceAll

Extract text starting from negated set up til (but not including) first occurance of #

good day community.
Say I have the following line:
[ ] This is a sentence about apples. #fruit #tag
I wish to create a regex that can generically extract the portion:
"This is a sentence about apples." only.
That is, ignore the [ ] before the sentence, and ignore #fruit #tag after.
What I have so far is: ([^\s*\[\s\]\s])(.*#)
Which is creating the following match:
This is a sentence about apples. #fruit #
How would I match up to, but not including the first occurrence of # symbol, while still negating [ ] pattern with ([^\s*\[\s\]\s]) group?
EDIT: Thanks to Wiktor Stribiżew for the critical piece to help:
RegExMatch(str, "O)\[\s*]\s*([^#]*[^#\s])", output)
Final code:
; Zim Inbox txt file
FileEncoding, UTF-8
File := "C:\Users\dragoon\Desktop\anki_cards.txt"
; sleep is necessary
;;Highlight line and copy
#IfWinActive ahk_exe zim.exe
{
clipboard=
sleep, 500
Send ^+c
ClipWait
Send ^{Down}
clipboardQuestion := clipboard
FoundQuestion := RegExMatch(clipboardQuestion,"O)\[\s*]\s*([^#]*[^#\s])",outputquestion)
clipboard=
sleep, 500
Send ^+c
ClipWait
clipboardAnswer := clipboard
FoundAnswer := RegExMatch(clipboardAnswer,"O)\[\s*]\s*([^#]*[^#\s])",outputanswer)
quotedQuestionAnswer := outputquestion[1] """" outputanswer[1] """"
Fileappend, %quotedQuestionAnswer%, %File%
}
What it does:
In Zim Wiki notebook, on Windows, press Win+V hotkey over Question? in the following structure:
[ ] Question Header
[ ] Question?
[ ] Answer about dogs #cat #dog
This will result in the text being formatted as such in an external file:
Question?"Answer about dogs"
This is an acceptable format for Anki card importing, and can be used to quickly make cards from a review structure. Thanks again for all the help on my first SO question.
You can use
\[\s*]\s*\K[^#]*[^#\s]
See the regex demo. Details:
\[\s*]\s* - [, zero or more whitespaces, ], zero or more whitespaces
\K - "forget" what has just been matched
[^#]* - zero or more chars other than #
[^#\s] - a char other than # and whitespace.
Note that in AutoHotKey, you can also capture the part of a match if use Object mode:
RegExMatch(str, "O)\[\s*]\s*([^#]*[^#\s])", output)
The string you want to use is captured with Group 1 pattern (defined with a pair of unescaped parentheses) and you can access it via output[1]. See documentation:
Object mode. [v1.1.05+]: This causes RegExMatch() to yield all information of the match and its subpatterns to a match object in OutputVar. For details, see OutputVar.

Using Regex to delete contents between repeating brackets

I'm trying to remove unneeded words between brackets that contains certain modifier ('DeleteMe') and don't delete contents between brackets that contains other words ('DontDeleteMe').
I though it was simple but proved difficult due to repeating brackets see below.
[
aljdsfjfldsa DeleteMe aldsjflajdf
]
[
aldskjfal DontDeleteMe asdlkjflasdj
]
[
aljdsfjfldsa DeleteMe aldsjflajdf
]
[
aldskjfal DontDeleteMe asdlkjflasdj
]
Desired output
[
aldskjfal DontDeleteMe asdlkjflasdj
]
[
aldskjfal DontDeleteMe asdlkjflasdj
]
I tried the following but the problem is the second line will be deleted with the third line.
(?s)\[.*?'DeleteMe'.*?\]
You can use a word boundary in combination with a negated character class [^
\[[^][]*\bDontDeleteMe\b[^][]*\]
Regex demo
If the word is DeleteMe you can match it using word boundaries and repace with an empty string.
\[[^][]*\bDeleteMe\b[^][]*\]
Regex demo

vscode snippet - transform and replace filename

my filename is
some-fancy-ui.component.html
I want to use a vscode snippet to transform it to
SOME_FANCY_UI
So basically
apply upcase to each character
Replace all - with _
Remove .component.html
Currently I have
'${TM_FILENAME/(.)(-)(.)/${1:/upcase}${2:/_}${3:/upcase}/g}'
which gives me this
'SETUP-PRINTER-SERVER-LIST.COMPONENT.HTML'
The docs doesn't explain how to apply replace in combination with their transforms on regex groups.
If the chunks you need to upper are separated with - or . you may use
"Filename to UPPER_SNAKE_CASE": {
"prefix": "usc_",
"body": [
"${TM_FILENAME/\\.component\\.html$|(^|[-.])([^-.]+)/${1:+_}${2:/upcase}/g}"
],
"description": "Convert filename to UPPER_SNAKE_CASE dropping .component.html at the end"
}
You may check the regex workings here.
\.component\.html$ - matches .component.html at the end of the string
| - or
(^|[-.]) capture start of string or - / . into Group 1
([^-.]+) capture any 1+ chars other than - and . into Group 2.
The ${1:+_}${2:/upcase} replacement means:
${1:+ - if Group 1 is not empty,
_ - replace with _
} - end of the first group handling
${2:/upcase} - put the uppered Group 2 value back.
Here is a pretty simple alternation regex:
"upcaseSnake": {
"prefix": "rf1",
"body": [
"${TM_FILENAME_BASE/(\\..*)|(-)|(.)/${2:+_}${3:/upcase}/g}",
"${TM_FILENAME/(\\..*)|(-)|(.)/${2:+_}${3:/upcase}/g}"
],
"description": "upcase and snake the filename"
},
Either version works.
(\\..*)|(-)|(.) alternation of three capture groups is conceptually simple. The order of the groups is important, and it is also what makes the regex so simple.
(\\..*) everything after and including the first dot . in the filename goes into group 1 which will not be used in the transform.
(-) group 2, if there is a group 2, replace it with an underscore ${2:+_}.
(.) group 3, all other characters go into group 3 which will be upcased ${3:/upcase}.
See regex101 demo.

grok parsing issue

I have an input line that looks like this:
localhost_9999.kafka.server:type=SessionExpireListener,name=ZooKeeperSyncConnectsPerSec.OneMinuteRate
and I can use this pattern to parse it:
%{DATA:kafka_node}:type=%{DATA:kafka_metric_type},name=%{JAVACLASS:kafka_metric_name}
which gives me this:
{
"kafka_node": [
[
"localhost_9999.kafka.server"
]
],
"kafka_metric_type": [
[
"SessionExpireListener"
]
],
"kafka_metric_name": [
[
"ZooKeeperSyncConnectsPerSec.OneMinuteRate"
]
]
}
I want to split the OneMinuteRate into a seperate field but can't seem to get it to work. I've tried this:
%{DATA:kafka_node}:type=%{DATA:kafka_metric_type},name=%{WORD:kafka_metric_name}.%{WORD:attr_type}"
but get nothing back then.
I'm also using https://grokdebug.herokuapp.com/ to test these out...
You can either use your last regex with an escaped . (note that a . matches any char but newline and a \. will match a literal dot char), or use DATA type for the last but one field and a GREEDYDATA for the last field:
%{DATA:kafka_node}:type=%{DATA:kafka_metric_type},name=% {DATA:kafka_metric_name}\.%{GREEDYDATA:attr_type}
Since %{DATA:name} translates to (?<name>.*?) and %{GREEDYDATA:name} translates to (?<name>.*), the name part will match any chars, 0 or more occurrences, as few as possible, up to the first ., and attr_type .* pattern will greedily "eat up" the rest of the line up to its end.