multiple name match the pattern - regex

Input file:File name='sample1.txt'
lion is a good friend (Host=lion) (Port=animal) and tiger is
(Host=Tiger)(Port=an)
burger is a food (Host=Burger)(Port=Food)
I have data as shown in the above txt file.I want to collect the host and port in each line from a txt file and place them in new txt file
Required Outfile:
lion:animal
Tiger:an
Burger:Food
Code used till nw:
cat sample1.txt | perl -ne 'print "$1=$2\n" if(/Host=([\w.]*.'-'*[\w.]*.).*Port=(\d+)/)' > sample2.txt
sed 's|[()]||g' sample2.txt > sample3.txt
Obtained output:
lion:animal
Burger:Food
Not getting the Tiger and an:
Problem : I am not able to get the host and port in same line which is present more then once..i some line it have only one host and port value..in other line there are more than one host and port value..pls help me to slove this ..thank you ..:)

If you want to have all entries in the order of appearance, you can use something like that:
perl -nle 's{Host=([^)]*).*?Port=([^)]*)}{print "$1:$2"}ge' < sample1.txt > sample3.txt

Related

Grepping two patterns from event logs

I am seeking to extract timestamps and ip addresses out of log entries containing a varying amount of information. The basic structure of a log entry is:
<timestamp>, <token_1>, <token_2>, ... ,<token_n>, <ip_address> <token_n+2>, <token_n+3>, ... ,<token_n+m>,-
The number of tokens n between the timestamp and ip address varies considerably.
I have been studying regular expressions and am able to grep timestamps as follows:
grep -o "[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}T[0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\}"
And ip addresses:
grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}'
But I have not been able to grep both patterns out of log entries which contain both. Every log entry contains a timestamp, but not every entry contains an ip address.
Input:
2021-04-02T09:06:44.248878+00:00,Creation Time,EVT,WinEVTX,[4624 / 0x1210] Source Name: Microsoft-Windows-Security-Auditing Message string: An account was successfully logged on.\n\nSubject:\n\tSecurity ID:\t\tS-1-5-18\n\tAccount Name:\t\tREDACTED$\n\tAccount Domain:\t\tREDACTED\n\tLogon ID:\t\tREDACTED\n\nLogon Type:\t\t\t10\n\nNew Logon:\n\tSecurity ID:\t\tREDACTED\n\tAccount Name:\t\tREDACTED\n\tAccount Domain:\t\tREDACTED\n\tLogon ID:\t\REDACTED\n\tLogon GUID:\t\tREDACTED\n\nProcess Information:\n\tProcess ID:\t\tREDACTED\n\tProcess Name:\t\tC:\Windows\System32\winlogon.exe\n\nNetwork Information:\n\tWorkstation:\tREDACTED\n\tSource Network Address:\t255.255.255.255\n\tSource Port:\t\t0\n\nDetailed Authentication Information:\n\tLogon Process:\t\tUser32 \n\tAuthentication Package:\tNegotiate\n\tTransited Services:\t-\n\tPackage Name (NTLM only):\t-\n\tKey Length:\t\t0\n\nThis event is generated when a logon session is created. It is generated on the computer that was accessed.\n\nThe subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service or a local process such as Winlogon.exe or Services.exe.\n\nThe logon type field indicates the kind of logon that occurred. The most common types are 2 (interactive) and 3 (network).\n\nThe New Logon fields indicate the account for whom the new logon was created i.e. the account that was logged on.\n\nThe network fields indicate where a remote logon request originated. Workstation name is not always available and may be left blank in some cases.\n\nThe authentication information fields provide detailed information about this specific logon request.\n\t- Logon GUID is a unique identifier that can be used to correlate this event with a KDC event.\n\t- Transited services indicate which intermediate services have participated in this logon request.\n\t- Package name indicates which sub-protocol was used among the NTLM protocols.\n\t- Key length indicates the length of the generated session key. This will be 0 if no session key was requested. Strings: ['S-1-5-18' 'DEVICE_NAME$' 'NETWORK' 'REDACTED' 'REDACTED' 'USERNAME' 'WORKSTATION' 'REDACTED' '10' 'User32 ' 'Negotiate' 'REDACTED' '{REDACTED}' '-' '-' '0' 'REDACTED' 'C:\\Windows\\System32\\winlogon.exe' '255.255.255.255' '0' '%%1833'] Computer Name: REDACTED Record Number: 1068355 Event Level: 0,winevtx,OS:REDACTED,-
Desired Output:
2021-04-02T09:06:44, 255.255.255.255
$ sed -En 's/.*([0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}).*[^0-9]([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}).*/\1, \2/p' file
2021-04-02T09:06:44, 255.255.255.255
Your regexps can be reduced by removing some of the explicit repetition though:
$ sed -En 's/.*([0-9]{4}(-[0-9]{2}){2}T([0-9]{2}:){2}[0-9]{2}).*[^0-9](([0-9]{1,3}\.){3}[0-9]{1,3}).*/\1, \4/p' file
2021-04-02T09:06:44, 255.255.255.255
It could be simpler still if all of the lines in your log file start with a timestamp:
$ sed -En 's/([^,.]+).*[^0-9](([0-9]{1,3}\.){3}[0-9]{1,3}).*/\1, \2/p' file
2021-04-02T09:06:44, 255.255.255.255
If you are looking for lines that contain both patterns, it may be easiest to do it two separate searches.
If you're searching your log file for lines that contain both "dog" and "cat", it's usually easiest to do this:
grep dog filename.txt | grep cat
The grep dog will find all lines in the file that match "dog", and then the grep cat will search all those lines for "cat".
You seem not to know the meaning of the "-o" switch.
Regular "grep" (without "-o") means: give the entire line where the pattern can be found. Adding "-o" means: only show the pattern.
Combining two "grep" in a logical AND-clause can be done using a pipe "|", so you can do this:
grep <pattern1> <filename> | grep <pattern2>

sed regex to clean input domain list of www. www1. ftp. etc

I have tried so many combinations and suggestions now to get this to work but I am just not succeeding.
UPDATE: to Provide more info
I have an input list like this
0------------0-------------0.0n-line.info
0-0--0-000.com
0-3.us
aw.dermalmask.com
idolstudio.free.fr
idolstudio2.free.fr
something.blogspot.com
anything.blogspot.com
xxx.blogspot.ca
www.hola.org
www10.a8.net
www11.alsto.com
www148.myquicksearch.com
ftp.thaitattoo.nl
ftp01.pornocrawler.ws
ftp04.pornocrawler.ws
g.blogads.com
wvw.tielecreidito-pe.com
And I was given a sed by someone which almost get's this right but is missing escaping some characters and stripping off some periods.
I am using
sed -r 's:(^.?(aw|www|ftp|ww|wvw)[[:alnum:]]?.|^..?)::g' input.txt > output.txt
But it gives me this output
-----------0-------------0.0n-line.info
.a8.net
.alsto.com
.pornocrawler.ws
0--0-000.com
3.us
8.myquicksearch.com
blogads.com
dermalmask.com
hola.org
mething.blogspot.com
olstudio.free.fr
olstudio2.free.fr
thaitattoo.nl
tielecreidito-pe.com
x.blogspot.ca
ything.blogspot.com
Instead of this
0-----------0-------------0.0n-line.info
0-0--0-000.com
0-3.us
dermalmask.com
idolstudio.free.fr
idolstudio2.free.fr
something.blogspot.com
anything.blogspot.com
xxx.blogspot.ca
hola.org
a8.net
alsto.com
myquicksearch.com
thaitattoo.nl
pornocrawler.ws
pornocrawler.ws
blogads.com
tielecreidito-pe.com
And Ultimately I would actually like this kind of output.
0n-line.info
0-0--0-000.com
0-3.us
dermalmask.com
idolstudio.free.fr
idolstudio2.free.fr
something.blogspot.com
anything.blogspot.com
xxx.blogspot.ca
hola.org
a8.net
alsto.com
myquicksearch.com
thaitattoo.nl
pornocrawler.ws
pornocrawler.ws
blogads.com
tielecreidito-pe.com
I think your last line is www.tielecreidito-pe.com and not wvw...
So you can try this one
sed '1s/[^.]*.//;s/^[w\|a]w[^\.]*.\|^[f\|g][^\.]*.//' infile

cat lines into file after regex

I want to add the info below into the file usr/local/nagios/etc/hosts.cfg but want to do it just below ##company in the hosts.cfg file. My setup script will contain the info that needs to be added
I have spend hours trying to get sed to just add a line into a file after a marker but to no avail
define host{
use linux-box
host_name $host_name
alias $alias
address $ip
parents $parent
notification_period 24x7
notification_interval 5
}
Previously I used
cat <> /path /filename
EOT
but now I need to do it in specif places in the file
Given the following file:
# some content
###company
If I run the following command:
sed -i 's/###company/&\ndefine host {\nuse host\nhost_name HOSTNAME/' file
Now, the contents of file are:
# some content
###company
define host {
use host
host_name HOSTNAME
Is this what you're looking for?

SED: Replacing String After Single Quote

I have a file named file.txt that contains the following:
CREATE LARGE TABLESPACE LONGSPCE1 IN DATABASE PARTITION GROUP IBMDEFAULTGROUP PAGESIZE 4096 MANAGED BY DATABASE
USING (FILE '/db2data1/TGT_INST/TGT_DB/LONGSPCE1.c1' 1588368,
FILE '/db2data2/TGT_INST/TGT_DB/LONGSPCE1.c2' 1588368,
FILE '/db2data3/TGT_INST/TGT_DB/LONGSPCE1.c3' 1588368,
FILE '/db2data4/TGT_INST/TGT_DB/LONGSPCE1.c4' 1588368)
I am trying to change the numerics after the c[0-9]' to a value of 100.
I have tried the following with no luck.
cat file.txt |sed 's/(c1'' )\([0-9]*\)/$1 100/g'
You can use:
sed "s/\(c[0-9]\+'\) [0-9]\+/\1 100/" file.txt
USING (FILE '/db2data1/TGT_INST/TGT_DB/LONGSPCE1.c1' 100,
FILE '/db2data2/TGT_INST/TGT_DB/LONGSPCE1.c2' 100,
FILE '/db2data3/TGT_INST/TGT_DB/LONGSPCE1.c3' 100,
FILE '/db2data4/TGT_INST/TGT_DB/LONGSPCE1.c4' 100)

Bash Script: sed/awk/regex to match an IP address and replace

I have a string in a bash script that contains a line of a log entry such as this:
Oct 24 12:37:45 10.224.0.2/10.224.0.2 14671: Oct 24 2012 12:37:44.583 BST: %SEC_LOGIN-4-LOGIN_FAILED: Login failed [user: root] [Source: 10.224.0.58] [localport: 22] [Reason: Login Authentication Failed] at 12:37:44 BST Wed Oct 24 2012
To clarify; the first IP listed there "10.224.0.2" was the machine the submitted this log entry, of a failed login attempt. Someone tried to log in, and failed, from the machine at the 2nd IP address in the log entry, "10.224.0.58".
I wish to replace the first occurrence of the IP address "10.224.0.2" with the host name of that machine, as you can see presently is is "IPADDRESS/IPADDRESS" which is useless having the same info twice. So here, I would like to grep (or similar) out the first IP and then pass it to something like the host command to get the reverse host and replace it in the log output.
I would like to repeat this for the 2nd IP "10.224.0.58". I would like to find this IP and also replace it with the host name.
It's not just those two specific IP address though, any IP address. So I want to search for 4 integers between 1 and 3, separated by 3 full stops '.'
Is regex the way forward here, or is that over complicating the issue?
Many thanks.
Replace a fixed IP address with a host name:
$ cat log | sed -r 's/10\.224\.0\.2/example.com/g'
Replace all IP addresses with a host name:
$ cat log | sed -r 's/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}/example.com/g'
If you want to call an external program, it's easy to do that using Perl (just replace host with your lookup tool):
$ cat log | perl -pe 's/(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/`host \1`/ge'
Hopefully this is enough to get you started.
There's variou ways to find th IP addresses, here's one. Just replace "printf '<<<%s>>>' " with "host" or whatever your command name is in this GNU awk script:
$ cat tst.awk
{
subIp = gensub(/\/.*$/,"","",$4)
srcIp = gensub(/.*\[Source: ([^]]+)\].*/,"\\1","")
"printf '<<<%s>>>' " subIp | getline subName
"printf '<<<%s>>>' " srcIp | getline srcName
gsub(subIp,subName)
gsub(srcIp,srcName)
print
}
$
$ gawk -f tst.awk file
Oct 24 12:37:45 <<<10.224.0.2>>>/<<<10.224.0.2>>> 14671: Oct 24 2012 12:37:44.583 BST: %SEC_LOGIN-4-LOGIN_FAILED: Login failed [user: root] [Source: <<<10.224.0.58>>>] [localport: 22] [Reason: Login Authentication Failed] at 12:37:44 BST Wed Oct 24 2012
googled this one line command together. but was unable to pass the founded ip address to the ssh command:
sed -n 's/\([0-9]\{1,3\}\.\)\{3\}[0-9]\{1,3\}/\nip&\n/gp' test | grep ip | sed 's/ip//' | sort | uniq
the "test" is the file the sed command is searching for for the pattern