Regexp pattern matching IP and UserAgent in an Huge File - regex

I have a huge log file that has a structure like this:
ip=X.X.X.X
userAgent=Firefox
-----
Referer=hxxp://www.bla.org
I want to create a custom output like this:
ip:userAgent
for ex:
X.X.X.X:Firefox
and the pattern will ignore lines which don't start with ip= and userAgent=. (these two must form a pair as i mentioned above.)
I am a newbie administrator and our client needs a sorted file immediately.
Any help will be wonderful.
Thanks.

^ip=(\d+(?:\.\d+){3})[\r\n]+userAgent=(.+)$
Apply in global + multiline mode.
Group 1 will contain the IP, group 2 will contain the user agent string.
Edit: The above expression can be simplified a bit, we can remove the IP address format checking - assuming that there will be nothing but real IP addresses in the log file:
^ip=(\d+\.?)+[\r\n]+userAgent=(.+)$

You can use:
^ip=((?:[0-9]{1,3}\.){3}[0-9]{1,3})$
And
^userAgent=(.*)$
Get the group 1 for both and you will have the desired data.

give it a try (this is in no way robust if there are lines where your log file differs from the example snippet above):
sed -n -e '/^ip=/ {s///
N
s/\nuserAgent=/:/
p
}' HugeFile > customoutput

Related

TCL Regex Skipping Over a Set of Characters and Matching to a New line

I'm working with expect scripting in order to ssh into a device and pull information off of it. However, I'm facing issues parsing the expect_out(buffer) for the data from the commands I send.
This is the contents of my expect_out(buffer):
"mca-cli-op info\r\n\r\nModel: UAP-AC-Lite\r\nVersion: 6.0.21.13673\r\nMAC Address: 10:9f:5r:20:c5:7e\r\nIP Address: 123.123.1.123\r\nHostname: UAP-AC-Lite\r\nUptime: 152662 seconds\r\n\r\nStatus: Connected (http://base_controller<url;>/inform)\r\nUAP-AC-Lite-BZ.6.0.21# "
Right now I'm trying to get the Model (UAP-AC-LITE) without the Model tag.
So the regex expression I'm using is,
expect -re {(?=(Model: ))+[.*\$]}
set model "$expect_out(0,string)"
puts $model
The command doesn't work, but my thought process was that I would perform a look ahead for the Model tag, then match only the subsequent characters after it to the new line. I've tried replacing the "$" with \r\n but that doesn't work either. Can anyone explain what I'm doing wrong? Thanks for the help!
Note: If possible, I wouldn't want to include the newline either, as it might mess up commands that I run which use these variables.
You're close, but the regex is incorrect. Try
expect -re {Model:\s+([^\r]+)}
set model $expect_out(1,string)
The 1 in $expect_out(1,string) means the first set of capturing parentheses.
Regexes are documented at http://www.tcl-lang.org/man/tcl8.6/TclCmd/re_syntax.htm

Bash replace substring after first colon

I am trying to build a connection string that requires pulling 3 IP addresses from another config file. When I get those values, I need to replace the port on each. I plan to replace each port using simple Bash find and replace ${string/pattern/replacement} but my problem is I'm stuck on the best way to parse the pattern out of the IP.
Here is what I have so far:
myFile.config:
ip.1=ip-ip-1-address:1234:5678
ip.2=ip-ip-2-address:1234:5678
ip.3=ip-ip-3-address:1234:5678
Copying some other simple process, I found I can pull the value of each IP like this:
IP1=`grep "ip.1=" /path/to/conf/myFile.config | awk -F "=" '{print $2}'`
which gives me ip.1=ip-ip-1-address:1234:5678. However, I need to replace 1234:5678 with 6543 for example. I've been looking around and I found this awesome answer that detailed using Bash prefix substitution but that relies on knowing the parameter. for example, I would have to do it this way:
test=${ip1##ip-ip-1-address:}
which results in $test being 1234:5678. That's fine but maybe I don't know the IP address as the parameter, so I'm back to considering regex unless there's a way for me to use * as the parameter or something, but I have been unsuccessful so far. For regex, I have tried a bunch such as test=${ip1/(?<=:).*/}.
Note that ${ip1/(?<=:).*/} you tried is an example of string manipulation syntax that does not support regex, only specific patterns.
You seem to want
x='ip.1=ip-ip-1-address:1234:5678'
echo "${x%%:*}:6543" # => ip.1=ip-ip-1-address:6543
The ${x%%:*} takes the value of x and removes all chars from the end till the first : including it. :6543 is added to the result of this manipulation using "${x%%:*}:6543".
To extract that value, you may also use
awk '/^ip\.1=/{sub("^[^:]+:", "");print}' myFile.config
The awk command finds lines starting with ip.1= and then removes all text from the start till the first colon including the colon and only prints these values.

Searching for an unknown IP using FINDSTR

I have text files with hundreds of entries like those below. They mostly come in pairs of 2 IPs. Sometimes they come as 3 IPs. I am trying to find that third IP that is always in the middle of the stack (syntax below). There are maximum 3 different IPs in each file at all times. It is possible that some text files won’t have that middle IP (its occurrence is quite rare). How do I write the search command to find the middle IP from mentioned stacks if there is one in the text file? OS: Win7.
Text file sample syntax:
- saving IP addresses
* 192.168.1.1
* 111.111.222.222
- over
- saving IP addresses
* 192.168.1.1
* 11.123.11.123
* 111.111.222.222
- over
- saving IP addresses
* 192.168.1.1
* 111.111.222.222
- over
I have tried findstr \-.*\*.*\*.*\- pathtofile.txt This should return the block of 3 IPs if there is such block in the file but it didn't work.
Assuming your real file isn't double-spaced like your sample, the following will output the first line (saving...) and line number of matching blocks. Your real problem is findstr will only output one line even if you are matching across lines, so you will never get the whole block output. You need a better tool.
Note: I am using the JPSoft Take Command escape character to put in CR and LF, but you can create them in real batch files as well, though it isn't easy.
findstr /n /R saving.*^r^n.*\..*\..*\..*^r^n.*\..*\..*\..*^r^n.*\..*\..*\..*^r^n sampleIPinput.txt

Change WiFi WPA2 passkey from a script

I'm using Raspbian Wheezy, but this is not a Raspberry Pi specific question.
I am developing a C application, which allows the user to change their WiFi Password.
I did not find a ready script/command for this, so I'm trying to use sed.
I pass the SSID name and new key to a bash script, and the key is replaced for the that ssid block within *etc/wpa_supplicant/wpa_supplicant.conf.*.
My application runs as root.
A sample block is shown below.
network={
ssid="MY_SSID"
scan_ssid=1
psk="my_ssid_psk"
}
so far I've tried the following (I've copied the wpa_supplicant.conf to wpa.txt for trying) :
(1) This tries to do the replacement between a range, started when my SSID is detected, and ending when the closing brace, followed by a newline.
SSID="TRIMURTI"
PSK="12345678"
sed -n "1 !H;1 h;$ {x;/ssid=\"${SSID}\"/,/}\n/ s/[[:space:]]*psk=.*\n/\n psk=\"${PSK}\"\n/p;}" wpa.txt
and
(2) This tries to 'remember' the matched pattern, and reproduce it in the output, but with the new key.
SSID="TRIMURTI"
PSK="12345678"
sed -n "1 !H; 1 h;$ {x;s/\(ssid=\"${SSID}\".*psk=\).*\n/\1\"${PSK}\"/p;}" wpa.txt
I have used hold & pattern buffers as the pattern can span multiple lines.
Above, the first example seems to ignore the range & replaces the 1st instance, and then truncates the rest of the file.
The second example replaces the last found psk value & truncates the file thereafter.
So I need help in correcting the above code, or trying a different solution.
If we can assume the fields will always be in a strict order where the ssid= goes before psk=, all you really need is
sed "/^[[:space:]]*ssid=\"$SSID\"[[:space:]]*$/,/}/s/^\([[:space:]]*psk=\"\)[^\"]*/\1$PSK/" wpa.txt
This is fairly brittle, though. If the input is malformed, or if the ssid goes after the psk in your block, it will break. The proper solution (which however is severe overkill in this case) is to have a proper parser for the input format; while that is in theory possible in sed, it would be much simpler if you were to swtich a higher-level language like Python or Perl, or even Awk.
The most useful case is update a password or other value in configuration is to utilize wpa_cli. E.g.:
wpa_cli -i "wlan0" set_network "0" psk "\"Some5Strong1Pass"\"
wpa_cli -i "wlan0" save_config
The save_config method is required to update cfg file: /etc/wpa_supplicant/wpa_supplicant.conf

Use regex with grep to filter data from the output of a verbose command

I am working with a cloud environment and there is a command that will display all available information about VMs running. here is an example of some of the lines that pertain to one VM.
RESERVATION r-6D0F464B 170506678332 GroupD
INSTANCE i-E9B444A9 emi-376642D8 999.99.999.999 88.888.88.888 running lock_key 0 c1.xlarge 2013-06-17T18:40:56.270Z cluster01 eki-E7E242A3 monitoring-disabled 999.99.999.999 88.888.88.888 ebs
I need to be able to pull the i-********, emi-********, both IP address, its status, the lock_key, the c1.xlarge, and the monitoring-disabled/enabled.
I have been able to pull the whole line with some super simple regex but all of this is well beyond me. If there is another easier method of grabbing this data any suggestions are welcome.
Let's go by parts. Best way I can think of is redirecting the output to a file, in unix-like environments you do it like:
cat your-command > filename.txt
Second, you need to read the file line by line, I would use a python script or a perl script if you know any of those, or whatever language fits you.
Third, you can get values two different ways:
Read columns by position, you can get colums with a regex like: [^\s]+
Write regular expressions for every specific column, so for IP you could have something like this: ([0-9]{1,3}\.){4} for monitoring monitoring-([^\s]+) and so on.
As long as the fields will always be in the same order, all you need to is split on whitespace.
Pseudocode (well, it's ruby, but hopefully you get the idea):
vms = {}
File.open('vm-info').readlines.each do |line|
fields = line.split('\s+')
field_map = {}
vm_name = fields[<index_of_vm_name>]
field_map['emi'] = fields[<index_of_emi>]
field_map['ip_address'] = fields[<index_of_ip_address]
.
.
.
vms[vm_name] = field_map
end
After this, vms will be initialized to contain information about each vm. You can simply print them all out at this point, or continue running data manipulation on them.