My database log file looks like this...
vi test.txt
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076898 ]' LOG: SELECT nspname FROM pg_namespace ORDER BY nspname
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076899 ]' LOG: SET search_path TO "public"
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076900 ]' LOG: SELECT typname
FROM pg_type
WHERE typnamespace = (SELECT oid FROM pg_namespace WHERE nspname = current_schema())
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076897 ]' LOG: SELECT datname FROM pg_database ORDER BY datname
Because of line breaks like '\n' and '\r' I am not able to check the complete query. For e.g.
# grep '2020' test.txt
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076898 ]' LOG: SELECT nspname FROM pg_namespace ORDER BY nspname
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076899 ]' LOG: SET search_path TO "public"
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076900 ]' LOG: SELECT typname
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076897 ]' LOG: SELECT datname FROM pg_database ORDER BY datname
As you can see, the line "FROM pg_type" is missing in the above output. How do I remove line breaks in this text file? I will need to keep line break before '2020' since that is another query.
How do I write a regular expression that will remove all breaks between "LOG:" and "'2020-"
A bit of a dirty solution, but you could do something like:
cat my_log_file.log | tr '\n' ' ' | sed "s/\('[0-9]\{4\}\)/\r\n\1/g"
# OR, simpler version:
tr '\n' ' ' < my_log_file.log | sed "s/\('[0-9]\{4\}\)/\r\n\1/g"
basically, you delete all '\n', and then you add them again where they should be
$ awk '{printf "%s%s", (/^\047/ ? ors : ofs), $0; ors=ORS; ofs=OFS} END{printf "%s", ors}' file
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076898 ]' LOG: SELECT nspname FROM pg_namespace ORDER BY nspname
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076899 ]' LOG: SET search_path TO "public"
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076900 ]' LOG: SELECT typname FROM pg_type WHERE typnamespace = (SELECT oid FROM pg_namespace WHERE nspname = current_schema())
'2020-03-27T08:00:24Z UTC [ db=xdb user=root pid=9037 userid=100 xid=36076897 ]' LOG: SELECT datname FROM pg_database ORDER BY datname
awk 'match($0, r) && NR>1 {print ""}
{printf "%s", $0} END {print ""}
' r="^'2020" test.txt
This might work for you (GNU sed):
sed '/^'\''2020/{:a;N;/^\('\''2020\).*\n\1/!s/\n/ /;ta;P;D}' file
If a line begins '2020, append the next line and if that line does not begin '2020, replace the newline between the lines with a space, append the next line and repeat. Otherwise print/delete the first line and repeat.
The OP has expressed How do I write a regular expression that will remove all breaks between "LOG:" and "'2020-".To handle any year, use:
sed '/^'\''[1-9][0-9][0-9][0-9]/{:a;N;/^'\''[1-9][0-9][0-9][0-9].*\n'\''[1-9][0-9][0-9][0-9]/!s/\n/ /;ta;P;D}' file
Related
I have been using the below query to create a table within Athena,
CREATE EXTERNAL TABLE IF NOT EXISTS test.test_table (
`converteddate` string,
`userid` string,
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
'serialization.format' = ',',
'field.delim' = ','
) LOCATION 's3:XXXX'
TBLPROPERTIES ('has_encrypted_data'='false',"skip.header.line.count"="1")
This returns me:
converteddate | userid
-------------------------------------
2017-11-29T05:00:00 | 00001
2017-11-27T04:00:00 | 00002
2017-11-26T03:00:00 | 00003
2017-11-25T02:00:00 | 00004
2017-11-24T01:00:00 | 00005
I would like to return:
converteddate | userid
-------------------------------------
2017-11-29 05:00:00 | 00001
2017-11-27 04:00:00 | 00002
2017-11-26 03:00:00 | 00003
2017-11-25 02:00:00 | 00004
2017-11-24 01:00:00 | 00005
and have converteddate as a datetime and not a string.
It is not possible to convert the data while table creation. But you can get the data while querying.
You can use date_parse(string,format) -> timestamp function. More details are mentioned here.
For your usecase you can do something like as follows
select date_parse(converteddate, '%y-%m-%dT%H:%i:%s') as converted_timestamp, userid
from test_table
Note : Based on type of your string you have to choose proper specifier for month(always two digits or not), day, hour(12 or 24 hours format), etc
(My answer has one premise: you are using OpenCSVSerDe. It doesn't apply to LazySimpleSerDe, for instance.)
If you have the option of changing the format of your input CSV file, you should convert your timestamp to UNIX Epoch Time. That's the format that OpenCSVSerDe is expecting.
For instance, your sample CSV looks like this:
"converteddate","userid"
"2017-11-29T05:00:00","00001"
"2017-11-27T04:00:00","00002"
"2017-11-26T03:00:00","00003"
"2017-11-25T02:00:00","00004"
"2017-11-24T01:00:00","00005"
It should be:
"converteddate","userid"
"1511931600000","00001"
"1511755200000","00002"
"1511665200000","00003"
"1511575200000","00004"
"1511485200000","00005"
Those integers are the number of milliseconds since Midnight January 1, 1970 for each one of your original dates.
Then you can run a slightly modified version of your CREATE TABLE statement:
CREATE EXTERNAL TABLE IF NOT EXISTS test.test_table (
converteddate timestamp,
userid string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
LOCATION 's3:XXXX'
TBLPROPERTIES ("skip.header.line.count"="1");
If you query your Athena table with select * from test_table, this will be the result:
converteddate userid
------------------------- --------
2017-11-29 05:00:00.000 00001
2017-11-27 04:00:00.000 00002
2017-11-26 03:00:00.000 00003
2017-11-25 02:00:00.000 00004
2017-11-24 01:00:00.000 00005
As you can see, type TIMESTAMP on Athena includes milliseconds.
I wrote a more comprehensive explanation on using types TIMESTAMP and DATE with OpenCSVSerDe. You can read it here.
I need to take the time stamp printed in After FTP connection and check whether it happened today.
I have a log file which contains the following:
---------------------------------------------------------------------
Opening connection for file1.dat
---------------------------------------------------------------------
---------------------------------------------------------------------
Before ftp connection -- time is -- Mon Oct 21 04:01:52 CEST 2013
---------------------------------------------------------------------
---------------------------------------------------------------------
After ftp connection -- time is Mon Oct 21 04:02:03 CEST 2013 .
---------------------------------------------------------------------
---------------------------------------------------------------------
Opening connection for file2.dat
---------------------------------------------------------------------
---------------------------------------------------------------------
Before ftp connection -- time is -- Wed Oct 23 04:02:03 CEST 2013
---------------------------------------------------------------------
---------------------------------------------------------------------
After ftp connection -- time is Wed Oct 23 04:02:04 CEST 2013 .
---------------------------------------------------------------------
Desired Output:
INPUT:file1.dat --> FAIL # since it is Oct 21st considering today is Oct 23.
INPUT:file2.dat --> PASS # since it is Oct 23rd.
INPUT:file3.dat --> FAIL # File information does not exist
What I tried so far:
grep "file1.dat\\|Before ftp connection\\|After ftp connection" logfilename
But this returns all the info that matches either file1.dat OR Before ftp connection OR After ftp connection. Considering the above sample, I get 5 lines out of which last 2 lines are from file2.dat:
Opening connection for file1.dat
Before ftp connection -- time is -- Mon Oct 21 04:01:52 CEST 2013
After ftp connection -- time is Mon Oct 21 04:02:03 CEST 2013 .
Before ftp connection -- time is -- Wed Oct 23 04:02:03 CEST 2013
After ftp connection -- time is Wed Oct 23 01:02:04 CEST 2013 .
I am stuck here. So ideally I need to take Mon Oct 21 04:02:03 CEST 2013 and compare and print the a result FAIL.
Defining the records correctly makes things a lot easier:
$ awk '{print $5,($0~"After.*"d?"PASS":"FAIL")}' d="$(date +'%a %b %d')" RS= file
file1.dat FAIL
file2.dat PASS
Use awk:
# read dates in shell variables
read x m d x x y < <(date)
awk -v f='file2.dat' -v m=$m -v d=$d -v y=$y '$0 ~ f {s=1; next}
s && /After ftp connection/ {
res = ($8==m && $9==d && $12==y) ? "PASS" : "FAIL";
print f, res; exit
}' file.log
file2.dat PASS
FOLLOW UP by OP:
I achieved the intended results by this:
check_success ()
{
CHK_DIR=/Archive
if [[ ! -d ${CHK_DIR} ]]; then
exit 1
elif [[ ! -d ${LOG_FOLDER} ]]; then
exit 1
fi
count_of_files=$(ls -al --time-style=+%D $CHK_DIR/*.dat | grep $(date +%D) | cut -f1 | awk '{ print $7}' | wc -l)
if [[ $count_of_files -lt 1 ]]; then
exit 2
fi
list_of_files=$(basename $(ls -al --time-style=+%D $CHK_DIR/*.dat | grep $(date +%D) | cut -f1 | awk '{ print $7}'))
for filename in $list_of_files
do
filename=basename filename
lg_name=$(grep -El "Opening.*$filename" $LOG_FOLDER/* | head -1 )
m=$(date +%b)
d=$(date +%d)
y=$(date +%Y)
output=$(awk -v f=$filename -v m=$m -v d=$d -v y=$y '$0 ~ f {s=1; next} s && /After ftp connection/ { res = ($8==m && $9==d && $12==y) ? "0" : "1"; print res; exit }' $lg_name)
if [[ ${output} != 0 ]]; then
exit 2
fi
done
exit 0
}
I used Anubhava's snippet, nevertheless Thanks to all the three champs.
It was tricky!
$ awk -vtoday=$(date "+%Y%m%d")
'/^Opening/ {file=$4}
/^After ftp connection/
{$1=$2=$3=$4=$5=$6=$NF="";
r="date -d \"" $0 "\" \"+%Y%m%d\""; r | getline dat;
if (today==dat) {print file, "PASS"}
else {print file, "FAIL"}}
' file
For file1.dat FAIL
For file2.dat PASS
Explanation
-vtoday=$(date "+%Y%m%d") gives today's date with "20131023" format
/^Opening/ {file=$4} gets lines starting with Opening and store the filename, that happens to be in the 4th field.
/^After ftp connection/ on lines starting with "After ftp connection...", do:
{$1=$2=$3=$4=$5=$6=$NF=""; delete up to 6th field and last one so the rest is the date info.
r="date -d \"" $0 "\" \"+%Y%m%d\""; r | getline dat; calculate the date on YYYYMMDD format of that line.
if (today==dat) {print file, "PASS} make comparison of dates.
else {print file, "FAIL"} idem.
I have a hugh log file with sql debugs. I want to change them to proper sql.
For example in logs:
**** debug Fri Aug 09 13:05:13 PDT 2013 1376078713845 /atg/web/viewmapping/ViewMappingRepository executeUpdate2: [++SQLInsert++]
**** debug Fri Aug 09 13:05:13 PDT 2013 1376078713845 /atg/web/viewmapping/ViewMappingRepository INSERT INTO vmap_pv2pvad_rel(view_id,name,attr_id)
**** debug Fri Aug 09 13:05:13 PDT 2013 1376078713845 /atg/web/viewmapping/ViewMappingRepository VALUES(?,?,?)
**** debug Fri Aug 09 13:05:13 PDT 2013 1376078713845 /atg/web/viewmapping/ViewMappingRepository -- Parameters --
**** debug Fri Aug 09 13:05:13 PDT 2013 1376078713845 /atg/web/viewmapping/ViewMappingRepository p[1] = {pd} AmPvPuRoleAllMembers (java.lang.String)
**** debug Fri Aug 09 13:05:13 PDT 2013 1376078713845 /atg/web/viewmapping/ViewMappingRepository p[2] = {pd} synthCategoryDefinition (java.lang.String)
**** debug Fri Aug 09 13:05:13 PDT 2013 1376078713845 /atg/web/viewmapping/ViewMappingRepository p[3] = {pd: attributes} AmPvAttPuRolesMembersCatDef (atg.adapter.gsa.SingleValueGSAId)
**** debug Fri Aug 09 13:05:13 PDT 2013 1376078713845 /atg/web/viewmapping/ViewMappingRepository [--SQLInsert--]
With below perl code
#!/usr/bin/perl
use strict;
use warnings;
open(FILE1, 'dps_ui_preview_view.log');
while (<FILE1>) {
if (/\+\+SQL[Insert,Update]/../\-\-SQL[Insert,Update]/) {
$_ =~ s/^.*\t//g;
print $_;
}
}
close(FILE1);
I am able to grab below sql format
[++SQLUpdate++]
UPDATE vmap_im
SET item_path=?
WHERE id=?
-- Parameters --
p[1] = {pd: itemPath} /atg/userprofiling/ExternalProfileRepository (java.lang.String)
p[2] = {pd} AmImEuUsers (java.lang.String)
[--SQLUpdate--]
[++SQLInsert++]
INSERT INTO vmap_fh(id,name,component_path)
VALUES(?,?,?)
-- Parameters --
p[1] = {pd} AmFhPuFH (java.lang.String)
p[2] = {pd: name} External Preview Users formHandler (java.lang.String)
p[3] = {pd: path} /atg/remote/userprofiling/assetmanager/editor/service/PreviewUserAssetService (java.lang.String)
[--SQLInsert--]
I need to change above statements like below
UPDATE vmap_im SET item_path='/atg/userprofiling/ExternalProfileRepository' WHERE id='AmImEuUsers';
INSERT INTO vmap_fh(id,name,component_path) VALUES('AmFhPuFH','External Preview Users formHandler','/atg/remote/userprofiling/assetmanager/editor/service/PreviewUserAssetService')
Can anyone please let me know how can I achieve that? Any guide lines is much appreciated.
With help of regex grouping mentioned by user2676655
Here is the full code
#!/usr/bin/perl
use strict;
use warnings;
my $log;
my $match = 0;
my $q="";
my $p="";
open(FILE1, 'dps_ui_preview_view.log');
while (<FILE1>) {
if (/\+\+SQL[Insert,Update]/../\-\-SQL[Insert,Update]/) {
$_ =~ s/^.*\t//g;
$log = $log . $_;
if($_ =~ m/\-\-\]/) {
($q,$p) = ($log =~ /\+\+\](.*)-- Parameters --(.*)\[--/s);
$p =~ s/^\s+//;
my #params = split(/\n/,$p);
foreach my $i (#params) {
my ($val) = ($i =~ /\}(.*)\(/);
$val =~ s/^\s+//;
$val =~ s/\s+$//;
$q =~ s/\?/'$val'/;
}
$q =~ s/\n/ /g;
$q =~ s/\s+/ /g;
$q =~ s/ $/;/g;
print $q,"\n";
$log="";
}
}
}
close(FILE1);
Try this:
$log = "[++SQLUpdate++]
UPDATE vmap_im
SET item_path=?
WHERE id=?
-- Parameters --
p[1] = {pd: itemPath} /atg/userprofiling/ExternalProfileRepository (java.lang.String)
p[2] = {pd} AmImEuUsers (java.lang.String)
[--SQLUpdate--]";
($q,$p) = ($log =~/\+\+\](.*)-- Parameters --(.*)\[--/s);
$p =~ s/^\s+//;
#params = split(/\n/, $p);
foreach (#params) {
my ($val) = ($_ =~/\}(.*)\(/) ;
$val =~ s/^\s+//;
$val =~ s/\s+$//;
$q =~s/\?/'$val'/;
}
$q =~s/\n/ /g;
$q =~s/\s+/ /g;
print $q;
I'm trying to get all "CP" values from a log file like below:
2013-06-27 17:00:00,017 INFO - [AlertSchedulerThread18] [2013-06-27 16:59:59, 813] -- SN: 989333333333 ||DN: 989333333333 ||CategoryId: 4687 ||CGID: null||Processing started ||Billing started||Billing Process: 97 msec ||Response code: 2001 ||Package id: 4387 ||TransactionId: 66651372336199820989389553437483742||CDR:26 msec||CDR insertion: 135 msec||Successfully inserted in CDR Table||CP:53 msec||PROC - 9 msec||Successfully executed procedure call.||Billing Ended||197 msec ||Processing ended
2013-06-27 17:00:00,018 INFO - [AlertSchedulerThread62] [2013-06-27 16:59:59, 824] -- SN: 989333333333 ||DN: 989333333333 ||CategoryId: 3241 ||CGID: null||Processing started ||Billing started||Billing Process: 61 msec ||Response code: 2001 ||Package id: 2861 ||TransactionId: 666513723361998319893580191324005184||CDR:25 msec||CDR insertion: 103 msec||Successfully inserted in CDR Table||CP:59 msec||PROC - 24 msec||Successfully executed procedure call.||Billing Ended||187 msec ||Processing ended
2013-06-27 17:00:00,028 INFO - [AlertSchedulerThread29] [2013-06-27 16:59:59, 903] -- SN: 989333333333 ||DN: 989333333333 ||CategoryId: 4527 ||CGID: null||Processing started ||Billing started||Billing Process: 47 msec ||Response code: 2001 ||Package id: 4227 ||TransactionId: 666513723361999169893616006323701572||CDR:22 msec||CDR insertion: 83 msec||Successfully inserted in CDR Table||CP:21 msec||PROC - 7 msec||Successfully executed procedure call.||Billing Ended||112 msec ||Processing ended
...getting output like this:
CP:53 msec
CP:59 msec
CP:21 msec
How can I do this using awk?
cut is always good and fast for these things:
$ cut -d"*" -f3 file
CP:53 msec
CP:59 msec
CP:21 msec
Anyway, these awk ways can make it:
$ awk -F"|" '{print $27}' file | sed 's/*//g'
CP:53 msec
CP:59 msec
CP:21 msec
or
$ awk -F"\|\|" '{print $14}' file | sed 's/*//g'
CP:53 msec
CP:59 msec
CP:21 msec
Or also
$ awk -F"*" '{print $3}' file
CP:53 msec
CP:59 msec
CP:21 msec
In both, we set the field delimiter to split the string as some specific character | or *. Then we print a certain block of the split text.
How about a hilarious sed command?
sed -n 's/.*\*\*\(.*\)\*\*.*/\1/p'
$ awk -F'[|][|]' '{print $14}' file
**CP:53 msec**
**CP:59 msec**
**CP:21 msec**
If you REALLY have '*'s in the input, just tweak to remove them:
$ awk -F'[|][|]' '{gsub(/\*/,""); print $14}' file
CP:53 msec
CP:59 msec
CP:21 msec
There's always grep:
grep -o 'CP:[[:digit:]]* msec' log.txt
If it's not necessarily going to be msec every time, you can just take everything up to the pipe:
grep -o 'CP:[^|]*' log.txt
With awk:
awk -F"[|*]+" '{ print $14 }' file
Code for GNU sed
$sed -r 's/.*(CP:[0-9]+\smsec).*/\1/' file
CP:53 msec
CP:59 msec
CP:21 msec
I am writing a program that somewhat mimics the last command in UNIX, and I am trying to use backreferencing in my solution. My program does exactly what it is supposed to do but I get a run time error/warning. My question is why is this error/warning coming up and how can I fix an issue like this?
If you need more information I can provide.
Program Execution
./last dodoherty
OUTPUT
Here is a listing of the logins for dodoherty:
1. dodohert pts/1 pc-618-012.omhq. Wed Feb 8 09:19 still logged in
2. dodohert pts/6 ip98-168-203-118 Tue Feb 7 19:19 - 20:50 (01:31)
3. dodohert pts/3 137.48.207.178 Tue Feb 7 14:00 - 15:06 (01:05)
4. dodohert pts/1 137.48.219.250 Tue Feb 7 12:32 - 12:36 (00:04)
5. dodohert pts/21 137.48.207.237 Tue Feb 7 12:07 - 12:23 (00:16)
6. dodohert pts/11 ip98-168-203-118 Mon Feb 6 20:50 - 23:29 (02:39)
7. dodohert pts/9 ip98-168-203-118 Mon Feb 6 20:31 - 22:57 (02:26)
8. dodohert pts/5 pc-618-012.omhq. Fri Feb 3 10:24 - 10:30 (00:05)
Use of uninitialized value $1 in addition (+) at ./odoherty_last.pl line 43.
Use of uninitialized value $2 in addition (+) at ./odoherty_last.pl line 44.
Here is a summary of the time spent on the system for dodoherty:
dodoherty
8
8:6
The Code (Snippet of where the error is coming from, Also this is the only time $1 and $2 are used.)
foreach my $line2 (#user)
{
$line2 =~ /\S*\((\d{2,2})\:(\d{2,2})\)\s*/;
$hours = $hours + $1;
$mins = $mins + $2;
if( $mins >= 60 )
{
$hours = $hours + 1;
$mins = $mins - 60;
}
}
I think the problem might be in the following line.
1. dodohert pts/1 pc-618-012.omhq. Wed Feb 8 09:19 still logged in
That is because nothing matches the pattern so $1 and $2 are undefined.
As has been noted in other answers, your regex does not match, and therefore $1 and $2 are undefined. It is necessary to always check to make sure the appropriate regex matches before using these variables.
Below I have upgraded your script with some proper perl code. += and %= are handy operator in this case. You can read about them in perlop
Your regex uses \S* and \s*, both of which are completely unnecessary here, since your regex is not anchored to anything else. In other words, \S*foo\s* will match any string that contains foo, since it can match the empty string around foo. Also, {2,2} means "match at least 2 times, max 2", which in effect is the same as {2} "match 2 times".
You will see that I changed your math around, and that is because it assumes that $mins will never be higher than 120. I suppose technically, that is a safe assumption, but doing it like below, it can handle all values of minutes and successfully turn them into hours.
The script below is for demonstration. If you remove DATA and leave <>, you can use this script as-is like so:
last user | perl script.pl
Code:
use strict;
use warnings;
use v5.10; # required for say()
my ($hours, $mins);
while (<DATA>) { # replace with while (<>) for live usage
if (/\((\d{2})\:(\d{2})\)/) {
$hours += $1;
$mins += $2;
if( $mins >= 60 ) {
$hours += int ($mins / 60); # take integer part of division
$mins %= 60; # remove excess minutes
}
}
}
say "Hours: $hours";
say "Mins : $mins";
__DATA__
1. dodohert pts/1 pc-618-012.omhq. Wed Feb 8 09:19 still logged in
2. dodohert pts/6 ip98-168-203-118 Tue Feb 7 19:19 - 20:50 (01:31)
3. dodohert pts/3 137.48.207.178 Tue Feb 7 14:00 - 15:06 (01:05)
4. dodohert pts/1 137.48.219.250 Tue Feb 7 12:32 - 12:36 (00:04)
5. dodohert pts/21 137.48.207.237 Tue Feb 7 12:07 - 12:23 (00:16)
6. dodohert pts/11 ip98-168-203-118 Mon Feb 6 20:50 - 23:29 (02:39)
7. dodohert pts/9 ip98-168-203-118 Mon Feb 6 20:31 - 22:57 (02:26)
8. dodohert pts/5 pc-618-012.omhq. Fri Feb 3 10:24 - 10:30 (00:05)
#!/usr/bin/perl
use strict;
my $hours = 0;
my $mins = 0;
my $loggedIn = 0;
while (<STDIN>)
{
chomp;
if (/\S*\((\d{2,2})\:(\d{2,2})\)\s*/)
{
$hours = $hours + $1;
$mins = $mins + $2;
if($mins >= 60 )
{
$hours = $hours + 1;
$mins = $mins - 60;
}
}
elsif (/still logged in$/)
{
$loggedIn = 1;
}
}
print "Summary: $hours:$mins ", ($loggedIn) ? " (Currently logged in)" : "", "\n";
When ever your RE fails to match, $1 and $2 have no value.
For this reason, it's considered best practice on ever to use $1, $2 etc. inside a conditional which tests the success of the RE.
So don't do:
$string =~ m/(somepattern)/sx;
my $var = $1;
But instead to do something like:
my $var = 'some_default_value';
if($string =~ m/(somepattern)/sx){
$var = $1;
}