Regex match string UNTIL string in a comma separated line - regex

All words start with "Passed", but I only want to match those that also end with "Unique".
Input:
PassedShownWeekUnique,PassedShownDayUnique,PassedFailedWeek,PassedFailedDayUnique,Passed1Week,Passed1WeekUnique
Desired output:
PassedShownWeekUnique,PassedShownDayUnique,PassedFailedDayUnique,Passed1WeekUnique
I tried regex Passed.* and it matches everything. Passed.*Unique isn't working, anyone help?

Just use the following. Match from Passed, then everything, until Unique
Passed.*Unique
if [[ $line =~ Passed.*Unique ]]; then echo line matched $line done; fi
EDIT: Since op revised his question to be a comma separated line.
line=PassedShownWeekUnique,PassedShownDayUnique,PassedFailedWeek,PassedFailedDayUnique,Passed1Week,Passed1WeekUnique
REGEX=Passed.*Unique
IFS=',';
for word in $line; do
if [[ $word =~ $REGEX ]]; then
echo matched $word
fi
done
Output:
matched PassedShownWeekUnique
matched PassedShownDayUnique
matched PassedFailedDayUnique
matched Passed1WeekUnique

You can either use the regex:
Unique$
to get lines that end with the word "Unique", or:
^Passed.+?Unique$
to get lines that start with "Passed" and end with "Unique". Depending on your specific implementation, you may want to choose one or the other.
And if you have comma-separated input, as you described:
(Passed.+?Unique),|$
This will capture each instance of a word that starts with "Passed" and ends with "Unique". You can check each capture group to print out the item that it matched.

How about you try to use ^ and $
^ Matches the empty string at the beginning of a line; also represents the characters not in the range of a list.
$ Matches the empty string at the end of a line.
So something like this
^Passed.*?Unique$
You can read more about it here.

Related

Search and replace a special character in perl

I want to search a character and replace it with a string. First, I search for ':' and replace it with 'to'. Next I want to search '$' and replace it with 'END'. This is the code that I've tried. In below code, it work for the first character but not the second character. I tried to use backslash to escape the special character '$' but it still did not work. What else can I do?
$string = "[9:8],
if ($string =~ /^.*:+/){
$stringreplaced =~ s/:/to/g;
}
elsif ($string =~ /^.*\$+/){
$stringreplaced =~ s/\$/END/g;
}
First of all, the code you posted doesn't even compile, yet you say it actually ran. Only post code that you've run.
Second, you're matching against the wrong string. You're checking if $string contains the character, but you replace the characters in $stringreplaced. ALWAYS use use strict; use warnings;. This would have caught this error.
Third, you only check if the character (: or $) is on the first line. This is because . doesn't match line feeds without /s.
Finally, You only check if the string contains $ if it doesn't contain : because you used elsif.
The following is all you need:
$string =~ s/:/to/g;
$string =~ s/\$/END/g;

Regex doesn't match with the lines in txt file

I'm reading the lines from a text file and check if it matches with the regex that I've created or not.
But it always says that your regex didn't match but the regex tool shows that it matches with my regular explanation.
while read line
do
name=$line
BRANCH_REGEX="\d{10}\-[^_]*\_\d{13}"
if [[ $name =~ $BRANCH_REGEX ]];
then
echo "BRANCH '$name' matches BRANCH_REGEX '$BRANCH_REGEX'"
else
echo "BRANCH '$name' DOES NOT MATCH BRANCH_REGEX '$BRANCH_REGEX'"
fi
done < names.txt
names.txt includes lines for example :
9000999484-suchocka_1416578464908
9000989944-schubertk_1416582641605
9001026342-extbeerfelde_1416586904787
9000687045-sturmjo_1416573131629
9001059401-extburghartswieser_1416405627982
9000806302-PDPUPDATE_1357830207068
9000658783-PDPUPDATE_1360445087963
BRANCH_REGEX="/\d{10}\-[^_]*\_\d{13}"
↑
Remove the leading /, none of your lines begin with it.
Also note that _ doesn't need to be escaped, you can write _ instead of \_.
Change your regex to:
BRANCH_REGEX="[0-9]{10}-[^_]*_[0-9]{13}"
Or else:
BRANCH_REGEX="[[:digit:]]{10}-[^_]*_[[:digit:]]{13}"
As BASH regex doesn't support \d property. There is no need to escape hyphens.

Regex not working, at least in command line

I have a regex:
($value) = $line =~ /\ABC(.+?)\#/;
For input, e.g.:
(32321213321) ABC 24432.232 #Junk
Which is meant to catch the number between FD and #.
When I run it through the command line, it returns a space. Through Padre, it returns a space + the number before #.
Is there something wrong with the regex?
In your regex, you have escaped the A. This then becomes an escape sequence, an assertion \A to match the beginning of the string. Another version of the same escape is ^ . And your string does not start there, so the regex cannot match. You have another redundant escape as well, before #. The regex you need is
/ABC(.+?)#/
You can use:
$line =~ /ABC *([0-9 ]+?) *#/;
OR better:
$line =~ /ABC *(\d+(?: \d+)*) *#/;

Substitute first character before match

For each line I need to add a semicolon exactly one character before the first match of an alphanumeric sign but only for the alphanumeric sign after the first appearance of a semicolon.
Example:
Input:
00000001;Root;;
00000002; Documents;;
00000003; oracle-advanced_plsql.zip;file;
00000004; Public;;
00000005; backup;;
00000006; 20110323-JM-F.7z.001;file;
00000007; 20110426-JM-F.7z.001;file;
00000008; 20110603-JM-F.7z.001;file;
00000009; 20110701-JM-F-via-summer_school;;
00000010; 20110701-JM-F-via-summer_school.7z.001;file;
Desired output:
00000001;;Root;;
00000002; ;Documents;;
00000003; ;oracle-advanced_plsql.zip;file;
00000004; ;Public;;
00000005; ;backup;;
00000006; ;20110323-JM-F.7z.001;file;
00000007; ;20110426-JM-F.7z.001;file;
00000008; ;20110603-JM-F.7z.001;file;
00000009; ;20110701-JM-F-via-summer_school;;
00000010; ;20110701-JM-F-via-summer_school.7z.001;file;
Could someone helps me please to create Perl regex for that? I'd need it in a program, not as a oneliner.
This is a way to insert a semi-colon after the first semi-colon and whitespace, but before the first non-whitespace.
s/;\s*\K(?=\S)/;/
If you feel the need, you can use \w instead of \S, but I felt with this input it was an unnecessary specification.
The \K (keep) escape is similar to a lookbehind assertion in that it does not remove what it matches. The same goes for the lookahead assertion, so all this substitution does is insert a semi-colon in the designated spot.
First of all, here is a program that seems to match your requirements:
#/usr/bin/perl -w
while(<>) {
s/^(.*?;.*?)(\w)/$1;$2/;
print $_;
}
Store it in a file 'program.pl', make it executable with 'chmod u+x program.pl' and run it on your input data like this:
program.pl input-data.txt
Here is an explanation of the regular expression:
s/ # start search-and-replace regexp
^ # start at the beginning of this line
( # save the matched characters until ')' in $1
.*?; # go forward until finding the first semicolon
.*? # go forward until finding... (to be continued below)
)
( # save the matched characters until ')' in $2
\w # ... the next alphanumeric character.
)
/ # continue with the replace part
$1;$2 # write all characters found above, but insert a ; before $2
/ # finish the search-and-replace regexp.
Based on your sample input, I would use a more specific regular expression:
s/^(\d*; *)(\w)/$1;$2/;
This expression starts at the beginning of the line, skips over numbers (\d*) followed by the first semicolon and space. Before the following word character, it inserts a semicolon.
Take what fits best to your needs!
First of all thank you for your really great answers!
Actually my code snippet looks like this:
our $seperator=";" # at the beginning of the file
#...
sub insert {
my ( $seperator, $line, #all_lines, $count, #all_out );
$count = 0;
#all_lines = read_file($filename);
foreach $line (#all_lines) {
$count = sprintf( "%08d", $count );
chomp $line;
$line =~ s/\:/$seperator/; # works
$line =~ s/\ file/file/; # works
#$line=~s/;\s*\K(?=\S)/;/; # doesn't work
$line =~ s/^(.*?$seperator.*?)(\w)/$1$seperator$2/; # doesn't work
say $count . $seperator . $line . $seperator;
$count++; # btw, is there maybe a hidden index variable in a foreach-loop I could us instead of a new variable??
push( #all_out, $count . $seperator . $line . $seperator . "\n" );
}
write_file( $csvfile, #all_out ); # using File::Slurp
}
In order to get the input which I presented you, I made already some small substitutions, as you can see in the beginning of the foreach-loop.
I am curious, why the regular expressions presented by TLP and Yaakov do not work in my code. In general they work, but only when written like in the example which Yaakov gave:
while(<>) {
s/^(.*?;.*?)(\w)/$1;$2/;
print $_;
}

Perl - remove first word in a string with regexps

I'm new to both Perl and reg-ex's, and I'm trying to remove the first word in a string (or the first word in a line in a text file) , along with any whitespace that follows it.
For example, if my string is 'one two abd123words', I want to remove 'one '.
The code I was trying is: $line =~/(\S)$/i;
but this only gives me the last word.
If it makes any difference, the word i'm trying to remove is an input, and stored as $arg.
To remove the first word of each line use:
$line =~ s/^\S+\s*//;
EDIT for a explanation:
s/.../.../ # Substitute command.
^ # (Zero-width) Begin of line.
\S+ # Non-space characters.
\s* # Blank-space characters.
// # Substitute with nothing, so remove them.
You mean, like this? :
my $line = 'one two abd123words';
$line =~ s/^\s*\S+\s*//;
# now $line is 'two abd123words'
(That removes any initial whitespace, followed by a one or more non-whitespace characters, followed by any newly-initial whitespace.)
In one-liner form:
$ perl -pi.bak -e 's{^\s*\S+\s*}//' file.txt