RegExp to match everything up to first blank line

RegExp to match everything up to first blank line - regex

I'm writing a bash script that will show me what TV programs to watch today, it will get this information from a text file.
The text is in the following format:
Monday:
Family Guy (2nd May)
Tuesday:
House
The Big Bang Theory (3rd May)
Wednesday:
The Bill
NCIS
NCIS LA (27th April)
Thursday:
South Park
Friday:
FlashForward
Saturday:
Sunday:
HIGNFY
Underbelly
I'm planning to use 'date +%A' to work out the day of the week and use the output in a grep regex to return the appropriate lines from my text file.
If someone can help me with the regex I should be using I would be eternally great full.
Incidentally, this bash script will be used in a Conky dock so if anyone knows of a better way to achieve this then I'd like to hear about it,

Perl solution:
#!/usr/bin/perl
my $today=`date +%A`;
$today=~s/^\s*(\w*)\s*(?:$|\Z)/$1/gsm;
my $tv=join('',(<DATA>));
for my $t (qw(Monday Tuesday Wednesday Thursday Friday Saturday Sunday)) {
print "$1\n" if $tv=~/($t:.*?)(?:^$|\Z)/sm;
}
print "Today, $1\n" if $tv=~/($today:.*?)(?:^$|\Z)/sm;
__DATA__
Monday:
Family Guy (2nd May)
Tuesday:
House
The Big Bang Theory (3rd May)
Wednesday:
The Bill
NCIS
NCIS LA (27th April)
Thursday:
South Park
Friday:
FlashForward
Saturday:
Sunday:
HIGNFY
Underbelly

sed -n '/^Tuesday:/,/^$/p' list.txt

grep -B10000 -m1 ^$ list.txt
-B10000: print 10000 lines before the match
-m1: match at most once
^$: match an empty line

Alternatively, you can use this:
awk '/^'`date +%A`':$/,/^$/ {if ($0~/[^:]$/) print $0}' guide.txt
This awk script matches a consecutive group of lines which starts with /^Day:$/ and ends with a blank line. It only prints a line if the line ends with a character that is not a colon. So it won't print "Sunday:" or the blank line.

Related

Regex list/column to comma delimited

I have a column of data, in this case captured from a website. I would like to convert this list into a comma separated list using regex either in the terminal or gedit, etc..
My list:
Liam
Noah
William
James
Oliver
Benjamin
What I want is:
Liam, Noah, William, James, Oliver, Benjamin
or
(Liam, Noah, William, James, Oliver, Benjamin)
or similar.
What I have tried is ^([A-Za-z]+)$("$1",) . I think it finds each name but it is not replacing anything.
It would also be great if something like this worked with numbers as well. Like,
10
20
30
pie
to
10,20,30,pie

Like this:
perl -i -pe 's/\n/, /' file
Output:
Liam, Noah, William, James, Oliver, Benjamin,
Or better:
perl -0ne 'my #a = (split /\n/, $_); print join (", ", #a) . "\n"' file
Output:
Liam, Noah, William, James, Oliver, Benjamin

Replace delimiter in a file without changing the value between quotes

I have a csv file containing:
# Director, Movie Title, Year, Comment
Ethan Coen, No Country for Old Men, 2007, none
Ethan Coen, "O Brother, Where Art Thou?", 2000, none
Ethan Coen, The Big Lebowski, 1998, "uncredited (with his brother, Joel)"
I want to change the field separator from "," to "|" but I don't want to change the the comma if it's in a quoted string:
so the result should be like:
# Director| Movie Title| Year| Comment
Ethan Coen| No Country for Old Men| 2007| none
Ethan Coen| "O Brother, Where Art Thou?"| 2000| none
Ethan Coen| The Big Lebowski| 1998| "uncredited (with his brother, Joel)"
I tried this but the output I get is :
sed -e 's/(".)(.")/|\1 \2/g'
This is the result I am getting so far
Ethan Coen, |"O Brother, Where Art Thou? ", 2000, none
Ethan Coen, The Big Lebowski, 1998, |"uncredited (with his brother, Joel) "

Approach: Change the quoted commas in \r, replace the remaining commas and change \r back.
The first attempt works with the given input, but is still wrong:
# Wrong
sed -E 's/("[^,]*),([\"]*)/\1\r\2/g; s/,/|/g;s/\r/,/g' file
It fails on lines with 2 commas in one field.
The first replacement should be repeated until all quoted commas are replaced:
sed -E ':a;s/("[^,"]*),([^"]*)"/\1\r\2"/g; ta; s/,/|/g;s/\r/,/g' file

This might work for you (GNU sed):
sed -E 's#"[^"]*"#$(echo &|sed "y/,/\\n/;s/.*/\\\"\&\\\"/")#g;s/.*/echo "&"/e;y/,\n/|,/' file
The substitution translates ,'s between double quotes into newlines, then translates ,'s to |'s and \n's to ,'s.

How to use sed to match Month day year pattern?

I am working with large documents and I want to make two tasks, the first is to substitute all the dates that come in this way: "August 12 2014" or "January 31 1999" so I am using the following line in sed:
s/\(jan.\|feb.\|mar.\|apr.\|may\|jun.\|jul.\|aug.\|sep.\|oct.\|nov.\|dec.\) \([0-9]\|[0-9][0-9]\) [1-9][0-9][0-9][0-9]/tokendate/g
However It does not take for example august, I know I could change aug. for august, but I'd like that sed match any string that begins with aug___ or sep____.
Thanks in advance for any help

sed -r 's/(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[a-z]* [0-9]{1,2} [1-9][0-9]{3}/tokendate/gi' file

Regex code for address separated by commas

How can I extract the state text which is before third comma only using the regex code?
54 West 21st Street Suite 603, New York,New York,United States, 10010
I've managed to extract the rest how I wanted but this one is a problem.
Also, how can I extract the "United States" please?

It looks like you want to use capturing groups:
.*,.*,(.*),(.*),.*
The first capturing group will be "New York" and the second will be "United States" (try it on Rubular).
Or you can split by commas (which will probably be even simpler) as #Jerry points out, assuming the language/tool you're using supports that.

You can use this regex:
(?:[^,]*,){2}([^,]*)
And use captured group # 1 for your desired String.

TL;DR
A lot depends on your regular expression engine, and whether you really need a regular expression or field-splitting. You can do field-splitting in Ruby and Awk (among others), but sed and grep only do regular expressions. See some examples below to get you started.
Ruby
str = '54 West 21st Street Suite 603, New York,New York,United States, 10010'
str.match /(?:.*?,){2}([^,]+)/
$1
#=> "New York"
GNU sed
$ echo '54 West 21st Street Suite 603, New York,New York,United States, 10010' |
sed -rn 's/([^,]+,){2}([^,]+).*/\2/p'
GNU awk
$ echo '54 West 21st Street Suite 603, New York,New York,United States, 10010' |
awk -F, '{print $3}'

Perl Pattern Matching Question

I am trying to match patterns in perl and need some help.
I need to delete from a string anything that matches [xxxx] i.e. opening bracket-things inside it-first closing bracket that occurs.
So I am trying to substitute with space the opening bracket, things inside, first closing bracket with the following code :
if($_ =~ /[/)
{
print "In here!\n";
$_ =~ s/[(.*?)]/ /ig;
}
Similarly I need to match i.e. angular bracket-things inside it-first closing angular bracket.
I am doing that using the following code :
if($_ =~ /</)
{
print "In here!\n";
$_ =~ s/<(.*?)>/ /ig;
}
This some how does not seem to work. My sample data is as below :
'Joanne' <!--Her name does NOT contain "Kathleen"; see the section "Name"--> "'Jo'" 'Rowling', OBE [http://news bbc co uk/1/hi/uk/793844 stm Caine heads birthday honours list] BBC News 17 June 2000 Retrieved 25 October 2000 , [http://content scholastic com/browse/contributor jsp?id=3578 JK Rowling Biography] Scholastic com Retrieved 20 October 2007 better known as 'J K Rowling' ,<ref name=telegraph>[http://www telegraph co uk/news/uknews/1531779/BBCs-secret-guide-to-avoid-tripping-over-your-tongue html Daily Telegraph, BBC's secret guide to avoid tripping over your tongue, 19 October 2006] is a British <!--do not change to "English" or "Scottish" until issue is resolved --> author best known as the creator of the [[Harry Potter]] fantasy series, the idea for which was conceived whilst on a train trip from Manchester to London in 1990 The Potter books have gained worldwide attention, won multiple awards, sold more than 400 million copies and been the basis for a popular series of films, in which Rowling had creative control serving as a producer in two of the seven installments [http://www businesswire com/news/home/20100920005538/en/Warner-Bros -Pictures-Worldwide-Satellite-Trailer-Debut%C2%A0Harry Business Wire - Warner Bros Pictures mentions J K Rowling as producer ]
Any help would be appreciated. Thanks!

You need to use this:
1 while s/\[[^\[\]]*\];
Demo:
% echo "i have [some [square] brackets] in [here] and [here] today."| perl -pe '1 while s/\[[^\[\]]*\]/NADA/g'
i have NADA in NADA and NADA today.
Versus the failing:
% echo "i have [some [square] brackets] in [here] and [here] today." | perl -pe 's/\[.*?\]/NADA/g'
i have NADA brackets] in NADA and NADA today.
The recursive regular expression I leave as an exercise for the reader. :)
EDIT: Eric Strom kindly provided a recursive solution you don’t have to use 1 while:
% echo "i have [some [square] brackets] in [here] and [here] today." | perl -pe 's/\[(?:[^\[\]]*|(?R))*\]/NADA/g'
i have NADA in NADA and NADA today.

$_ =~ /someregex/ will not modify $_
Just a note, $_ =~ /someregex/ and /someregex/ do the same thing.
Also, you don't need to check for the existence of [ or < or the grouping parenthesis:
s/\[.*?\]/ /g;
s/<.*?>/ /g;
will do the job you want.
Edit: changed code to match the fact you're modifying $_

Square brackets have special meaning in the regex syntax, so escape them: /\[.*?\]/. (You also don't need the parentheses here, and doing case-insensitive matching is pointless.)
It's been a long time since I had to wrestle with Perl, but I'm pretty sure that testing $_ with a regex will also modify $_ (even if you aren't using s///). You don't need the test anyway; just run the replacement, and if the pattern doesn't match anywhere, then it won't do anything.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

RegExp to match everything up to first blank line - regex

sed -n '/^Tuesday:/,/^$/p' list.txt

grep -B10000 -m1 ^$ list.txt -B10000: print 10000 lines before the match -m1: match at most once ^$: match an empty line

Related

Regex list/column to comma delimited

Replace delimiter in a file without changing the value between quotes

How to use sed to match Month day year pattern?

Regex code for address separated by commas

Perl Pattern Matching Question

Categories

Resources