Regex list/column to comma delimited - regex

I have a column of data, in this case captured from a website. I would like to convert this list into a comma separated list using regex either in the terminal or gedit, etc..
My list:
Liam
Noah
William
James
Oliver
Benjamin
What I want is:
Liam, Noah, William, James, Oliver, Benjamin
or
(Liam, Noah, William, James, Oliver, Benjamin)
or similar.
What I have tried is ^([A-Za-z]+)$("$1",) . I think it finds each name but it is not replacing anything.
It would also be great if something like this worked with numbers as well. Like,
10
20
30
pie
to
10,20,30,pie

Like this:
perl -i -pe 's/\n/, /' file
Output:
Liam, Noah, William, James, Oliver, Benjamin,
Or better:
perl -0ne 'my #a = (split /\n/, $_); print join (", ", #a) . "\n"' file
Output:
Liam, Noah, William, James, Oliver, Benjamin

Related

Substitute second occurrence of a pattern in a line using awk

I need to replace second occurrence of a pattern (that matches the last field) with another and also keep a count of all such changes done in a file.
Example: try.txt
Hi
Change apple orange guava mango banana orange
It's hot outside
Change tom greg fred harry steve fred
George is a cool guy
Change mary lucy becky karly jill karly
thank you
In all the lines that has pattern "Change", I want to replace the last word, for example "orange" in second line, with say, pear. Note that first orange should not be changed. I also want to put a suffix that shows number of changes happened in the file.
I tried following, but it was changing both the occurences (1st orange and 2nd orange, 1st fred and 2nd fred, 1st karly and 2nd karly), whereas I wanted to change only the second occurence.
awk 'BEGIN {cntr=0} {if (/Change/) {gsub($NF,"pear"); OFS=""; print $0,cntr; cntr++} else {print}}' try.txt
The output is:
Hi
Change apple pear guava mango banana pear0
It's hot outside
Change tom greg pear harry steve pear1
George is a cool guy
Change mary lucy becky pear jill pear2
thank you
Desired output is:
Hi
Change apple orange guava mango banana pear0
It's hot outside
Change tom greg fred harry steve pear1
George is a cool guy
Change mary lucy becky karly jill pear2
thank you
When gsub is replaced with sub, it's changing only first occurrence. Any help is appreciated.
This one-liner works for your input:
awk '/Change/{$NF="peal"(i++)}7' file
This line will overwrite the OFS, however, if you want to keep OFS (continuous spaces for example) untouched, you can do:
awk '/Change/{sub(/\S+$/,"peal"(i++))}7' file
I think I found a work-around:
awk 'BEGIN {cntr=0} {if (/Change/) {$NF=$NF_cntr; sub($NF,"pear"); OFS=""; print $0,cntr; OFS=" "; cntr++} else {print}}' try.txt
The output was as I desired.
But I still would like to hear from community for better ways of achieving it.
Thanks

How to remove unique values from an HTML select list with linux program like sed, awk, or grep?

I copy the HTML from a select boxes, and trying to figure out a quick way to remove the HTML so I am left with a list of names. Generally it's not a problem, but these have unique values. I would prefer using a program like grep, sed, awk or vi. Right now I have to go through manually and edit each line. Any help would be great, thank you!
<option value="DL_54292">(DL)finance</option>
<option value="DL_54274">(DL)sales</option>
<option value="510496">Ben Smith</option
<option value="510507">Christopher Jones</option>
<option value="510513">Dawn James</option>
<option value="510533">Joe Wilson</option>
<option value="551825">Mark Jackson</option>
<option value="510562">Ronnie Libby</option>
Edit: Output format suggested by Fede.
Trying to get a simple text list, with line feed or carriage return.
finance
sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby
Use grep to get the texts between the tags,
$ grep -oP '(?<=>)[^<>]+' file
(DL)finance
(DL)sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby
Since you mentioned vi, you can use this line
:%s_^<option value=".*">\(.*\)</option>$_\1_gi
%s -> substitute in all the file
^ -> start of line
.* -> any characters
\(.*\) -> any characters, remember those.
$ -> end of line
\1 -> first remembered match
gi -> ingnore case and take all matches in line
_ -> substitution separator
:s is search and replace, s_foo_bar replaces foo by bar in current line
awk can do this:
awk -F"<|>" '{print $3}'
(DL)finance
(DL)sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby
If I should be true to your output request the data in parentheses should be gone too:
awk -F"<|>" '{sub(/[^)]*)/,"",$3);print $3}'
finance
sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby
If you don't mind using Notepad++, then you can use this regex:
.*>(.*)<.*
And replace with \1

Exchanging words with regex in multiple lines

I am using RegexBuddy software to change this:
Adam Sandler
Into this:
Sandler, Adam
Having very little knowledge about regex, I searched and found the command to solve this
([^_]+) (.+)
and to replace: $2, $1
It works. But there is a problem with multiple line. How can I make it work when the input is like this?
Adam Sandler
Rob Schneider
Ben Stiller
Now, output is like this:
Stiller, Adam Sandler
Rob Schneider
Ben
Use the following settings:
^$ Match at line breaks
Line by line
A perl equivalent,
sub revName{
my $fullname = "#_";
my ($lastname, $firstname);
if($fullname =~ /(\w+)\s+(\w+)/){
$firstname = $1;
$lastname = $2;
}
my $revname = "$lastname, $firstname";
return $revname;
}

Regex code for address separated by commas

How can I extract the state text which is before third comma only using the regex code?
54 West 21st Street Suite 603, New York,New York,United States, 10010
I've managed to extract the rest how I wanted but this one is a problem.
Also, how can I extract the "United States" please?
It looks like you want to use capturing groups:
.*,.*,(.*),(.*),.*
The first capturing group will be "New York" and the second will be "United States" (try it on Rubular).
Or you can split by commas (which will probably be even simpler) as #Jerry points out, assuming the language/tool you're using supports that.
You can use this regex:
(?:[^,]*,){2}([^,]*)
And use captured group # 1 for your desired String.
TL;DR
A lot depends on your regular expression engine, and whether you really need a regular expression or field-splitting. You can do field-splitting in Ruby and Awk (among others), but sed and grep only do regular expressions. See some examples below to get you started.
Ruby
str = '54 West 21st Street Suite 603, New York,New York,United States, 10010'
str.match /(?:.*?,){2}([^,]+)/
$1
#=> "New York"
GNU sed
$ echo '54 West 21st Street Suite 603, New York,New York,United States, 10010' |
sed -rn 's/([^,]+,){2}([^,]+).*/\2/p'
GNU awk
$ echo '54 West 21st Street Suite 603, New York,New York,United States, 10010' |
awk -F, '{print $3}'

RegExp to match everything up to first blank line

I'm writing a bash script that will show me what TV programs to watch today, it will get this information from a text file.
The text is in the following format:
Monday:
Family Guy (2nd May)
Tuesday:
House
The Big Bang Theory (3rd May)
Wednesday:
The Bill
NCIS
NCIS LA (27th April)
Thursday:
South Park
Friday:
FlashForward
Saturday:
Sunday:
HIGNFY
Underbelly
I'm planning to use 'date +%A' to work out the day of the week and use the output in a grep regex to return the appropriate lines from my text file.
If someone can help me with the regex I should be using I would be eternally great full.
Incidentally, this bash script will be used in a Conky dock so if anyone knows of a better way to achieve this then I'd like to hear about it,
Perl solution:
#!/usr/bin/perl
my $today=`date +%A`;
$today=~s/^\s*(\w*)\s*(?:$|\Z)/$1/gsm;
my $tv=join('',(<DATA>));
for my $t (qw(Monday Tuesday Wednesday Thursday Friday Saturday Sunday)) {
print "$1\n" if $tv=~/($t:.*?)(?:^$|\Z)/sm;
}
print "Today, $1\n" if $tv=~/($today:.*?)(?:^$|\Z)/sm;
__DATA__
Monday:
Family Guy (2nd May)
Tuesday:
House
The Big Bang Theory (3rd May)
Wednesday:
The Bill
NCIS
NCIS LA (27th April)
Thursday:
South Park
Friday:
FlashForward
Saturday:
Sunday:
HIGNFY
Underbelly
sed -n '/^Tuesday:/,/^$/p' list.txt
grep -B10000 -m1 ^$ list.txt
-B10000: print 10000 lines before the match
-m1: match at most once
^$: match an empty line
Alternatively, you can use this:
awk '/^'`date +%A`':$/,/^$/ {if ($0~/[^:]$/) print $0}' guide.txt
This awk script matches a consecutive group of lines which starts with /^Day:$/ and ends with a blank line. It only prints a line if the line ends with a character that is not a colon. So it won't print "Sunday:" or the blank line.