Swap two words with regex in a text file using bash

Swap two words with regex in a text file using bash - regex

I have file adr.txt with info:
Franklin Avenue, US, 33123
Laurel Drive, US, 59121
Street King, UK, 00939
Street Williams, US, 19123
Warren Avenue, UK, 93891
Street Court, UK, 89730
Country Club Road, US, 10865
Madison Avenue, US, 36975
Street Front, US, 41911
Cedar Lane, UK, 21563
Garfield Avenue, UK, 00842
Street Cottage, US, 33205
Arlington Avenue, US, 94008
Cedar Avenue, US, 72635
Windsor Drive, US, 34384
Devon Court, UK, 13789
Garfield Avenue, US, 86115
Street Olive, US, 63007
Street Williams, US, 54675
Franklin Avenue, US, 82479
I need to swap the words "Street" and the name of the street to get the following - the name should come first, and then the word "Street".
Franklin Avenue, US, 33123
Laurel Drive, US, 59121
King Street, UK, 00939
Williams Street, US, 19123
Warren Avenue, UK, 93891
Court Street, UK, 89730
Country Club Road, US, 10865
Madison Avenue, US, 36975
Front Street, US, 41911
Cedar Lane, UK, 21563
Garfield Avenue, UK, 00842
Cottage Street, US, 33205
Arlington Avenue, US, 94008
Cedar Avenue, US, 72635
Windsor Drive, US, 34384
Devon Court, UK, 13789
Garfield Avenue, US, 86115
Olive Street, US, 63007
Williams Street, US, 54675
Franklin Avenue, US, 82479
As an example, "sed" command works for me sed -i 's/\(Street\) \(King\)/\2 \1/' adr.txt What regular expression can be used to automatically catch all words with a street name?
I tried sed -i 's/\(Street\) \([WO]{1}[a-z]*[es]{1}\,\)/\2 \1/' adr.txt
I checked the regular expression [WO]{1}[a-z]*[se]{1}\, on regex101.com. It looks for the names "Williams," "Olive," But it does not work in "sed" command.

Your regex attempt was very weird. The simple sed solution would be
sed -i 's/^\(Street\) \([^ ,]*\)/\2 \1/' adr.txt
Though perhaps better to use Awk for this.
awk '/^Street [^ ,]+,/ {
two=$2; $2=$1 ",";
sub(/,$/, "", two);
$1=two }1' adr.txt >newfile
mv newfile adr.txt
As an aside, https://regex101.com/ supports a number of regex dialects, but none of them is exactly the one understood by sed.
Also, {1} in a regex is never useful; if you want to repeat something once, the expression before {1} already does exactly that.

$ cat file
Warren Avenue, UK, 93891
Street Court, UK, 89730
Street Front, US, 41911
Cedar Lane, UK, 21563
Street Cottage, US, 33205
Arlington Avenue, US, 94008
Windsor Drive, US, 34384
Garfield Avenue, US, 86115
Franklin Avenue, US, 82479
Street Muhammad Ali, US, 82479
Street Albert Einstein Jr., US, 82479
awk -F',' -v OFS="," '/^Street /{$1=gensub(/^(Street) (.*)$/,"\\2 \\1",1,$1)}1' file
Warren Avenue, UK, 93891
Court Street, UK, 89730
Front Street, US, 41911
Cedar Lane, UK, 21563
Cottage Street, US, 33205
Arlington Avenue, US, 94008
Windsor Drive, US, 34384
Garfield Avenue, US, 86115
Franklin Avenue, US, 82479
Muhammad Ali Street, US, 82479
Albert Einstein Jr. Street, US, 82479

Simple sed solution:
cat adr.txt | sed -E 's/^(Street) ([^,]+)/\2 \1/

Related

Detect empty line (contain only space) in perl

my program read a file, and i don't want to treat empty line
while (<FICC>) {
my $ligne=$_;
if ($ligne =~ /^\s*$/){}else{
print " $ligne\n";}
but this code also print empty line
the file that i test with contain:
Ms. Ruth Dreifuss Dreifuss Federal Councillor Federal ruth
     
sir christopher warren US Secretary of state secretary of state
     
external economic case federal economic affair conference the Federal Office case
     
US bill clinton bill clinton Mr. Bush
     
Nestle food cs holding swiss Swiss Performance Index Performance

I think it is because of using the \n within your code. Just remove that \n from your code and it should be fine.
Usually people do chomp after reading a line from a file to remove the end of line character.

An easier way to write that is probably to invert the logic and only print lines that contain non-whitespace characters.
while (<FICC>) {
my $ligne = $_;
if ($ligne =~ /\S/) {
print " $ligne"; # No need for linefeed here as $ligne already has one
}
}
Update: Demo using your sample data:
#!/usr/bin/perl
use strict;
use warnings;
while (<DATA>) {
my $ligne = $_;
if ($ligne =~ /\S/) {
print " $ligne";
}
}
__END__
Ms. Ruth Dreifuss Dreifuss Federal Councillor Federal ruth
sir christopher warren US Secretary of state secretary of state
external economic case federal economic affair conference the Federal Office case
US bill clinton bill clinton Mr. Bush
Nestle food cs holding swiss Swiss Performance Index Performance
Output:
Ms. Ruth Dreifuss Dreifuss Federal Councillor Federal ruth
sir christopher warren US Secretary of state secretary of state
external economic case federal economic affair conference the Federal Office case
US bill clinton bill clinton Mr. Bush
Nestle food cs holding swiss Swiss Performance Index Performance
Which seems correct to me.

The reason is also that you're adding a new line, to the end of your string which already has a newline in it "$ligne\n", so use chomp as below
I think the nicer way of doing this is with next (skip to next loop iteration) as it removes some brackets from your code:
while (<FICC>) {
my $ligne=chomp $_;
next if $ligne =~ /^\s*$/;
print " $ligne\n";
}

Trouble with a sed regex

I'm trying to handle a file containing currencies with sed but can't figure out where my error is.
This is a extract from the file :
AED: United Arab Emirates DirhamAFN: Afghan AfghaniALL: Albanian LekAMD: Armenian DramANG: Netherlands Antillean GuldenAOA: Angolan KwanzaARS: Argentine PesoAUD: Australian DollarAWG: Aruban FlorinAZN: Azerbaijani ManatBAM: Bosnia & Herzegovina Convertible MarkBBD: Barbadian DollarBDT: Bangladeshi TakaBGN: Bulgarian LevBIF: Burundian FrancBMD: Bermudian DollarBND: Brunei DollarBOB: Bolivian BolivianoBRL: Brazilian Real*BSD: Bahamian DollarBWP: Botswana PulaBZD: Belize DollarCAD: Canadian Dollar[...]
I want to add a newline before each tree uppercase group followed by the character ":".
What I tried was sed -e 's/\([A-Z]{3}:)/\n\1/g list1.txt > list2.txt, but nothing is changed. In fact, when I just try /[A-Z]{3}/blabla/ nothing happens.
I am puzzled.

sed -r 's/([A-Z]{3}:)/\n\1/g' list1.txt
# or
# sed -e 's/\([A-Z]\{3\}:\)/\n\1/g' list1.txt
return:
AED: United Arab Emirates Dirham
AFN: Afghan Afghani
ALL: Albanian Lek
AMD: Armenian Dram
ANG: Netherlands Antillean Gulden
AOA: Angolan Kwanza
ARS: Argentine Peso
AUD: Australian Dollar
AWG: Aruban Florin
AZN: Azerbaijani Manat
BAM: Bosnia & Herzegovina Convertible Mark
BBD: Barbadian Dollar
BDT: Bangladeshi Taka
BGN: Bulgarian Lev
BIF: Burundian Franc
BMD: Bermudian Dollar
BND: Brunei Dollar
BOB: Bolivian Boliviano
BRL: Brazilian Real*
BSD: Bahamian Dollar
BWP: Botswana Pula
BZD: Belize Dollar
CAD: Canadian Dollar

How to remove unique values from an HTML select list with linux program like sed, awk, or grep?

I copy the HTML from a select boxes, and trying to figure out a quick way to remove the HTML so I am left with a list of names. Generally it's not a problem, but these have unique values. I would prefer using a program like grep, sed, awk or vi. Right now I have to go through manually and edit each line. Any help would be great, thank you!
<option value="DL_54292">(DL)finance</option>
<option value="DL_54274">(DL)sales</option>
<option value="510496">Ben Smith</option
<option value="510507">Christopher Jones</option>
<option value="510513">Dawn James</option>
<option value="510533">Joe Wilson</option>
<option value="551825">Mark Jackson</option>
<option value="510562">Ronnie Libby</option>
Edit: Output format suggested by Fede.
Trying to get a simple text list, with line feed or carriage return.
finance
sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby

Use grep to get the texts between the tags,
$ grep -oP '(?<=>)[^<>]+' file
(DL)finance
(DL)sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby

Since you mentioned vi, you can use this line
:%s_^<option value=".*">\(.*\)</option>$_\1_gi
%s -> substitute in all the file
^ -> start of line
.* -> any characters
\(.*\) -> any characters, remember those.
$ -> end of line
\1 -> first remembered match
gi -> ingnore case and take all matches in line
_ -> substitution separator
:s is search and replace, s_foo_bar replaces foo by bar in current line

awk can do this:
awk -F"<|>" '{print $3}'
(DL)finance
(DL)sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby
If I should be true to your output request the data in parentheses should be gone too:
awk -F"<|>" '{sub(/[^)]*)/,"",$3);print $3}'
finance
sales
Ben Smith
Christopher Jones
Dawn James
Joe Wilson
Mark Jackson
Ronnie Libby

If you don't mind using Notepad++, then you can use this regex:
.*>(.*)<.*
And replace with \1

Regex code for address separated by commas

How can I extract the state text which is before third comma only using the regex code?
54 West 21st Street Suite 603, New York,New York,United States, 10010
I've managed to extract the rest how I wanted but this one is a problem.
Also, how can I extract the "United States" please?

It looks like you want to use capturing groups:
.*,.*,(.*),(.*),.*
The first capturing group will be "New York" and the second will be "United States" (try it on Rubular).
Or you can split by commas (which will probably be even simpler) as #Jerry points out, assuming the language/tool you're using supports that.

You can use this regex:
(?:[^,]*,){2}([^,]*)
And use captured group # 1 for your desired String.

TL;DR
A lot depends on your regular expression engine, and whether you really need a regular expression or field-splitting. You can do field-splitting in Ruby and Awk (among others), but sed and grep only do regular expressions. See some examples below to get you started.
Ruby
str = '54 West 21st Street Suite 603, New York,New York,United States, 10010'
str.match /(?:.*?,){2}([^,]+)/
$1
#=> "New York"
GNU sed
$ echo '54 West 21st Street Suite 603, New York,New York,United States, 10010' |
sed -rn 's/([^,]+,){2}([^,]+).*/\2/p'
GNU awk
$ echo '54 West 21st Street Suite 603, New York,New York,United States, 10010' |
awk -F, '{print $3}'

GNU Sed REGEX find and alter string (no replace)

I have the followinf text:
11 Cherrywood Rise Ashford Kent TN25 4QA United Kingdom N B BONE 02/12 387
Bisham village Bisham Buckinghamshire SL7 1RR United Kingdom Neil Noakes 06/13 488
6 Kynaston Road London London N16 0EX United Kingdom MR N P SALTMARSH 04/13 907
116 Long Acre London London WC2E 9SU United Kingdom Lorna J Gradden 11/14 415
How can I use sed to match the dates "mm/yy" format and alter to "|mm/yy|"
Like: 11 Cherrywood Rise Ashford Kent TN25 4QA United Kingdom N B BONE|02/12|387
Thanks!

does this work for you?
sed -r 's# ([0-9]{2}/[0-9]{2}) #|\1|#' file

Example 1
cat t.txt | sed -E 's/([0-9]{2}\/[0-9]{2})/|\1|/g'
11 Cherrywood Rise Ashford Kent TN25 4QA United Kingdom N B BONE (02/12) 387
Bisham village Bisham Buckinghamshire SL7 1RR United Kingdom Neil Noakes (06/13) 488
6 Kynaston Road London London N16 0EX United Kingdom MR N P SALTMARSH (04/13) 907
116 Long Acre London London WC2E 9SU United Kingdom Lorna J Gradden (11/14) 415
or
sed -E 's/([0-9]{2}\/[0-9]{2})/|\1|/g' t.txt

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Swap two words with regex in a text file using bash - regex

Simple sed solution: cat adr.txt | sed -E 's/^(Street) ([^,]+)/\2 \1/

Related

Detect empty line (contain only space) in perl

Trouble with a sed regex

How to remove unique values from an HTML select list with linux program like sed, awk, or grep?

Regex code for address separated by commas

GNU Sed REGEX find and alter string (no replace)

Categories

Resources