How can I read certain elements from a text file? - c++

I have a text file that has multiple sets of book information (title, author, etc). I need to be able to use a loop to read from the file and assign each piece of info to a corresponding string. I have it working to where I goes through the entire file, it's just messing up while going through the file.
Book readOne(ifstream &fin) {
string titleOne;
getline(fin, titleOne, ',');
string firstOne;
getline(fin, firstOne, ',');
string lastOne;
getline(fin, lastOne, ',');
string formatOne;
getline(fin, formatOne, ',');
string pubDateOne;
getline(fin, pubDateOne, ',');
string priceOne;
getline(fin, priceOne);
Here is the text file:
Gone With the Wind, Margaret Mitchell, Hardcover, 1936, 17.49
The Adventures of Sherlock Holmes, Arthur Doyle, Paperback, 1892, 6.85
The Illustrated A Brief History of Time, Stephen Hawking, Hardcover, 1996, 9.59
Frankenstein, Mary Shelley, Paperback, 1818, 7.99
Command Authority, Tom Clancy, Paperback, 2013, 15.99
Origin, Dan Brown, Ebook, 2017, 14.99
The Lost Order, Steve Berry, Audiobook, 2017, 5.95
The Hunt for Red October, Tom Clacy, Audiobook, 1984, 7.00
Patriot Games, Tom Clancy, Audiobook, 1987, 22.50
The 14th Colony, Steve Berry, Paperback, 2016, 9.99
The Bishop's Pawn, Steve Berry, Ebook, 2018, 14.99
Pride and Prejudice, Jane Austen, Ebook, 1813, 8.99
Sense and Sensibility, Jane Austen, Hardcover, 1811, 19.99
Wuthering Heights, Emily Bronte, Paperback, 1847, 6.99
Jane Eyre, Charlotte Bronte, Hardcover, 1847, 10.95
Anna Karenina, Leo Tolstoy, Paperback, 1877, 5.99
Sahara, Clive Cussler, Ebook, 1992, 5.99
The Notebook, Nicholas Sparks, Hardcover, 1996, 12.59
A Walk to Remember, Nicholas Sparks, Ebook, 1999, 7.99
See Me, Nicholas Sparks, Ebook, 2015, 7.99
The Last Song, Nicholas Sparks, Paperback, 2009, 5.99
The Wedding, Nicholas Sparks, Ebook, 2003, 7.99
My thinking was that it would read until a comma, assign that piece of info to the string, then continue. Instead, it outputs as if it didn't see certain commas.
Gone With the Wind by Margaret Mitchell Hardcover on 1936. Published on 17.49
The Adventures of Sherlock Holmes. It costs $0
The Illustrated A Brief History of Time by Stephen Hawking Hardcover on 1996. Published on 9.59
Frankenstein. It costs $0
Command Authority by Tom Clancy Paperback on 2013. Published on 15.99
Origin. It costs $0
The Lost Order by Steve Berry Audiobook on 2017. Published on 5.95
The Hunt for Red October. It costs $0
Patriot Games by Tom Clancy Audiobook on 1987. Published on 22.50
The 14th Colony. It costs $0
The Bishop's Pawn by Steve Berry Ebook on 2018. Published on 14.99
Pride and Prejudice. It costs $0
Sense and Sensibility by Jane Austen Hardcover on 1811. Published on 19.99
Wuthering Heights. It costs $0
Jane Eyre by Charlotte Bronte Hardcover on 1847. Published on 10.95
Anna Karenina. It costs $0
Sahara by Clive Cussler Ebook on 1992. Published on 5.99
The Notebook. It costs $0
A Walk to Remember by Nicholas Sparks Ebook on 1999. Published on 7.99
See Me. It costs $0
The Last Song by Nicholas Sparks Paperback on 2009. Published on 5.99
The Wedding. It costs $0

This is nothing but a csv (comma separated value) file. And there are ample code samples to read it. Refer this sample.

Related

Perl one liner using substitute with a regex

I have a file that looks like this :
7th Aug 2020 10:18:35 am Bill Smith:
NW: RE: Matt Reid - EUC23284 - INC1020721599
7th Aug 2020 10:22:02 am Bill Smith:
VK: RE: don't think we send the price, pls help check what happened - INC1020721668
7th Aug 2020 11:00:06 am Bill Smith:
*mailbox handover*
7th Aug 2020 11:06:04 am Tom Jones:
BJ - RE: Megan Holleran Unmatched Trader Trades 08/06/2020 17:35 [Restricted - External] INC1020722335
7th Aug 2020 11:07:37 am Tom Jones:
DS - RE: All summit books missing from multiple reports in ICE INC1020722348
7th Aug 2020 12:36:10 pm Tom Jones:
NW - confirm trade receipt for Jon Lett from GFI ID: 1922979 INC1020723352
And I want it to look like this :
7th Aug 2020 10:18:35 am Bill Smith: NW: RE: Matt Reid - EUC23284 - INC1020721599
7th Aug 2020 10:22:02 am Bill Smith: VK: RE: don't think we send the price, pls help check what happened - INC1020721668
7th Aug 2020 11:00:06 am Bill Smith: *mailbox handover*
7th Aug 2020 11:06:04 am Tom Jones: BJ - RE: Megan Holleran Unmatched Trader Trades 08/06/2020 17:35 [Restricted - External] INC1020722335
7th Aug 2020 11:07:37 am Tom Jones: DS - RE: All summit books missing from multiple reports in ICE INC1020722348
7th Aug 2020 12:36:10 pm Tom Jones: NW - confirm trade receipt for Jon Lett from GFI ID: 1922979 INC1020723352
So I run this over the file, the goal is to take the new line off of the string ending with the persons name, followed by a colon. I want to change, in this case "Bill Smith:\n" and "Tom Jones:\n" to "Bill Smith: " and Tom Jones: ". If you look at the one liner, it does not work on the replace.
cat incfile | perl -p -e 's/\w+\s\w+\:\n/\w+\s\w+\:/g'
7th Aug 2020 10:18:35 am w+sw+:NW: RE: Matt Reid - EUC23284 - INC1020721599
7th Aug 2020 10:22:02 am w+sw+:VK: RE: don't think we send the price, pls help check what happened - INC1020721668
7th Aug 2020 11:00:06 am w+sw+:*mailbox handover*
7th Aug 2020 11:06:04 am w+sw+:BJ - RE: Megan Holleran Unmatched Trader Trades 08/06/2020 17:35 [Restricted - External] INC1020722335
7th Aug 2020 11:07:37 am w+sw+:DS - RE: All summit books missing from multiple reports in ICE INC1020722348
7th Aug 2020 12:36:10 pm w+sw+:NW - confirm trade receipt for Jon Lett from GFI ID: 1922979 INC1020723352
You were going for
perl -pe's/(\w+\s\w+:)\n/$1 /'
The substring matched by the first capture (()) is assigned to $1, which you can use in the replacement expression.
The above can be simplified/optimized to
perl -pe's/\w+\s\w+:\K\n/ /'
What matched before \K is "kept" (not replaced), so only the line feed is replaced (with a space).
Alternatively, you could simply replace the line feeds of odd-numbered lines.
perl -pe's/\n/ / if $. % 2'

Python regex re.sub() is not matching and replacing as expected

The following regex isn't replacing substrings as expected.
I've tried running the code with the following modifications (one at a time, of course) all with no luck:
Utilizing list comprehensions (current)
Using a traditional for loop
Adding the regex result back to the iterator itself
Appending the regex result to a new list
Checked the type of 'name' (it's a string)
Utilized (copied) code format from another regex in my notebook that is currently working
Put the regex into regex101.com to verify that it's functioning (you can see the regex and data I'm using here
Adding/removing the raw string indicators preceding the regex and substitution patterns
names is a list of strings
reg_pattern = r"(?!\\s)(\\W[^\\W,]+)(?!,) and\\s([^ ]+ )([^ ]+)"
sub_pattern = r"\\1 \\3 \\2\\3"
cleaned_names = []
cleaned_names = [re.sub(reg_pattern, sub_pattern, name) for name in names]
The goal can be seen in the link above (particularly in the 'substitution' section at the bottom of that page), but ultimately, I need to append group3 of the regex to the end of group1.
I'm guessing that maybe, you're trying to re.sub the couples names, for which you can likely write some expression similar to:
([A-Z][a-z]+)\s+and\s+(.*)([A-Z]\S*)
if you are not having edge cases, if you do then, you'd probably want to modify the char classes, [A-Z], and add those other chars, in there.
Demo
Test
import re
l = ['George Rosario, Ali Jones, Barbara Boll, and Lindsay McKelvoy', 'Jan and Edgar Adelman', 'Bill Mack and Les Lieberman', 'Dr. Susan Muehle-Bussel, Ray Morales, and Dr. Samuel Barker', 'Dan Barroso and Emily High', 'Cassie and George Sorenson', 'Tom Scott and Mark Smith', 'The scene at IDEAL School & Academy’s 10th\xa0Annual Gala.',
'Les Lieberman, Barri Lieberman, Isabel Kallman, Trish Iervolino, and Ron Iervolino', 'Chuck Grodin', 'Diana Rosario, Ali Sussman, Sarah Boll, Jen Zaleski, Alysse Brennan, and Lindsay Macbeth', 'Kelly and Tom Murro', 'Udo Spreitzenbarth', 'Ron Iervolino, Trish Iervolino, Russ Middleton, and Lisa Middleton', 'Barbara Loughlin, Dr. Gerald Loughlin, and Debbie Gelston', 'Julianne Michelle']
e = r'([A-Z][a-z]+)\s+and\s+(.*)([A-Z]\S*)'
l_out = []
for names in l:
if re.match(e, names):
l_out.append(re.sub(e, r'\1 \3 and \2\3', names))
else:
l_out.append(names)
print(l_out)
Output
['George Rosario, Ali Jones, Barbara Boll, and Lindsay McKelvoy', 'Jan
Adelman and Edgar Adelman', 'Bill Mack and Les Lieberman', 'Dr. Susan
Muehle-Bussel, Ray Morales, and Dr. Samuel Barker', 'Dan Barroso and
Emily High', 'Cassie Sorenson and George Sorenson', 'Tom Scott and
Mark Smith', 'The scene at IDEAL School & Academy’s 10th\xa0Annual
Gala.', 'Les Lieberman, Barri Lieberman, Isabel Kallman, Trish
Iervolino, and Ron Iervolino', 'Chuck Grodin', 'Diana Rosario, Ali
Sussman, Sarah Boll, Jen Zaleski, Alysse Brennan, and Lindsay
Macbeth', 'Kelly Murro and Tom Murro', 'Udo Spreitzenbarth', 'Ron
Iervolino, Trish Iervolino, Russ Middleton, and Lisa Middleton',
'Barbara Loughlin, Dr. Gerald Loughlin, and Debbie Gelston', 'Julianne
Michelle']
Or you can try
import re
l = ['George Rosario, Ali Jones, Barbara Boll, and Lindsay McKelvoy', 'Jan and Edgar Adelman', 'Bill Mack and Les Lieberman', 'Dr. Susan Muehle-Bussel, Ray Morales, and Dr. Samuel Barker', 'Dan Barroso and Emily High', 'Cassie and George Sorenson', 'Tom Scott and Mark Smith', 'The scene at IDEAL School & Academy’s 10th\xa0Annual Gala.',
'Les Lieberman, Barri Lieberman, Isabel Kallman, Trish Iervolino, and Ron Iervolino', 'Chuck Grodin', 'Diana Rosario, Ali Sussman, Sarah Boll, Jen Zaleski, Alysse Brennan, and Lindsay Macbeth', 'Kelly and Tom Murro', 'Udo Spreitzenbarth', 'Ron Iervolino, Trish Iervolino, Russ Middleton, and Lisa Middleton', 'Barbara Loughlin, Dr. Gerald Loughlin, and Debbie Gelston', 'Julianne Michelle']
e = r'([A-Z][a-z]+)\s+and\s+(.*)([A-Z]\S*)'
l_out = []
for names in l:
if re.match(e, names):
l_out.append(re.sub(e, r'\1 \3', names))
l_out.append(re.sub(e, r'\2\3', names))
else:
l_out.append(names)
print(l_out)
Output
['George Rosario, Ali Jones, Barbara Boll, and Lindsay McKelvoy', 'Jan
Adelman', 'Edgar Adelman', 'Bill Mack and Les Lieberman', 'Dr. Susan
Muehle-Bussel, Ray Morales, and Dr. Samuel Barker', 'Dan Barroso and
Emily High', 'Cassie Sorenson', 'George Sorenson', 'Tom Scott and Mark
Smith', 'The scene at IDEAL School & Academy’s 10th\xa0Annual Gala.',
'Les Lieberman, Barri Lieberman, Isabel Kallman, Trish Iervolino, and
Ron Iervolino', 'Chuck Grodin', 'Diana Rosario, Ali Sussman, Sarah
Boll, Jen Zaleski, Alysse Brennan, and Lindsay Macbeth', 'Kelly
Murro', 'Tom Murro', 'Udo Spreitzenbarth', 'Ron Iervolino, Trish
Iervolino, Russ Middleton, and Lisa Middleton', 'Barbara Loughlin, Dr.
Gerald Loughlin, and Debbie Gelston', 'Julianne Michelle']
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.

Turning my web scrape data into an array

The following code below pulls all the information I want; however, I want it to be sorted into an array so that each phone number is paired with the corresponding name, address, and description. I can't figure out a way to indent it to make it pull all 38 entries. Any help would be appreciated!
#import libraries
from selenium import webdriver
import csv
#driver path
driver = webdriver.Chrome('C:\Python27\Chromedriver\chromedriver.exe')
#fetch top Amsterdam restaurants
driver.get('http://www.eater.com/maps/best-amsterdam-restaurants')
for elem in driver.find_elements_by_xpath('.//h2[span[#class = "c-mapstack__card-index"]]'):
restname = elem.text
for address in driver.find_elements_by_class_name('c-mapstack__address'):
restaddress = address.text
for content in driver.find_elements_by_class_name('c-entry-content'):
restdescrip = content.text
eaterarray = [restname, restaddress, restdescrip]
print eaterarray
I am aware the indenting isn't right, and I've tried several configurations but I can't seem to get it to loop right in any configuration.
First of all i would like to inform you that if you do not want to provide the path of chromedriver in each script, just paste the "chromedriver.exe" under scripts folder of python..i.e,"C:\Python27\Scripts"
try this code, it will solve your problem:
from selenium import webdriver
driver = webdriver.Chrome()
driver.maximize_window()
#fetch top Amsterdam restaurants
driver.get('http://www.eater.com/maps/best-amsterdam-restaurants')
a=[]
b=[]
c=[]
for elem in driver.find_elements_by_xpath('.//h2[span[#class = "c-mapstack__card-index"]]'):
restname = elem.text.encode('ascii', 'ignore')
a.append(restname)
for address in driver.find_elements_by_class_name('c-mapstack__address'):
restaddress = address.text.encode('ascii', 'ignore').strip()
b.append(restaddress)
for content in driver.find_elements_by_class_name('c-entry-content'):
restdescrip = content.text.encode('ascii', 'ignore').strip()
c.append(restdescrip)
q=[(x,y) for x,y in zip(b, b[1:]) if '+31' in y]
q.insert(21,'Raadhuisstraat Amsterdam, Netherlands')
q.insert(25,'Leidsestraat 94 Amsterdam, North Holland 1017 PE, Netherlands')
d=c[1:]
new_dict= dict((a[i], (d[i],q[i])) for i in range(len(a)))
for k, v in new_dict.iteritems():
print k , v
it will print the output as:
22 Haringstal Ab Kromhout ("Contrary to popular belief, Dutch herring is not raw but salt-cured although the complex curing process does give it a raw finish on the tongue. First of the season herring are called Hollandse nieuwe and are usually available starting in early June. You can find herring stalls all over the city, but Haringstal Ab Kromhout come highly recommended. Order one au naturel or go for the traditional raw chopped onion and pickle accompaniment. Can't make it to Ab Kromhout? Kras Haring on Wittenburgergracht is also an excellent option. [$]", 'Raadhuisstraat Amsterdam, Netherlands')
25 Foodhallen ('Formerly a tram depot, De Foodhallen is now the place to get a taste of the Dutch street food scene. Theres something for everyone here: grilled cheese sandwiches (at Caulils), a bitterbal tasting (at De Ballenbar), burgers (at the Butcher), hotdogs (at Bulls & Dogs), Vietnamese street food (at Viet View), BBQ pork (at the Rough Kitchen), sweet tartlets (at Le Petit Gateau), Mediterranean snacks (at Maza), and lots more. [$$]', ('Bellamyplein 51\n1053 AT Amsterdam, Netherlands', '+31 6 29265037'))
3 Rotisserie Rijsel ('Rijsel serves Flemish and French classics like boeuf la mode, huzarensalade (Russian salad), presskop (head cheese), and rotisserie poussin, all prepared with the finest ingredients. This combined with a well-chosen and well-priced wine selection has put Rijsel on everybodys favourite list since its opening in 2012. Booking ahead is essential and (if on offer) dont think twice about ordering the Cte de Boeuf. [$$$]', ('Marcusstraat 52b\nAmsterdam, NoordHolland 1091TK, Netherlands', '+31 20 463 2142'))
9 Bord'eau - Restaurant Gastronomique ('If you can afford it, head to the two-Michelin-starred Bordeau for the ultimate fine-dining experience. Here, chef Richard van Oostenbrugge wows his guests with his incredibly skilled, classic technique-based cooking. Expect the finest produce, maximum flavors, exquisite sauces, and picture-perfect plates. In fact, Bordeaus signature apple dessert is the most photographed/Instagrammed dessert in Amsterdam. [$$$$]', ('Nieuwe Doelenstraat 2\nAmsterdam, North Holland 1012 CP, Netherlands', '+31 20 531 1705'))
10 Oriental City ('Located in Amsterdams Wallen area, Oriental City is a firm favorite with locals and Amsterdams Chinese community alike. Youll be tempted by much of the extensive menu, and Oriental Citys dim sum is among the best in Amsterdam. The restaurant has many tables divided over two floors, but still be prepared to stand in line on Saturdays. [$$$]', ('Oudezijds Voorburgwal 177-179\nAmsterdam, North Holland 1012 EV, Netherlands', '+31 20 626 8352'))
4 La Rive ('An Amsterdam fine-dining institution since the early 90s (and once the home of renowned Dutch chef Robert Kranenborg), since 2008, Rogr Rassin has been at the helm of La Rives kitchen. Dont be fooled by the traditional, ever-so-slightly formal dining room, because, au contraire, Rassins cooking is deliciously modern and seasonal. The dinner-only restaurant has a unique riverside location, so try to book a window table. [$$$$]', ('Professor Tulpplein 1\nAmsterdam, North Holland 1018 GX, Netherlands', '+31 20 520 3264'))
11 Restaurant Gebr. Hartering ('Part of Amsterdams new wave of casual and unpretentious restaurants, Gebr Hartering has helped shape the citys lively dining scene. The eatery is run by brothers Paul and Niek Hartering and the concept is very simple: hearty food cooked with great ingredients, to be enjoyed with a glass of fine wine. Theres a daily-changing menu, which includes the big-hitter Fleckvieh beef, grilled on charcoal. [$$$]', ('Peperstraat 10I\n1011 TL Amsterdam, Netherlands', '+31 20 421 0699'))
12 Nam Kee ('One of Amsterdams longest-running Chinese restaurants, Nam Kee is best known for its Peking duck window display and famous for its steamed oysters in black bean sauce fantastic oysters that owe their fame to the Dutch film (and novel) Oysters at Nam Kees. [$$$]', ('Zeedijk 111-113\nAmsterdam, North Holland 1012 AV, Netherlands', '+31 20 624 3470'))
15 Gebroeders Niemeijer ('Start your day with a cup of coffee (featuring Costadora beans) and a freshly-baked croissant at Gebr. Niemeijer bakery, or order one of its French-style breakfasts, with petit pains, croissants, marmalade, and jam. At lunchtime, Gebr. Niemeijer serves simple sandwiches and salads. Theres also a great selection of baked goods, and dont miss out on their baguettes (you know, for that Vondelpark picnic). [$]', ('Nieuwendijk 35\nAmsterdam, North Holland, Netherlands', '+31 20 707 6752'))
17 Toscanini ('Toscanini is the most-loved Italian restaurant in Amsterdams Jordaan, and with its 30-year history, probably one of the oldest, too. No pizzas here: Instead, expect a proper (seasonal) Italian menu with a choice of antipasti, primi, secondi, and dolci. Toscanini offers non-fussy food with great ingredients and maximum flavor, served in a wonderfully bustling setting. Its a great dinner spot and theres an excellent wine list, too. [$$$]', ('Lindengracht 75\nAmsterdam, North Holland 1015 KD, Netherlands', '+31 20 623 2813'))
26 FEBO ('Amsterdam is famous for its deep-fried snacks like kroket and bitterballen (both similar to croquettes) and frikandel, a type of sausage. At 75-year-old fast food chain Febo you can buy these snacks from an automat. There are branches scattered all over the city, so it shouldnt be too difficult to get your teeth into a frikandel or a kaassouffl, a pocket of deep-fried cheese. On Fridays and Saturdays some branches are open until 4 a.m., perfect for your wee-hour drunken munchies. [$]', 'Leidsestraat 94 Amsterdam, North Holland 1017 PE, Netherlands')
14 Restaurant Stork ('Hop on the IJplein ferry (near Central Station) for lunch or dinner at Stork, housed in a former Stork engines factory building on the north banks of the river IJ. Order sole or lobster with fries or tuck into a delicious plateau fruit de mer and enjoy the great views of the river. Storks riverside terrace offers a wonderful al fresco dining experience. [$$]', ('Gedempt Hamerkanaal 201\nAmsterdam, North Holland 1021 KP, Netherlands', '+31 20 634 4000'))
19 Proeflokaal Arendsnest ('For a taste of the burgeoning Dutch craft beer scene, get yourself a seat at the bar at Arendsnest. At this canal-side beer bar on the Herengracht, you can try over 30 Dutch beers on tap and no fewer than 100 bottled beers. Youll be spoiled with choices, but do try one of Jopen Brewerys award-winning beers, particularly the Extra Stout, which won a gold medal in the 2015 World Beer Awards. [$]', ('Herengracht 90\nAmsterdam, North Holland 1015 BS, Netherlands', '+31 20 421 2057'))
30 Thrill Grill ('With its first-rate burgers, Thrill Grill has rapidly become a household name for real burger lovers. Thrill Grill is the brainchild of veteran chef Robert Kranenborg, a local legend. The meat is from old Dutch dairy cows and cooked medium-rare. Get your teeth into a classic beef thriller or go for the salmon or veggie falafel burger. The branch on the Gerard Doustraat provides particularly lovely ambiance. [$$]', ('Gerard Doustraat 98\nAmsterdam, North Holland 1072VX, Netherlands', '+31 20 760 6750'))
27 Patisserie Holtkamp ('A family-owned pastry shop where Amsterdam locals go for their sweet treats, expect Patisserie Holtkamp to offer a small but superb range of French and Dutch patisserie, cakes, chocolates, and biscuits (no cupcakes here!). Holtkamp is also famous for its veal, shrimp, and cheese kroketten (croquettes), which are deep-fried to order in the shop. [$]', ('Vijzelgracht 15\nAmsterdam, North Holland 1017, Netherlands', '+31 20 624 8757'))
2 Brouwerij 't IJ ('This Amsterdam brewery has a unique canal-side location, right next to an old windmill, and the outdoor terrace is a popular hangout on sunny days. Around seven beers are available on tap, including the classic Zatte and Natte and often a special seasonal brew, too. A small selection of bar snacks is on offer, including the traditional Dutch Ossenworst, a raw and smoked beef sausage. [$]', ('Funenkade 7\nAmsterdam, North Holland 1018 AL, Netherlands', '+31 20 320 1786'))
24 Fromagerie Kef ('Cheesemonger Fromagerie Abraham Kef supplies many Michelin-starred restaurants in Amsterdam with cheese. The original shop (est. 1953) is on the Marnixstraat, but since 2014, a second branch also operates on the Czaar Peterstraat. On Sundays the Marnixstraat branch regularly organizes cheese and wine tastings. Kefs fantastic cheese selection (mainly made from raw milk) includes some magnificent aged Dutch cheeses. Dont leave without some Remeker. [$]', ('Marnixstraat\nAmsterdam, North Holland 1016 TJ, Netherlands', '+31 20 420 0097'))
18 Cafe De Klepel ('Quality wines and bistro food take the spotlight at Caf De Klepel, part of the recent Dutch bistronomie movement. This friendly and popular place is run by young sommelier duo Margot Los and Job Seuren (formerly of De Librije). Pop in for a glass of wine (at the bar) with some charcuterie or cheese. For the full experience, book a table and order De Klepels three-or four-course menu. [$$]', ('Prinsenstraat 22\nAmsterdam, North Holland 1015 DD, Netherlands', '+31 20 623 8244'))
36 Yamazato ('Yamazato provides an unexpected slice of Japan in the Dutch capital, including dining room views of a Japanese garden with a koi pond. In the evenings, the Michelin-starred Yamazato which is also in the Hotel Okura offers authentic kaiseki tasting menus, but you can also step in for lunch and order a bento box or the great value lunch menu (five courses for 50). An la carte menu including sushi and sashimi is available, too. [$$$-$$$$]', ('Ferdinand Bolstraat 333\nAmsterdam, North Holland 1072 LH, Netherlands', '+31 20 678 8351'))
38 Ron Gastrobar ('Amsterdams thriving dining scene owes a lot to Ron Blaauw. Three years ago, he relaunched his two-star restaurant into the more casual and wallet-friendly Ron Gastrobar, leaving his fine-dining years behind and at the same time launching a new trend. In fact, Michelin Netherlands is even talking about the Ron Blaauw effect. All dishes are priced at 15 (desserts 9) and the restaurant is lauded for its dry-aged barbecue steaks. [$$$]', ('Sophialaan 55 hs\nAmsterdam, North Holland 1075 BP, Netherlands', '+31 20 496 1943'))
13 Choux ('Relatively new on the Amsterdam dining scene but booked solid for dinner every night Choux serves natural wines and light, fresh cuisine, the latter always with a touch of comfort. Order three, four, or seven courses from the monthly-changing menu by chef Merijn van Berlo (including an excellent vegetarian option). For those who fail to snag a seat at dinner, theres also a three- or four- course menu available at lunchtime. [$$$]', ('De Ruijterkade 128\nAmsterdam, North Holland, Netherlands', '+31 6 16512364'))
20 La Perla ('If youre in the mood for a pizza, La Perla in the Jordaan is the place to go. The restaurant is split in two, with the pizzeria on one side of the street, and the huge wood-fired oven on the other. Try the classic Margherita (with buffalo mozzarella) or order the special porchetta di Ariccia, made with oven-roasted pork. [$$]', ('Tweede Tuindwarsstraat 14 & 53\nAmsterdam, North Holland 1015 RZ, Netherlands', '+31 20 624 8828'))
6 Slagerij de Leeuw ('Head to this butchers shop/deli if youre planning to cook a meal in your rental apartment. De Leeuw, the only gourmet butch shop in Amsterdam, offers a wide range of top-quality fresh meat and poultry, such as Wagyu and Rubia Gallega beef, Iberico pork, and Bresse chicken. But for your gourmet picnic, theres also a great selection of charcuterie, cold meats, pats, and other ready-made delicacies. [$$]', ('Utrechtsestraat 92\nAmsterdam, North Holland 1017 VS, Netherlands', '+31 20 623 0235'))
7 Librije's Zusje Amsterdam ('Literally the young sibling (zusje means little sister) of Jonnie Boers three-star restaurant De Librije in Zwolle, Librijes Zusje is located in the stunning Waldorf Astoria Hotel. Executive chef and De Librije alumnus Sidney Schutte has a modern and cutting-edge style of cooking, which shines through in all the dishes. The tasting menu has a hefty price tag, but its worth every cent. [$$$$]', ('Herengracht 542-556\nAmsterdam, North Holland 1017 CG, Netherlands', '+31 20 718 4643'))
34 Twenty Third Bar ('The Dutch cocktail scene is small but growing. The best option, if only for the amazing views, is Twenty Third Bar, situated on the 23rd floor of the Hotel Okura. The extensive cocktail list primarily features classics priced at 15 (champagne cocktails 19.50), and theres a small bar snack menu. Okuras notoriously expensive two-Michelin-starred restaurant Ciel Bleu is located on the same floor. [$$]', ('Ferdinand Bolstraat 333\nAmsterdam, North Holland, Netherlands', '+31 20 678 7450'))
16 Gs ('This funky place is an ideal spot for an American-style brunch which has grown increasingly popular here in recent years and provides respite for those in desperate need of a hangover Bloody Mary. Gs serves a full range of egg dishes; its chicken waffle burger is somewhat famous among locals. The Bloody Mary menu offers no fewer than 13 different versions. Gs has two branches. Consider booking a seat online in advance.', ('Goudsbloemstraat 91\nAmsterdam, North Holland 1015 JK, Netherlands', '+31 20 362 0030'))
8 Guts and Glory ('Guts & Glory a lively, stripped-down place just off Rembrandt Square opened by the super-talented chefs Guillaume de Beer and Freek van Noortwijk and their partner Johanneke van Iwaarden is one of the hottest places to eat in Amsterdam. Its signature is the single-ingredient menu called chapter, which changes every two to three months. After Chicken, Fish, Beef, Pork, and Vegetarian, de Beer and van Noortwijk will soon embark on chapter six: Italian. [$$-$$$]', ('Utrechtsestraat 6\nAmsterdam, North Holland, Netherlands', '+31 20 362 0030'))
31 Le Garage ('This iconic restaurant was founded by restaurateur Joop Braakhekke in 1990 in a former garage. Its famous for being a celebrity haunt, but perhaps equally famous for its dramatic red and black decor that hasnt changed since opening. Le Garage has a heavily French influenced menu (steak tartare, canard la presse, le flottante), but theres also room for modern dishes (squid carbonara, tuna pizza). [$$$]', ('Ruysdaelstraat 54-56\n1071 XE Amsterdam, Netherlands', '+31 20 679 7176'))
23 Broodjeszaak t Kuyltje ('Leisurely lunches are not part of everyday life in the Netherlands, but the Dutch do like a good sandwich, preferably on the go. The best place to get a taste of a Dutch-style sandwich is t Kuyltje. People queue up for its pastrami sandwich, but equally delicious is the Tartaar Speciaal (minced raw beef, onion, hardboiled egg) or the Halfom sandwich (half corned beef, half liver). [$]', ('Gasthuismolensteeg 9\nAmsterdam, North Holland 1016 AM, Netherlands', '+31 20 620 1045'))
35 BAK restaurant ('Bak is a pop-up turned brick-and-mortar restaurant located on the banks of the river IJ in Amsterdams recently re-developed Westelijk Havengebied area. In short: Expect serious food and serious wine, served in a laid-back setting. The menu reflects chef Benny Blistos love for seasonal and local ingredients, and on the wine list you can expect natural wines and quirky grape varieties. For lunch, BAK offers a very affordable three-course menu for 27. [$$]', ('Van Diemenstraat 410\nAmsterdam, North Holland 1013 CR, Netherlands', '+31 20 737 2553'))
21 Restaurant Breda ('A more upscale restaurant by game-changing chefs Guillaume de Beer and Freek van Noortwijk (compared to their other restaurant Guts & Glory, anyway), Bredas opening was greeted by widespread critical acclaim. Dishes are modern with creative flavor combinations, and you can taste the ambition of these young chefs. Sit down for dinner and order the Basic, Extra, or Full Monty tasting menu, and enjoy fine wines selected by sommelier Johanneke van Iwaarden. Its open daily for lunch and dinner. [$$$]', ('Singel 210\nAmsterdam, North Holland, Netherlands', '+31 20 622 5233'))
37 Restaurant Blauw ('Restaurant Blauw is an Indonesian spot renowned for its rijsttafel, which is the thing to order. Rijsttafel is a table-filling feast of small dishes, rice, and condiments, a hybrid Dutch-Indonesian tradition that originated during the Dutch colonial era. Theres a vegetarian rijsttafel option, and you can also order more traditional Indonesian dishes from the a la carte menu. Arrive hungry! [$$]', ('Amstelveenseweg 158-160\nAmsterdam, North Holland 1075 XN, Netherlands', '+31 20 675 5000'))
32 Par Hasard ('Herring and cheese aside, most people also think of fries when thinking of Amsterdam. The Belgian-style double-baked fries at Par Hasard (meaning: by accident) are regarded by many as the best fries in town. Grab an order with a traditional topping of mayonnaise, satay sauce, or zoervleis (a type of beef stew). [$]', ('Ceintuurbaan 113-115\nAmsterdam, North Holland 1072 EZ, Netherlands', '+31 20 471 4052'))
29 Conservatorium Brasserie & Lounge ('With its immense floor-to-ceiling windows and glass ceiling, this is hands-down the most impressive lobby-cum-all-day-dining-room in Amsterdam an essential part of the total experience at this cosmopolitan hotel. Enjoy drinks and snacks in the lounge area or go to the brasserie for lunch or dinner. Standout dishes include veal cheeks with mac and cheese, lobster au gratin, and apple crumble. Theres also a selection of sandwiches and steaks. [$$$]', ('Van Baerlestraat 27\nAmsterdam, North Holland 1070 LP, Netherlands', '+31 20 570 0000'))
1 Merkelbach ("Located in a former 18th-century coach house, Merkelbach's spectacular garden is hands-down the best outdoor dining experience in Amsterdam. The restaurant prides itself on following the principles of the Slow Food movement, so expect a seasonal menu with local ingredients. During the day you can walk in for coffee and apple pie, and theres a compact lunch menu. [$$]", ('Middenweg 72\nAmsterdam, North Holland 1097 BS, Netherlands', '+31 20 423 3930'))
28 Rijks at the Rijksmuseum ('Rijks brings a fresh approach to museum dining (its housed in the Rijksmuseum of Dutch art and history). On the menu designed by chef Joris Bijdendijk (formerly of the three-Michelin-starred Le Jardin de Sens and the one-starred Bridges) and his team find inventive small plates. Also featured are dishes by guest-chefs who cooked at Rijks, like Andr Chiang and Tim Raue. Definitely order the spit-roasted celeriac. [$$$]', ('Museumstraat 1\nAmsterdam, North Holland 1071 XX, Netherlands', '+31 20 674 7000'))
33 The Fat Dog ('For hot dogs, look no further than the Fat Dog, Amsterdams first-ever hot dog joint, opened by acclaimed chef/restaurateur Ron Blaauw in 2014. Order an all-pork frank with sauerkraut, mustard, and onion marmalade (called Gangs of New York) or go for the chicken Gado Gado hot dog with satay sauce, cabbage, and serundeng (spiced coconut flakes). Innovation doesnt stop there: The lamb dog comes with baba ganoush, and theres also a veggie dog. [$]', ('Ruysdaelkade 251\nAmsterdam, North Holland 1072 AX, Netherlands', '+31 20 221 6249'))
5 Patisserie Kuyt ('Follow locals and food obsessives from near and far to this fabulous patisserie for the finest pies, cakes, chocolates, biscuits, and eclairs. Kuyt also has a good selection of delicate savory pastries, quiches, and biscuits. The choice is overwhelming, but dont leave without a Appelschnitt, or better yet, enjoy any of the beautiful and delicious baked goods in the tea room. [$]', ('Utrechtsestraat 109\nAmsterdam, North Holland 1017 VL, Netherlands', '+31 20 623 4833'))
hope this is what you want

Space Formatting data to csv

For quite some time I have been trying to format space separated data to a CSV structure.
Initial position
The initial data table is given by:
Dr. Arun Raykar MBBS, MS - ENT 9 years experience Ear-Nose-Throat (ENT) Specialist SHAKTHI E.N.T CARE Malleswaram, Bangalore INR 250 MON-SAT7:00PM-9:00PM Book Appointment
Dr. Hema Sanath C BHMS, CFN 0 years experience Homeopath Sankirana Homeopathic Clinic Kalyan Nagar, Bangalore INR 250 MON-SAT10:00AM-2:00PM6:30PM-8:00PM Book Appointment
Dr. Hema Ahuja BDS,M Phil 33 years experience Dentist V2 E City Family Dental Center Electronics City, Bangalore INR 200 MON-SUN10:00AM-8:00PM Book Appointment
It contains lots of spaces and unnecessary information throughout. The information is present somewhat like this
Doctor's name | Degree | Years of experience | Specialization | Hospital name | Address | Fees | Schedule | and an unnecessary book appointment field.
I want to convert it to the following format
Doctor's name,Specialization,Hospital name,Address,Fees,Schedule
So the current data should look like this
Dr. Arun Raykar,Ear-Nose-Throat (ENT) Specialist,SHAKTHI E.N.T CARE,Malleswaram,INR 250,MON-SAT7:00PM-9:00PM
Dr. Hema Sanath,Homeopath,Sankirana Homeopathic Clinic,Kalyan Nagar,INR 250,MON-SAT10:00AM-2:00PM6:30PM-8:00PM
Dr. Hema Ahuja,Dentist,V2 E City Family Dental Center,Electronics City,INR 200,MON-SUN10:00AM-8:00PM
Till now I have succeeded in removing the Book Appointment field.
Problem
However I am facing difficulties in classifying the hospital's name. As the spacing in it varies a lot. Is this problem feasible?
EDIT
The output of cat -A file is the following:
Dr. Arun Raykar MBBS, MS - ENT 9 years experience Ear-Nose-Throat (ENT) Specialist SHAKTHI E.N.T CARE ^I Malleswaram, Bangalore INR 250 MON-SAT7:00PM-9:00PM Book Appointment $
Dr. Hema Sanath C BHMS, CFN 0 years experience Homeopath Sankirana Homeopathic Clinic ^I Kalyan Nagar, Bangalore INR 250 MON-SAT10:00AM-2:00PM6:30PM-8:00PM Book Appointment $
Dr. Hema Ahuja BDS,M Phil 33 years experience Dentist V2 E City Family Dental Center ^I Electronics City, Bangalore INR 200 MON-SUN10:00AM-8:00PM Book Appointment
There's no straightforward way to separate the specialization from the hospital name, but with some assumptions, you could perhaps use perl to do this:
perl -pe 's/^(\S+\s+\S+\s+\S+).+experience\s([^\t]+?)\s+(\b[A-Z0-9]{2}[^\t]+?|(?:(?!\b[A-Z0-9]{2})[^\t])*)\s+\t\s+([^,]+,).+?(INR.+?PM)\s+.*/\1,\2,\3,\4\5/' file
Gives:
Dr. Arun Raykar,Ear-Nose-Throat (ENT) Specialist,SHAKTHI E.N.T CARE,Malleswaram,INR 250 MON-SAT7:00PM-9:00PM
Dr. Hema Sanath,Homeopath,Sankirana Homeopathic Clinic,Kalyan Nagar,INR 250 MON-SAT10:00AM-2:00PM6:30PM-8:00PM
Dr. Hema Ahuja,Dentist,V2 E City Family Dental Center,Electronics City,INR 200 MON-SUN10:00AM-8:00PM
And since it's perl based regex, you can use regex101 to get a glimpse of how it works through the regex debugger. The regex is quite straightforward, but the fact that there are many parts can make it appear daunting.
Warning: The above is able to separate the specialization based on two things:
It tries to find the first occurrence of space followed by two uppercase characters or digits and starts matching as the hospital name when it finds it; or
If there are no consecutive uppercase characters or digits, it takes only the first word as the specialization and the rest as the hospital name.
I know it might not solve the complete problems as there are always lines that won't fit the above rules, but that can get you started on cleaning these up. If there is anything incorrectly separated (i.e. when the specialization consists of more than 1 word and the hospital name doesn't have two consecutive upper/digit) you will have one word of the specialization correctly placed, and the rest in the hospital name.
Unfortunately, based on your input, there's no way to separate specialisation with hospital name. The other fields can be captured, albeit inelegantly and with gawk (probably >= 4.0, but I think 3.x should work):
$ awk -F" \t " -v OFS="," -v S=" " '
{
sub(/\s+$/, "");
split($2, Data, /[ ,]{2,}/);
Address = Data[1];
split($2, Data, / +/);
nData = length(Data);
Schedule = Data[nData - 2];
Fees = Data[nData - 4] S Data[nData - 3];
split($1, Data, / +/);
Name = Data[1] S Data[2] S Data[3]; # assume all names are Dr. Xxx Xxx only
match($1, /[0-9]+ years experience /);
SpecializationHospital = substr($1, RSTART + RLENGTH);
print Name, SpecializationHospital, Address, Fees, Schedule;
} ' data.txt
Dr. Arun Raykar,Ear-Nose-Throat (ENT) Specialist SHAKTHI E.N.T CARE,Malleswaram,INR 250,MON-SAT7:00PM-9:00PM
Dr. Hema Sanath,Homeopath Sankirana Homeopathic Clinic,Kalyan Nagar,INR 250,MON-SAT10:00AM-2:00PM6:30PM-8:00PM
Dr. Hema Ahuja,Dentist V2 E City Family Dental Center,Electronics City,INR 200,MON-SUN10:00AM-8:00PM

GNU Sed format USA address to Street, City, State, Zip

I have the following data for some of my customers:
719 13th Street East, Glencoe MN, 55336
626 Valley Road, Montclair NJ, 07043
666 EAST DYER ROAD, SANTA ANA CA, 92705
20800 N. 135th Ave, Sun City West AZ, 85375
9775 Herring Gull Drive, Indianapolis IN, 46280
712 21st Street, Vero Beach FL, 32960
PO BOX 324, PORT SALERNO FL, 34992
207 Middleton Road, Lafayette LA, 70503
5091 nw fiddle leaf ct, port saint lucie FL, 34986
347 Mayberry Lane, Dover DE, 19904
2648 SW 137th Ave, Miramar FL, 33027
4410 Williams Dr SUITE 104, Georgetown TX, 78628
17020 Windsor Court, Homer Glen IL, 60491
11 Technology Drive North, Warren NJ, 07059
655 Boylston St, Boston MA, 02116
1375 bishops terrace, wixom MI, 48393
4705 Center Blvd Apt. 808, Long Island City NY, 11109
5340 CORNELIA HWY, ALTO GA, 30510
1541 Paces Ferry North, Smyrna GA, 30080
603 west pacific coast hwy, wilmington CA, 90744
2503Paddock CT, Louisville KY, 40216
9421 Dunbar dr, Oakland CA, 94603
1804 Third Avenue Apt #8, New York NY, 10029
2504 bellaire st, wantagh NY, 11793
1380 avon lane apt 21, north lauderdale FL, 33068
How can I use SED regex to format it like
Street Address|City|State|Zip
eg.
719 13th Street East|Glencoe|MN|55336
626 Valley Road|Montclair|NJ|07043
666 EAST DYER ROAD|SANTA ANA|CA|92705
Thanks!
sed 's/^\(.*\), *\(.*\) \(..\), \([0-9][0-9][0-9][0-9][0-9]\)/\1|\2|\3|\4/'
or:
sed -r 's/^(.*), *(.*) (..), ([0-9]{5})/\1|\2|\3|\4/'
Output:
719 13th Street East|Glencoe|MN|55336
626 Valley Road|Montclair|NJ|07043
666 EAST DYER ROAD|SANTA ANA|CA|92705
20800 N. 135th Ave|Sun City West|AZ|85375
9775 Herring Gull Drive|Indianapolis|IN|46280
712 21st Street|Vero Beach|FL|32960
PO BOX 324|PORT SALERNO|FL|34992
207 Middleton Road|Lafayette|LA|70503
5091 nw fiddle leaf ct|port saint lucie|FL|34986
347 Mayberry Lane|Dover|DE|19904
2648 SW 137th Ave|Miramar|FL|33027
4410 Williams Dr SUITE 104|Georgetown|TX|78628
17020 Windsor Court|Homer Glen|IL|60491
11 Technology Drive North|Warren|NJ|07059
655 Boylston St|Boston|MA|02116
1375 bishops terrace|wixom|MI|48393
4705 Center Blvd Apt. 808|Long Island City|NY|11109
5340 CORNELIA HWY|ALTO|GA|30510
1541 Paces Ferry North|Smyrna|GA|30080
603 west pacific coast hwy|wilmington|CA|90744
2503Paddock CT|Louisville|KY|40216
9421 Dunbar dr|Oakland|CA|94603
1804 Third Avenue Apt #8|New York|NY|10029
2504 bellaire st|wantagh|NY|11793
1380 avon lane apt 21|north lauderdale |FL|33068
Try with this:
sed -e 's/\([A-Z]*\) \([A-Z][A-Z]\),/\1\|\2,/g' -e 's/, /\|/g'
it gets all , and subtitutes to |. Prior to that, searches for AAAA AA, and changes it to AAAA|AA, for the City|State part.
Test
$ sed -e 's/\([A-Z]*\) \([A-Z][A-Z]\),/\1\|\2,/g' -e 's/, /\|/g' your_file
719 13th Street East|Glencoe|MN|55336
626 Valley Road|Montclair|NJ|07043
666 EAST DYER ROAD|SANTA ANA|CA|92705
20800 N. 135th Ave|Sun City West|AZ|85375
9775 Herring Gull Drive|Indianapolis|IN|46280
712 21st Street|Vero Beach|FL|32960
PO BOX 324|PORT SALERNO|FL|34992
207 Middleton Road|Lafayette|LA|70503
5091 nw fiddle leaf ct|port saint lucie|FL|34986
347 Mayberry Lane|Dover|DE|19904
2648 SW 137th Ave|Miramar|FL|33027
4410 Williams Dr SUITE 104|Georgetown|TX|78628
17020 Windsor Court|Homer Glen|IL|60491
11 Technology Drive North|Warren|NJ|07059
sed -e 's/, /|/g' -e 's/ \([^ ]\+\)$/|\1/' file