Turning my web scrape data into an array - python-2.7
The following code below pulls all the information I want; however, I want it to be sorted into an array so that each phone number is paired with the corresponding name, address, and description. I can't figure out a way to indent it to make it pull all 38 entries. Any help would be appreciated!
#import libraries
from selenium import webdriver
import csv
#driver path
driver = webdriver.Chrome('C:\Python27\Chromedriver\chromedriver.exe')
#fetch top Amsterdam restaurants
driver.get('http://www.eater.com/maps/best-amsterdam-restaurants')
for elem in driver.find_elements_by_xpath('.//h2[span[#class = "c-mapstack__card-index"]]'):
restname = elem.text
for address in driver.find_elements_by_class_name('c-mapstack__address'):
restaddress = address.text
for content in driver.find_elements_by_class_name('c-entry-content'):
restdescrip = content.text
eaterarray = [restname, restaddress, restdescrip]
print eaterarray
I am aware the indenting isn't right, and I've tried several configurations but I can't seem to get it to loop right in any configuration.
First of all i would like to inform you that if you do not want to provide the path of chromedriver in each script, just paste the "chromedriver.exe" under scripts folder of python..i.e,"C:\Python27\Scripts"
try this code, it will solve your problem:
from selenium import webdriver
driver = webdriver.Chrome()
driver.maximize_window()
#fetch top Amsterdam restaurants
driver.get('http://www.eater.com/maps/best-amsterdam-restaurants')
a=[]
b=[]
c=[]
for elem in driver.find_elements_by_xpath('.//h2[span[#class = "c-mapstack__card-index"]]'):
restname = elem.text.encode('ascii', 'ignore')
a.append(restname)
for address in driver.find_elements_by_class_name('c-mapstack__address'):
restaddress = address.text.encode('ascii', 'ignore').strip()
b.append(restaddress)
for content in driver.find_elements_by_class_name('c-entry-content'):
restdescrip = content.text.encode('ascii', 'ignore').strip()
c.append(restdescrip)
q=[(x,y) for x,y in zip(b, b[1:]) if '+31' in y]
q.insert(21,'Raadhuisstraat Amsterdam, Netherlands')
q.insert(25,'Leidsestraat 94 Amsterdam, North Holland 1017 PE, Netherlands')
d=c[1:]
new_dict= dict((a[i], (d[i],q[i])) for i in range(len(a)))
for k, v in new_dict.iteritems():
print k , v
it will print the output as:
22 Haringstal Ab Kromhout ("Contrary to popular belief, Dutch herring is not raw but salt-cured although the complex curing process does give it a raw finish on the tongue. First of the season herring are called Hollandse nieuwe and are usually available starting in early June. You can find herring stalls all over the city, but Haringstal Ab Kromhout come highly recommended. Order one au naturel or go for the traditional raw chopped onion and pickle accompaniment. Can't make it to Ab Kromhout? Kras Haring on Wittenburgergracht is also an excellent option. [$]", 'Raadhuisstraat Amsterdam, Netherlands')
25 Foodhallen ('Formerly a tram depot, De Foodhallen is now the place to get a taste of the Dutch street food scene. Theres something for everyone here: grilled cheese sandwiches (at Caulils), a bitterbal tasting (at De Ballenbar), burgers (at the Butcher), hotdogs (at Bulls & Dogs), Vietnamese street food (at Viet View), BBQ pork (at the Rough Kitchen), sweet tartlets (at Le Petit Gateau), Mediterranean snacks (at Maza), and lots more. [$$]', ('Bellamyplein 51\n1053 AT Amsterdam, Netherlands', '+31 6 29265037'))
3 Rotisserie Rijsel ('Rijsel serves Flemish and French classics like boeuf la mode, huzarensalade (Russian salad), presskop (head cheese), and rotisserie poussin, all prepared with the finest ingredients. This combined with a well-chosen and well-priced wine selection has put Rijsel on everybodys favourite list since its opening in 2012. Booking ahead is essential and (if on offer) dont think twice about ordering the Cte de Boeuf. [$$$]', ('Marcusstraat 52b\nAmsterdam, NoordHolland 1091TK, Netherlands', '+31 20 463 2142'))
9 Bord'eau - Restaurant Gastronomique ('If you can afford it, head to the two-Michelin-starred Bordeau for the ultimate fine-dining experience. Here, chef Richard van Oostenbrugge wows his guests with his incredibly skilled, classic technique-based cooking. Expect the finest produce, maximum flavors, exquisite sauces, and picture-perfect plates. In fact, Bordeaus signature apple dessert is the most photographed/Instagrammed dessert in Amsterdam. [$$$$]', ('Nieuwe Doelenstraat 2\nAmsterdam, North Holland 1012 CP, Netherlands', '+31 20 531 1705'))
10 Oriental City ('Located in Amsterdams Wallen area, Oriental City is a firm favorite with locals and Amsterdams Chinese community alike. Youll be tempted by much of the extensive menu, and Oriental Citys dim sum is among the best in Amsterdam. The restaurant has many tables divided over two floors, but still be prepared to stand in line on Saturdays. [$$$]', ('Oudezijds Voorburgwal 177-179\nAmsterdam, North Holland 1012 EV, Netherlands', '+31 20 626 8352'))
4 La Rive ('An Amsterdam fine-dining institution since the early 90s (and once the home of renowned Dutch chef Robert Kranenborg), since 2008, Rogr Rassin has been at the helm of La Rives kitchen. Dont be fooled by the traditional, ever-so-slightly formal dining room, because, au contraire, Rassins cooking is deliciously modern and seasonal. The dinner-only restaurant has a unique riverside location, so try to book a window table. [$$$$]', ('Professor Tulpplein 1\nAmsterdam, North Holland 1018 GX, Netherlands', '+31 20 520 3264'))
11 Restaurant Gebr. Hartering ('Part of Amsterdams new wave of casual and unpretentious restaurants, Gebr Hartering has helped shape the citys lively dining scene. The eatery is run by brothers Paul and Niek Hartering and the concept is very simple: hearty food cooked with great ingredients, to be enjoyed with a glass of fine wine. Theres a daily-changing menu, which includes the big-hitter Fleckvieh beef, grilled on charcoal. [$$$]', ('Peperstraat 10I\n1011 TL Amsterdam, Netherlands', '+31 20 421 0699'))
12 Nam Kee ('One of Amsterdams longest-running Chinese restaurants, Nam Kee is best known for its Peking duck window display and famous for its steamed oysters in black bean sauce fantastic oysters that owe their fame to the Dutch film (and novel) Oysters at Nam Kees. [$$$]', ('Zeedijk 111-113\nAmsterdam, North Holland 1012 AV, Netherlands', '+31 20 624 3470'))
15 Gebroeders Niemeijer ('Start your day with a cup of coffee (featuring Costadora beans) and a freshly-baked croissant at Gebr. Niemeijer bakery, or order one of its French-style breakfasts, with petit pains, croissants, marmalade, and jam. At lunchtime, Gebr. Niemeijer serves simple sandwiches and salads. Theres also a great selection of baked goods, and dont miss out on their baguettes (you know, for that Vondelpark picnic). [$]', ('Nieuwendijk 35\nAmsterdam, North Holland, Netherlands', '+31 20 707 6752'))
17 Toscanini ('Toscanini is the most-loved Italian restaurant in Amsterdams Jordaan, and with its 30-year history, probably one of the oldest, too. No pizzas here: Instead, expect a proper (seasonal) Italian menu with a choice of antipasti, primi, secondi, and dolci. Toscanini offers non-fussy food with great ingredients and maximum flavor, served in a wonderfully bustling setting. Its a great dinner spot and theres an excellent wine list, too. [$$$]', ('Lindengracht 75\nAmsterdam, North Holland 1015 KD, Netherlands', '+31 20 623 2813'))
26 FEBO ('Amsterdam is famous for its deep-fried snacks like kroket and bitterballen (both similar to croquettes) and frikandel, a type of sausage. At 75-year-old fast food chain Febo you can buy these snacks from an automat. There are branches scattered all over the city, so it shouldnt be too difficult to get your teeth into a frikandel or a kaassouffl, a pocket of deep-fried cheese. On Fridays and Saturdays some branches are open until 4 a.m., perfect for your wee-hour drunken munchies. [$]', 'Leidsestraat 94 Amsterdam, North Holland 1017 PE, Netherlands')
14 Restaurant Stork ('Hop on the IJplein ferry (near Central Station) for lunch or dinner at Stork, housed in a former Stork engines factory building on the north banks of the river IJ. Order sole or lobster with fries or tuck into a delicious plateau fruit de mer and enjoy the great views of the river. Storks riverside terrace offers a wonderful al fresco dining experience. [$$]', ('Gedempt Hamerkanaal 201\nAmsterdam, North Holland 1021 KP, Netherlands', '+31 20 634 4000'))
19 Proeflokaal Arendsnest ('For a taste of the burgeoning Dutch craft beer scene, get yourself a seat at the bar at Arendsnest. At this canal-side beer bar on the Herengracht, you can try over 30 Dutch beers on tap and no fewer than 100 bottled beers. Youll be spoiled with choices, but do try one of Jopen Brewerys award-winning beers, particularly the Extra Stout, which won a gold medal in the 2015 World Beer Awards. [$]', ('Herengracht 90\nAmsterdam, North Holland 1015 BS, Netherlands', '+31 20 421 2057'))
30 Thrill Grill ('With its first-rate burgers, Thrill Grill has rapidly become a household name for real burger lovers. Thrill Grill is the brainchild of veteran chef Robert Kranenborg, a local legend. The meat is from old Dutch dairy cows and cooked medium-rare. Get your teeth into a classic beef thriller or go for the salmon or veggie falafel burger. The branch on the Gerard Doustraat provides particularly lovely ambiance. [$$]', ('Gerard Doustraat 98\nAmsterdam, North Holland 1072VX, Netherlands', '+31 20 760 6750'))
27 Patisserie Holtkamp ('A family-owned pastry shop where Amsterdam locals go for their sweet treats, expect Patisserie Holtkamp to offer a small but superb range of French and Dutch patisserie, cakes, chocolates, and biscuits (no cupcakes here!). Holtkamp is also famous for its veal, shrimp, and cheese kroketten (croquettes), which are deep-fried to order in the shop. [$]', ('Vijzelgracht 15\nAmsterdam, North Holland 1017, Netherlands', '+31 20 624 8757'))
2 Brouwerij 't IJ ('This Amsterdam brewery has a unique canal-side location, right next to an old windmill, and the outdoor terrace is a popular hangout on sunny days. Around seven beers are available on tap, including the classic Zatte and Natte and often a special seasonal brew, too. A small selection of bar snacks is on offer, including the traditional Dutch Ossenworst, a raw and smoked beef sausage. [$]', ('Funenkade 7\nAmsterdam, North Holland 1018 AL, Netherlands', '+31 20 320 1786'))
24 Fromagerie Kef ('Cheesemonger Fromagerie Abraham Kef supplies many Michelin-starred restaurants in Amsterdam with cheese. The original shop (est. 1953) is on the Marnixstraat, but since 2014, a second branch also operates on the Czaar Peterstraat. On Sundays the Marnixstraat branch regularly organizes cheese and wine tastings. Kefs fantastic cheese selection (mainly made from raw milk) includes some magnificent aged Dutch cheeses. Dont leave without some Remeker. [$]', ('Marnixstraat\nAmsterdam, North Holland 1016 TJ, Netherlands', '+31 20 420 0097'))
18 Cafe De Klepel ('Quality wines and bistro food take the spotlight at Caf De Klepel, part of the recent Dutch bistronomie movement. This friendly and popular place is run by young sommelier duo Margot Los and Job Seuren (formerly of De Librije). Pop in for a glass of wine (at the bar) with some charcuterie or cheese. For the full experience, book a table and order De Klepels three-or four-course menu. [$$]', ('Prinsenstraat 22\nAmsterdam, North Holland 1015 DD, Netherlands', '+31 20 623 8244'))
36 Yamazato ('Yamazato provides an unexpected slice of Japan in the Dutch capital, including dining room views of a Japanese garden with a koi pond. In the evenings, the Michelin-starred Yamazato which is also in the Hotel Okura offers authentic kaiseki tasting menus, but you can also step in for lunch and order a bento box or the great value lunch menu (five courses for 50). An la carte menu including sushi and sashimi is available, too. [$$$-$$$$]', ('Ferdinand Bolstraat 333\nAmsterdam, North Holland 1072 LH, Netherlands', '+31 20 678 8351'))
38 Ron Gastrobar ('Amsterdams thriving dining scene owes a lot to Ron Blaauw. Three years ago, he relaunched his two-star restaurant into the more casual and wallet-friendly Ron Gastrobar, leaving his fine-dining years behind and at the same time launching a new trend. In fact, Michelin Netherlands is even talking about the Ron Blaauw effect. All dishes are priced at 15 (desserts 9) and the restaurant is lauded for its dry-aged barbecue steaks. [$$$]', ('Sophialaan 55 hs\nAmsterdam, North Holland 1075 BP, Netherlands', '+31 20 496 1943'))
13 Choux ('Relatively new on the Amsterdam dining scene but booked solid for dinner every night Choux serves natural wines and light, fresh cuisine, the latter always with a touch of comfort. Order three, four, or seven courses from the monthly-changing menu by chef Merijn van Berlo (including an excellent vegetarian option). For those who fail to snag a seat at dinner, theres also a three- or four- course menu available at lunchtime. [$$$]', ('De Ruijterkade 128\nAmsterdam, North Holland, Netherlands', '+31 6 16512364'))
20 La Perla ('If youre in the mood for a pizza, La Perla in the Jordaan is the place to go. The restaurant is split in two, with the pizzeria on one side of the street, and the huge wood-fired oven on the other. Try the classic Margherita (with buffalo mozzarella) or order the special porchetta di Ariccia, made with oven-roasted pork. [$$]', ('Tweede Tuindwarsstraat 14 & 53\nAmsterdam, North Holland 1015 RZ, Netherlands', '+31 20 624 8828'))
6 Slagerij de Leeuw ('Head to this butchers shop/deli if youre planning to cook a meal in your rental apartment. De Leeuw, the only gourmet butch shop in Amsterdam, offers a wide range of top-quality fresh meat and poultry, such as Wagyu and Rubia Gallega beef, Iberico pork, and Bresse chicken. But for your gourmet picnic, theres also a great selection of charcuterie, cold meats, pats, and other ready-made delicacies. [$$]', ('Utrechtsestraat 92\nAmsterdam, North Holland 1017 VS, Netherlands', '+31 20 623 0235'))
7 Librije's Zusje Amsterdam ('Literally the young sibling (zusje means little sister) of Jonnie Boers three-star restaurant De Librije in Zwolle, Librijes Zusje is located in the stunning Waldorf Astoria Hotel. Executive chef and De Librije alumnus Sidney Schutte has a modern and cutting-edge style of cooking, which shines through in all the dishes. The tasting menu has a hefty price tag, but its worth every cent. [$$$$]', ('Herengracht 542-556\nAmsterdam, North Holland 1017 CG, Netherlands', '+31 20 718 4643'))
34 Twenty Third Bar ('The Dutch cocktail scene is small but growing. The best option, if only for the amazing views, is Twenty Third Bar, situated on the 23rd floor of the Hotel Okura. The extensive cocktail list primarily features classics priced at 15 (champagne cocktails 19.50), and theres a small bar snack menu. Okuras notoriously expensive two-Michelin-starred restaurant Ciel Bleu is located on the same floor. [$$]', ('Ferdinand Bolstraat 333\nAmsterdam, North Holland, Netherlands', '+31 20 678 7450'))
16 Gs ('This funky place is an ideal spot for an American-style brunch which has grown increasingly popular here in recent years and provides respite for those in desperate need of a hangover Bloody Mary. Gs serves a full range of egg dishes; its chicken waffle burger is somewhat famous among locals. The Bloody Mary menu offers no fewer than 13 different versions. Gs has two branches. Consider booking a seat online in advance.', ('Goudsbloemstraat 91\nAmsterdam, North Holland 1015 JK, Netherlands', '+31 20 362 0030'))
8 Guts and Glory ('Guts & Glory a lively, stripped-down place just off Rembrandt Square opened by the super-talented chefs Guillaume de Beer and Freek van Noortwijk and their partner Johanneke van Iwaarden is one of the hottest places to eat in Amsterdam. Its signature is the single-ingredient menu called chapter, which changes every two to three months. After Chicken, Fish, Beef, Pork, and Vegetarian, de Beer and van Noortwijk will soon embark on chapter six: Italian. [$$-$$$]', ('Utrechtsestraat 6\nAmsterdam, North Holland, Netherlands', '+31 20 362 0030'))
31 Le Garage ('This iconic restaurant was founded by restaurateur Joop Braakhekke in 1990 in a former garage. Its famous for being a celebrity haunt, but perhaps equally famous for its dramatic red and black decor that hasnt changed since opening. Le Garage has a heavily French influenced menu (steak tartare, canard la presse, le flottante), but theres also room for modern dishes (squid carbonara, tuna pizza). [$$$]', ('Ruysdaelstraat 54-56\n1071 XE Amsterdam, Netherlands', '+31 20 679 7176'))
23 Broodjeszaak t Kuyltje ('Leisurely lunches are not part of everyday life in the Netherlands, but the Dutch do like a good sandwich, preferably on the go. The best place to get a taste of a Dutch-style sandwich is t Kuyltje. People queue up for its pastrami sandwich, but equally delicious is the Tartaar Speciaal (minced raw beef, onion, hardboiled egg) or the Halfom sandwich (half corned beef, half liver). [$]', ('Gasthuismolensteeg 9\nAmsterdam, North Holland 1016 AM, Netherlands', '+31 20 620 1045'))
35 BAK restaurant ('Bak is a pop-up turned brick-and-mortar restaurant located on the banks of the river IJ in Amsterdams recently re-developed Westelijk Havengebied area. In short: Expect serious food and serious wine, served in a laid-back setting. The menu reflects chef Benny Blistos love for seasonal and local ingredients, and on the wine list you can expect natural wines and quirky grape varieties. For lunch, BAK offers a very affordable three-course menu for 27. [$$]', ('Van Diemenstraat 410\nAmsterdam, North Holland 1013 CR, Netherlands', '+31 20 737 2553'))
21 Restaurant Breda ('A more upscale restaurant by game-changing chefs Guillaume de Beer and Freek van Noortwijk (compared to their other restaurant Guts & Glory, anyway), Bredas opening was greeted by widespread critical acclaim. Dishes are modern with creative flavor combinations, and you can taste the ambition of these young chefs. Sit down for dinner and order the Basic, Extra, or Full Monty tasting menu, and enjoy fine wines selected by sommelier Johanneke van Iwaarden. Its open daily for lunch and dinner. [$$$]', ('Singel 210\nAmsterdam, North Holland, Netherlands', '+31 20 622 5233'))
37 Restaurant Blauw ('Restaurant Blauw is an Indonesian spot renowned for its rijsttafel, which is the thing to order. Rijsttafel is a table-filling feast of small dishes, rice, and condiments, a hybrid Dutch-Indonesian tradition that originated during the Dutch colonial era. Theres a vegetarian rijsttafel option, and you can also order more traditional Indonesian dishes from the a la carte menu. Arrive hungry! [$$]', ('Amstelveenseweg 158-160\nAmsterdam, North Holland 1075 XN, Netherlands', '+31 20 675 5000'))
32 Par Hasard ('Herring and cheese aside, most people also think of fries when thinking of Amsterdam. The Belgian-style double-baked fries at Par Hasard (meaning: by accident) are regarded by many as the best fries in town. Grab an order with a traditional topping of mayonnaise, satay sauce, or zoervleis (a type of beef stew). [$]', ('Ceintuurbaan 113-115\nAmsterdam, North Holland 1072 EZ, Netherlands', '+31 20 471 4052'))
29 Conservatorium Brasserie & Lounge ('With its immense floor-to-ceiling windows and glass ceiling, this is hands-down the most impressive lobby-cum-all-day-dining-room in Amsterdam an essential part of the total experience at this cosmopolitan hotel. Enjoy drinks and snacks in the lounge area or go to the brasserie for lunch or dinner. Standout dishes include veal cheeks with mac and cheese, lobster au gratin, and apple crumble. Theres also a selection of sandwiches and steaks. [$$$]', ('Van Baerlestraat 27\nAmsterdam, North Holland 1070 LP, Netherlands', '+31 20 570 0000'))
1 Merkelbach ("Located in a former 18th-century coach house, Merkelbach's spectacular garden is hands-down the best outdoor dining experience in Amsterdam. The restaurant prides itself on following the principles of the Slow Food movement, so expect a seasonal menu with local ingredients. During the day you can walk in for coffee and apple pie, and theres a compact lunch menu. [$$]", ('Middenweg 72\nAmsterdam, North Holland 1097 BS, Netherlands', '+31 20 423 3930'))
28 Rijks at the Rijksmuseum ('Rijks brings a fresh approach to museum dining (its housed in the Rijksmuseum of Dutch art and history). On the menu designed by chef Joris Bijdendijk (formerly of the three-Michelin-starred Le Jardin de Sens and the one-starred Bridges) and his team find inventive small plates. Also featured are dishes by guest-chefs who cooked at Rijks, like Andr Chiang and Tim Raue. Definitely order the spit-roasted celeriac. [$$$]', ('Museumstraat 1\nAmsterdam, North Holland 1071 XX, Netherlands', '+31 20 674 7000'))
33 The Fat Dog ('For hot dogs, look no further than the Fat Dog, Amsterdams first-ever hot dog joint, opened by acclaimed chef/restaurateur Ron Blaauw in 2014. Order an all-pork frank with sauerkraut, mustard, and onion marmalade (called Gangs of New York) or go for the chicken Gado Gado hot dog with satay sauce, cabbage, and serundeng (spiced coconut flakes). Innovation doesnt stop there: The lamb dog comes with baba ganoush, and theres also a veggie dog. [$]', ('Ruysdaelkade 251\nAmsterdam, North Holland 1072 AX, Netherlands', '+31 20 221 6249'))
5 Patisserie Kuyt ('Follow locals and food obsessives from near and far to this fabulous patisserie for the finest pies, cakes, chocolates, biscuits, and eclairs. Kuyt also has a good selection of delicate savory pastries, quiches, and biscuits. The choice is overwhelming, but dont leave without a Appelschnitt, or better yet, enjoy any of the beautiful and delicious baked goods in the tea room. [$]', ('Utrechtsestraat 109\nAmsterdam, North Holland 1017 VL, Netherlands', '+31 20 623 4833'))
hope this is what you want
Related
How to use regex capturing-group in custom US Address information in Office365
I'm trying to create a custom U.S Address classification label in Azure Information Protection to match possible U.S Addresses Regex (it works Java8 - e.g. https://regex101.com/): ^(\d+) ?(\w)? (.*?) ?((?<= )avenue|ave|court|ct|street|st|drive|dr|lane|ln|road|rd|blvd|plaza|parkway|pkwy)? ?((?<= )\d*)?$ But when I try to set this code in Azure Information Protection I receive the error message below: You cannot configure a pattern with groups or multiple match conditions like (.*, .+, .{0,n} or .{1,n}). Remove the group or the multiple match condition from the pattern to continue. Is there a way to circumvent this situation? Is it possible to reach the same result in another way? Sample Data to test: 66-4 Parkhurst Rd, Chelmsford MA 1824 591 Memorial Dr, Chicopee MA 1020 55 Brooksby Village Way, Danvers MA 1923 137 Teaticket Hwy, East Falmouth MA 2536 42 Fairhaven Commons Way, Fairhaven MA 2719 374 William S Canning Blvd, Fall River MA 2721 121 Worcester Rd, Framingham MA 1701 677 Timpany Blvd, Gardner MA 1440 337 Russell St, Hadley MA 1035 295 Plymouth Street, Halifax MA 2338 1775 Washington St, Hanover MA 2339
Reading text into table format in pandas
I have a table in text form that I want to read into pandas I can use \n to separate the rows, but how can I separate the columns they are in the format ( 2 x text fields, then 6 x numeric). Is there a method using regex or similar? table_text = '''Name AIC sector Price (last close) Price (bid) Price (offer) NAV Total assets (£m) Market cap (£m) 3i Infrastructure Plc Infrastructure GBX 296.00 2.96 2.96 254.50 2,268.700 2,638.645 Aberdeen Asian Income Fund Limited Asia Pacific Income GBX 227.50 2.26 2.29 252.51 479.110 399.796 Aberdeen Diversified Income & Growth Ord Flexible Investment GBX 95.20 0.95 0.96 115.34 379.030 294.985 Aberdeen Emerging Markets Investment Company Limited Global Emerging Markets GBX 704.00 6.98 7.10 829.47 391.268 323.595 Aberdeen Japan Investment Trust Plc Japan GBX 712.50 7.00 7.25 784.79 114.957 94.198 Aberdeen Latin American Income Latin America GBX 57.00 0.54 0.57 62.13 40.985 32.555 Aberdeen New Dawn Asia Pacific GBX 322.00 3.22 3.26 365.56 431.544 350.752 Aberdeen New India Investment Trust Plc India GBX 516.00 5.16 5.18 601.47 375.170 301.268 Aberdeen New Thai Investment Trust Plc Country Specialist GBX 445.00 4.40 4.50 516.30 92.585 71.180 Aberdeen Smaller Companies Income Trust UK Smaller Companies GBX 358.00 3.56 3.60 397.45 95.028 79.153 Aberdeen Standard Asia Focus 2025 CULS Asia Pacific Smaller Companies GBX 100.95 1.01 1.01 97.25 391.484 37.026 Aberdeen Standard Asia Focus PLC Asia Pacific Smaller Companies GBX 1,280.00 12.75 13.00 1,440.65 483.841 402.730 Aberdeen Standard Equity Inc Trust plc UK Equity Income GBX 353.00 3.50 3.56 379.60 203.368 170.598 Aberdeen Standard European Logistics Income PLC Property - Europe GBX 116.00 1.15 1.16 117.82 309.808 305.022 Aberforth Smaller Companies Trust Plc UK Smaller Companies GBX 1,496.00 14.94 15.00 1,613.41 1,513.467 1,327.297 Aberforth Split Level Income Trust Plc UK Smaller Companies GBX 80.10 0.80 0.81 91.46 228.143 152.390 Aberforth Split Level Income ZDP 2024 UK Smaller Companies GBX 111.50 1.10 1.13 113.83 227.713 53.032 Acorn Income Fund Ltd UK Equity & Bond Income GBX 351.00 3.46 3.56 415.97 100.206 55.517 Acorn Income Fund ZDP 2022 UK Equity & Bond Income GBX 161.00 1.61 1.61 162.09 34.413 34.182 AEW UK REIT Ord Property - UK Commercial GBX 92.40 0.92 0.92 97.85 194.107 146.384''' df = pd.DataFrame([x.split(';') for x in table_text.split('\n')]) print(df) Outputs: 0 0 Name AIC sector Price (last close) Price (bid)... 1 3i Infrastructure Plc Infrastructure GBX 296... 2 Aberdeen Asian Income Fund Limited Asia Paci... 3 Aberdeen Diversified Income & Growth Ord Fle... 4 Aberdeen Emerging Markets Investment Company... 5 Aberdeen Japan Investment Trust Plc Japan GB... 6 Aberdeen Latin American Income Latin America... 7 Aberdeen New Dawn Asia Pacific GBX 322.00 3.... 8 Aberdeen New India Investment Trust Plc Indi... 9 Aberdeen New Thai Investment Trust Plc Count... 10 Aberdeen Smaller Companies Income Trust UK S... 11 Aberdeen Standard Asia Focus 2025 CULS Asia ... 12 Aberdeen Standard Asia Focus PLC Asia Pacifi... 13 Aberdeen Standard Equity Inc Trust plc UK Eq... 14 Aberdeen Standard European Logistics Income ... 15 Aberforth Smaller Companies Trust Plc UK Sma... 16 Aberforth Split Level Income Trust Plc UK Sm... 17 Aberforth Split Level Income ZDP 2024 UK Sma... 18 Acorn Income Fund Ltd UK Equity & Bond Incom... 19 Acorn Income Fund ZDP 2022 UK Equity & Bond ... 20 AEW UK REIT Ord Property - UK Commercial GBX... EDIT: This is my hacky way of doing it. Relies on there being a currency column populated with "GBX" though. Would welcome any ideas on better ways of doing this? Is there a regex way of finding three capital letters preceded by a space and with a space then number afterwards? That would find the currency without hardcoding "GBX". def convert_rows(df): sector_name = "GBX" for index, row in df.iterrows(): if sector_name in row[0]: name = row[0].split(sector_name)[0] numbers = row[0].split(sector_name)[1] df.at[index, ['Name']] = name df.at[index, ['AIC sector']] = sector_name df.at[index,['Price (last close)', 'Price (bid)', 'Price (offer)', 'NAV', 'Total assets (£m)', 'Market cap (£m)']] = numbers.split() return df df = convert_rows(df)
You could try this: import re def convert_rows(df): for index, row in df.iterrows(): # Search for the pattern sector_name = re.match(r".+\s([A-Z]{3})\s\d+.+", row[0]) if sector_name: sector_name = sector_name.group(1) # GBX for instance name = row[0].split(sector_name)[0] numbers = row[0].split(sector_name)[1] df.at[index, ['Name']] = name df.at[index, ['AIC sector']] = sector_name df.at[index,['Price (last close)', 'Price (bid)', 'Price (offer)', 'NAV', 'Total assets (£m)', 'Market cap (£m)']] = numbers.split() return df
Remove Unicode literals in a Dataframe string
I have a Dataframe with resumes in, but they contain Unicode literals such as "\xe2\x80\x93". I want to remove all of these values to prepare the text for processing. My issue is that I have tried many recommended ways to remove these, and none seem to work when applied to the data in my df. Text example: "COMPETENCIES\nBenefits Administration \xe2\x80\x93 Customer Service \xe2\x80\x93 Cost Control \xe2\x80\x93 Recruiting \xe2\x80\x93 Acquisition Management" The part I am finding difficult is if I take this text and put it in a string variable such as y = <text> then using one of the following methods to deal with unicode literals: print(re.sub(r'[^\x00-\x7F+]',' ', y) print(y.encode('ascii',errors='ignore').decode('ascii')) It will output: "CORE COMPETENCIES Benefits Administration Customer Service Cost Control Recruiting Acquisition Management" As expected. When I try this on the values in my Dataframe it simply does not seem to work. I have tried the following (df is called resume): resume = resume.apply(lambda x : re.sub(r'[^\x00-\x7F+]',' ',x)) resume = resume.apply(x.encode('ascii',errors='ignore').decode('ascii') resume = resume.replace(re.sub(r'[^\x00-\x7F+]',' ',x)``` I have even tried: for x in resume: x = str(x) x = (re.sub(r'[^\x00-\x7F+]',' ', x)) print(x) and: print(re.sub(r'[^\x00-\x7F+]',' ', resume[0]) Just to see if I could replicate the change when I apply these to a string variable but still no luck. The dataframe is shape (368,0) The dtype is object which I have tried converting to string but I believe it always stays as object.
Can you try this: df['text_clean'] = df['text'].apply(lambda x: x.decode('unicode_escape').\ encode('ascii', 'ignore').\ strip()) Assumptions: The dataframe(df) has a 'text' column containing the resume strings with unicode literals. This is how I tested it: import pandas as pd # created sample data - same example row inserted 5 times. Not ideal but just was trying to test df = pd.DataFrame({"text": [b"COMPETENCIES\nBenefits Administration \xe2\x80\x93 Customer Service \xe2\x80\x93 Cost Control \xe2\x80\x93 Recruiting \xe2\x80\x93 Acquisition Management", b"COMPETENCIES\nBenefits Administration \xe2\x80\x93 Customer Service \xe2\x80\x93 Cost Control \xe2\x80\x93 Recruiting \xe2\x80\x93 Acquisition Management", b"COMPETENCIES\nBenefits Administration \xe2\x80\x93 Customer Service \xe2\x80\x93 Cost Control \xe2\x80\x93 Recruiting \xe2\x80\x93 Acquisition Management", b"COMPETENCIES\nBenefits Administration \xe2\x80\x93 Customer Service \xe2\x80\x93 Cost Control \xe2\x80\x93 Recruiting \xe2\x80\x93 Acquisition Management", b"COMPETENCIES\nBenefits Administration \xe2\x80\x93 Customer Service \xe2\x80\x93 Cost Control \xe2\x80\x93 Recruiting \xe2\x80\x93 Acquisition Management"]}) df['text_clean'] = df['text'].apply(lambda x: x.decode('unicode_escape').\ encode('ascii', 'ignore').\ strip()) Testing the code with kaggle source file: # path stores the location of the data file downloaded from kaggle df = pd.read_csv(path) # remove the binary 'b' prefix reinstated even though the data is # read as string during df creation df['Resume'] = [val[1:].encode('utf-8') for val in df['Resume']] # create a separate column with multiple decode and encode steps to # retrieve the final clean version df['text_clean'] = df['Resume'].apply(lambda x: x.decode('unicode_escape').\ encode('ascii', 'ignore').decode('utf-8').strip()) print(df['text_clean']) Output Sample: 0 'John H. Smith, P.H.R.\n800-991-5187 | PO Box ... 1 'Name Surname\nAddress\nMobile No/Email\nPERSO... 2 'Anthony Brown\nHR Assistant\nAREAS OF EXPERTI... 3 'www.downloadmela.com\nSatheesh\nEMAIL ID:\nCa... 4 "HUMAN RESOURCES DIRECTOR\nExpert in organizat... 5 'John H. Smith, P.H.R.\n800-991-5187 | PO Box ... 6 'Resume of Satheesh\n\nwww.downlo\nSatheesh\n\... 7 "GM HR & ADMINISTRATION Resume Sample www.time... 8 "www.uaehrzone.com\n\nRobert Wales\nDubai\nUni... 9 "Human Resources Coordinator Resume\nExample\n... 10 'RESUME WORLD INC.\n1200 Markham Road, Suite 1... 11 'XXXXX XXXXX\nXXXX, Renton, WA 98059\nHome: XX... 12 'SATHEESH\n\nwww.downloadmela.com\n\nObjective... 13 'Alan Bloggs BE\n1Main Street, Irish Town, Co.... 14 'www.downloadmela.com\nSatheesh\nSummary\n4+ y... 15 'Anthony Brown\nHR Assistant\nAREAS OF EXPERTI... 16 'T\n\nAYLOR J ONES\n15 Jinglewood Street Melbo... 17 'Human Resources Manager\nCurriculum Vitae Exa... 18 'EDMONDBRADY\n1900SummersDriveMontello,AZ55996... 19 'Jonathan Burns\n1414 Marcy Drive\n\n\n\nSomet... 20 'Jo Sample\n123 Ocean Drive\nSampleville, FL 1... 21 'Jonathan Burns\n1414 Marcy Drive\n\n\n\nExamp... 22 'Shweta XXX\nMobile: +91-98********\n\nE-mail:... 23 'www.downloadmela.com\n\nSATHEESH\nMobile :\nE... 24 "Steven B. Manning\n3249 Oral Lake Road\nMinne... 25 "www.downloadmela.com\nSatheesh\n\nE-mail:\nHa... 26 "Resume for HR Assistant\nTX\n3 Avenue,\nSale,... 27 'RESUME WORLD INC.\n1200 Markham Road, Suite 1... 28 'HOW TO WRITE A PROFESSIONAL RESUME\n\n RESUM... 29 'RESUME WORLD INC.\n1200 Markham Road, Suite 1... ... 1189 'Joseph Andrade\nACTOR\nEmail: Jfandrade192#ou... 1190 'MARIAH FORD\nHeight: 5 4\nStars Talent Studio... 1191 'Your Name\nPhone number\nEmail address\nHeigh... 1192 'Jarien Sky-Stutts Senior 3D Artist\n\ncontac... 1193 "Gary White\nMake up artist\nAREAS OF EXPERTIS... 1194 'RESUME\nDan Platt\n5134 Oakdale Ave.\nWoodlan... 1195 "Jeff Wolverton, M.S., B.S.Cis.\nVisual FX Art... 1196 'LETA LOU GRAY\t\n\r\n\n\t\n\r\n\n\t\n\r\n\n\t... 1197 'Curriculum Vitae\nPersonal Details\n\nDarren ... 1198 'Your Name\n\nSchool Address\n123 Main Street\... 1199 'Stacy Adams\nSAG/AFTRA\nHeight:\nWeight:\nHai... 1200 'PERFORMING ARTS RESUME\nContent\nA performers... 1201 'ED WEISS\nTeaching Artist Resume\n\ne-mail: W... 1202 '8/23/2016 sample resume for painter. accounti... 1203 'KELSEY PAINTER\nBlonde Hair/Brown Eyes | Alto... 1204 'Wendy Robin\nProfessional Make-up Artist\n(70... 1205 'Chet Bailey\n100 Desert Street\nDrytown, CA 9... 1206 'Chris Flight Attendant\n11223 East South Aven... 1207 'Bilingual Flight Attendant Resume\n\nANGELICA... 1208 'Flight Attendant Resumes\nFlight-Attendant-Ca... 1209 'Emirates Flight Attendant Resume Sample\n\nAn... 1210 'Entry Level Flight Attendant Resume No Exper... 1211 'CURRICULUM VITAE\nMay 11, 2004\n\nNAME\n\nRob... 1212 'JED REDD\n\n_\n003 Boudry Lane\nFriend, TX 77... 1213 'Lauren B. Pires\nMiami, Florida & New York Ci... 1214 "Free Flight Attendant Resume\nDarlene Flint\n... 1215 'Corporate Flight Attendant Resume\nCAITLIN FL... 1216 'MAJOR CONRAD A. PREEDOM\n2354 Fairchild Dr., ... 1217 'STACY SAMPLE\n\n702 800-0000 cell\n\n0000#ema... 1218 'Entry Level Resume Guide\n\nThis packet is in... Name: text_clean, Length: 1219, dtype: object
Calculating the amount of Board Turnover
I have been trying to calculate the amount of turnover happening in exective boards between 2006 and 2009 in the financial sector. For this I have data looking like the following: Year Bank Director DirectorID (ISIN, RoA, Size etc) 2005 Bank1 John Smith 120 2005 Bank1 Barry Pooter 160 2005 Bank1 Jack Sparrow 2070 2006 Bank1 John Smith 120 2006 Bank1 Barry Pooter 160 2006 Bank1 Jack Sparrow 2070 2007 Bank1 John Smith 120 2007 Bank1 Barry Pooter 160 2007 Bank1 Jack Sparrow 2070 2008 Bank1 John Smith 120 2008 Bank1 Carla Jansen 250 2008 Bank1 Jack Sparrow 2070 2009 Bank1 John Smith 160 2009 Bank1 Carla Jansen 250 2009 Bank1 Mike Stata 875 And this data repeats for each bank from 2005 - 2015. Now I have already made a turnover dummy variable with 0 = no change and 1 = change by using: collapse(sum) DirectorID, by (ISIN, Year, Bank) gen interest = inrange(Year, 2006,2009) bysort ID interest (DirectorID) : gen temp = DirectorID[1] != DirectorID[_N] replace temp = . if interest==0 bysort ID : egen changed = max(temp) However, I would like to make turnover an actual variable on how many changes were made i.e.: (assume bank2 made no change Turnover=0, bank3 made 6 changes (6 new managers came in)Turnover=6 and bank4 made 4 changes (4 new managers came in)Turnover=4. Bank Turnover (ISIN, RoA, Size, etc) Bank1 2 Bank2 0 Bank3 6 Bank4 4 Is this possible with Stata (or SPSS if that happens to be the case)? ISIN codes are my ID variable as they are linked to each specific bank. Two new people entered the board of Bank1. For now it would show as Turnover = 2 as only 2 new people entered the organization's board. Had three people joined in the previous example, in that case Turnover = 3 as each change made to the Board counts as "+1" turnover regardless of the people leaving. Only people that join (whether they replace someone or are just an addition to the board) are of interest in my thesis. However, this could also be calculated differently if that makes it easier. Depends on how I write my methodology. It would be fine if the variable turnover says how many changes were made per year i.e. Turnover2005: 2005 - 2006, Turnover2006: 2006 - 2007, Turnover2007 2007- 2008 and Turnover2008 2008 - 2009 Finally, it's possible that TMTs grow, i.e. 2005 bank 1 has 14 managers on the board and in 2006 they hire 3 new managers but only let 1 go. Now the board has 16 managers and made 3 changes (3 new managers)
This might help. The following code builds a dataset consisting of data with four banks and five years. It is panel data. The xtset command lets you use time series operators which are well documented here (https://www.youtube.com/watch?v=ik8r4WvrPkc). (Note: for sake of clear exposition, in this example Bank 1 had no changes, Bank 2 had two changes, Bank 3 had three, etc.). // Clear the session and other memory. set more off clear all // Input reproducible data. input year bank_num ceo_num 2005 1 200 2006 1 200 2007 1 200 2008 1 200 2009 1 200 2005 2 222 2006 2 222 2007 2 222 2008 2 333 2009 2 444 2005 3 300 2006 3 301 2007 3 302 2008 3 302 2009 3 303 2005 4 999 2006 4 888 2007 4 777 2008 4 666 2009 4 555 end // Declare the panel structure. xtset bank_num year // Gen variable indicating if ceo_num stayed same. // Resulting variable is 0 when there was no change. gen no_turn = (ceo_num - f1.ceo_num) // Gen dummy to indicate if ceo_num changed. gen is_turn = (no_turn != 0 & no_turn < .) // Gen a variable that counts changes. egen turn_nums = sum(is_turn), by(bank_num) // List data to inspect results. list Edit: Re-characterized comment for no_turn variable.
Effective way to store list of list of dict to csv
I've got dataframe like this : Name Nationality Tall Age John USA 190 24 Thomas French 194 25 Anton Malaysia 180 23 Chris Argentina 190 26 so let say i got incoming data structure like this. each element representing the data of each row. : data = [{ 'food':{'lunch':'Apple', 'breakfast':'Milk', 'dinner':'Meatball'}, 'drink':{'favourite':'coke', 'dislike':'juice'} }, ..//and 3 other records ]. 'data' is some variable that save predicted food and drink from my machine learning. There is more record(about 400k rows) but i process them by batch size (right now i process 2k data each iteration) through iteration. Expected result like: Name Nationality Tall Age Lunch Breakfast Dinner Favourite Dislike John USA 190 24 Apple Milk Meatball Coke Juice Thomas French 194 25 .... Anton Malaysia 180 23 .... Chris Argentina 190 26 .... Is there's an effective way to achive that dataframe? so far i've already tried to iterate the data variables and get the value of each predicted label. which its feels like that process took much time.
You need flatenning dictionaries first, create DataFrame and join to original: data = [{ 'a':{'lunch':'Apple', 'breakfast':'Milk', 'dinner':'Meatball'}, 'b':{'favourite':'coke', 'dislike':'juice'} }, { 'a':{'lunch':'Apple1', 'breakfast':'Milk1', 'dinner':'Meatball2'}, 'b':{'favourite':'coke2', 'dislike':'juice3'} }, { 'a':{'lunch':'Apple4', 'breakfast':'Milk5', 'dinner':'Meatball4'}, 'b':{'favourite':'coke2', 'dislike':'juice4'} }, { 'a':{'lunch':'Apple3', 'breakfast':'Milk8', 'dinner':'Meatball7'}, 'b':{'favourite':'coke4', 'dislike':'juice1'} } ] #or use another solutions, both are nice L = [{k: v for x in d.values() for k, v in x.items()} for d in data] df1 = pd.DataFrame(L) print (df1) breakfast dinner dislike favourite lunch 0 Milk Meatball juice coke Apple 1 Milk1 Meatball2 juice3 coke2 Apple1 2 Milk5 Meatball4 juice4 coke2 Apple4 3 Milk8 Meatball7 juice1 coke4 Apple3 df2 = df.join(df1) print (df2) Name Nationality Tall Age breakfast dinner dislike favourite \ 0 John USA 190 24 Milk Meatball juice coke 1 Thomas French 194 25 Milk1 Meatball2 juice3 coke2 2 Anton Malaysia 180 23 Milk5 Meatball4 juice4 coke2 3 Chris Argentina 190 26 Milk8 Meatball7 juice1 coke4 lunch 0 Apple 1 Apple1 2 Apple4 3 Apple3