Validate Title Case Full Name with Regex - regex
To learn Regex, I was solving some problems to train and study. And this is the problem, i know it might not be the best way to do with Regex, and my Regex is a mess, but i liked the challenge.
Problem:
The names needs to be Title Case;
There are exceptions for some lowercase words inside;
And some Names, e.g.: McDonald, MacDuff, D'Estoile
Names with ' and - are accepted, and sometimes they are o'Brien, O'brien, O'Brien, O' Brien or 'Ehu Kali.
No whitespaces on the beggining and end of Name;
No more than one space between each Name of Full Name;
A . is accepted if not alone, e.g.: Dan . Ferdnand (isn't accepted) and Dan G. Ferdnand (is accepted)
Numbers and symbols are not accepted
However, Roman numbers are accepted and aren't Title Case, e.g.: Elizabeth II
Some names can be alone, e.g.: Akihito (Prince of Japan)
Some special characters common in some countries are accepted, e.g.: Valeh ßlÿsgÿroğlu, Lażżru Role, Alaksiej Taraškievič
Regex
The code is
^(?![ ])(?!.*(?:\d|[ ]{2}|[!$%^&*()_+|~=`\{\}\[\]:";<>?,\/]))(?:(?:e|da|do|das|dos|de|d'|la|las|el|los|l'|al|of|the|el-|al-|di|van|der|op|den|ter|te|ten|ben|ibn)\s*?|(?:[A-ZàáâäãåąčćęèéêëėįìíîïłńòóôöõøùúûüųūÿýżźñçčšžÀÁÂÄÃÅĄĆČĖĘÈÉÊËÌÍÎÏĮŁŃÒÓÔÖÕØÙÚÛÜŲŪŸÝŻŹÑßÇŒÆČŠŽ∂ð'][^\s]*\s*?)(?!.*[ ]$))+$
And the Regex101 with a validation list
References
What i tried so far was based on these:
regular expression for first and last name
Regular Expression to disallow two consecutive white spaces in the middle of a string
A regex to test if all words are title-case
How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
Use Regex to Split Numbered List array into Numbered List Multiline
Not working
I did this Regex and don't know how to make a way for it to not recognize the cases below, that are matching:
CAPITAL LETTER
AlTeRnAtE LeTtEr
And those aren't and should:
Urxan Əbűlhəsənzadə
İsmət Jafarov
Şükür Hagverdiyev
Űmid Abdurrahimov
Ġerardo Seralta
Ċikku Paris
Question
Is there a way to optimize this Regex (monster)?
And how do i fix the problems stated before on Not working?
p.s.: The list of names with examples for validation can be found on the link to Regex101.
Brief
Seeing as how you're learning Regex and haven't specified a regex flavour to use, I've chosen PCRE as it has a wide variety of support in the regex world.
Code
See this regex in use here
(?(DEFINE)
(?# Definitions )
(?<valid_nameChars>[\p{L}\p{Nl}])
(?<valid_nonNameChars>[^\p{L}\p{Nl}\p{Zs}])
(?<valid_startFirstName>(?![a-z])[\p{L}'])
(?<valid_upperChar>(?![a-z])\p{L})
(?<valid_nameSeparatorsSoft>[\p{Pd}'])
(?<valid_nameSeparatorsHard>\p{Zs})
(?<valid_nameSeparators>(?&valid_nameSeparatorsSoft)|(?&valid_nameSeparatorsHard))
(?# Invalid combinations )
(?<invalid_startChar>^[\p{Zs}a-z])
(?<invalid_endChar>.*[^\p{L}\p{Nl}.\p{C}]$)
(?<invalid_unaccompaniedSymbol>.*(?&valid_nameSeparatorsHard)(?&valid_nonNameChars)(?&valid_nameSeparatorsHard))
(?<invalid_overTwoUpper>(?:(?&valid_nameChars)*\p{Lu}){3})
(?<invalid>(?&invalid_startChar)|(?&invalid_endChar)|(?&invalid_unaccompaniedSymbol)|(?&invalid_overTwoUpper))
(?# Valid combinations )
(?<valid_name>(?:(?:(?&valid_nameChars)|(?&valid_nameSeparatorsSoft))*(?&valid_nameChars)+(?:(?&valid_nameChars)|(?&valid_nameSeparatorsSoft))*)+\.?)
(?<valid_firstName>(?&valid_startFirstName)(?:\.|(?&valid_name)*))
(?<valid_multipleName>(?&valid_firstName)(?=.*(?&valid_nameSeparators)(?&valid_upperChar))(?:(?&valid_nameSeparatorsHard)(?&valid_name))+)
(?<valid>(?&valid_multipleName)|(?&valid_firstName))
)
^(?!(?&invalid))(?&valid)$
Results
Input
== 1NcOrrect N4M3S ==
CAPITAL LETTER
AlTeRnAtE LeTtEr
Natalia maria
Natalia aria
Natalia orea
Maria dornelas
Samuel eto'
Miguel lasagna
Antony1 de Home Ap*ril
Ap*ril Willians
Antony_ de Home Apr+il
Ant_ony de Home Apr#il
Antony# de Ho#me Apr^il
Maria Silva
Maria silva
maria Silva
Maria Silva
Maria Silva
Maria / Silva
Maria . Silva
John W8
==Correct Names==
Urxan Əbűlhəsənzadə
İsmət Jafarov
Şükür Hagverdiyev
Űmid Abdurrahimov
Ġerardo Seralta
Ċikku Paris
Hind ibn Sheik
Colop-U-Uichikin
Lażżru Role
Alaksiej Taraškievič
Petruso Husoǔski
Sumu-la-El
Valeh ßlÿsgÿroğlu
'Arab al-Rashayida
Tariq al-Hashimi
Nabeeh el-Mady
Tariq Al-Hashimi
Brian O'Conner
Maria da Silva
Maria Silva
Maria G. Silva
Maria McDuffy
Getúlio Dornelles Vargas
Maria das Flores
John Smith
John D'Largy
John Doe-Smith
John Doe Smith
Hector Sausage-Hausen
Mathias d'Arras
Martin Luther King Jr.
Ai Wong
Chao Chang
Alzbeta Bara
Marcos Assunção
Maria da Silva e Silva
Juscelino Kubitschek de Oliveira
Maria da Costa e Silva
Samuel Eto'o
María Antonieta de las Nieves
Eugène
Antòny de Homé April
àntony de Home ùpril
Antony de Home Aprìl
Pierre de l'Estache
Pierre de L'Estoile
Akihito
Nadine Schröder
Anna A. Møller
D. Pedro I
Pope Benedict XVI
Marsibil Ragnarsdóttir
Natanaël Morel
Isaac De la Croix
Jean-Michel Bozonnet
Qutaibah Mu'tazz Abadi
Rushd Jawna' Kassab
Khaldun Abdul-Qahhar Sabbag
'Awad Bashshar Asker
Al B. Zellweger
Gunnleif Snæ-Ulfsson
Käre Toresson
Sorli Ærnmundsson
Arnkel Øystæinsson
Ástríður Dórey
Åsmund Kåresson
Yahatti-Il
Ipqu-Annunitum
Nabu-zar-adan
Eskopas Cañaverri
Botolph of Langchester
Aelfhun the Cantrell
Fraco di Natale
Fraco Di Natale
Iván de Luca
Iván De Luca
Man'nah
Atabala Aüamusalü
Ramiz Ağasəfalu
Dadaş Aghakhanov
Fÿrxad Mübarizlı
Vaclaǔ Šupa
Yakiv Volacič
Flor Van Vaerenbergh
Flor van Vaerenbergh
Edwin van der Sar
Husein Ekmečić
Álvaro Guimarães Alencar
Phone U Yaza Arkar
Seocan MacGhille
X'wat'e Tlekadugovy
Albert-Jan Bootsveld
Maurits-jan Kuipers op den Kollenstaart
Elco ter Hoek
Robbert te Poele
Aad ten Have
'Ehu Kali
Ho'opa'a Loni
Aukanai'i Mahi'ai
Kalman ben Tal El
Żytomir Roszkowski
K'awai
==EXTRA== only if possible, strange ones
Maol-Moire Mac'IlleBhuidh
Tòmas MacIlleChruim
Aindreas MacIllEathain
Eanruig MacGilleBhreac
Peadar MacGilleDhonaghart
Maolmhuire MacGill-Eain
Eanruig MacGilleBhreac
Wim van 't Plasman
Output
Note: Shown below are only the strings that matched from the above Input
Urxan Əbűlhəsənzadə
İsmət Jafarov
Şükür Hagverdiyev
Űmid Abdurrahimov
Ġerardo Seralta
Ċikku Paris
Hind ibn Sheik
Colop-U-Uichikin
Lażżru Role
Alaksiej Taraškievič
Petruso Husoǔski
Sumu-la-El
Valeh ßlÿsgÿroğlu
'Arab al-Rashayida
Tariq al-Hashimi
Nabeeh el-Mady
Tariq Al-Hashimi
Brian O'Conner
Maria da Silva
Maria Silva
Maria G. Silva
Maria McDuffy
Getúlio Dornelles Vargas
Maria das Flores
John Smith
John D'Largy
John Doe-Smith
John Doe Smith
Hector Sausage-Hausen
Mathias d'Arras
Martin Luther King Jr.
Ai Wong
Chao Chang
Alzbeta Bara
Marcos Assunção
Maria da Silva e Silva
Juscelino Kubitschek de Oliveira
Maria da Costa e Silva
Samuel Eto'o
María Antonieta de las Nieves
Eugène
Antòny de Homé April
àntony de Home ùpril
Antony de Home Aprìl
Pierre de l'Estache
Pierre de L'Estoile
Akihito
Nadine Schröder
Anna A. Møller
D. Pedro I
Pope Benedict XVI
Marsibil Ragnarsdóttir
Natanaël Morel
Isaac De la Croix
Jean-Michel Bozonnet
Qutaibah Mu'tazz Abadi
Rushd Jawna' Kassab
Khaldun Abdul-Qahhar Sabbag
'Awad Bashshar Asker
Al B. Zellweger
Gunnleif Snæ-Ulfsson
Käre Toresson
Sorli Ærnmundsson
Arnkel Øystæinsson
Ástríður Dórey
Åsmund Kåresson
Yahatti-Il
Ipqu-Annunitum
Nabu-zar-adan
Eskopas Cañaverri
Botolph of Langchester
Aelfhun the Cantrell
Fraco di Natale
Fraco Di Natale
Iván de Luca
Iván De Luca
Man'nah
Atabala Aüamusalü
Ramiz Ağasəfalu
Dadaş Aghakhanov
Fÿrxad Mübarizlı
Vaclaǔ Šupa
Yakiv Volacič
Flor Van Vaerenbergh
Flor van Vaerenbergh
Edwin van der Sar
Husein Ekmečić
Álvaro Guimarães Alencar
Phone U Yaza Arkar
Seocan MacGhille
X'wat'e Tlekadugovy
Albert-Jan Bootsveld
Maurits-jan Kuipers op den Kollenstaart
Elco ter Hoek
Robbert te Poele
Aad ten Have
'Ehu Kali
Ho'opa'a Loni
Aukanai'i Mahi'ai
Kalman ben Tal El
Żytomir Roszkowski
K'awai
Maol-Moire Mac'IlleBhuidh
Tòmas MacIlleChruim
Aindreas MacIllEathain
Eanruig MacGilleBhreac
Peadar MacGilleDhonaghart
Maolmhuire MacGill-Eain
Eanruig MacGilleBhreac
Wim van 't Plasman
Explanation
I used a define block to create definitions. You can look at each definition to see how it works. In general, I use \p{.} where . is replaced with some pointer to a Unicode character group (i.e \p{L} is any letter from any language - this will not work in most flavours of regex, but it does allow the regex to be much more simplified if available, which is why I used it).
If you need anything else explained, don't hesitate to ask me and I'll do my best, but regex101 should be able to explain anything you're wondering about regex.
Related
How do I extract surnames with spaces in them as 'one' name/'en bloc'?
Could anyone please advise on a way to extract surnames that have spaces in them, as a single block of names? I have names in a dataset that look like this clear input str40 name "R. P. de la Espriella Guerrero" "J. de Carvalho Ponce" "E. De Freitas Drumond" "R. de la Fuente and M. E. Medina-Mora" "C. Van Heyningen and I. D. Watson" "A. Z. van de Wiel and D. W. de Lange" end I only want the first surname (so only the first author and excluding other authors) but I want those names that have spaces to be extracted 'en bloc'. So, ultimately resulting in a dataset as follows, for instance: clear input str40 name "de la Espriella Guerrero" "de Carvalho Ponce" "De Freitas Drumond" "de la Fuente" "Van Heyningen" "van de Wiel" end I'd be grateful for any help.
Here is code that implements the two rules given in my comment above. It assumes the version of Stata used supports the unicode character string functions. clear input str40 name "R. P. de la Espriella Guerrero" "J. de Carvalho Ponce" "E. De Freitas Drumond" "R. de la Fuente and M. E. Medina-Mora" "C. Van Heyningen and I. D. Watson" "A. Z. van de Wiel and D. W. de Lange" end generate surname = name replace surname = usubstr(surname,1,ustrpos(surname+" and "," and ")-1) list, clean noobs replace surname = usubstr(surname,ustrrpos(surname,". ")+1,.) list, clean noobs
Regular expression in distinct texts
I have two patterns to use a regular expression. In the first one, I have this pattern and I can catch the word. With this regex: referente[,;]*\s\S\s(.+)\.\sOnde O COORDENADOR-GERAL DE GESTÃO DE PESSOAS DO MINISTÉRIO DOS TRANSPORTES, PORTOS E AVIAÇÃO CIVIL, no uso das atribuições que lhe foram subdelegadas pela Portaria/SAAD nº. 202, art. 1°, inciso VII, de 08 de outubro de 2010, publicada no Diário Oficial da União, de 11 de outubro de 2010, resolve: Retificar a Portaria COGEP-MT nº 3394, de 30 de novembro de 2016, publicada no Diário Oficial da União, Seção 2, página 55, de 13 de dezembro de 2016, referente à MARIA ALIXANDRINA COSTA REIS. Onde se lê "MARIA AUXILIADORA COSTA REIS"; Leia-se "MARIA ALIXANDRINA COSTA REIS.(Processo SEI: 50000.124582/2016-62) BA. I need to take the name in another pattern. O COORDENADOR-GERAL DE GESTÃO DE PESSOAS DO MINISTÉRIO DOS TRANSPORTES, PORTOS E AVIAÇÃO CIVIL, no uso das atribuições que lhe foram subdelegadas pela Portaria/SAAD nº. 202, art.1º, inciso VII, de 08 de outubro de 2010, publicada no Diário Oficial da União, de 11 de outubro de 2010, resolve: Conceder Pensão Temporária, nos termos do artigo 215 e 217, inciso II, alínea "a" da Lei nº 8.112/1990, à ELIANE RIBEIRO MENESES, filha inválida do ex-servidor ASTOLFO MENEZES, matrícula SIAPE nº. 0783182, do Quadro Permanente deste Ministério, falecido na inatividade em 05 de agosto de 1997, cuja cota parte equivale a 100% (cem por cento) do valor correspondente à remuneração decorrente do cargo de Artífice de Mecânica (NI), Classe "A", Padrão "III", com vigência partir do momento da Publicação da Portaria de Concessão e efeitos financeiros a partir de 30 de maio de 2015, data do falecimento da viúva. (Processo SEI nº 50000.019342/2016-47) - MG. I need to take the bold word too, in the same regex. How can I modify this regex?
I am not sure why you want to match/embolden the trailing punctuation and Onde/Where substring. I would recommend this pattern to optionally match referente then the à then the all-caps words to follow. There are no capture groups, just replace the fullstring with the emboldened fullstring. (I don't use nsregularexpression, so let me know if something is simply not right.) /(?:referente )?à [A-Z]+(?: [A-Z]+)*/u The unicode flag is to accommodate the accented letters that will be encountered. Pattern Demo p.s. In your "solution" you are incorporating [,;]* but that doesn't get represented in your sample strings, so I left it out. Reducing the total number of parenthetical groups delivers improved pattern efficiency -- that is why I use just two non-capturing groups.
you can use the following regex to match the 2 bold parts of your examples: (à\sELIANE\s\w+\s\w+SES)|(referente[,;]*\s\S\s.+\.\sOnde) Good luck!
My solution is ([,;]*)*\sà\s((\w+\s)+\w+)[\.,]
Silver Searcher: how to return filename without path
I am using Silver Searcher to find information in my Calibre library which, by default uses long directory and filenames that are a bit redundant. Example search: chris#ODYSSEUS:~/db/ebooks/paper-art$ ag --markdown angel Christophe Boudias (Editor)/Origami Bogota 2014 (Paginas de Origami) (2)/Origami Bogota 2014 (Paginas de Origami) - Christophe Boudias (Editor).md 8:* [16] Angel (???) 9:* [22] Christmas Angel (Uniya Filonova) Juan Fernando Aguilera (Editor)/Origami Bogota 2013 (Paginas de Origami) (1)/Origami Bogota 2013 (Paginas de Origami) - Juan Fernando Aguilera (Editor).md 29:* [96] Inspired Origami Angel (K. Dianne Stephens) 31:* [100] Angel for Eric Joisel (Kay Kraschewski) I would like to return just the filename where the whole path is shown in the example. How can I do that?
The l (lowecase L) flag will return the files-with-matches instead of the lines matched. e.g. $ ag -l "angel" you can pipe into sed to remove anything up to and including the final / which leaves the filename. ag -l angel | sed 's=.*/=='
How to camelcase only a specific part of a string?
I have a CSV file like that: "","LESCHELLES","","LESCHELLES" "","SAINTE CROIX DE VERDON","","SAINTE CROIX DE VERDON" "","SERRE CHEVALIER","","SERRE CHEVALIER" "","SAINT JUST D'ARDECHE","","SAINT JUST D'ARDECHE" "","NEUVILLE SUR VANNES","","NEUVILLE SUR VANNES" "","ESCUEILLENS ET SAINT JUST","","ESCUEILLENS ET SAINT JUST" "","PAS DES LANCIERS","","PAS DES LANCIERS" "","PLAN DE CAMPAGNE","","PLAN DE CAMPAGNE" And I'd like to convert it this way: "","Leschelles","","LESCHELLES" "","Sainte Croix De Verdon","","SAINTE CROIX DE VERDON","STE CROIX DE VERDON","93" "","Serre Chevalier","","SERRE CHEVALIER","SERRE CHEVALIER","93" "","Saint Just D'Ardeche","","SAINT JUST D'ARDECHE" "","Neuville Sur Vannes","","NEUVILLE SUR VANNES" "","Escueillens Et Saint Just","","ESCUEILLENS ET SAINT JUST","ESCUEILLENS ET ST JUST","91" "","Luc","","LUC" "","Pas Des Lanciers","","PAS DES LANCIERS","PAS DES LANCIERS","93" "","Plan De Campagne","","PLAN DE CAMPAGNE","PLAN DE CAMPAGNE","93" This would be nice. And better: lowercase all "whole" words like de, d', et, sur and des. This would give: "","Leschelles","","LESCHELLES" "","Sainte Croix de Verdon","","SAINTE CROIX DE VERDON","STE CROIX DE VERDON","93" "","Serre Chevalier","","SERRE CHEVALIER","SERRE CHEVALIER","93" "","Saint Just d'Ardeche","","SAINT JUST D'ARDECHE" "","Neuville sur Vannes","","NEUVILLE SUR VANNES" "","Escueillens et Saint Just","","ESCUEILLENS ET SAINT JUST","ESCUEILLENS ET ST JUST","91" "","Luc","","LUC" "","Pas des Lanciers","","PAS DES LANCIERS","PAS DES LANCIERS","93" "","Plan de Campagne","","PLAN DE CAMPAGNE","PLAN DE CAMPAGNE","93"
Python has title(): Return a titlecased version of the string where words start with an uppercase character and the remaining characters are lowercase. The algorithm uses a simple language-independent definition of a word as groups of consecutive letters. The definition works in many contexts but it means that apostrophes in contractions and possessives form word boundaries, which may not be the desired result: "they're bill's friends from the UK".title() "They'Re Bill'S Friends From The Uk" A workaround for apostrophes can be constructed using regular expressions: import re def titlecase(s): return re.sub(r"[A-Za-z]+('[A-Za-z]+)?", lambda mo: mo.group(0)[0].upper() + mo.group(0)[1:].lower(), s) titlecase("they're bill's friends.") "They're Bill's Friends." Update: here's the solution for French problem: import re, sys def titlecase(s): return re.sub(r"[A-Za-z]+('[A-Za-z]+)?", lambda mo: mo.group(0)[0].upper() + mo.group(0)[1:].lower(), s) def french_parse(s): p = re.compile( r"( de la | sur | sous | la | de | les | du | le | au | aux | en | des | et )|(( d'| l')([a-z]+))", re.IGNORECASE) return p.sub( lambda mo: mo.group().find("'")>0 and mo.group()[:mo.group().find("'")+1].lower() + titlecase(mo.group()[mo.group().find("'")+1:]) or (mo.group(0)[0].upper() + mo.group(0)[1:].lower()), s); for line in sys.stdin: s = line[20:len(line)-1] p = s.find('"') t = s[:p] # Just output to show which names have been modified: if french_parse( titlecase(t) ) != titlecase(t): print '"' + french_parse( titlecase(t) ) + '"' Just launch it like this: python thepythonscript.py < file.csv Then the output will be: "Grenand les Sombernon" "Touville sur Montfort" "Fontenay en Vexin" "Durfort Saint Martin de Sossenac" "Monclar d'Armagnac" "Ports sur Vienne" "Saint Barthelemy de Beaurepaire" "Saint Bernard du Touvet" "Rosoy le Vieil"
While you may be able to pull this off with some vim regex magic, I think it'll be easier if you solve the problem in your favorite scripting language, and pipe selected text through that from vim using the ! command. Here's an (untested) example in PHP: #!/usr/bin/env php <?php $specialWords = array('de', 'd\'', 'et', 'du', /* etc. */ ); foreach (file('php://stdin') as $ville) { $line = ucwords($line); foreach ($specialWords as $w) { $line = preg_replace("/\\b$w\\b/i", $w, $line); } echo $line; } Make that script executable and store it somewhere on your PATH; then from vim, select some text and use :'<,'>! yourscript.php to convert (or just :%! yourscript.php for the whole buffer).
The csv.vim ftplugin helps with working in CSV files. Though it does not offer a "substitute in column N" function directly, it may get your near that. At least you can arrange the columns into neat blocks, and then apply a simple regexp or visual blockwise selection to it. But I second that using a different toolchain that is more suited to manipulating CSV-files may be preferable over doing this completely in Vim. It also depends on whether it's a one-off task or, you do this frequently.
Here is an oneliner vim command. %s/"[^"]*",\zs\("[^"]*"\)/\=substitute(substitute(submatch(0), '\<\(\a\)\(\a*\)\>', '\u\1\L\2', 'g'), '\c\<\(de\|d\|l\|sur\|le\|la\|en\|et\)\>', '\L&', 'g') I expect here to have no double-quotes in the first two fields. The idea behind this solution is to rely on :h :s\= to execute a series of functions on the second field once found. The series of functions being: first change each word to TitleCase, then put all liants in lowercase.
Csv list of country codes/names? [closed]
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers. This question does not appear to be about programming within the scope defined in the help center. Closed 9 years ago. Improve this question Where can I get a list of easily parseable country codes and names? Note: 2 letter codes wanted. Examples: au,Australia uk,United Kingdom us,United States etc
The ISO 3166-1 list of country codes is what I was looking for. For more information you can also find general information about country codes on Wikipedia and a - possibly more up to date list - ISO 3166-1 list as well. Edit: Posting it here: Country Name;ISO 3166-1-alpha-2 code AFGHANISTAN;AF ÅLAND ISLANDS;AX ALBANIA;AL ALGERIA;DZ AMERICAN SAMOA;AS ANDORRA;AD ANGOLA;AO ANGUILLA;AI ANTARCTICA;AQ ANTIGUA AND BARBUDA;AG ARGENTINA;AR ARMENIA;AM ARUBA;AW AUSTRALIA;AU AUSTRIA;AT AZERBAIJAN;AZ BAHAMAS;BS BAHRAIN;BH BANGLADESH;BD BARBADOS;BB BELARUS;BY BELGIUM;BE BELIZE;BZ BENIN;BJ BERMUDA;BM BHUTAN;BT BOLIVIA, PLURINATIONAL STATE OF;BO BONAIRE, SINT EUSTATIUS AND SABA;BQ BOSNIA AND HERZEGOVINA;BA BOTSWANA;BW BOUVET ISLAND;BV BRAZIL;BR BRITISH INDIAN OCEAN TERRITORY;IO BRUNEI DARUSSALAM;BN BULGARIA;BG BURKINA FASO;BF BURUNDI;BI CAMBODIA;KH CAMEROON;CM CANADA;CA CAPE VERDE;CV CAYMAN ISLANDS;KY CENTRAL AFRICAN REPUBLIC;CF CHAD;TD CHILE;CL CHINA;CN CHRISTMAS ISLAND;CX COCOS (KEELING) ISLANDS;CC COLOMBIA;CO COMOROS;KM CONGO;CG CONGO, THE DEMOCRATIC REPUBLIC OF THE;CD COOK ISLANDS;CK COSTA RICA;CR CÔTE D'IVOIRE;CI CROATIA;HR CUBA;CU CURAÇAO;CW CYPRUS;CY CZECH REPUBLIC;CZ DENMARK;DK DJIBOUTI;DJ DOMINICA;DM DOMINICAN REPUBLIC;DO ECUADOR;EC EGYPT;EG EL SALVADOR;SV EQUATORIAL GUINEA;GQ ERITREA;ER ESTONIA;EE ETHIOPIA;ET FALKLAND ISLANDS (MALVINAS);FK FAROE ISLANDS;FO FIJI;FJ FINLAND;FI FRANCE;FR FRENCH GUIANA;GF FRENCH POLYNESIA;PF FRENCH SOUTHERN TERRITORIES;TF GABON;GA GAMBIA;GM GEORGIA;GE GERMANY;DE GHANA;GH GIBRALTAR;GI GREECE;GR GREENLAND;GL GRENADA;GD GUADELOUPE;GP GUAM;GU GUATEMALA;GT GUERNSEY;GG GUINEA;GN GUINEA-BISSAU;GW GUYANA;GY HAITI;HT HEARD ISLAND AND MCDONALD ISLANDS;HM HOLY SEE (VATICAN CITY STATE);VA HONDURAS;HN HONG KONG;HK HUNGARY;HU ICELAND;IS INDIA;IN INDONESIA;ID IRAN, ISLAMIC REPUBLIC OF;IR IRAQ;IQ IRELAND;IE ISLE OF MAN;IM ISRAEL;IL ITALY;IT JAMAICA;JM JAPAN;JP JERSEY;JE JORDAN;JO KAZAKHSTAN;KZ KENYA;KE KIRIBATI;KI KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF;KP KOREA, REPUBLIC OF;KR KUWAIT;KW KYRGYZSTAN;KG LAO PEOPLE'S DEMOCRATIC REPUBLIC;LA LATVIA;LV LEBANON;LB LESOTHO;LS LIBERIA;LR LIBYA;LY LIECHTENSTEIN;LI LITHUANIA;LT LUXEMBOURG;LU MACAO;MO MACEDONIA, THE FORMER YUGOSLAV REPUBLIC OF;MK MADAGASCAR;MG MALAWI;MW MALAYSIA;MY MALDIVES;MV MALI;ML MALTA;MT MARSHALL ISLANDS;MH MARTINIQUE;MQ MAURITANIA;MR MAURITIUS;MU MAYOTTE;YT MEXICO;MX MICRONESIA, FEDERATED STATES OF;FM MOLDOVA, REPUBLIC OF;MD MONACO;MC MONGOLIA;MN MONTENEGRO;ME MONTSERRAT;MS MOROCCO;MA MOZAMBIQUE;MZ MYANMAR;MM NAMIBIA;NA NAURU;NR NEPAL;NP NETHERLANDS;NL NEW CALEDONIA;NC NEW ZEALAND;NZ NICARAGUA;NI NIGER;NE NIGERIA;NG NIUE;NU NORFOLK ISLAND;NF NORTHERN MARIANA ISLANDS;MP NORWAY;NO OMAN;OM PAKISTAN;PK PALAU;PW PALESTINE, STATE OF;PS PANAMA;PA PAPUA NEW GUINEA;PG PARAGUAY;PY PERU;PE PHILIPPINES;PH PITCAIRN;PN POLAND;PL PORTUGAL;PT PUERTO RICO;PR QATAR;QA RÉUNION;RE ROMANIA;RO RUSSIAN FEDERATION;RU RWANDA;RW SAINT BARTHÉLEMY;BL SAINT HELENA, ASCENSION AND TRISTAN DA CUNHA;SH SAINT KITTS AND NEVIS;KN SAINT LUCIA;LC SAINT MARTIN (FRENCH PART);MF SAINT PIERRE AND MIQUELON;PM SAINT VINCENT AND THE GRENADINES;VC SAMOA;WS SAN MARINO;SM SAO TOME AND PRINCIPE;ST SAUDI ARABIA;SA SENEGAL;SN SERBIA;RS SEYCHELLES;SC SIERRA LEONE;SL SINGAPORE;SG SINT MAARTEN (DUTCH PART);SX SLOVAKIA;SK SLOVENIA;SI SOLOMON ISLANDS;SB SOMALIA;SO SOUTH AFRICA;ZA SOUTH GEORGIA AND THE SOUTH SANDWICH ISLANDS;GS SOUTH SUDAN;SS SPAIN;ES SRI LANKA;LK SUDAN;SD SURINAME;SR SVALBARD AND JAN MAYEN;SJ SWAZILAND;SZ SWEDEN;SE SWITZERLAND;CH SYRIAN ARAB REPUBLIC;SY TAIWAN, PROVINCE OF CHINA;TW TAJIKISTAN;TJ TANZANIA, UNITED REPUBLIC OF;TZ THAILAND;TH TIMOR-LESTE;TL TOGO;TG TOKELAU;TK TONGA;TO TRINIDAD AND TOBAGO;TT TUNISIA;TN TURKEY;TR TURKMENISTAN;TM TURKS AND CAICOS ISLANDS;TC TUVALU;TV UGANDA;UG UKRAINE;UA UNITED ARAB EMIRATES;AE UNITED KINGDOM;GB UNITED STATES;US UNITED STATES MINOR OUTLYING ISLANDS;UM URUGUAY;UY UZBEKISTAN;UZ VANUATU;VU VENEZUELA, BOLIVARIAN REPUBLIC OF;VE VIETNAM;VN VIRGIN ISLANDS, BRITISH;VG VIRGIN ISLANDS, U.S.;VI WALLIS AND FUTUNA;WF WESTERN SAHARA;EH YEMEN;YE ZAMBIA;ZM ZIMBABWE;ZW
Here's a maintained repo of country codes with friendly country names in multiple languages and in many formats, including text, CSV, JSON, YAML, XML, SQL, and others: https://github.com/umpirsky/country-list The English CSV dump from the above repo: https://github.com/umpirsky/country-list/blob/master/data/en_US/country.csv
In case it can help anyone, I just took the list of ISO 3166 country codes and names in English and French and put them together in CSV and SQL format: http://blog.plsoucy.com/2012/04/iso-3166-country-code-list-csv-sql/ While doing that, I fixed the capitalization so names aren't all uppercase, and renamed a few countries to the name that is used on Wikipedia, so "MACEDONIA, THE FORMER YUGOSLAV REPUBLIC OF" became "Macedonia" for example. That should allow for shorter lists. Feel free to use these if you want.
Take a look at ISO http://www.iso.org/iso/home/standards/country_codes/country_names_and_code_elements.htm It is not CSV but you can easily copy and paste it into a file, or you can buy the file from ISO updated the link