Where to get an easily parseable list of country names and their calling codes - country-codes

Where can I find an easily parseable list of the names of all countries and their calling codes (i.e. their mobile country codes).
For example
United States of America,1
United Kingdom,44
Sultanate of Oman,968 etc.

Related

What is the regular expression that only extracts the URL address?

There are url and email addresses in the middle of the sentence below. But I want to extract only url as a regular expression. The extracted results are as follows.
www.united.com
https://www.bbc.com/sport/football/64698988
https://linuxpip.org
www.gggggg.ac.us
github.com
What should I do?
example sentence:
"Wembley, Wembley, we're the famous Man United and we're off to Wembley," was the chant from the home supporters against Leicester.
United rode their luck, needing David de Gea two make two world-class saves to keep them in the contest, but two goals from Marcus rash#icloud.co.kr Rashford and one from Jadon Sancho helped them to a comfortable victory. gsgad#gmail.com England international Rashford is in the form of his life, taking his tally to 24 goals for the campaign, but Bruno Fernandes' impressive www.united.com performances have gone under the radar, https://www.bbc.com/sport/football/64698988 with the Portuguese playmaker providing two more assists on Sunday.
Free-flowing up front but solid in defence, https://linuxpip.org United's clean sheet against Leicester was their 10th in the league this season, two more than the entirety of the last campaign.
Ten Hag's men were www.gggggg.ac.us without midfield maestro report#abcdefcaf.net Casemiro, and it showed for large parts of the first half when they failed to gain control github.com in the middle of the park, but the Brazil international's return from suspension will provide a boost against the Magpies.
Use the regular expression below to get both url and email address.
(https?:\/\/)?(www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*)

Algorithm to rank the simplicity of a random name

I have been looking for a name for a new project. I want the name to have available domains and social media handles. For months, all those I can think of are taken.
So I generated a list of names with at least a consonant and a vowel and checked if the domains are available (which is very fast). I have about a million possible names.
I would like to sort them by some rank of simplicity. "Aaazq" would be close to the bottom, "Cawel" would be close to the top. I thought of the CVC structure (Consonant-Vowel-Consonant) and wonder if some more sophisticated algorithm exists. I searched for "sonority" but it has a different meaning in linguistics.
How can I automatically rank the simplicity of a random name?
I assume you would judge simplicity as compared to a target language, say English. Something that is 'simple' in English might not be 'simple' in German or Korean, as these languages have very different phonological structures.
I would recommend the following:
collect some data of the language you are using. Just get some novels from Project Gutenberg, for example, or newspaper articles. Whatever you can easily get hold of.
now generate n-grams from this: all sequences of two (bigrams) or three (trigrams) letters. Turn this into a frequency list, so that common n-grams are at the top of the list with a high frequency.
turn your suggested name into n-grams. Count how many times the respective n-gram occurs in your frequency list, and take the average or median of the result
Your examples would do as follows:
aa aa az zq: "aa" is rare ("aardvark") "az" a bit more common ("glaze", "raze"), and "zq" would not exist. So, not a very high score.
ca aw we el: all of these are fairly common in English words, so a reasonably high score.
You could also add a dummy # at the beginning and the end, so in your first example you'd get #a, which is fine, as many English words start with "a", but the final q# bombs out, as there's only words such as "Iraq" which end in a "q".
You can obviously do the same for other languages.
Also, you can reverse the process in a way, and pick random n-grams from your frequency list to generate names: by picking higher-frequency n-grams you will make sure the name is a good match with the phonological structure of your target language.
Note for pedants: I use phonological structure, but it's really its representation in the spelling system that we're dealing with here.

How can we set chart settings for number abbreviations of Data Studio Report?

Date and number abbreviations in Data Studio Report look different on different users. In English, abbreviation of 1000 is K and abbreviation of 1,000,000,000 is B. But In Turkish, abbreviation of 1000 is B.
This causes confusion between a thousand or a billion. When examined users with different report views, we found that Google Account Language Settings are different.Is it possible set these abbreviations so that they do not change with the Google Account Language Settings?
The number format is based on the language of the respective Google Account that's logged in by the viewer. To display the Report URL in a specific language (e.g. English), add the suffix:
?hl=en
More Language Codes
1) Standard URL
https://datastudio.google.com/reporting/48900f64-2068-4331-b707-f82df114542a
2) English (US)
https://datastudio.google.com/reporting/48900f64-2068-4331-b707-f82df114542a?hl=en
3) Spanish
https://datastudio.google.com/reporting/48900f64-2068-4331-b707-f82df114542a?hl=es
Source: Google Forum Post (Nimantha 13 Jan 2020)

Text Analysis Tools

I am currently building a datatable in base sas and using an index function to flag certain company names embedded in a paragraph of text in a column. If the company name exists I will flag them with a one. When I've looked into the paragraphs in more detail this simple approach doesn't work. Take this example below;
"John Smith advised Coco-cola on its merger with Pepsi". I'm searching on both Coca-cola and Pepsi but only want to flag Coca-cola in this example as John Smith "advised" them. I don't want both Coco-cola and Pepsi flagged with a "1". I understand that I can write code that takes words after certain anchor words such as "advised", "represented" which does work. What happens if one record simply lists all companies that they have advised without using an anchor words to identify them? Is there any tools out there that can do this automatically by AI?
Thanks
Chris

Extract substring starting with the comma, moving right, until you hit a space

I have a series of addresses in one column. I am trying to extract each component (Street Address, City, State, and Zip Code) into a separate columns.
I was able to extract the zip codes rather easily with `=RIGHT(A1, 5)'. However, I am having a hard time extracting the city. All of the rows follow the same format below. My idea is to find the comma, and extract the substring from right to left until getting a space. How do I do this?
Here is an example of what the data looks like:
2209 Fake Street Arlington, TX 76015
3100 Fake Street Bedford, TX 76021
3558 Fake Street Flower Mound, TX 75028
4230 Fake Street Fort Worth, TX 76119
2662 Fake Street Bedford, TX 76021
That will only work with cities that have one word. And looking for the type of address (road, street, etc) for the start of the city name, won't work when there is no type. I think if your layout has no unique separator between street and city, you'll probably need a zip code lookup table to get the city.
In addition, you will need code to resolve issues where two different cities have the same zip code. For example, in Texas, 76119 could refer to FORT WORTH, FOREST HILL, or FT WORTH. And you may need code to handle misspellings.
It might be that these are few enough to allow manual correction.