The following program works exactly as intended:
destinations = {}
while True:
query = input("Tell me where you went: ").strip()
if not query:
break
if query.count(',') != 1:
print("That's not a legal city, state combination. Please try again.\n")
continue
city, country = query.split(',')
city = city.strip()
country = country.strip()
if country not in destinations:
destinations[country] = [city]
else:
destinations[country].append(city)
temp1 = {key: sorted(destinations[key]) for key in destinations.keys()}
temp2 = {key: value for key, value in sorted(temp1.items(), key=lambda x: x[0])}
# temp2 = {}
# for key in sorted(temp1.keys()):
# temp2[key] = temp1[key]
for key, value in temp2.items():
print(key)
for elem in value:
print('\t' + elem)
Sample:
Tell me where you went: Shanghai, China
Tell me where you went: Boston, USA
Tell me where you went: Beijing, China
Tell me where you went: London, England
Tell me where you went: Phoenix, USA
Tell me where you went: Hunan, China
Tell me where you went: Denver, USA
Tell me where you went: Moscow, USSR
Tell me where you went: Leningrad, USSR
Tell me where you went: San Francisco, USA
Tell me where you went: Indianapolis, USA
Tell me where you went: Jakarta, Phillipines
Tell me where you went:
China
Beijing
Hunan
Shanghai
England
London
Phillipines
Jakarta
USA
Boston
Denver
Indianapolis
Phoenix
San Francisco
USSR
Leningrad
Moscow
There is one aspect I do not like, and that is how it handles duplicates city, country entries.
I am trying to modify my code so that if a city, country pair is entered that already exists then the output would look like this:
China
Beijing (2)
Shanghai
England
London
USA
Boston
Chicago (2)
New York
I thought that a good idea was to change my list of values to a list of lists
like this:
if country not in destinations:
destinations[country] = [[city]]
else:
destinations[country].append([city])
Then I thought I would check to see if the city already existed and if it did
I would append to the embedded list a value for that city starting with 2:
destinations = {'England': [['Birmingham', 2], ['London'], ['Wiltshire', 3]]}
I can't get it to work. Maybe there is a better way to represent multiple occurrences of a given city, country pair?
This is a good effort. You can simply replace your last for loop with this:
Code
import collections as ct
for country, cities in temp2.items():
counter = ct.Counter(cities)
seen = set()
print(country)
for city in cities:
count = counter[city]
if city not in seen and count > 1:
print("\t {:<20} ({})".format(city, count)) # new format
elif city not in seen:
print("\t {}".format(city)) # old format
seen.add(city)
Here we use a collections.Counter to track repeated cities. As we iterate the cities for a given country, any repeated cities are printed in your desired format, otherwise the old format is used. To avoid printing cities again, every city is added to a set which is searched prior to printing.
Suggestions
To clean up your code, consider refactoring it by making two functions.
def get_destinations():
"""Return a dictionary of destinations from the user."""
destinations = ct.defaultdict(list)
while True:
# INSERT INPUT LOGIC HERE
destinations[country].append(city)
return destinations
def print_destinations(destinations):
"""Print city and countries."""
temp1 = {k: sorted(v) for k, v in destinations.items()}
temp2 = {k: v for k, v in sorted(temp1.items(), key=lambda x: x[0])}
# Python < 3.6, optional
# temp2 = ct.OrderedDict()
# for k, v in sorted(temp1.items()):
# temp2[k] = v
# INSERT POSTED CODE HERE
Demo
>>> dest = get_destinations()
>>> dest
Tell me where you went: Beijing, China
Tell me where you went: Atlanta, USA
Tell me where you went: Beijing, China
Tell me where you went: New York, USA
Tell me where you went:
defaultdict(list,
{'China': ['Beijing', 'Beijing'], 'USA': ['Atlanta', 'New York']})
>>> print_destinations(dest)
China
Beijing (2)
USA
Atlanta
New York
Details
In get_destinations(), we refactor by:
using a defaultdict, which sets an empty list as a default value.
returning the default dictionary; note repeated cities are retained.
In print_destinations(), we refactor by:
sorting keys directly (no need for .keys()) and sorting the values of temp1
leverage preserved key insertion in Python 3.6 to maintain an alphabetized dictionary; note for any lesser version of Python, the OrderedDict should be used (commented code)
Related
I have a table which has one column with values separated by commas
Table_A
State City
Colorado Denver
Texas Dallas, Houston, Austin
Arizona Phoenix, Flagstaff
Expected_Result
Table_A
State City
Colorado Denver
Texas Dallas
Texas Houston
Texas Austin
Arizona Phoenix
Arizona Flagstaff
There are easier ways to do it in SQL but can't find anything similar in Redshift. Please help
You can use split_to_array function to change the comma separated column into an array, then you can do with a join.
with array_data as (
select State, split_to_array(City , ',') as city
from Table_A
)
SELECT t.state, cities as city
FROM array_data AS t
LEFT JOIN t.city AS cities ON TRUE;
I have a table with street and city fields however most street fields have the city listed in it as well at the end. I have identified those records but now would like to delete the city from the street. Here is the query that I used to identify bad records. I am just learning SQL and would appreciate any help.
SELECT *
FROM mytable
WHERE Mailing_Street like CONCAT('%',SUBSTRING(mailing_city,1,Len(mailing_city)))
I am looking for
street
123 Main St Anytown
to be updated to
street
123 Main St
where city = Anytown
I am trying to match merge two data sets by the variable "country". Both data sets contain the variable country (one has it named as "name" but was changed to country) and other variables, one data set (data1) contains continent information. However, I run into the issue of SAS just concatenating the data sets, that is, stacking them on top of one another.
I have tried the basics, sorting the data sets by the same by variables and making sure to use the by statement when merging the data sets.
proc sort data=data1;
by name;
run;
proc sort data=data2;
by country;
run;
data merged_data;
length continent $ 20 country $ 200;
merge data1(rename=(name=country)) data2;
by country;
run;
The result of this code is the data sets just being stacked on top of one another. My goal is to attach the continent to the country, ie identify the continent of each country.
data1:
Continent Name
Asia China
Australia New Zealand
Europe France
data2:
Country Var City
China 1.2 Beijing, China
New Zealand 3.5 Auckland, New Zealand
France 2.8 Paris, France
data I want:
Country Var City Continent
China 1.2 Beijing, China Asia
New Zealand 3.5 Auckland, New Zealand Australia
France 2.8 Paris, France Europe
data I get:
Country Var City Continent
China 1.2 Beijing, China
New Zealand 3.5 Auckland, New Zealand
France 2.8 Paris, France
China Asia
New Zealand Australia
France Europe
From my example data your logic works for me. Maybe your error has to do with your length statement
Data Df1;
INPUT Country $1-18 #19 Temp;
datalines;
United States 87
Canada 68
Mexico 88
Russia 77
China 55
;
Run;
Data Df2;
INPUT name $1-18 #19 season $;
datalines;
United States Summer
Canada Summer
Mexico Summer
Russia Winter
China Winter
;
Run;
Proc sort data=Df1;
by Country;
Proc sort data= Df2;
by Name;
Run;
Data Merged_data;
merge Df1 Df2(rename=(name=country));
by country;
Run;
Make sure the values of the variables are what you think they are. Print the values using $QUOTE. format. Look at the results using fixed length font. etc.
Perhaps one has the actual values you see and the other has a code that is decoded by a format to the values you see.
If it is not an issue of formatted value versus actual value then perhaps the records in DATA2 have leading spaces.
This program produces the result you are showing. If you remove the leading spaces from COUNTRY in DATA2 then the merge works as expected.
data data1 ;
input Continent $13. Name $15.;
cards;
Asia China
Australia New Zealand
Europe France
;
data data2;
input Country $15. Var City $25.;
country=' '||country;
cards;
China 1.2 Beijing, China
New Zealand 3.5 Auckland, New Zealand
France 2.8 Paris, France
;
proc sort data=data1; by name; run;
proc sort data=data2; by country; run;
data want ;
merge data2 data1(rename=(name=country)) ;
by country;
run;
Results:
Obs Country Var City Continent
1 China 1.2 Beijing, China
2 France 2.8 Paris, France
3 New Zealand 3.5 Auckland, New Zealand
4 China . Asia
5 France . Europe
6 New Zealand . Australia
I want to extract text before the last space from column A and add it to column B.
Example of input:
Chicago A12
New York GE8
United States of America AB8
Wanted output:
Chicago
New York
United States of America
ColumnB =
VAR string_length = LEN('Data'[ColumnA])
RETURN
TRIM(
LEFT(
SUBSTITUTE('Data'[ColumnA];" "; REPT(" "; string_length));string_length)
)
This does only work if I have one word before the space.
Output:
Chicago
New
United
Assuming you always have last 3 characters behind your Inpurt such as A12, GED or so on
Create New Field and add below expression to it.
Note: Assuming you are not having City column as Null or empty, else you will have to add one more condition of if
Expected Result = LEFT('Table'[City];LEN('Table'[City])-3)
A code I am running uses svy:mean and there is NO subpop command used.
My issue is that is that for certain variables, it renames some of the values of the variable to _subpop_1, etc. but others are still in their original format. For example, I have a county variable. After using the svy:mean command, some counties show up as Alameda, Alpine, etc) while some show up as _subpop_7, _subpop_8, etc.
Does anyone know why this is?
When using a tab command on the same variable, none of the formats are affected and every county shows up.
An example of my code and output (I hid the numbers) would be:
foreach var of varlist county {
svy: mean deport, over(`var')
}
Survey: Mean estimation
Number of strata = . Number of obs = .
Number of PSUs = . Population size = .
Design df = .
ALAMEDA: county = ALAMEDA
ALPINE: county = ALPINE
AMADOR: county = AMADOR
BUTTE: county = BUTTE
CALAVERAS: county = CALAVERAS
COLUSA: county = COLUSA
_subpop_7: county = CONTRA COSTA
_subpop_8: county = DEL NORTE
_subpop_9: county = EL DORADO
FRESNO: county = FRESNO
GLENN: county = GLENN
HUMBOLDT: county = HUMBOLDT
IMPERIAL: county = IMPERIAL
More than a programming problem, this is simply a case of Stata doing what it states it'll do. From help mean:
Noninteger values,
negative values, and labels that are not valid Stata names are substituted with a default identifier.
An example reproducing the "problem" is:
webuse hbp
// some value labels with spaces
label define lblcity 1 "contra costa" 2 "el dorado" 3 "alameda" 5 "alpine"
label values city lblcity
mean hbp, over(city)
More on valid Stata names in [U] 11 Language syntax.
(Note the svy : prefix plays no role here.)