ML program to find the acronyms of a given list

ML program to find the acronyms of a given list - sml

I am researching and learning about the ML language. I have met with a question and have difficulty in solving it. I'm sure I will use the Traverse, Size and Substring functions, but i cannot put it in some way, I'm a bit confused. Could you help?
Question:
val x = [ ["National", "Aeronautics", "and", "Space", "Administration"]
, ["The", "North", "Atlantic", "Treaty", "Organization"]
]
Sample run:
val it = [ {acronym="NASA", name="National Aeronautics and Space Administration"},
, {acronym="NATO", name="The North Atlantic Treaty Organization"}
]
: nm list

Looking at the information in your question, I'm guessing that the goal of the problem is to write a function acronyms which meets the following specification. I've taken some liberty of renaming types to make it clearer:
type words = string list
type summary = {acronym : string, name : string}
val acronyms : words list -> summary list
This function takes a list of organization names (which have been split into words) and produces a list of summaries. Each summary in the output describes the corresponding organization from the input.
The tricky part is writing a function acronym : words -> summary which computes a single summary. For example,
- acronym ["National", "Aeronautics", "and", "Space", "Administration"];
val it = {acronym="NASA",name="National Aeronautics and Space Administration"}
: summary
Once you have this function, you can apply it to each organization name of the input with List.map:
fun acronyms orgs = List.map acronym orgs
I'll leave the acronym function to you. As a hint to get started, consider filtering the list of words to remove words such as "and" and "the".

Related

Pandas: Grouping rows by list in CSV file?

In an effort to make our budgeting life a bit easier and help myself learn; I am creating a small program in python that takes data from our exported bank csv.
I will give you an example of what I want to do with this data. Say I want to group all of my fast food expenses together. There are many different names with different totals in the description column but I want to see it all tabulated as one "Fast Food " expense.
For instance the Csv is setup like this:
Date Description Debit Credit
1/20/20 POS PIN BLAH BLAH ### 1.75 NaN
I figured out how to group them with an or statement:
contains = df.loc[df['Description'].str.contains('food court|whataburger', flags = re.I, regex = True)]
I ultimately would like to have it read off of a list? I would like to group all my expenses into categories and check those category variable names so that it would only output from that list.
I tried something like:
fast_food = ['Macdonald', 'Whataburger', 'pizza hut']
That obviously didn't work.
If there is a better way of doing this I am wide open to suggestions.
Also I have looked through quite a few posts here on stack and have yet to find the answer (although I am sure I overlooked it)
Any help would be greatly appreciated. I am still learning.
Thanks

You can assign a new column using str.extract and then groupby:
df = pd.DataFrame({"description":['Macdonald something', 'Whataburger something', 'pizza hut something',
'Whataburger something','Macdonald something','Macdonald otherthing',],
"debit":[1.75,2.0,3.5,4.5,1.5,2.0]})
fast_food = ['Macdonald', 'Whataburger', 'pizza hut']
df["found"] = df["description"].str.extract(f'({"|".join(fast_food)})',flags=re.I)
print (df.groupby("found").sum())
#
debit
found
Macdonald 5.25
Whataburger 6.50
pizza hut 3.50

Use dynamic pattern building:
fast_food = ['Macdonald', 'Whataburger', 'pizza hut']
pattern = r"\b(?:{})\b".format("|".join(map(re.escape, fast_food)))
contains = df.loc[df['Description'].str.contains(pattern, flags = re.I, regex = True)]
The \b word boundaries find whole words, not partial words.
The re.escape will protect special characters and they will be parsed as literal characters.
If \b does not work for you, check other approaches at Match a whole word in a string using dynamic regex

keyword inspection based on words present in multiple lists

I have a dictionary similar to this:
countries = ["usa", "france", "japan", "china", "germany"]
fruits = ["mango", "apple", "passion-fruit", "durion", "bananna"]
cf_dict = {k:v for k,v in zip(["countries", "fruits"], [countries, fruits])}
and I also have a list of strings similar to this:
docs = ["mango is a fruit that is very different from Apple","I like to travel, last year I was in Germany but I like France.it was lovely"]
I would like to inspect the docs and see if each string contains any of the keywords in any of the lists(the values of cf_dict are lists) in cf_dict, and if they are present then return the corresponding key(based on values) for that string(strings in docs) as output.
so for instance, if I inspect the list docs the output will be [fruits, countries]
something similar to this answer but this checks only one list, however, I would like to check multiple lists.

The following returns a dict of sets in case a string matches values in more than one list (e.g. 'apple grows in USA' should be mapped to {'fruits', 'countries'}).
print({s: {k for k, l in cf_dict.items() for w in l if w in s.lower()} for s in docs})
This outputs:
{'mango is a fruit that is very different from Apple': {'fruits'}, 'I like to travel, last year I was in Germany but I like France.it was lovely': {'countries'}}

Applying regexp and finding the highest number in a list

I have got a list of different names. I have a script that prints out the names from the list.
req=urllib2.Request('http://some.api.com/')
req.add_header('AUTHORIZATION', 'Token token=hash')
response = urllib2.urlopen(req).read()
json_content = json.loads(response)
for name in json_content:
print name['name']
Output:
Thomas001
Thomas002
Alice001
Ben001
Thomas120
I need to find the max number that comes with the name Thomas. Is there a simple way to to apply regexp for all the elements that contain "Thomas" and then apply max(list) to them? The only way that I have came up with is to go through each element in the list, match regexp for Thomas, then strip the letters and put the remaining numbers to a new list, but this seems pretty bulky.

You don't need regular expressions, and you don't need sorting. As you said, max() is fine. To be safe in case the list contains names like "Thomasson123", you can use:
names = ((x['name'][:6], x['name'][6:]) for x in json_content)
max(int(b) for a, b in names if a == 'Thomas' and b.isdigit())
The first assignment creates a generator expression, so there will be only one pass over the sequence to find the maximum.

You don't need to go for regex. Just store the results in a list and then apply sorted function on that.
>>> l = ['Thomas001',
'homas002',
'Alice001',
'Ben001',
'Thomas120']
>>> [i for i in sorted(l) if i.startswith('Thomas')][-1]
'Thomas120'

How to search and replace keywords in strings in python from multiple categories?

Assume I have a string which reads:
'''
Looking closely we find Pepsi and Coca-cola have been the two two biggest brands of soda in the world for the past four years.
'''
and I want to represent mapping of words to its classes as :
classes={"NAME":["pepsi","coca-cola", "james","jill"...],"CATEGORY":["soda","food","automobile"....],"NUMBER":["one","two","three","four"....]}
So that at the end I want to have the original string as :
Looking closely we find NAME and NAME have been the two biggest brands of CATEGORY in the world for the past NUMBER years
for a simple dict like :
rep = {"NAME": "pepsi", "CATEGORY": "soda"....}
I can replace the words for above dict but how do I do that if there more than one word per key?
This is what I have so far:
stringh=sentence.lower()
for i,j in rep.items():
stringh = stringh.replace(j, i)
print stringh

Iterating through the list
stringh=sentence.lower()
for i,j in rep.items():
for k in j:
stringh = stringh.replace(k, i)
print stringh

Defining lists in Haskell

I'd like to know how to define this as a list within Haskell so that I could perform operations such as tail, reverse and length:
cars = "lamborghinis are my favourite types of car"
I've tried:
let cars = [lamborghinis,are,my,favourite,types,of,car]
let cars = ["lamborghinis","are","my","favourite","types","of","car"]
let cars = ['lamborghinis','are','my','favourite','types','of','car']
I have been using http://learnyouahaskell.com/starting-out as a tutorial as I am new to Haskell and I can't see where I'm going wrong, I thought my first attempt above would be correct as that it how it does it in the tutorial but with numbers instead of words.
The error I am getting is: parse error on input 'of.
Any ideas where I'm going wrong? Thanks.

let cars = "lamborghinis are my favourite types of car"
makes cars a list of characters [Char]==String and head cars should give you the first letter. This is special syntax of haskell for strings
let cars2 = ["lamborghinis","are","my","favourite","types","of","car"]
This is the normal form for defining lists and gives you a list of strings [String] and head cars should give you "lamborghinis".
You can also split a sentance into words using the words function.
let cars3 = words cars

The type String is only a type synonym for [Char].
This means that for example "Hello, you" is the same as ['H','e','l','l','o',',',' ','y','o','u']

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

ML program to find the acronyms of a given list - sml

Related

Pandas: Grouping rows by list in CSV file?

keyword inspection based on words present in multiple lists

Applying regexp and finding the highest number in a list

How to search and replace keywords in strings in python from multiple categories?

Defining lists in Haskell

Categories

Resources