Text grouping the text - grouping

I need help in grouping the texts ..I have a list of merchants like this and we can see that first few belong to CENTURYLINK next to SMART ATT ..is there a way to group/label these texts with a single label or categorize these texts as per the pool they fall into ..
Thanks in advance
001 CENTURYLINK IREP
003 CENTURYLINK MY ACCOUNT
003-ClearTalk Wireless
004 CENTURYLINK IVR
005 CENTURYLINK RECURRING
006 CENTURYLINK WIFI
007 CENTURYLINK CABLE
111 SMART ATT
112 SMART ATT
113 - SMART - ATT
114 SMART ATT
120 - SMART - ATT
131 - SMART - ATT
137 - SMART - ATT
A WIRELESS AMERY
A WIRELESS ANNA
A WIRELESS APTOS
A WIRELESS ARCADIA
A WIRELESS ARNOLDS PAR
A WIRELESS ASHLAND
A WIRELESS ATHENS

You have a few options. Among the simplest would be to match vendor substrings, as follows:
import java.util.Arrays;
import java.util.LinkedList;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;
import java.util.stream.Collectors;
public class GroupVendors {
public static void main(final String[] args) {
final List<String> vendors = Arrays.asList(
"CENTURYLINK",
"SMART",
"ATT",
"A WIRELESS");
final List<String> uncategorizedVendors = Arrays.asList(
"001 CENTURYLINK IREP",
"003 CENTURYLINK MY ACCOUNT",
"003-ClearTalk Wireless",
"004 CENTURYLINK IVR",
"005 CENTURYLINK RECURRING",
"006 CENTURYLINK WIFI",
"007 CENTURYLINK CABLE",
"111 SMART ATT",
"112 SMART ATT",
"113 - SMART - ATT",
"114 SMART ATT",
"120 - SMART - ATT",
"131 - SMART - ATT",
"137 - SMART - ATT",
"A WIRELESS AMERY",
"A WIRELESS ANNA",
"A WIRELESS APTOS",
"A WIRELESS ARCADIA",
"A WIRELESS ARNOLDS PAR",
"A WIRELESS ASHLAND",
"A WIRELESS ATHENS");
final Map<String, List<String>> categorizedVendors = new TreeMap<>();
for (final String vendor : vendors) {
categorizedVendors.put(vendor, new LinkedList<String>());
}
for (final String vendor : uncategorizedVendors) {
for (final Map.Entry<String, List<String>> entry : categorizedVendors.entrySet()) {
final String category = entry.getKey();
if (vendor.contains(category)) {
final List<String> bin = entry.getValue();
bin.add(vendor);
}
}
}
for (final Map.Entry<String, List<String>> entry : categorizedVendors.entrySet()) {
final String category = entry.getKey();
final List<String> bin = entry.getValue();
System.out.printf("vendors(\"%s\") = {%n", category);
if (!bin.isEmpty()) {
System.out.printf(" %s%n",
bin.stream()
.map((vendor) -> String.format("\"%s\"", vendor))
.collect(Collectors.joining(",\n ")));
}
System.out.println("}");
}
}
}
Sample run:
% java GroupVendors
vendors("A WIRELESS") = {
"A WIRELESS AMERY",
"A WIRELESS ANNA",
"A WIRELESS APTOS",
"A WIRELESS ARCADIA",
"A WIRELESS ARNOLDS PAR",
"A WIRELESS ASHLAND",
"A WIRELESS ATHENS"
}
vendors("ATT") = {
"111 SMART ATT",
"112 SMART ATT",
"113 - SMART - ATT",
"114 SMART ATT",
"120 - SMART - ATT",
"131 - SMART - ATT",
"137 - SMART - ATT"
}
vendors("CENTURYLINK") = {
"001 CENTURYLINK IREP",
"003 CENTURYLINK MY ACCOUNT",
"004 CENTURYLINK IVR",
"005 CENTURYLINK RECURRING",
"006 CENTURYLINK WIFI",
"007 CENTURYLINK CABLE"
}
vendors("SMART") = {
"111 SMART ATT",
"112 SMART ATT",
"113 - SMART - ATT",
"114 SMART ATT",
"120 - SMART - ATT",
"131 - SMART - ATT",
"137 - SMART - ATT"
}
I've made the assumption that the list of vendor categories you are interested in is "CENTURYLINK", "SMART", "ATT", and "A WIRELESS". This has the effect of categorizing all entries containing both "SMART" and "ATT" in both their bins. If you want each vendor to be categorized in exactly one bin, then you will need to resolve which vendor you prefer when the categories are redundant.

Related

Select value in column that matches value in list (UPDATED FOR CLARITY)

If I have a column of street addresses and want to select only the address's directional, what syntax would I use to accomplish that in Excel Power Query?
For instance, how do I get "NE" from "357 Pyrite Dr NE" even if the address is incorrectly formatted as "357 NE Pyrite Dr" or "357 Pyrite NE Dr"? Likewise, how would I get "NW" from "506 Mark NW St"?
As far as I can figure out, I would hit add column > custom column and enter a syntax similar to the following...
= if List.ContainsAny([Address], {"NE", "NW", "SE", "SW"}) = TRUE then Text.Select([Address], {"NE", "NW", "SE", "SW"} else null
...except I know that's not the correct syntax since it always produces an error. The same thing happens when I replace "Text.Select" with "List.Select" in the above formula.
For greater clarification, I'm posting the query as it stands now, whittled down to one column from a table with 100 columns and 4000 rows:
let
Source = q_NMAACC,
#"Removed Other Columns" = Table.SelectColumns(Source,{"Address - Street 1", "Address - Street 2"}),
#"Merged Columns" = Table.CombineColumns(#"Removed Other Columns",{"Address - Street 1", "Address - Street 2"},Combiner.CombineTextByDelimiter(" ", QuoteStyle.None),"Street Address"),
#"Trimmed Text" = Table.TransformColumns(#"Merged Columns",{{"Street Address", Text.Trim, type text}}),
#"Filtered Rows" = Table.SelectRows(#"Trimmed Text", each [Street Address] <> null and [Street Address] <> "")
in
#"Filtered Rows"
Here are the first 25 rows to give you some data to work off.
Street Address
PO Box 3416 Nr57 #165a
1016 Copper NE Ave Apt C
217 Garcia St NE
232 17th St SE
560 60th St NW
2935 Madeira Dr NE
9677 Eagle Ranch Rd NW Apt 415
5320 Roanoke Ave NW
17 Hwy 304
HCR 79 Box 46
6524 Camino Rojo
3518 Vail Ave SE
6412 Torreon Dr NE
6136 Flor de Rio Ct NW
1712 36th Street SE
734 Columbia Street
716 Morning Meadows Dr NE
6601 Tennyson St NE Apt 10207
Alamo - Rio Salado PO Box 804
206 Aragon Rd
6901 Verano Ct NW
6709 Siesta Pl NE
10 Meadow Hills Loop
98 Avenida Jardin
6903 Prairie Rd NE Apt 216
Try
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
List={"NE","NW","SW","SE"},
LocateTable = Table.FromList(List, null, {"Locate"}),
Find = Table.AddColumn(Source, "Found", (x) => Text.Combine(Table.SelectRows(LocateTable, each Text.Contains(x[Address],[Locate], Comparer.OrdinalIgnoreCase))[Locate],", "))
in Find
You could also use another table to contain the search criteria
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
Find = Table.AddColumn(Source, "Found", (x) => Text.Combine(Table.SelectRows(LocateTable, each Text.Contains(x[Address],[Locate], Comparer.OrdinalIgnoreCase))[Locate],", "))
in Find
the , Comparer.OrdinalIgnoreCase part is ignoring case for comparison, which you can remove if you want to match case

PySpark using Regexp_extract and Col to Create Dataset

I need help creating a dataset that shows both the first name and last name of people who live in Texas and the area code of their phone numbers (phone1). This is the coding that I tried to use and this is the dataset that I was given.
from pyspark.sql.functions import regexp_extract, col
regexp_extract(col('first_name + last_name'), '.by\s+(\w+)', 1))
first_name last_name company_name address city county state zip phone1
Billy Thornton Qdoba 8142 Yougla Road Dallas Fort Worth TX 34218 689-956-0765
Joe Swanson Beachfront 9243 Trace Street Miami Dade FL 56432 890-780-9674
Kevin Knox MSG 7683 Brooklyn Ave New York New York NY 56987 850-342-1123
Bill Lamb AFT 6394 W Beast Dr Houston Galveston TX 32804 407-413-4842
Raylene Kampa Hermar Inc 2046 SW Nylin Rd Elkhart Elkhart IN 46514 574-499-1454
Now I see. Your phone number status is good to split, so use split.
df.show()
+----------+---------+------------+-----------------+--------+----------+-----+-----+------------+
|first_name|last_name|company_name| address| city| county|state| zip| phone1|
+----------+---------+------------+-----------------+--------+----------+-----+-----+------------+
| Billy| Thornton| Qdoba| 8142 Yougla Road| Dallas|Fort Worth| TX|34218|689-956-0765|
| Joe| Swanson| Beachfront|9243 Trace Street| Miami| Dade| FL|56432|890-780-9674|
| Kevin| Knox| MSG|7683 Brooklyn Ave|New York| New York| NY|56987|850-342-1123|
| Bill| Lamb| AFT| 6394 W Beast Dr| Houston| Galveston| TX|32804|407-413-4842|
| Raylene| Kampa| Hermar Inc| 2046 SW Nylin Rd| Elkhart| Elkhart| IN|46514|574-499-1454|
+----------+---------+------------+-----------------+--------+----------+-----+-----+------------+
df.filter("state = 'TX'") \
.withColumn('area_code', split('phone1', "-")[0].alias('area_code')) \
.select('first_name', 'last_name', 'state', 'area_code') \
.show()
+----------+---------+-----+---------+
|first_name|last_name|state|area_code|
+----------+---------+-----+---------+
| Billy| Thornton| TX| 689|
| Bill| Lamb| TX| 407|
+----------+---------+-----+---------+

Enum is defined but not found in the class

The solution consists of three classes: the SongGenre, the Song and the Library (+ Program). I am just following the instructions so most of coding comes from my lectures and the book and not much of the experience. It is what what you see and I am not really proud of it. Pointers are really appreciated. The main one is why the enum values can not be seen in another classes?
This code has been fixed (see comments).
namespace SongLibrary
{
[Flags]
enum SongGenre
{
Unclassified = 0,
Pop = 0b1,
Rock = 0b10,
Blues = 0b100,
Country = 0b1_000,
Metal = 0b10_000,
Soul = 0b100_000
}
}
namespace SongLibrary
{
/*This acts like a record for the song. The setter is missing for all the properties.
* There are no fields.
* This class comprise of four auto-implemented properties with public getters and
* setters absent. */
public class Song
{
public string Artist { get; }
public string Title { get; }
public double Length { get; }
public SongGenre Genre { get; }
/*This constructor that takes four arguments and assigns them to the appropriate properties.*/
public Song(string title, string artist, double length, SongGenre genre)
{
Artist = artist;
Title = title;
Length = length;
SongGenre Genre = SongGenre.genre;/*<-ERROR 'SongGenre does not contain a definition for 'genre'*/
}
public override string ToString()
{
return string.Format("[{0} by ,{1} ,({2}) ,{3}min]", Title, Artist, Genre, Length);
}
}
}
namespace SongLibrary
{
public static class Library
{
/*This is a static class therefore all the members also have to be static. Class members
* are accessed using the type instead of object reference.
* There are no properties.
* There is no constructor for this class.
* There are four over-loaded methods. */
/*This private field is a list of song object is a class variable.*/
private static List<string> songs = new List<string> { "title", "artist", "length", "genre" };
/*This is a public class method that does not take any argument and displays all the songs in
* the collection.*/
public static void DisplaySongs()
{
for (int i = 0; i < songs.Count; i++)
Console.WriteLine(songs[i]);
}
/*This is a public class method that takes a double argument and displays only songs that are
* longer than the argument.*/
public static void DisplaySongs(double longerThan)
{
foreach (string songs in songs)
{
if (songs.Length > longerThan)
{
Console.WriteLine("\n" + songs);
}
}
}
/*This is a public class method that takes a SongGenre argument and displays only songs that
* are of this genre.*/
public static void DisplaySongs(SongGenre genre)
{
foreach (string songs in songs)
{
if (songs.Genre == genre)/*<-ERROR 'string' does not contain a definition for 'Genre'
* and no accessable extension method 'Genre' accepting a first
* argument of type 'string' could be found*/
{
Console.WriteLine("\n" + songs);
}
}
}
/*This is a public class method that takes a string argument and displays only songs by this artist.*/
public static void DisplaySongs(string artist)
{
foreach (string songs in songs)
{
if (songs.Artist == artist) /*< -ERROR 'string' does not contain a definition for 'Artist'
* and no accessable extension method 'Artist' accepting a first
* argument of type 'string' could be found */
{
Console.WriteLine("\n" + songs);
}
}
}
/*This a class method that is public. It takes a single string argument that represents a text file containing
* a collection of songs. You will read all the data and create songs and add it to the songs collection.You
* will have to read four lines to create one Song. Your loop body should have four ReadLine(). */
public static void LoadSongs(string fileName)
{
/*Initialize the songs field to a new List of Song*/
List<string> songs = new List<string> { "title", "artist", "length", "genre" };
/*Declare four string variable (title, artist, length, genre) to store the results of four in reader.ReadLine().*/
string title;
string artist;
double length;
SongGenre genre;
/*The first ReadLine() is a string representing the title of the song. This can and should be used as a check
* for termination condition. If this is empty then there are no more songs to read i.e. it is the end of
* the file. The next ReadLine() will get the Artist. The next ReadLine() will be a string that represents
* the weight. Use the Convert.ToDouble to get the required type. The next ReadLine() will be a string that
* represents the genre. Use the Enum.Parse() to get the required type. Use the above four variables to create
* a Song object. Add the newly created object to the collection.And finally do one more read for the title
* to re-enter the loop.*/
TextReader reader = new StreamReader(filename);//<-ERROR The name 'filename' does not exist in the current context
string line = reader.ReadLine();
while (line != null)
{
string[] data = line.Split();
title.Add(data[0]);//<-ERROR Use of unassigned local variable 'title'| 'string' does not contain definition for 'Add'
artist.Add(data[1]);//<-ERROR Use of unassigned local variable 'artist'| 'string' does not contain definition for 'Add'
length.Add(Convert.ToDouble(data[2]));/*<-ERROR Use of unassigned local variable 'length'| 'string' does not contain
* definition for 'Add'*/
genre.Add(Enum.Parse(data[3]));/*<-ERROR Use of unassigned local variable 'genre' |ERROR 'string' does not contain
* definition for 'Add' | ERROR The type arguments for method Enum.Parse cannot be
inferred from the usage*/
line = reader.ReadLine();
}
reader.Close();
}
}
}
class Program
{
static void Main(string[] args)
{
List<string> songs = new List<string>();
string filename = #"D:\songs4.txt";//<-Warning The variable 'filename' is assigned but it's value never used.
//To test the constructor and the ToString method
Console.WriteLine(new Song("Baby", "Justin Bebier", 3.35, SongGenre.Pop));//<-ERROR 'Pop'
//This is first time to use the bitwise or. It is used to specify a combination of genres
Console.WriteLine(new Song("The Promise", "Chris Cornell", 4.26, SongGenre.Country | SongGenre.Rock));//<-ERROR 'Country' and 'Rock'
Library.LoadSongs("songs4.txt"); //Class methods are invoke with the class name
Console.WriteLine("\n\nAll songs");
Library.DisplaySongs();
SongGenre genre = SongGenre.Rock;//<-ERROR 'SongGenre' does no contain a definition for 'Rock'
Console.WriteLine($"\n\n{genre} songs");
Library.DisplaySongs(genre);
string artist = "Bob Dylan";
Console.WriteLine($"\n\nSongs by {artist}");
Library.DisplaySongs(artist);
double length = 5.0;
Console.WriteLine($"\n\nSongs more than {length}mins");
Library.DisplaySongs(length);
Console.ReadKey();
}
}
}
song4.txt file is used to test the solution:
Baby
Justin Bebier
3.35
Pop
Fearless
Taylor Swift
4.03
Pop
Runaway Love
Ludacris
4.41
Pop
My Heart Will Go On
Celine Dion
4.41
Pop
Jesus Take The Wheel
Carrie Underwood
3.31
Country
If Tomorrow Never Comes
Garth Brooks
3.40
Country
Set Fire To Rain
Adele
4.01
Soul
Don't You Remember
Adele
3.03
Soul
Signed Sealed Deliverd I'm Yours
Stevie Wonder
2.39
Soul
Just Another Night
Mick Jagger
5.15
Rock
Brown Sugar
Mick Jagger
3.50
Rock
All I Want Is You
Bono
6.30
Metal
Beautiful Day
Bono
4.08
Metal
Like A Rolling Stone
Bob Dylan
6.08
Rock
Just Like a Woman
Bob Dylan
4.51
Rock
Hurricane
Bob Dylan
8.33
Rock
Subterranean Homesick Blues
Bob Dylan
2.24
Rock
Tangled Up In Blue
Bob Dylan
5.40
Rock
Love Me
Elvis Presley
2.42
Rock
In The Getto
Elvis Presley
2.31
Rock
All Shook Up
Elvis Presley
1.54
Rock
The output should look like that:
Baby by Justin Bebier (Pop) 3.35min
The Promise by Chris Cornell (Rock, Country) 4.26min
All songs
Baby by Justin Bebier (Pop) 3.35min
Fearless by Taylor Swift (Pop) 4.03min
Runaway Love by Ludacris (Pop) 4.41min
My Heart Will Go On by Celine Dion (Pop) 4.41min
Jesus Take The Wheel by Carrie Underwood (Country) 3.31min
If Tomorrow Never Comes by Garth Brooks (Country) 3.40min
Set Fire To Rain by Adele (Soul) 4.01min
Don't You Remember by Adele (Soul) 3.03min
Signed Sealed Deliverd I'm Yours by Stevie Wonder (Soul) 2.39min
Just Another Night by Mick Jagger (Rock) 5.15min
Brown Sugar by Mick Jagger (Rock) 3.50min
All I Want Is You by Bono (Metal) 6.30min
Beautiful Day by Bono (Metal) 4.08min
Like A Rolling Stone by Bob Dylan (Rock) 6.08min
Just Like a Woman by Bob Dylan (Rock) 4.51min
Hurricane by Bob Dylan (Rock) 8.33min
Subterranean Homesick Blues by Bob Dylan (Rock) 2.24min
Tangled Up In Blue by Bob Dylan (Rock) 5.40min
Love Me by Elvis Presley (Rock) 2.42min
In The Getto by Elvis Presley (Rock) 2.31min
All Shook Up by Elvis Presley (Rock) 1.54min
Rock songs
Just Another Night by Mick Jagger (Rock) 5.15min
Brown Sugar by Mick Jagger (Rock) 3.50min
Like A Rolling Stone by Bob Dylan (Rock) 6.08min
Just Like a Woman by Bob Dylan (Rock) 4.51min
Hurricane by Bob Dylan (Rock) 8.33min
Subterranean Homesick Blues by Bob Dylan (Rock) 2.24min
Tangled Up In Blue by Bob Dylan (Rock) 5.40min
Love Me by Elvis Presley (Rock) 2.42min
In The Getto by Elvis Presley (Rock) 2.31min
All Shook Up by Elvis Presley (Rock) 1.54min
Songs by Bob Dylan
Like A Rolling Stone by Bob Dylan (Rock) 6.08min
Just Like a Woman by Bob Dylan (Rock) 4.51min
Hurricane by Bob Dylan (Rock) 8.33min
Subterranean Homesick Blues by Bob Dylan (Rock) 2.24min
Tangled Up In Blue by Bob Dylan (Rock) 5.40min
Songs more than 5mins
Just Another Night by Mick Jagger (Rock) 5.15min
All I Want Is You by Bono (Metal) 6.30min
Like A Rolling Stone by Bob Dylan (Rock) 6.08min
Hurricane by Bob Dylan (Rock) 8.33min
Tangled Up In Blue by Bob Dylan (Rock) 5.40min
There are a couple of different bits wrong with it and it'll take a little while to work through with some explanations, but the basic problem (that you pointed me to here from your question) of "Genre can't be seen in other classes" is that the Genre enum is declared inside a class called SongGenre rather than being declared in the namespace directly, and you're hence not referring to it properly (it's of type SongGenre.Genre, not Genre) so in the Song class (for example) you'd declare like:
public SongGenre.Genre Genre { get; }
^^^^^^^^^^^^^^^ ^^^^^
this is the type the name
Consequentially this is a bit of a syntax error in the Song contructor:
SongGenre Genre = SongGenre.genre;/*<-ERROR 'SongGenre does not contain a definition for 'genre'*/
It should be like:
Genre = SongGenre.Genre.Blues;
Or like:
Genre = genre;
But then you have to adjust your constructor not to take a SongGenre class but to take a SongGenre.Genre enum:
public Song(string title, string artist, double length, SongGenre.Genre genre)
It's actually causing you a lot of headaches by having that enum inside the SongGenre class. You should consider throwing the SongGenre class away and moving the enum into the namespace directly, instead and renaming the enum to be SongGenre:
namespace whatever{
enum SongGenre{ Blues...
This means you don't have to refer to it by the class name prefix all the time and your existing code will work more like expected
You have another type confusion here:
if (songs.Genre == genre)/*<-ERROR 'string' does not contain a definition for 'Genre'
* and no accessable extension method 'Genre' accepting a first
* argument of type 'string' could be found*/
{
Console.WriteLine("\n" + songs);
}
songs is a list of strings, not a list of Songs, and strings don't have a Genre property. Try List<Song> instead
= new List<string> { "title", "artist", "length", "genre" };
This doesnt need to make sense to me; are you expecting these to be column headers to somthing? This just declares a list of 4 strings, nothing really to do with songs. You could perhaps load these strings into a combo box so the user can "choose a thing to search by" - but they aren't anything to do with songs
title.Add(data[0]);//<-ERROR Use of unassigned local variable 'title'| 'string' does not contain definition for 'Add'
title is a string, not a list or other container, and it cannot be added to
TextReader reader = new StreamReader(filename);//<-ERROR The name 'filename' does not exist in the current context
string line = reader.ReadLine();
while (line != null)
{
string[] data = line.Split();
title.Add(data[0]);//<-ERROR Use of unassigned local variable 'title'| 'string' does not contain definition for 'Add'
artist.Add(data[1]);//<-ERROR Use of unassigned local variable 'artist'| 'string' does not contain definition for 'Add'
length.Add(Convert.ToDouble(data[2]));/*<-ERROR Use of unassigned local variable 'length'| 'string' does not contain
* definition for 'Add'*/
genre.Add(Enum.Parse(data[3]));/*<-ERROR Use of unassigned local variable 'genre' |ERROR 'string' does not contain
* definition for 'Add' | ERROR The type arguments for method Enum.Parse cannot be
inferred from the usage*/
line = reader.ReadLine();
}
reader.Close();
If I was reading that file I'd do it like:
//read all lines into an array
var songFile = File.ReadAllLines("...path to file...");
List<Song> library = new List<Song>();
//iterate, skipping 4 lines at a time
for(int i = 0; i< songFile.Length; i+=4){
string artist = songFile[i];
string title = songFile[i+1];
double durn = double.Parse(songFile[i+2]);
Genre gen = (Genre)Enum.Parse(songFile[3]);
Song s = new Song(artist, title, durn, gen);
library.Add(s);
}

python: Splitting Main address into primary and secondary addresses

I need help to create a python function to make Main street address (usually house number and street name) in Address field. Additional address information (Suite, Unit, Space, PO Box, other additional details) saved to Address2
Here are few examples of Address format which need to split.
780 Main Street, P.O. Box 4109 -> 780 Main Street / PO Box 4109
438 University Ave. P.O. Box 5 -> 438 University Ave. / PO Box 5
HIGHWAY 10 BOX 39 -> HIGHWAY 10 / PO Box 39
98 LATHROP ROAD - BOX 147 -> 98 LATHROP ROAD / PO Box 147
396 S MAIN/P.O. BOX 820 -> 396 S MAIN / PO Box 820
HWY 18 AND HWY 128 (BOX 1305) -> HWY 18 AND HWY 128 / PO Box 1305
808 Innisfil Beach Rd Box 2 -> 808 Innisfil Beach Rd / PO Box 2
100 St 101 Ave, P.o. Box 1620 -> 100 St 101 Ave / P.O. Box 1620
201 Del Rio (p.O. Box 309 -> 201 Del Rio / PO Box 309
BOX 487 2054 HWY 1 EAST -> 2054 HWY 1 EAST / PO Box 487
P O BOX 2820 41340 BIG BEAR BL -> 41340 BIG BEAR BL / PO Box 2820
2813 HWY 15 - P O BOX 1083 -> 2813 HWY 15 / PO Box 1083
P.o. Box 838 2540 Hwy 43 West -> 2540 Hwy 43 West / POBox 838
I have tried below code. But It can remove important information from address and leave PO Box data in address (not to move all PO Box data into address2).
input_array = [
'780 Main Street, P.O. Box 410',
'438 University Ave. P.O. Box 5 ',
'HIGHWAY 10 BOX 39',
'98 LATHROP ROAD - BOX 147',
'396 S MAIN/P.O. BOX 820 ',
'HWY 18 AND HWY 128 (BOX 1305)',
'808 Innisfil Beach Rd Box 2',
'100 St 101 Ave, P.o. Box 1620',
'201 Del Rio (p.O. Box 309 ',
'BOX 487 2054 HWY 1 EAST ',
'P O BOX 2820 41340 BIG BEAR BL',
'2813 HWY 15 - P O BOX 1083 ',
'P.o. Box 838 2540 Hwy 43 West'
]
import re
for inputs in input_array:
inputs = (inputs).lower()
for a in (inputs.split(' ')):
if 'box' in a:
box_index = (inputs.split(' ').index(a))
box_num = ((inputs.split(' ')[(inputs.split(' ').index(a)) + 1]))
if (((inputs.split(' ')[(inputs.split(' ').index(a)) + 1])).isdigit()):
if 'p' in ((inputs.split(' ')[(inputs.split(' ').index(a)) - 1])) or 'o' in ((inputs.split(' ')[(inputs.split(' ').index(a)) - 1])):
inputs = inputs.replace(((inputs.split(' ')[(inputs.split(' ').index(a)) - 1])), '')
else:
inputs = inputs.replace(((inputs.split(' ')[(inputs.split(' ').index(a)) + 1])), '')
inputs = inputs.replace(a, '')
inputs = inputs.replace('-', '')
inputs = inputs.replace('/', '')
inputs = inputs.replace(',', '')
print ('address => ',inputs,' address2 => ', 'PO Box ', box_num)
break
Need Improvement in above function to make it more compatible with desired result.
Interesting enough question. Here's regex which works for all of your examples, but I can't say for sure if it will work all the way for your project.
Read more regex documentation and play with regular expressions here.
Here's code:
import re
streets = [
'780 Main Street, P.O. Box 410',
'438 University Ave. P.O. Box 5 ',
'HIGHWAY 10 BOX 39',
'98 LATHROP ROAD - BOX 147',
'396 S MAIN/P.O. BOX 820 ',
'HWY 18 AND HWY 128 (BOX 1305)',
'808 Innisfil Beach Rd Box 2',
'100 St 101 Ave, P.o. Box 1620',
'201 Del Rio (p.O. Box 309 ',
'BOX 487 2054 HWY 1 EAST ',
'P O BOX 2820 41340 BIG BEAR BL',
'2813 HWY 15 - P O BOX 1083 ',
'P.o. Box 838 2540 Hwy 43 West'
]
regex = r'([^a-z0-9]*(p[\s.]?o)?[\s.]*?box (\d+)[^a-z0-9]*)'
for street in streets:
match = re.search(regex, street, flags=re.IGNORECASE)
po_box_chunk = match.group(0)
po_box_number = match.group(3)
cleaned_address = street.strip(po_box_chunk)
result = '{} / PO Box {}'.format(cleaned_address, po_box_number)
print(result)

Why do I get a strcpy runtime error in my code?

I've been trying to make my code work on Windows (moved from the Mac) and for some reason I get a runtime error related to my strcpy call.
Please help!!
Cust.h
/*
* Cust.h
* Project 3
*
* Created by Anthony Glyadchenko on 11/17/09.
* Copyright 2009 __MyCompanyName__. All rights reserved.
*
*/
#include <iostream>
#include <string>
using namespace std;
#ifndef CUST_H
#define CUST_H
class Cust{
public:
char * getAcctNum();
void setAcctNum(char num[]);
double getCurrBalance();
void setCurrBalance(double balance);
void addToCurrBalance(double amount);
void subFromCurrBalance(double amount);
void setAcctFN(char firstName[]);
void setAcctLN(char lastName[]);
char * getAcctFN();
char * getAcctLN();
void setPIN(int pin);
int getPIN();
private:
char acctNum[255];
char acctFN[255];
char acctLN[255];
double currBalance;
int pin;
char fileName[255];
};
#endif
Cust.cpp
/*
* Cust.cpp
* Project 3
*
* Created by Anthony Glyadchenko on 11/17/09.
* Copyright 2009 __MyCompanyName__. All rights reserved.
*
*/
#include <fstream>
#include <string>
#include <sstream>
#include "Cust.h"
using namespace std;
char * Cust::getAcctNum(){
return acctNum;
}
void Cust::setAcctNum(char num[]){
strcpy(acctNum,num);
}
double Cust::getCurrBalance(){
return currBalance;
}
void Cust::setCurrBalance(double balance){
currBalance = balance;
}
void Cust::addToCurrBalance(double amount){
currBalance += amount;
}
void Cust::subFromCurrBalance(double amount){
currBalance -= amount;
}
void Cust::setAcctFN(char firstName[]){
strcpy(acctFN,firstName);
}
void Cust::setAcctLN(char lastName[]){
strcpy(acctLN,lastName);
}
char * Cust::getAcctFN(){
return acctFN;
}
char * Cust::getAcctLN(){
return acctLN;
}
void Cust::setPIN(int pin){
Cust::pin = pin;
}
int Cust::getPIN(){
return pin;
}
main.cpp
#include <iostream>
#include <string>
#include <fstream>
#include "Cust.h"
using namespace std;
int findNumLines(char file[]){
ifstream tempInput(file);
char ch;
int lineCount = 0;
while (!tempInput.eof()){
tempInput.get(ch);
if (ch == '\n') lineCount++;
}
tempInput.close();
return lineCount;
}
int main (int argc, char * const argv[]) {
Cust customers[500];
char tmpString[70] = " ";
char pch[255];
string tmpAcctFN = " ";
string tmpAcctLN = " ";
ifstream input("P3_custData.txt");
for (int idx = 0; idx < 130; idx++){
input.getline(tmpString, 70, '\n');
strcpy(pch,strtok(tmpString," "));
customers[idx].setAcctNum(pch);
cout << pch << endl;
strcpy(pch, strtok(NULL," "));;
customers[idx].setAcctFN(pch);
cout << pch << endl;
strcpy(pch, strtok(NULL," "));;
customers[idx].setAcctLN(pch);
cout << pch << endl;
strcpy(pch, strtok(NULL," "));;
customers[idx].setCurrBalance(atol(pch));
cout << pch << endl;
strcpy(pch, strtok(NULL," "));;
customers[idx].setPIN(atoi(pch));
cout << pch << endl;
}
input.close();
return 0;
}
P3_custData.txt
10000 Alicia Jones 1005.00 1234
10010 Mary Gonzalez 2040.55 8472
10020 Bill Henry 5340.20 7840
10030 Alex Brown 10010.50 8202
10040 Becca Kingman 983.00 9201
10050 Oliver Stone 12001.74 2382
10060 Robert Reich 3010.30 8137
10070 Judith Johnson 540.98 8203
10080 Jeremy Brice 672.10 8472
10090 Andrew Aziz 4041.50 2456
10100 Alicia Jones 10010.00 8264
10110 Mary Gonzalez 2050.51 6252
10120 Bill Henry 5340.20 3658
10130 Audrey Samuels 536.78 7462
10140 Marion Sams 9788.19 3266
10150 Richard Rubens 3265.90 6237
10160 Russell Townsend 123.00 5324
10170 Carolyn Tanner 4210.60 3256
10180 Corey Brill 77.40 4356
10190 Randall North 44.50 6346
10200 James Jackson 10020.00 2457
10210 Martin Gallagher 2041.50 2345
10220 William Walker 7340.20 2345
10230 Ellen Jacobson 433.99 1234
10240 Angela Bryer 15010.10 4321
10250 Steven Bond 960.00 9876
10260 Sally Stevens 23.10 2834
10270 Alan Fuller 7858.00 7294
10280 Peter Prentice 697.00 7618
10290 Paula Smith 1020.00 7349
10300 Alice Johnson 10030.00 7364
10310 Gail Green 3040.55 6717
10320 Gene Harold 8340.20 5162
10330 Lois Lane 100.00 7234
10340 Debby Dewhurst 8765.34 1382
10350 Louise Talent 350.00 8193
10360 Louis Bragg 10091.22 6738
10370 Alexander Gibson 540.70 7392
10380 Gertrude Ring 9030.00 7390
10390 John Johnson 3299.99 6329
10400 Alice Johannsen 2009.80 8273
10410 Marty Gordon 2040.55 6712
10420 William Hurst 540.20 1273
10430 Barry True 278.50 3247
10440 Maxwell Smart 800.66 2119
10450 Owen Burton 5261.00 3749
10460 Diane Walters 6004.44 3794
10470 Georgina Trump 7083.00 9283
10480 Erica Applegate 12007.00 3649
10490 Walter Wonkers 15789.40 1639
10500 Alicia Rogers 1009.00 6392
10510 Emmanuel Evans 220.50 2803
10520 Robert Bachman 760.25 9999
10530 Richard Rogers 2345.10 8888
10540 Roberta Maxwell 6666.66 6238
10550 Gregory Ichan 521.30 1111
10560 Lars Jensen 497.80 7239
10570 Roberta Peters 20004.10 3333
10580 Ali Masterson 3980.00 8304
10590 Laurence Leonard 6732.12 3684
10600 Tracy Jones 500.00 6382
10610 Michael Gonzalez 2040.57 3649
10620 Alexander Henry 5368.10 7389
10630 Leo Palmer 21900.00 6283
10640 Esther Richman 300.00 3684
10650 Harold Pinter 6783.10 3648
10660 Eva Burton 5355.55 7639
10670 William Shakespeare 0.00 6384
10680 Russell Carlson 4455.77 1384
10690 Janice Klein 3965.15 2738
10700 Henry Adams 4050.00 2374
10710 George Gonzalez 2040.55 2739
10720 Jose Enrique 5340.20 2376
10730 Jane Eakins 657.90 8209
10740 Justin Prince 8000.12 7394
10750 Ed True 5978.00 4798
10760 Emily Prentice 34.00 8220
10770 Olivia Callahan 231.21 5374
10780 Peter Cabot 5478.20 8293
10790 Andrew Austin 1110.10 3792
10800 Oliver Owens 100.00 8201
10810 Monty Wood 200.55 3748
10820 Terrance Thomas 340.20 6239
10830 Barry Brown 105.00 6387
10840 Harrison Huston 299.78 6384
10850 Robin Young 8655.30 9734
10860 Ishmael Green 10101.10 9246
10870 Fiona Fein 257.20 2836
10880 Florence Gregson 5699.60 6374
10890 Wilma Flinstone 78.00 5478
10900 Nancy Drew 2001.00 2536
10910 Captain Kirk 2444.44 7364
10920 Allie McGraw 540.20 6483
10930 Frederick Campbell 1050.00 6492
10940 Paula Prescott 5134.44 7483
10950 Ursula Unger 789.00 6482
10960 Betty Banker 4500.34 3567
10970 Elizabeth Young 1022.00 6489
10980 Maria Manners 510.00 5463
10990 Tracy Austin 674.10 6834
11000 Alex Andrews 300.00 1245
11010 Mike Matire 4040.55 7234
11020 Oscar Grouch 5340.20 9326
11030 Jennifer Young 823.33 6593
11040 Walter True 444.00 3485
11050 Hudson Haliburton 953.10 8465
11060 Ursula Angel 321.00 6583
11070 Zackery Brown 7666.60 9123
11080 Carole King 10000.00 6382
11090 Tracy Burton 955.00 6654
11100 Arthur Jones 100.00 7893
11110 Andrew Jackson 4040.55 9173
11120 Samuel Barber 50.20 2874
11130 George Gregrory 643.00 7392
11140 Quentin Larson 21.00 9277
11150 Dorothy Pace 777.23 4270
11160 Frieda Flowers 9000.99 6483
11170 Howard Alexander 78.00 2743
11180 Henry Aldritch 55.00 2084
11190 Beatrice Snow 99.99 2987
11200 Kelly Klark 200.00 3874
11210 Mary Gonzalez 440.51 2480
11220 Elly Hand 555.20 2479
11230 Gregory George 431.44 4756
11240 Nancy Alexander 6220.90 9274
11250 Sargent Pepper 16870.50 7777
11260 Linda Gale 20000.70 2974
11270 Charles Reilly 544.45 5973
11280 Chuck Mangers 10.00 5555
11290 Wilson Beckett 6010.10 6666
It would help if you would post the error message, but since you're saying you're going from Mac to Windows, it would point to a line ending issue. Convert your custData.txt file to have Windows end-of-lines (CR + LF) and retry it that way.
It could be that your line
input.getline(tmpString, 70, '\n');
is just looking for the \n when it should be looking for a \r\n or something similar because of the Windows line ending being different the Mac line ending.
The most likely reason is that one of the five strtok() sequence calls is returning NULL because there aren't enough fields on the line.
I don't get a runtime error, but I get a compile time error (on Linux) because you didn't
#include <cstring>
#include <cstdlib>
in main.cpp
or
#include <cstring>
in Cust.cpp
After adding those it compiled and ran fine for me...
What precisely is the error you get?
You didn't specify in which line of the output file the crash happens. Could it be that the last line is missing the carriage return (\n) at the end?