how to get the series out of a data frame? - python-2.7

fake = {'EmployeeID' : [0,1,2,3,4,5,6,7,8,9],
'State' : ['a','b','c','d','e','f','g','h','i','j'],
'Email' : ['a','b','c','d','e','f','g','h','i','j']
}
fake_df = pd.DataFrame(fake)
I am trying to define a function that returns a Series of strings of all email addresses of employees in states. The email addresses should be separated by a given delimiter. I think I will use ";".
Arguments:
- DataFrame
- delimiter (;)
Do I have to use for loop?? to be honest, I don't even know how to start on this..
====EDITION
After done with coding, I should run
emails = getEmailListByState(fake_df, ", ")
for state in sorted(emails.index):
print "%15s: %s" % (state, emails[state])
and should get something like
a: a
b: b
c: c,d
d: e
e: f,g
as my output

If I understand the problem properly you are looking for groupby state,get the emails and apply join i.e joining the emails based on the state i.e
fake = {'EmployeeID' : [0,1,2,3,4,5,6,7,8,9],
'State' : ['NZ','NZ','NY','NY','ST','ST','YK','YK','YK','YK'],
'Email' : ['ab#h.com','bab#h.com','cab#h.com','dab#h.com','eab#h.com','fab#h.com','gab#h.com','hab#h.com','iab#h.com','jab#h.com']
}
fake_df = pd.DataFrame(fake)
ndf = fake_df.groupby('State')['Email'].apply(', '.join)
Output:
State
NY cab#h.com, dab#h.com
NZ ab#h.com, bab#h.com
ST eab#h.com, fab#h.com
YK gab#h.com, hab#h.com, iab#h.com, jab#h.com
Name: Email, dtype: object
If you want that in a method then
def getEmailListByState(df,delim):
return df.groupby('State')['Email'].apply(delim.join)
emails = getEmailListByState(fake_df, ", ")
for state in sorted(emails.index):
print( "%15s: %s" % (state, emails[state])

Related

How do I do a where OR on two columns in the same table?

I have a membership model, and I want to search OR on two columns....not sure how to do it.
I tried this:
u1.all_memberships.where(inviter: u2, invited: u2)
Membership Load (6.8ms) SELECT "memberships".* FROM "memberships" WHERE (memberships.user_id = 1 OR memberships.invited_id = 1) AND "memberships"."user_id" = 3 AND "memberships"."invited_id" = 3
=> []
But note the 2 AND on both queries, when ideally what I would like to do is literally just replace all the ANDs with ORs.
So I would love if the query looked something like this:
Membership Load (6.8ms) SELECT "memberships".* FROM "memberships" WHERE (memberships.user_id = 1 OR memberships.invited_id = 1) OR (memberships.user_id = 3 OR memberships.invited_id = 3)
Assuming that my modification is syntactically correct SQL ofcourse.
How do I do that with a where clause? Is that possible?
I expect It can help
where("inviter_id = ? OR invited_id = ? ", u2.id, u2.id)
DRYer solution
where("inviter_id = :uid OR invited_id = :uid ", uid: u2.id)

How to extract messages using regex in Scala?

My version of RegEx is being greedy and now working as it suppose to. I need extract each message with timestamp and user who created it. Also if user has two or more consecutive messages it should go inside one match / block / group. How to solve it?
https://regex101.com/r/zD5bR6/1
val pattern = "((a\.b|c\.d)\n(.+\n)+)+?".r
for(m <- pattern.findAllIn(str).matchData; e <- m.subgroups) println(e)
UPDATE
ndn solution throws StackOverflowError when executed:
Exception in thread "main" java.lang.StackOverflowError
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4708)
.......
Code:
val pattern = "(?:.+(?:\\Z|\\n))+?(?=\\Z|\\w\\.\\w)".r
val array = (pattern findAllIn str).toArray.reverse foreach{println _}
for(m <- pattern.findAllIn(str).matchData; e <- m.subgroups) println(e)
I don't think a regular expression is the right tool for this job. My solution below uses a (tail) recursive function to loop over the lines, keep the current username and create a Message for every timestamp / message pair.
import java.time.LocalTime
case class Message(user: String, timestamp: LocalTime, message: String)
val Timestamp = """\[(\d{2})\:(\d{2})\:(\d{2})\]""".r
def parseMessages(lines: List[String], usernames: Set[String]) = {
#scala.annotation.tailrec
def go(
lines: List[String], currentUser: Option[String], messages: List[Message]
): List[Message] = lines match {
// no more lines -> return parsed messages
case Nil => messages.reverse
// found a user -> keep as currentUser
case user :: tail if usernames.contains(user) =>
go(tail, Some(user), messages)
// timestamp and message on next line -> create a Message
case Timestamp(h, m, s) :: msg :: tail if currentUser.isDefined =>
val time = LocalTime.of(h.toInt, m.toInt, s.toInt)
val newMsg = Message(currentUser.get, time, msg)
go(tail, currentUser, newMsg :: messages)
// invalid line -> ignore
case _ =>
go(lines.tail, currentUser, messages)
}
go(lines, None, Nil)
}
Which we can use as :
val input = """
a.b
[10:12:03]
you can also get commands
[10:11:26]
from the console
[10:11:21]
can you check if has been resolved
[10:10:47]
ah, okay
c.d
[10:10:39]
anyways startsLevel is still 4
a.b
[10:09:25]
might be a dead end
[10:08:56]
that need to be started early as well
"""
val lines = input.split('\n').toList
val users = Set("a.b", "c.d")
parseMessages(lines, users).foreach(println)
// Message(a.b,10:12:03,you can also get commands)
// Message(a.b,10:11:26,from the console)
// Message(a.b,10:11:21,can you check if has been resolved)
// Message(a.b,10:10:47,ah, okay)
// Message(c.d,10:10:39,anyways startsLevel is still 4)
// Message(a.b,10:09:25,might be a dead end)
// Message(a.b,10:08:56,that need to be started early as well)
The idea is to take as little characters as possible that will be followed by a username or the end of the string:
(?:.+(?:\Z|\n))+?(?=\Z|\w\.\w)
See it in action

Split Pandas Column by values that are in a list

I have three lists that look like this:
age = ['51+', '21-30', '41-50', '31-40', '<21']
cluster = ['notarget', 'cluster3', 'allclusters', 'cluster1', 'cluster2']
device = ['htc_one_2gb','iphone_6/6+_at&t','iphone_6/6+_vzn','iphone_6/6+_all_other_devices','htc_one_2gb_limited_time_offer','nokia_lumia_v3','iphone5s','htc_one_1gb','nokia_lumia_v3_more_everything']
I also have column in a df that looks like this:
campaign_name
0 notarget_<21_nokia_lumia_v3
1 htc_one_1gb_21-30_notarget
2 41-50_htc_one_2gb_cluster3
3 <21_htc_one_2gb_limited_time_offer_notarget
4 51+_cluster3_iphone_6/6+_all_other_devices
I want to split the column into three separate columns based on the values in the above lists. Like so:
age cluster device
0 <21 notarget nokia_lumia_v3
1 21-30 notarget htc_one_1gb
2 41-50 cluster3 htc_one_2gb
3 <21 notarget htc_one_2gb_limited_time_offer
4 51+ cluster3 iphone_6/6+_all_other_devices
First thought was to do a simple test like this:
ages_list = []
for i in ages:
if i in df['campaign_name'][0]:
ages_list.append(i)
print ages_list
>>> ['<21']
I was then going to convert ages_list to a series and combine it with the remaining two to get the end result above but i assume there is a more pythonic way of doing it?
the idea behind this is that you'll create a regular expression based on the values you already have , for example if you want to build a regular expressions that capture any value from your age list you may do something like this '|'.join(age) and so on for all the values you already have cluster & device.
a special case for device list becuase it contains + sign that will conflict with the regex ( because + means one or more when it comes to regex ) so we can fix this issue by replacing any value of + with \+ , so this mean I want to capture literally +
df = pd.DataFrame({'campaign_name' : ['notarget_<21_nokia_lumia_v3' , 'htc_one_1gb_21-30_notarget' , '41-50_htc_one_2gb_cluster3' , '<21_htc_one_2gb_limited_time_offer_notarget' , '51+_cluster3_iphone_6/6+_all_other_devices'] })
def split_df(df):
campaign_name = df['campaign_name']
df['age'] = re.findall('|'.join(age) , campaign_name)[0]
df['cluster'] = re.findall('|'.join(cluster) , campaign_name)[0]
df['device'] = re.findall('|'.join([x.replace('+' , '\+') for x in device ]) , campaign_name)[0]
return df
df.apply(split_df, axis = 1 )
if you want to drop the original column you can do this
df.apply(split_df, axis = 1 ).drop( 'campaign_name', axis = 1)
Here I'm assuming that a value must be matched by regex but if this is not the case you can do your checks , you got the idea

Multiple inputs to produce wanted output

I am trying to create a code that takes the user input, compares it to a list of tuples (shares.py) and then prints the values in a the list. for example if user input was aia, this code would return:
Please list portfolio: aia
Code Name Price
AIA Auckair 1.50
this works fine for one input, but what I want to do is make it work for multiple inputs.
For example if user input was aia, air, amp - this input would return:
Please list portfolio: aia, air, amp
Code Name Price
AIA Auckair 1.50
AIR AirNZ 5.60
AMP Amp 3.22
This is what I have so far. Any help would be appreciated!
import shares
a=input("Please input")
s1 = a.replace(' ' , "")
print ('Please list portfolio: ' + a)
print (" ")
n=["Code", "Name", "Price"]
print ('{0: <6}'.format(n[0]) + '{0:<20}'.format(n[1]) + '{0:>8}'.format(n[2]))
z = shares.EXCHANGE_DATA[0:][0]
b=s1.upper()
c=b.split()
f=shares.EXCHANGE_DATA
def find(f, a):
return [s for s in f if a.upper() in s]
x= (find(f, str(a)))
print ('{0: <6}'.format(x[0][0]) + '{0:<20}'.format(x[0][1]) + ("{0:>8.2f}".format(x[0][2])))
shares.py contains this
EXCHANGE_DATA = [('AIA', 'Auckair', 1.5),
('AIR', 'Airnz', 5.60),
('AMP', 'Amp',3.22),
('ANZ', 'Anzbankgrp', 26.25),
('ARG', 'Argosy', 12.22),
('CEN', 'Contact', 11.22)]
I am assuming a to contain values in the following format 'aia air amp'
raw = a # just in case you want the original string at a later point
toDisplay = []
a = a.split() # a now looks like ['aia','air','amp']
for i in a:
temp = find(f, i)
if(temp):
toDisplay.append(temp)
for i in toDisplay:
print ('{0: <6}'.format(i[0][0]) + '{0:<20}'.format(i[0][1]) + ("{0:>8.2f}".format(i[0][2])))
Essentially what I'm trying to do is
Split the input into a list
Do exactly what you were doing for a single input for each item in that list
Hope this helps!

Scala objects not changing their internal state

I am seeing a problem with some Scala 2.7.7 code I'm working on, that should not happen if it the equivalent was written in Java. Loosely, the code goes creates a bunch of card players and assigns them to tables.
class Player(val playerNumber : Int)
class Table (val tableNumber : Int) {
var players : List[Player] = List()
def registerPlayer(player : Player) {
println("Registering player " + player.playerNumber + " on table " + tableNumber)
players = player :: players
}
}
object PlayerRegistrar {
def assignPlayersToTables(playSamplesToExecute : Int, playersPerTable:Int) = {
val numTables = playSamplesToExecute / playersPerTable
val tables = (1 to numTables).map(new Table(_))
assert(tables.size == numTables)
(0 until playSamplesToExecute).foreach {playSample =>
val tableNumber : Int = playSample % numTables
tables(tableNumber).registerPlayer(new Player(playSample))
}
tables
}
}
The PlayerRegistrar assigns a number of players between tables. First, it works out how many tables it will need to break up the players between and creates a List of them.
Then in the second part of the code, it works out which table a player should be assigned to, pulls that table from the list and registers a new player on that table.
The list of players on a table is a var, and is overwritten each time registerPlayer() is called. I have checked that this works correctly through a simple TestNG test:
#Test def testRegisterPlayer_multiplePlayers() {
val table = new Table(1)
(1 to 10).foreach { playerNumber =>
val player = new Player(playerNumber)
table.registerPlayer(player)
assert(table.players.contains(player))
assert(table.players.length == playerNumber)
}
}
I then test the table assignment:
#Test def testAssignPlayerToTables_1table() = {
val tables = PlayerRegistrar.assignPlayersToTables(10, 10)
assertEquals(tables.length, 1)
assertEquals(tables(0).players.length, 10)
}
The test fails with "expected:<10> but was:<0>". I've been scratching my head, but can't work out why registerPlayer() isn't mutating the table in the list. Any help would be appreciated.
The reason is that in the assignPlayersToTables method, you are creating a new Table object. You can confirm this by adding some debugging into the loop:
val tableNumber : Int = playSample % numTables
println(tables(tableNumber))
tables(tableNumber).registerPlayer(new Player(playSample))
Yielding something like:
Main$$anon$1$Table#5c73a7ab
Registering player 0 on table 1
Main$$anon$1$Table#21f8c6df
Registering player 1 on table 1
Main$$anon$1$Table#53c86be5
Registering player 2 on table 1
Note how the memory address of the table is different for each call.
The reason for this behaviour is that a Range is non-strict in Scala (until Scala 2.8, anyway). This means that the call to the range is not evaluated until it's needed. So you think you're getting back a list of Table objects, but actually you're getting back a range which is evaluated (instantiating a new Table object) each time you call it. Again, you can confirm this by adding some debugging:
val tables = (1 to numTables).map(new Table(_))
println(tables)
Which gives you:
RangeM(Main$$anon$1$Table#5492bbba)
To do what you want, add a toList to the end:
val tables = (1 to numTables).map(new Table(_)).toList
val tables = (1 to numTables).map(new Table(_))
This line seems to be causing all the trouble - mapping over 1 to n gives you a RandomAccessSeq.Projection, and to be honest, I don't know how exactly they work, but a bit less clever initialising technique does the job.
var tables: Array[Table] = new Array(numTables)
for (i <- 0 to numTables) tables(i) = new Table(i)
Using the first initialisation method I wasn't able to change the objects (just like you), but using a simple array everything seems to be working.