Python for loop to concatenate dataframes - list

I have 600 dataframes that need to be concatenated (rows appended). dt1, dt2, dt3 ... dt600.
Is there an easy way to do that with a for loop? Create a list and use pd.concat, but how do I make a list of dts.
Thanks!

Related

Any ideas on Iterating over dataframe and applying regex?

This may be a rudimentary problem but I am new to pandas.
I have a csv dataframe and I want to iterate over each row to extract all the string information in a specific column through regex. . (The reason why I am using regex is because eventually I want to make a separate dataframe of that column)
I tried iterating through for loop but I got ton of errors. So far, It looks like for loop reads each input row as a list or series rather than a string (correct me if i'm wrong). My main functions are iteritems() and findall() but no good results so far. How can I approach this problem?
My dataframe looks like this:
df =pd.read_csv('foobar.csv')
df[['column1','column2, 'TEXT']]
My approach looks like this:
for Individual_row in df['TEXT'].iteritems():
parsed = re.findall('(.*?)\:\s*?\[(.*?)\], Individual_row)
res = {g[0].strip() : g[1].strip() for g in parsed}
Many thanks in advance
you can try the following instead of loop:
df['new_TEXT'] = df['TEXT'].apply(lambda x: [g[0].strip(), g[1].strip()] for g in re.findall('(.*?)\:\s*?\[(.*?)\]', x), na_action='ignore' )
This will create a new column with your resultant data.

Make new rows in Pandas dataframe based on df.str.findall matches?

I have a dataframe current_df I want to create a new row for each regex match that occurs in each entry of column_1. I currently have this below:
current_df['new_column']=current_df['column_1'].str.findall('(?<=ABC).*?(?=XYZ)')
This appends a list of the matches for the regex in each row. How do I create a new row for each match? I'm guessing something with list comprehension, but I'm not sure what it'd be exactly.
The output df would be something like:
column_1 column2 new_column
ABC_stuff_to_match_XYZ_ABC_more_stuff_to_match_XYZ... data _stuff_to_match_
ABC_stuff_to_match_XYZ_ABC_more_stuff_to_match_XYZ... data _more_stuff_to_match_
ABC_a_different_but_important_piece_of_data_XYZ_ABC_find_me_too_XYZ... different_stuff _a_different_but_important_piece_of_data_
ABC_a_different_but_important_piece_of_data_XYZ_ABC_find_me_too_XYZ... different_stuff _find_me_too_
Use can use extractall, and with merge:
df.merge(df.column_1.str.extractall('(?<=ABC)(.*?)(?=XYZ)')
.reset_index(level=-1, drop=True),
left_index=True,
right_index=True
)
Use the extract function.
df['new_column'] = df['column_1'].str.extract('(?<=ABC).*?(?=XYZ)', expand=True)

How to combine dataset with mxnet?

I have two separate folders containing 3D arrays (data), each folder contains files of the same classification. I used mxnet.gluon.data.ArrayDataset() create datasets for each label respectively. Is there a way to combine these two datasets into the final training dataset that combines both classifications? The new data sets are different size.
e.g
A_data = mx.gluon.data.ArrayDataset(list2,label_A )
noA_data = mx.gluon.data.ArrayDataset(list,label_noA)
^ I want to combine A_data and noA_data for a complete dataset.
Additionally, is there an easier way to combine the two folders with its classification into a mxnet dataset from the get-go? That would also solve my problem.
You could create an ArrayDataset that contains both, if list and list2 are both python lists then you could do something like
full_data = mx.gluon.data.dataset.ArrayDataset(list + list2, label_noA + labelA)
where len(label_noA) == len(list) and len(label_A) == len(list2)

merging dataframes in pandas python

I have 10 dataframes and I'm trying to merge the data in them on the variable names. The purpose is to get one file which would contain all the data from the relevant variables
I'm using the below mentioned formula:
pd.merge(df,df1,df2,df3,df4,df5,df6,df7,df8,df9,df10, on = ['RSSD9999', 'RCFD0010','RCFD0071','RCFD0081','RCFD1400','RCFD1773','RCFD2123','RCFD2145','RCFD2160','RCFD3123','RCFD3210','RCFD3300','RCFD3360','RCFD3368','RCFD3792','RCFD6631','RCFD6636','RCFD8274','RCFD8275','RCFDB530','RIAD4000','RIAD4073','RIAD4074','RIAD4079','RCFD1403','RCON3646','RIAD4230','RIAD4300','RIAD4301','RIAD4302','RIAD4340','RIAD4475','RCFD1406','RCFD3230','RCFD2950','RCFD3632','RCFD3839','RCFDB529','RCFDB556','RCON0071','RCON0081','RCON0426','RCON2145','RCON2148','RCON2168','RCON2938','RCON3210','RCON3230','RCON3300','RCON3839','RCONB528','RCONB529','RCONB530','RCONB556','RCONB696','RCONB697','RCONB698','RCONB699','RCONB700','RCONB701','RCONB702','RCONB703','RCONB704','RCON1410','RCON6835','RCFD2210','RCONA223','RCONA224','RCON5311','RCON5320','RCON5327','RCON5334','RCON5339','RCON5340','RCON7204','RCON7205','RCON7206','RCON3360','RCON3368','RCON3385','RIAD3217','RCFDA222','RCFDA223','RCFDA224','RCON3792','RCON0391','RCFD7204','RCFD7206','RCFD7205','RCONB639','RIADG104','RCFDG105','RSSD9017','RSSD9010','RSSD9042','RSSD9050'],how='outer')
But I'm getting an error "merge() got multiple values for keyword argument 'on'". I think the code is correct, can anyone help me to understand whats wrong here?
Firstly you are using 10 dataframes to merge. Ok it's possible but all dataframe should have atleast one column should have same.
import pandas as pd
df=pd.Dataframe(data,column=[your columns],index=[index names])
df=df.set_index(common co)
# do for all ten dataframes
answer=pd.merge(df,df1........,df10,on=column name,how='outer')
pd.merge(Ray, Bec, Dan, on = 'Key', how ='outer')

Adding a new row to a list

I have:
row1 = [1,'a']
row2 = [2,'b']
I want to create 'allrows' to look like these two rows concatenated together. In fact, I want to start with an empty list and add rows.
append does not do the job, it just creates a long horizontal list.
How do I create a list or other structure that holds each row as a ROW?
For two rows, I want the result to be:
[[1,'a']
[2,'b']]
I am not sure I need the outer brackets, but put them in there assuming the final structure was itself a list, I suppose any other structure that holds these lists, like an "array" of lists, will be fine, as long as I can write out specific rows using:
for line in allrows:
print line
Thanks!
I'm guesing that you code in Python.
List can hold other lists so you can do this
allrows = [row1, row2]
for row in allrows:
print (row)
Output would be
[1,'a']
[2,'b']