I am using the code below to get the first row where "value" is found, but I am getting the last row of the file. What am I doing wrong? Is there a way to get the first row?
Suppose my dataframe look like this:
Summary no
This is an analysis
of some data
Phone: 452-354-4456
col1 Value col2 col3
bac15 job $16.00 $0.00
khs bank $19.25 $0.00
jsg foot $0.00 $70,000.00
eyhf water $15.00 $0.00
edf drink $15.00 $0.00
for fname in os.listdir(root_dir):
file_path = os.path.join(root_dir, fname)
if fname.endswith(('.csv')):
df = pd.read_csv(file_path)
for row in df.itertuples():
if row == "value":
print(row)
You are comparing the entire row to the string "value", which will never be true, sine row is a tuple of the values in each row of the dataframe.
Rows containing "value" can be found using if "value" in row:
rather than if row == "value":
Related
I am trying to group a table by a column, so the resulted table have unique values in that column, and also returns all the unique values from another column that belonged to the grouped column:
Source:
Country = USA
Cities =
New York
Boston
Chicago
Houston
Transform: group by [Country] column, and return unqiue values from [Cities] and coma seperated:
Country = USA
Cities = New York,Boston,Chicago,Houston
thanks a lot
You can simply use CONCATENATEX in a measure
Measure = CONCATENATEX(VALUES('Table'[Cities]),'Table'[Cities],",")
I have im my model table that contains data from reports based on monthly reporting of employees with column names "ReportDate" and enployye numbers.
I want to check that there is no gaps between the monthly dates to each employee with DAX.
For example:
EmpNum | ReportDate | CheckColumn
111 | 30.08.2019
111 | 30.09.2019
111 | 31.10.2019
222 | 30.08.2019
222| 31.10.2019 ----------> Here I want alert in my CheckColumn
Can someone find me a solution?
First you need to create a index column. Go to Edit Queries > Add Column > Index Column, starting with 1 for example.
Next you add a column with DAX which has a shift of 1 to the original column with this expression (make sure this column is from the same date format as your original column; Modelling > Format):
ShiftColumn = DATEVALUE(CALCULATE(MAX('Table'[Report Date]);FILTER('Table';'Table'[Index]=EARLIER('Table'[Index])-1)))
Next add the column with the check:
Column 2 = IF(DATEADD('Table'[Report Date].[Date];-1;DAY) = 'Table'[ShiftColumn]; TRUE(); FALSE())
The result:
I have a .csv file like below where all the contents are text
col1 Col2
My name Arghya
The Big Apple Fruit
I am able to read this csv using pd.read_csv(index_col=False, header=None).
How do I combine all the three rows in Col1 into a list separated by a full stop.
If need convert column values to list:
print (df.Col1.tolist())
#alternative solution
#print (list(df.Col1))
['This is Apple', 'I am in Mumbai', 'I like rainy day']
And then join values in list - output is string:
a = '.'.join(df.Col1.tolist())
print (a)
This is Apple.I am in Mumbai.I like rainy day
print (df)
0 1
0 Col1 Col2
1 This is Apple Fruit
2 I am in Mumbai Great
3 I like rainy day Flood
print (list(df.loc[:, 0]))
#alternative
#print (list(df[0]))
['Col1', 'This is Apple', 'I am in Mumbai', 'I like rainy day']
I'm new to python. I have two .csv files with identical columns, but my oldlist.csv has been edited for row[9] with employee names, the newlist.csv when generated defaults to certain domains for names. I want to be able to take the oldlist.csv compare to newlist.csv and override columns in newlist.csv with the data in row[9] from oldlist.csv. Thanks for your help.
Example: (oldlist) col1, col2 (newlist) col1, col2
1234, Bob 1234, Jane
I want to read oldlist, if col1 == col1 in newlist override col2 and I want to contine to write.write(row) for everything matching in col from oldlist
I have a data file in this format:
I want the columns to be grouped by month in a pivot table. When I pivot the data a column for each day is being created.
df = ex.read_excel("C:\\ExportReport.xlsx", "ExportReport")
table = pd.pivot_table(df, values='Forecast Qty', rows='Part', cols='Due Date', aggfunc=np.sum, fill_value=0)
Is there a way to tell pandas to group the columns by month?
Need to have a field that calculates the month. If this is going to span multiple years, will need to combine into one field.
df['YYYY-MM'] = df['Due Date'].apply(lambda x: x.strftime("%Y-%m"))
Then try yours, but change to the monthly field...
table = pd.pivot_table(df, values='Forecast Qty', rows='Part', cols='YYYY-MM', aggfunc=np.sum, fill_value=0)