I have a text file in which there is a line with the category and then all items of that category in lines below it. This is followed by 2 empty lines and then the title of the next category and more items in the category. I want to know how I could use regular expressions (specifically with Notepad++) in order to put the category at the start of each of the item's lines so I can save the file as a CSV or TAB file.
I started by isolating one of the categories as such:
Городищенский поссовет 1541
Арабовщина 535
Болтичи 11
Бриксичи 59
Великое Село 160
Гарановичи 34
Грибовщина 3
Душковцы 5
Зеленая 182
Кисели 97
Колдычево 145
Конюшовщина 16
Микуличи 31
Мостытычи 18
Насейки 5
Новоселки 45
Омневичи 53
Поручин 43
Пруды 24
Станкевичи 42
Ясенец 33
I then got as far as getting to be finding for
(.+)(поссовет)(\t\d{4}\r\n)(^.*$\r\n)
and replacing with
$1$2\t$4
which makes the first line
Арабовщина 535
turn into
Городищенский поссовет Арабовщина 535
which is what I want to happen to the rest of the lines but I couldn't get any farther.
I'm trying to write a program to pull information from a file, but i only want the rows that do not contain 0. I know how to get the file to print to screen but I can't seem to get the desired rows printed with out printing each individual line I want. I've tried putting the information i want into a list and writing it to a new CSV file but it just jumbles everything up into 2 rows instead of the 300+ I'm trying to get. Any help would be appreciated
Months A B C
Feb-49 0 0 0
Jan-50 95 378 3767
Feb-50 67 17 233
Jan-51 0 0 0
Feb-51 0 0 0
#This file has about 400 rows that look something like this
I have a Calc sheet listing a cut-list for plywood in two columns with a quantity in a third column. I would like to remove duplicate matching pairs of dimensions and total the quantity. Starting with:
A B C
25 35 2
25 40 1
25 45 3
25 45 2
35 45 1
35 50 3
40 25 1
40 25 1
Ending with:
A B C
25 35 2
25 40 1
25 45 5
35 45 1
35 50 3
40 25 2
I'm trying to automate this. Currently I have multiple lists which occupy the same page which need to be totaled independently of each other.
Put a unique different ListId, ListCode or ListNumber for each of the lists. Let all rows falling into the same list, have the same value for this field.
Concatenate A & B and form a new column, say, PairAB.
If the list is small and handlable, filter for PairAB and collect totals.
Otherwise, use Grouping and subtotals to get totals for each list and each pair, grouping on ListId and PairAB.
If the list is very large, you are better off taking it to CSV, and onward to a database, such things are simple child's play in SQL.
I have a csv file which i need to parse using python.
triggerid,timestamp,hw0,hw1,hw2,hw3
1,234,343,434,78,56
2,454,22,90,44,76
I need to read the file line by line, slice the triggerid,timestamp and hw3 columns from these. But the column-sequence may change from run to run. So i need to match the field name, count the column and then print out the output file as :
triggerid,timestamp,hw3
1,234,56
2,454,76
Also, is there a way to generate an hash-table(like we have in perl) such that i can store the entire column for hw0 (hw0 as key and the values in the columns as values) for other modifications.
I'm unsure what you mean by "count the column".
An easy way to read the data in would use pandas, which was designed for just this sort of manipulation. This creates a pandas DataFrame from your data using the first row as titles.
In [374]: import pandas as pd
In [375]: d = pd.read_csv("30735293.csv")
In [376]: d
Out[376]:
triggerid timestamp hw0 hw1 hw2 hw3
0 1 234 343 434 78 56
1 2 454 22 90 44 76
You can select one of the columns using a single column name, and multiple columns using a list of names:
In [377]: d[["triggerid", "timestamp", "hw3"]]
Out[377]:
triggerid timestamp hw3
0 1 234 56
1 2 454 76
You can also adjust the indexing so that one or more of the data columns are used as index values:
In [378]: d1 = d.set_index("hw0"); d1
Out[378]:
triggerid timestamp hw1 hw2 hw3
hw0
343 1 234 434 78 56
22 2 454 90 44 76
Using the .loc attribute you can retrieve a series for any indexed row:
In [390]: d1.loc[343]
Out[390]:
triggerid 1
timestamp 234
hw1 434
hw2 78
hw3 56
Name: 343, dtype: int64
You can use the column names to retrieve the individual column values from that one-row series:
In [393]: d1.loc[343]["triggerid"]
Out[393]: 1
Since you already have a solution for the slices here's something for the hash table part of the question:
import csv
with open('/path/to/file.csv','rb') as fin:
ht = {}
cr = csv.reader(fin)
k = cr.next()[2]
ht[k] = list()
for line in cr:
ht[k].append(line[2])
I used a different approach (using.index function)
bpt_mode = ["bpt_mode_64","bpt_mode_128"]
with open('StripValues.csv') as file:
for _ in xrange(1):
next(file)
for line in file:
stat_values = line.split(",")
draw_id=stats.index('trigger_id')
print stat_values[stats.index('trigger_id')],',',
for j in range(len(bpt_mode)):
print stat_values[stats.index('hw.gpu.s0.ss0.dg.'+bpt_mode[j])],',', file.close()
#holdenweb Though i am unable to figure out how to print the output to a file. Currently i am redirecting while running the script
Can you provide a solution for writing to a file. There will be multiple writes to a single file.
This is my input dataset:
Ref Col_A0 Col_01 Col_02 Col_aa Col_03 Col_04 Col_bb
NYC 10 0 44 55 66 34 44
CHG 90 55 4 33 22 34 23
TAR 10 8 0 25 65 88 22
I need to calculate the % of Col_A0 for a specific reference.
For example % col_A0 would be calculated as
10/(10+0+44+55+66+34+44)=.0395 i.e. 3.95%
So my output should be
Ref %Col_A0 %Rest
NYC 3.95% 96.05%
CHG 34.48% 65.52%
TAR 4.58% 95.42%
I can do this part but the issue is column variables.
Col_A0 and Ref are fixed columns so they will be there in the input every time. But the other columns won't be there. And there can be some additional columns too like Col_10, col_11 till col_30 and col_cc till col_zz.
For example the input data set in some scenarios can be just:
Ref Col_A0 Col_01 Col_02 Col_aa Col_03
NYC 10 0 44 55 66
CHG 90 55 4 33 22
TAR 10 8 0 25 65
So is there a way I can write a SAS code which checks to see if the column exists or not. Or if there is any other better way to do it.
This is my current SAS code written in Enterprise Guide.
PROC SQL;
CREATE TABLE output123 AS
select
ref,
(col_A0/(Sum(Col_A0,Col_01,Col_02,Col_aa,Col_03,Col_04,Col_bb)) FORMAT=PERCENT8.2 AS PERCNT_ColA0,
(1-(col_A0/(Sum(Col_A0,Col_01,Col_02,Col_aa,Col_03,Col_04,Col_bb))) FORMAT=PERCENT8.2 AS PERCNT_Rest
From Input123;
quit;
Scenarios where all the columns are not there I get an error. And if there are additional columns then I miss those. Please advice.
Thanks
I would not use SQL, but would use regular datastep.
data want;
set have;
a0_prop = col_a0/sum(of _numeric_);
run;
If you wanted to do this in SQL, the easiest way is to keep (or transform) the dataset in vertical format, ie, each variable a separate row per ID. Then you don't need to know how many variables there are to figure it out.
If you always want to sum all the numeric columns then just do :
col_A0 / sum(of _numeric_)