Change FS definition past the header of a CSV file - regex

I got this kind of CSV file:
name,x-extension,value,extra
"Roger","9890","",""
"Nicole","9811","president, ceo",""
...
Now, I want the find the maximum size of each field in the file. So I used this awk script:
Updated script:
NR==1 {
gsub(/\r/,"",$0) #remove linefeed from last field name
for (n = 1; n <= NF; n++) {
colname[n]=$n;
maxlen[n]=-1;
}
nbrField = NF; # will get bump +2 by the new FS
FS="\",\"|^\"|\"$";
}
NR>1 {
for (n = 2; n <= nbrField+1; n++) {
if (length($n)>maxlen[n-1]) {
maxlen[n-1]=length($n);
}
}
}
END {
for(i = 1; i <= nbrField; i++) {
printf "%s : %s\n", colname[i], maxlen[i]
}
}
The problem a got is I need to change the field separator AFTER reading the first line because as you can see, the header don't use double quote for field delimiter and there is coma INSIDE some field.
I tried to play with this -F option on my awk command line but I can't find the right combination of regex to do the trick..
> awk -F'", "|^"|"$' -f myprog mydata ==>(don't work)
Help! :-)

Change FS in the block that processes the first line:
NR==1 {
for(n = 1; n <= NF; n++) {
colname[n]=$n
}
FS="\",\"|^\"|\"$"
}

I prefer to use a real CSV parser for CSV data. For example, Perl:
perl -MText::CSV -MList::Util=max -nE '
BEGIN {$csv = Text::CSV->new({binary=>1})}
$csv->parse($_);
#row = $csv->fields();
if ($. == 1) {
#h = #row; $n = $#row;
}
else {
$max{$h[$_]} = max($max{$h[$_]}, length $row[$_]) for (0..$n)
}
END {
while (($k,$v) = each %max) {say join ":", $k, $v}
}
' << DATA
name,x-extension,value,extra
"Roger","9890","",""
"Nicole","9811","president, ceo",""
DATA
value:14
name:6
extra:0
x-extension:4

Related

Replace non-blank cells with 0

I tried to find solutions online however couldn't find one specifically for my need: I want to create a script which replaces non-blank cells in given column with 0.
Is there a simple solution for this?
Thanks.
Try:
function blankTo0() {
var ss=SpreadsheetApp.getActiveSpreadsheet()
var s = ss.getActiveSheet()
var rng = s.getRange("A:A");//change to column you want
var data= rng.getValues()
for (var i=0; i < data.length; i++) {
if (data[i][0] == "") {
data[i][0] = 0;
} else if (data[i][0] == "") {
data[i][0] = data[i][0];
}}
rng.setValues(data); // replace old data with new
}

Parsing custom structured file with PyParsing

I'm need of help parsing a custom file structured file. As you can see below is the structure. The issue is that I can't seem to parse the structure correctly, namely myOriginalFormula & myBonusType in the same group when I want them seperated for example.
AttributeDictionary SomeDictName
{
myAttributeDefinitionCategories
{
AttributeDefinitionList SomeList
{
AttributeDefinition SomeDefinitioName < uid=8972789HHDUAI7 >
{
myOriginalFormula "(A_Variable) * 10"
myBonusType FlatValue
}
AttributeDefinition UIBlankAttribute < uid=JIAIODJ7899 >
{
}
AttributeDefinition SomeOtherDefinitionName < uid=17837HHJAJ7788 >
{
myOriginalFormula 1
mySpecializations
{
Specialization "Some_Specialization 1"
{
myCondition "Something.MustEqual = 1"
myFormula 0
}
Specialization "SomeSpecialization 2"
{
myCondition "Something.MustEqual = 2"
myFormula 0.026
}
}
myBonusType FlatValue
}
AttributeDefinition SomeReal_Other_definition < uid=6768GYAG//() >
{
myOriginalFormula "(Some_Formula / Other_Variable) - 1"
myBonusType Percentage
myUINumDecimals 1
myHasAddSignUI FALSE
}
}
}
}
My try is blow. Could someone help me parse this structure correctly?
def syntaxParser():
# constants
left_bracket = Literal("{").suppress()
right_bracket = Literal("}").suppress()
semicolon = Literal(";").suppress()
space = White().suppress()
key = Word(alphanums + '_/"')
value = CharsNotIn("{};,")
# rules
assignment = Group(key + Optional(space + value))
block = Forward()
specialblock = Forward()
subblock = ZeroOrMore(Group(assignment) | specialblock)
specialblock = (
Keyword("Specialization")
+ key
+ left_bracket
+ subblock
+ right_bracket)
block << Group(
Optional(Group(key + Optional(space + value)))
+ left_bracket
+ Group(ZeroOrMore(assignment | Group(block) | Group(specialblock)))
+ right_bracket)
script = block
return script
Your definition of value is too greedy:
value = CharsNotIn("{};,")
I don't know how this works with the format you are parsing, but I get better results with this:
value = quotedString | CharsNotIn("{};,\n")

Exporting CSS Rules with awk

I'm trying to export a css rule from a css file with awk but I am not able to do it. I need only the rules containing the "background-image" line.
#rule{
...
background-image: url(path);
}
Here's what I have tried so far:
awk '/^[#].*{.*background-image.*/','/}/' css/file.css
What am I doing wrong?
At this moment I got the best result using:
/^[#A-Za-z.]/ { accum = 1; }
accum == 1 { css = css $0 "\n"; }
accum == 1 && /background-image/ { found = 1; }
/\}/ { accum = 0; if (found == 1) print css; found = 0; css = ""; }
and it allows me to get a full block with all the selectors
Turn accumulation on after matching an open brace. Accumulate all lines while flag is on. Turn off after closing brace is seen. Print only if background-image found during accumulation. If you want to include lines before the match you do do something like this.
{ line4 = line3; line3 = line2; line2 = line1; line1 = $0 "\n"; }
/\{/ { accum = 1; head = line4 line3 line2 line1; }
accum == 1 { css = css $0 "\n"; }
accum == 1 && /background-image/ { found = 1; }
/\}/ {
accum = 0;
if (found == 1) print head css;
found = 0; css = "";
}
You had said in comments "I need the full block from # (or . ) to }" but I'm getting the impression that you really just want this.
/\{/ { selector = $0 }
/background-image/ { print selector "\n" $0 "\n}\n" }

Parsing digits from command line argv

I want to change a perl script that executes a loop some times, and I want to pass the number of loops by command line option. The program now receives some options, then I need to change it to receive a new parameter, but it is the first time I see a perl script, then I don't know how to change.
The start of program (to parse command line options) is:
if ($#ARGV >= 1) {
for ($i = 1; $i <= $#ARGV; $i++) {
if ($ARGV[$i] =~ /^\-/) {
if ($ARGV[$i] =~ /\-test/) {
//do something
}
} else {
//do something other
}
}
}
I think that I must put something like:
if ($ARGV[$i] =~ /^\-L40/)
But it only match to 40, I don't know how to parse the number attached to the -L parameter to use for the loop limit.
Thanks in advance and sorry if there is any similar question, but I don't find any.
use Getopt::Long qw( );
sub usage {
print(STDERR "usage: prog [--test] [-L NUM]\n");
exit(1);
}
GetOptions(
'test' => \my $opt_test,
'L=i' => \my $opt_L,
)
or usage();
die("-L must be followed by a positive integer\n")
if defined($opt_L) && $opt_L < 1;
Something like:
my $loopLimit = 1; # default
if ($#ARGV >= 1)
{
for ($i = 1; $i <= $#ARGV; $i++)
{
if ($ARGV[$i] =~ /^\-/)
{
if ($ARGV[$i] =~ /\-test/)
{
# do something
}
elsif ($ARGV[$i] =~ /\-L(\d+)/) # -L followed by digits
{
$loopLimit = $1;
}
}
else
{
# do something other
}
}
}

Groovy template - code generation

I am asking for advice and opition as of the code to use with groovy templates.
All template examples on the web used a very limited logic but I simply cannot overcome that barrier and the code in my template is substantial.
Is this acceptable? What could be a better way to do this?
Thanks
Peter
The task is to generate TCL type code - specifically if then/elsif/else type contraint
if { [streq $pm_enrichment('a1') "'aaaa2'"] && [strlen $pm_enrichment('aaaa3')] &&\
[strlen $pm_enrichment('aaaa4') ] } {
set pm_enrichment('ResultAAA') 0
}
elseif { [streq $pm_enrichment('b1') "'bb2'"] && [strlen $pm_enrichment('bbb3')] &&\
[strlen $pm_enrichment('bbbb4') ] } {
set pm_enrichment('ResultBBB') 1
}
else { [streq $pm_enrichment('c1') "'cc2'"] && [strlen $pm_enrichment('ccc3')] &&\
[strlen $pm_enrichment('cccc4') ] } {
set pm_enrichment('ResultCCC') 2G
}
//////////////////////////////////////
def dataCasesClosure= {->
pos=0
arrSorted = []
mapStmt.each{arrSorted.add(it.key) }
arrSorted = arrSorted.sort()
outStr=''''''
arrSorted.each { i ->
tmpStatement = statement
tmpResultStmt = resultStmt
list=mapStmt[i]
resultList=mapResultStmt[i]
pos=0
int index = tmpStatement.indexOf(keyword);
while (index >=0){
val = list[pos].replaceAll(~'"','')
pos +=1
tmpStatement=tmpStatement.replaceFirst( ~/#/,/${val}/)
index = tmpStatement.indexOf(keyword, index+keyword.length()) ;
}
if (tmpStatement ==~/\W+$/) {
tmpStatement=tmpStatement[0..-2]
}
pos=0
index = tmpResultStmt.indexOf(keyword);
while (index >=0){
val = resultList[pos]
pos +=1
tmpResultStmt=tmpResultStmt.replaceFirst( ~/#/,/${val}/)
index = tmpResultStmt.indexOf(keyword, index+keyword.length()) ;
}
if (i==0) {
outStr= "if {${tmpStatement} } { \n\t\t ${tmpResultStmt} \n }"
} else if (i < arrSorted.size()-1 ){
outStr += "elseif {${tmpStatement} } { \n\t\t ${tmpResultStmt} \n }"
} else {
outStr += "else {${tmpStatement} } { \n\t\t ${tmpResultStmt} \n }"
}
}
outStr
} // ### dataCasesClosure
def valuesIfThenStmt= [
"statement":dataCasesClosure
]
tplIfThenStmt = '''
##############################
${statement}
'''
def engine = new SimpleTemplateEngine()
templateResult = engine.createTemplate(tplIfThenStmt).make(valuesIfThenStmt)
println templateResult.toString()
If this is all you have to generate, the template is overkill. You could have just called the dataCasesClosure directly to get its output.
Assuming it is part of a larger template then, I think it is very reasonable to use closures to produce output for a particularly complex parts, just as you have done. I have personally done this on an extreme scale with good results.