Related
I'm using ANTLR with Presto grammar in order to parse SQL queries.
This is the original string definition I've used to parse queries:
STRING
: '\'' ( '\\' .
| ~[\\'] // match anything other than \ and '
| '\'\'' // match ''
)*
'\''
;
This worked ok for most queries until I saw queries with different escaping rules. For example:
select
table1(replace(replace(some_col,'\\'',''),'\"' ,'')) as features
from table1
So I've modified my String definition and now it looks like:
STRING
: '\'' ( '\\' .
| '\\\\' . {HelperUtils.isNeedSpecialEscaping(this)}? // match \ followed by any char
| ~[\\'] // match anything other than \ and '
| '\'\'' // match ''
)*
'\''
;
However, this won't work for the query mentioned above as I'm getting
'\\'',''),'
as a single string.
The predicate returns True for the following query.
Any idea how can I handle this query as well?
Thanks,
Nir.
In the end I was able to solve it. This is the expression I was using:
STRING
: '\'' ( '\\\\' . {HelperUtils.isNeedSpecialEscaping(this)}?
| '\\' (~[\\] | . {!HelperUtils.isNeedSpecialEscaping(this)}?)
| ~[\\'] // match anything other than \ and '
| '\'\'' // match ''
)*
'\''
;
grammar Question;
sql
#init {System.out.println("Question last update 2352");}
: replace+ EOF
;
replace
: REPLACE '(' expr ')'
;
expr
: ( replace | ID ) ',' STRING ',' STRING
;
REPLACE : 'replace' DIGIT? ;
ID : [a-zA-Z0-9_]+ ;
DIGIT : [0-9] ;
STRING : '\'' '\\\\\'' '\'' // '\\''
| '\'' '\'\'' '\'' // ''''
| '\'' ~[\\']* '\'\'' ~[\\']* '\'' // 'it is 8 o''clock'
| '\'' .*? '\'' ;
NL : '\r'? '\n' -> channel(HIDDEN) ;
WS : [ \t]+ -> channel(HIDDEN) ;
File input.txt (not having more examples, I can only guess) :
replace1(replace(some_col,'\\'',''),'\"' ,'')
replace2(some_col,'''','')
replace3(some_col,'abc\tdef\tghi','xyz')
replace4(some_col,'abc\ndef','xyz')
replace5(some_col,'it is 8 o''clock','8')
Execution :
$ alias a4='java -jar /usr/local/lib/antlr-4.9-complete.jar'
$ alias grun='java org.antlr.v4.gui.TestRig'
$ a4 Question.g4
$ javac Question*.java
$ grun Question sql -tokens input.txt
[#0,0:7='replace1',<REPLACE>,1:0]
[#1,8:8='(',<'('>,1:8]
[#2,9:15='replace',<REPLACE>,1:9]
[#3,16:16='(',<'('>,1:16]
[#4,17:24='some_col',<ID>,1:17]
[#5,25:25=',',<','>,1:25]
[#6,26:30=''\\''',<STRING>,1:26]
[#7,31:31=',',<','>,1:31]
[#8,32:33='''',<STRING>,1:32]
[#9,34:34=')',<')'>,1:34]
[#10,35:35=',',<','>,1:35]
[#11,36:39=''\"'',<STRING>,1:36]
[#12,40:40=' ',<WS>,channel=1,1:40]
[#13,41:41=',',<','>,1:41]
[#14,42:43='''',<STRING>,1:42]
[#15,44:44=')',<')'>,1:44]
[#16,45:45='\n',<NL>,channel=1,1:45]
[#17,46:53='replace2',<REPLACE>,2:0]
[#18,54:54='(',<'('>,2:8]
[#19,55:62='some_col',<ID>,2:9]
[#20,63:63=',',<','>,2:17]
[#21,64:67='''''',<STRING>,2:18]
[#22,68:68=',',<','>,2:22]
[#23,69:70='''',<STRING>,2:23]
[#24,71:71=')',<')'>,2:25]
[#25,72:72='\n',<NL>,channel=1,2:26]
[#26,73:80='replace3',<REPLACE>,3:0]
[#27,81:81='(',<'('>,3:8]
[#28,82:89='some_col',<ID>,3:9]
[#29,90:90=',',<','>,3:17]
[#30,91:105=''abc\tdef\tghi'',<STRING>,3:18]
[#31,106:106=',',<','>,3:33]
[#32,107:111=''xyz'',<STRING>,3:34]
[#33,112:112=')',<')'>,3:39]
[#34,113:113='\n',<NL>,channel=1,3:40]
[#35,114:121='replace4',<REPLACE>,4:0]
[#36,122:122='(',<'('>,4:8]
[#37,123:130='some_col',<ID>,4:9]
[#38,131:131=',',<','>,4:17]
[#39,132:141=''abc\ndef'',<STRING>,4:18]
[#40,142:142=',',<','>,4:28]
[#41,143:147=''xyz'',<STRING>,4:29]
[#42,148:148=')',<')'>,4:34]
[#43,149:149='\n',<NL>,channel=1,4:35]
[#44,150:157='replace5',<REPLACE>,5:0]
[#45,158:158='(',<'('>,5:8]
[#46,159:166='some_col',<ID>,5:9]
[#47,167:167=',',<','>,5:17]
[#48,168:185=''it is 8 o''clock'',<STRING>,5:18]
[#49,186:186=',',<','>,5:36]
[#50,187:189=''8'',<STRING>,5:37]
[#51,190:190=')',<')'>,5:40]
[#52,191:191='\n',<NL>,channel=1,5:41]
[#53,192:191='<EOF>',<EOF>,6:0]
Question last update 2352
i have this list that contains an empty element:
list = ['Caramanico Terme', ' ', 'Castellafiume', ' ', 'Castelvecchio Subequo', ' ', 'Falesia di ovindoli', ' ', 'Fara San Martino', ' ', "L'Aquila - Madonna d'Appari", ' ', 'La Palma Pazza (Bisegna AQ)', ' ', 'Liscia Palazzo', ' ', 'Luco dei marsi', ' ', 'Montebello di Bertona', ' ', 'Monticchio', ' ', 'Palena', ' ', 'Pennadomo', ' ', 'Pennapiedimonte', ' ', 'Pescomarrino', ' ', 'Petrella', ' ', 'Pianezza', ' ', 'Pietrasecca', ' ', ' ', 'PietrePiane', ' ', 'Pizzi di Lettopalena (loc. Fonte della Noce)', ' ', 'Placche di Bini', ' ', 'Roccamorice', ' ', 'Sasso di Lucoli', ' ', 'Villetta Barrea', ' ']
how i can remove this '' empty element?
I have try in this way:
[x for x in list if all(x)]
but the elements are not delete
Any help?
Thanks
First of all. Make sure to not call your list list. That's a built-in type and will cause problems later. I renamed it to lst. Then you can filter the list the following way:
lst = ['Caramanico Terme', ' ', 'Castellafiume', ' ', 'Castelvecchio Subequo', ' ', 'Falesia di ovindoli', ' ', 'Fara San Martino', ' ', "L'Aquila - Madonna d'Appari", ' ', 'La Palma Pazza (Bisegna AQ)', ' ', 'Liscia Palazzo', ' ', 'Luco dei marsi', ' ', 'Montebello di Bertona', ' ', 'Monticchio', ' ', 'Palena', ' ', 'Pennadomo', ' ', 'Pennapiedimonte', ' ', 'Pescomarrino', ' ', 'Petrella', ' ', 'Pianezza', ' ', 'Pietrasecca', ' ', ' ', 'PietrePiane', ' ', 'Pizzi di Lettopalena (loc. Fonte della Noce)', ' ', 'Placche di Bini', ' ', 'Roccamorice', ' ', 'Sasso di Lucoli', ' ', 'Villetta Barrea', ' ']
filtered = [x for x in lst if len(x.strip()) > 0]
This will remove all kinds of whitepace elements like ' ' or ' ' etc.
EDIT:
As corn3lius pointed out, this would work too:
filtered = [x for x in lst if x.strip()]
You can add a condition in comprehension list:
l = ['Caramanico Terme', ' ', 'Castellafiume', ' ', 'Castelvecchio Subequo', ' ', 'Falesia di ovindoli', ' ', 'Fara San Martino', ' ', "L'Aquila - Madonna d'Appari", ' ', 'La Palma Pazza (Bisegna AQ)', ' ', 'Liscia Palazzo', ' ', 'Luco dei marsi', ' ', 'Montebello di Bertona', ' ', 'Monticchio', ' ', 'Palena', ' ', 'Pennadomo', ' ', 'Pennapiedimonte', ' ', 'Pescomarrino', ' ', 'Petrella', ' ', 'Pianezza', ' ', 'Pietrasecca', ' ', ' ', 'PietrePiane', ' ', 'Pizzi di Lettopalena (loc. Fonte della Noce)', ' ', 'Placche di Bini', ' ', 'Roccamorice', ' ', 'Sasso di Lucoli', ' ', 'Villetta Barrea', ' ']
print([l for l in list if l != ' '])
Removing all items that not is ' ' i.e. the empty string is the same thing as building a set with all elements from the first set that has length > 0. This one liner takes care of that:
a = ['', 'apple', '', 'peach']
b = [i for i in a if i != '']
Removing empty items from list. Here empty items might be in single space or multiple space within quotes. So, use strip() function in list comprehension.
Ex:
temp_str = ' || 0X0C || 0X00000 || 0X00094 || 0X00E8C || IN_OPER || 000000e8cff7e000 || '
temp_str.split('||')
# result: [' ', ' 0X0C ', ' 0X00000 ', ' 0X00094 ', ' 0X00E8C ', ' IN_OPER ', ' 000000e8cff7e000 ', ' ']
temp_list = [ x for x in temp_str.split('||') if x]
temp_list
# result: [' ', ' 0X0C ', ' 0X00000 ', ' 0X00094 ', ' 0X00E8C ', ' IN_OPER ', ' 000000e8cff7e000 ', ' ']
temp_list = [ x for x in temp_str.split('||') if x.strip()]
temp_list
# result: [' 0X0C ', ' 0X00000 ', ' 0X00094 ', ' 0X00E8C ', ' IN_OPER ', ' 000000e8cff7e000 ']
temp_list = [ x.strip() for x in temp_str.split('||') if x.strip()]
temp_list
# result: ['0X0C', '0X00000', '0X00094', '0X00E8C', 'IN_OPER', '000000e8cff7e000']
This is my code, all I did to the exeisting working code is add the orderby:
$queryBuilder->select('pa1')
->from('\SeeThroughWeb\Shop\Domain\Model\ProductArticle', 'pa1')
->join('pa1.productPrices', 'pp1')
->join('pa1.product', 'p')
->where('pp1.salePrice IN (' . $subQueryBuilder . ') AND pa1.status = ' . \SeeThroughWeb\Shop\Domain\Model\ProductArticle::STATUS_ACTIVE . ' AND (pa1.stock > 0 OR pa1.displayOutOfStock = 1) AND p.status = ' . \SeeThroughWeb\Shop\Domain\Model\Product::STATUS_ACTIVE . ' AND p.isFeatured = 1 AND p.deleted = 0')
->groupBy('p')
->orderBy('p.isgiftcard', 'ASC');
$result = $query->execute();
doesn't seem to work, it gives me the exception:
MetaDataController.php line 176
What am I doing wrong?
This is wat i tried using split
string[] req_info_texts = Regex.Matches(model_file_string_qts_corrected,
"RequirementInfo.*\"")
.OfType<Match>()
.Select(m=> m.Groups[0].Value.Split('\'').ToString())
.ToArray();
RequirementInfo.*\" Lines in the string "model_file_string_qts_corrected" is similar to
RequirementInfo "{'1' 2' 3'4 '5' 6'7' 8'syed_syed' 'SRDD_PFC_047602' } %GIDa_033022bd_8058_4216_8b9d_71454ba5f896"
There were n no of lines like above in the string .
I need syed_syed in the array req_info_texts .
But wat i get is index out of range exception.
Can u say wat the mistake is?
string[] req_info_texts = Regex.Matches(input,#"RequirementInfo.*\"")
.Cast<Match>()
.Select(m=> m.Value
.Split(''')
.Where(x=>x.Contains("syed_syed"))
.Single()
).ToArray();
Given your input string is
RequirementInfo "{'other' ' ' '' 'true' 'syed_syed_GRP001' 'klajdskfjadklsjfklsa' } %GIDa_ed66dae7_2d68_4d07_9c67_a1cf1cb614cc" RequirementInfo "{'other' ' ' '' 'true' 'syed_syed_GRP001' 'klajdskfjadklsjfklsa' } %GIDa_b9a766f9_2b2b_4ca8_98f4_f693055b4792" RequirementInfo "{'other' ' ' '' 'true' 'syed_syed_GRP004' 'klajdskfjadklsjfklsa' } %GIDa_271d5326_cb57_4d87_8cd9_66687c0a1d32" RequirementInfo "{'other' ' ' '' 'true' 'syed_syed_GRP03' 'klajdskfjadklsjfklsa' } %GIDa_07ed6119_91d2_41f9_94dc_69d518503d64"
just with newlines, as you said in a comment on another question, you just need two splits:
var infosString = "RequirementInfo \"{'other' ' ' '' 'true' 'syed_syed_GRP001' 'klajdskfjadklsjfklsa' } %GIDa_ed66dae7_2d68_4d07_9c67_a1cf1cb614cc\"\nRequirementInfo \"{'other' ' ' '' 'true' 'syed_syed_GRP001' 'klajdskfjadklsjfklsa' } %GIDa_b9a766f9_2b2b_4ca8_98f4_f693055b4792\"\n RequirementInfo \"{'other' ' ' '' 'true' 'syed_syed_GRP004' 'klajdskfjadklsjfklsa' } %GIDa_271d5326_cb57_4d87_8cd9_66687c0a1d32\"\n RequirementInfo \"{'other' ' ' '' 'true' 'syed_syed_GRP03' 'klajdskfjadklsjfklsa' } %GIDa_07ed6119_91d2_41f9_94dc_69d518503d64";
var result = infosString.Split('\n').Select(line => line.Split('\'')[9]).ToArray();
result is now
The first Split creates an array with the strings starting with RequirementInfo, and the Select splits these strings again and takes the 10th items (the ones starting with syed_syed).
my csv data file is like this
title,name,gender
MRS.,MADHU,Female
MRS.,RAJ KUMAR,male
MR.,N,Male
MRS.,SHASHI,Female
MRS.,ALKA,Female
now as you can see i wanna avoid all data like line 2 and 3 (i.e no white space or data length >= 3 )
MRS.,RAJ KUMAR,male
MR.,N,Male
and place it in a file called rejected_list.csv, rest all go in a file called clean_list.csv
hence here is my gawk script for it
gawk -F ',' '{
if( $2 ~ /\S/ &&
$1 ~ /MRS.|MR.|MS.|MISS.|MASTER.|SMT.|DR.|BABY.|PROF./ &&
$3 ~ /M|F|Male|Female/)
print $1","$2","$3 > "clean_list.csv";
else
print $1","$2","$3 > "rejected_list.csv" } ' \
< DATA_file.csv
My problem is this script is not recognising '\S' character set( all alphabets except space).. it is selecting all words starting with S or has a S and rejecting the rest
a simple regex like /([A-Z])/ in place of /s works perfectly but as i place a limit of {3,} the script fails..
gawk -F ',' '{
if( $2 ~ /([A-Z]){3,}/ &&
$1 ~ /MRS.|MR.|MS.|MISS.|MASTER.|SMT.|DR.|BABY.|PROF./ &&
$3 ~ /M|F|Male|Female/)
print $1","$2","$3 > "clean_list.csv";
else
print $1","$2","$3 > "rejected_list.csv" } ' \
< DATA_file.csv
i have tried all sorts of combination of the regex with '*','+' etc but i cant get what i want...
can anyone tell me what is the problem?
Use [:graph:] instead of \S for all printable and visible characters. GAWK does not recognize \S as [:graph:] so it will not work.
Additionally, the {3,} interval expression only works in posix or re-interval modes.
I added a rejection condition: not exactly 3 fields
gawk -F, '
BEGIN {
titles = "MRS.|MR.|MS.|MISS.|MASTER.|SMT.|DR.|BABY.|PROF."
genders = "M|F|Male|Female"
}
$1 !~ titles || $2 ~ /[[:space:]]/ || length($2) < 3 || $3 !~ genders || NF != 3 {
print > "rejected_list.csv"
next
}
{ print > "clean_list.csv" }
' < DATA_file.csv