Lex: Avoid a quoted expression

Lex: Avoid a quoted expression - regex

I'm trying to build a lex program which should avoid a quoted sequence of characters. Something like not(":="). I've written this so far but can't seem to be able to get the desired output:
/* Definitions */
assgn ":="
symbol [^{assgn}]
%%
.
":=" {printf("Found - %s\n",yytext);}
{symbol} {printf("Error: Unmatched symbol \'%s\'\n", yytext);}
%
Input:
:= & | #
Output:
Found - :=
Error: Unmatched symbol ':'
Error: Unmatched symbol '='
Error: Unmatched symbol '&'
Error: Unmatched symbol '|'
Error: Unmatched symbol '#'
Desired output:
Found - :=
Error: Unmatched symbol '&'
Error: Unmatched symbol '|'
Error: Unmatched symbol '#'
I want that the program avoids the ":=" as a whole but it still reads the single character ":". How do I correct this?

The following little complete (f)lex program does precisely what is desired. flex will not rescan a matched token unless you ask it to (or, of course, there is some undefined behaviour in your code which corrupts flex's internal data structures.)
%{
#include <stdio.h>
%}
%option nodefault noyywrap noinput nounput
%%
[[:space:]]+ /* Ignore spaces */
":=" {printf("Found - %s\n",yytext);}
. {printf("Error: Unmatched symbol \'%s\'\n", yytext);}
With the above file stored in only:=.l, I do the following:
$ lex -o only:=.c only:=.l
$ gcc -o only:= -Wall only:=.c -lfl
$ ./only:=
:= & | #
Found - :=
Error: Unmatched symbol '&'
Error: Unmatched symbol '|'
Error: Unmatched symbol '#'
(-lfl adds a minimal definition of main())

Related

Why am I getting the error: Unmatched ( in regex; marked by <-- HERE?

I am having trouble figuring out why am I getting the error defined in the title.
This is the line of code I'm inputting into the command line:
perl -pi -e 's/(\/(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)' myfilepath
Basically, what I'm trying to do is go through a body of text, find all the URLS and append something to the end of the domain. For example:
https://thisisalink.com/navigate/page <-- I want to ignore the ]
I keep getting this error when I run that code though:
Unmatched [ in regex; marked by <-- HERE in *)|[ <-- HERE A-Z0-9+&##%=~_|5.030003)/gxi)/ at -e line 1, <> line 1.
How to fix this issue?

$] is a special variable that contains the current version of the Perl interpreter used. Hence, [A-Z0 9+&##%=~_|$] is interpolated as [A-Z0 9+&##%=~_|5.032001 (on my Perl 5.32.1), and the opening [ is thus unmatched. To fix this, escape the $ using \$:
[A-Z0 9+&##%=~_|\$]
Similarly, earlier in the regex, you are using [...$?...], except that $? is also a special variable containing The status returned by the last pipe close, backtick (``) command, successful call to wait() or waitpid(), or from the system() operator. This does not cause any error since it should be an integer, but it will no match either $ or ? as you'd like. Once again, escape the $ using \$?.
In general, when you want to match a literal $, you should probably escape it.

Why I cannot match certain string with ( | ) in regex

I have a question about matching a pattern of string
I want copy certain file with some identification characters
For example:
20190108JPYUSDabced.csv
20190108CHNUSDabced.csv
20190108IJKUSDabcde.csv
So I want to used command to just copy the first 2 files
cp 20190108(JPY|CHN)USDabced.csv
Does not work.
Received error:
-bash: syntax error near unexpected token ‘(‘

bash brace expansion is for this
$ cp 20190108{JPY,CHN}USDabced.csv

Find files by given pattern and grep for whole word count matching the search string pattern and ignoring some pattern and print along with file name

Requirement : I need to search whole words matching 'xceptions' and count individual word occurrence and print file name where it is found, also ignore lines where 'throws' string is present but not print date pattern used for searching the lines.
I can only use ' find, grep, zgrep, awk, cut ' commands
Example: File name "server.log" containing below text for different dates.
2017-12-08 00:39:44,453 Some lengthier string.with.javaExceptionString with more data here
2017-12-08 00:39:44,453 Some lengthier string.with.javaExceptionString with more data here
2017-12-09 00:39:44,453 Some more string.with.ContextServiceException some thing here
2017-12-09 00:40:44,453 Some more string.with.ContextServiceException junk values
2017-12-09 00:39:44,453 Some more java.net.MalformedURLException with more data here
2017-12-09 00:39:44,453 function () throws genericException which should not be grepped
2017-12-10 11:11:12,123 function () throws MalformedURLException which should not be grepped
2017-12-10 09:09:12,123 function () throws ContextServiceException which should not be grepped
2017-12-10 09:09:12,123 some oracle error ORA-10001 not grepped
2017-12-09 09:09:12,123 some oracle error ORA-99999 should be counted
2017-12-09 09:09:12,123 another oracle error ORA-20002 with java error ArrayOutOfBoundException and more...
2017-12-09 09:09:12,123 java error ArrayOutOfBoundException and another oracle error ORA-20002 and more...
2017-12-09 09:09:12,123 multiple errors line IOException and NullPointerException, RunTimeException and many more
Sample command which prints all the word count except date match :
find /tmp/ -name "*log*" -exec zgrep -HPo "(\b\w*xception|ORA-\w*\b)" {} + 2>/dev/null | sort | uniq -c
Current Output:
2 /tmp/server.log:ArrayOutOfBoundException
3 /tmp/server.log:ContextServiceException
1 /tmp/server.log:IOException
2 /tmp/server.log:MalformedURLException
1 /tmp/server.log:NullPointerException
1 /tmp/server.log:ORA-10001
2 /tmp/server.log:ORA-20002
1 /tmp/server.log:ORA-99999
1 /tmp/server.log:RunTimeException
1 /tmp/server.log:genericException
2 /tmp/server.log:javaExceptionString
Need your help in writing single line command in unix using 'find, grep, awk, cut' but not perl or sed Expected Output if I filter only for 2017-12-09 (need command here)
Expected Output where it has filtered only 2017-12-09 logs and ignored lines with 'throws'
2 /tmp/server.log:ArrayOutOfBoundException
2 /tmp/server.log:ContextServiceException
1 /tmp/server.log:IOException
1 /tmp/server.log:MalformedURLException
1 /tmp/server.log:NullPointerException
2 /tmp/server.log:ORA-20002
1 /tmp/server.log:ORA-99999
1 /tmp/server.log:RunTimeException
zgrep is used so that if there are any gzip file it needs to search in those file as well.

If I understand correctly,
you are looking two make two changes:
Exclude lines that contain "throws"
Exclude lines that don't start with "2017-12-09 "
You could use a zgrep to match the lines that you want (start with "2017-12-09 " and contain "xception"),
and then an awk to exclude the lines that you don't want (contains "throws"),
and exclude parts of the lines that you don't want (text between filename and the exception name):
find ... -exec zgrep -HE '^2017-12-09.*(xception|ORA-)' {} + \
| awk '!/throws/ { print gensub(/^([^:]+:).*\<(\w+xception\w*|ORA-\w+).*/, "\\1\\2", $0) }' \
| sort | uniq -c
This should work with the GNU implementation of awk.
After your further clarification,
you want to turn a line like this:
/tmp/server.log:2017-12-09 09:09:12,123 another oracle error ORA-20002 with java error ArrayOutOfBoundException and more...
To lines like this:
/tmp/server.log:ORA-20002
/tmp/server.log:ArrayOutOfBoundException
That's possible, using a different tool of awk, the match function,
which returns the starting position of a match, and 0 when there are no more matches. We can call it repeatedly, with appropriately modified parameters, as long as there are matches, and print the output as we go:
find ... -exec zgrep -HE '^2017-12-09.*(xception|ORA-)' {} + \
| awk -F: '!/throws/ { s = $0; while (match(s, /\w+xception\w*|ORA-\w+/)) { print $1 ":" substr(s, RSTART, RLENGTH); s = substr(s, RSTART + RLENGTH + 1); }}' \
| sort | uniq -c

Lex match an angle bracket literally

I can't seem to get this lex regex working:
%{
#include"y.tab.h"
%}
%option yylineno
/* regular definitions */
angle_bracket_start "<"
%%
angle_bracket_start /*swallow it, do nothing!*/{}
%%
But when I test it with
lex lex.l
gcc lex.yy.c -lfl
I got:
$ ./a.out
<
< <--- If it prints out the "<", it means lex can't parse it, right?
^C
I'm asking this because I need a regex which matches verbatimly
<script type="text/JavaScript">
But I always get syntax error because lex decides it can't parse and thus throw out the < and parse script as an id

To refer to a definition, use curly braces ({}) around the name:
angle_bracket_start "<"
%%
{angle_bracket_start} /*swallow it, do nothing!*/
without the curly braces, it is looking for the literal string angle_bracket_start in the input...

how to exactily repeat the n matched pattern in result string

How to exactly repeat the n matched pattern in result string?
Example if I have the folowing text:
++ '[' -f /etc/bashrc ']'
++ . /etc/bashrc
+++ '[' '[\u#\h \W]\$ ' ']'
+++ '[' -z 'printf "\033]0;%s#%s:%s\007" "${USER}" "${HOSTNAME%%.*}" "${PWD/#$HOME/~}"' ']'
+++ shopt -s checkwinsize
+++ '[' '[\u#\h \W]\$ ' = '\s-\v\$ ' ']'
+++ shopt -q login_shell
+++ '[' 506 -gt 199 ']'
++++ id -gn
Now I want to substitute every '+' for 3 spaces, but it can only happen at the begining of the pattern. I would use :<range>s/^<pattern> :%s/+/ /g, but if it there were a '+' in the rest of the text I would simply mess it up.
The question:
How to match every + at begining and repeat the same count of found + in the result string?
expected:
^ ++$ -> ^ $
^ +++$ -> ^ $
^ +$ -> ^ $
Thanks

Try this:
:%s/^+*/\=repeat(' ',strlen(submatch(0)))/
submatch(0) contains all the matched + at the start of the line, strlen counts them. So for every plus sign at the start of the line three spaces are inserted using repeat.
For more information:
:help sub-replace-expression
:help repeat()
:help submatch()
:help strlen()

An elegant substitution command for this case is the following:
:%s/\%(^+*\)\#<=+/ /g

I think you'll have to run an expression several times, if that is acceptable...
You'll want to run something like this (minus the single quotes, which are used to show whitespace):
'^(\s*)+'
replacing with something like (again minus the single quotes)
'$1 '
Not every problem that can be solved with regular expressions can be solved using only a single regular expression - I'm pretty sure this is one of those cases
This expression/replacement pair will need to be run once for each plus sign at the beginning of the line with the most plus signs (in your example above, that would be four times) N.B.: as written, this will mess up any lines that are supposed to begin with whitespace and plus signs , so I hope that doesn't happen anywhere...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Lex: Avoid a quoted expression - regex

Related

Why am I getting the error: Unmatched ( in regex; marked by <-- HERE?

Why I cannot match certain string with ( | ) in regex

Find files by given pattern and grep for whole word count matching the search string pattern and ignoring some pattern and print along with file name

Lex match an angle bracket literally

how to exactily repeat the n matched pattern in result string

Categories

Resources