Go validator.v2 gives error "unknown tag" for regexp - regex

I have been trying to understand why some regular expressions give me an error "Unknown tag" while using the validator.v2 package in golang. It works for some regular expressions but does not work with some which have "{}" inside them, and when I use the validator.Validate() it gives me an error at runtime "unknown tag".
Here's the code:
type Company struct {
Name string `validate:"regexp=^[a-zA-Z .]{1,100}$"`
}
which gives me the following error at runtime:
Name: unknown tag
however this regex works perfectly fine
type Company struct {
Name string `validate:"regexp=^[a-zA-Z .]*$"`
}
I am using the braces because of length restrictions that I want to put on the string. There could be other ways to do it, but I feel the regex is the way to go and is easier to have it along with other rules right there in the expression.

The problem appears to be the , char in your first regex. You can see in the validator source code that the tag is split on ,. By UTSLing, I see no support for escaped commas in the tags; this is probably an oversight on the part of the project author. I suggest filing a bug/feature request.

The problem pointed out by #Flimzy is correct, but it's not a bug.
Probably it was fixed since then, so at the moment validator supports escape sequence \\ for such case, and you can do it like this:
type Company struct {
Name string `validate:"regexp=^[a-zA-Z .]{1\\,100}$"`
}

Related

Vim syntax highlight two specific words with a space (regex)

I have created this file called "~/.vim/syntax/proc.vim" and have been populating regex expressions (I think that's what they call it). I'm having to write some test scripts in this test language developed back in the 90's called STOL for a spacecraft payload. I'm working in a secured environment so the only thing I have access to is vim.
STOL lets you write several different types of print statements and I would like some syntax highlighting difference between each type of message (error message, info message, etc).
My vim color profile called "~/.vim/color/molokai.vim" lets me link various regex expressions to a specific syntax class which will invoke a specific color. For example, to highlight keywords I specify two lines like...
" This is the regex where two keywords have been defined
syntax keyword procKeywords IF ELSE
" This is how I link the above regex to a molokai color class
" which is called Keyword
highlight link procKeywords Keyword
I would essentially like to do the same for some error and info messages that STOL defines like the following...
EVENT ERROR "This is a error message"
EVENT INFO "This is a info message"
How can I match on two specific words with a space between them? I need something like the following in my "~/.vim/syntax/proc.vim". The following is wrong but I'm just writing it out showing you what I'm thinking...
syntax match procInfo "EVENT INFO"
syntax match procError "EVENT ERROR"
highlight link procInfo ModeMsg
highlight link procError ErrorMsg
Your attempt already mostly works. However, :syntax match does not automatically enforce whole-keyword match like :syntax keyword, so you'd get false highlighting on e.g. MYEVENT INFORMATION, too. You can easily fix that by adding the keyword boundary atoms :help /\< to the regular expressions:
syntax match procInfo "\<EVENT INFO\>"
syntax match procError "\<EVENT ERROR\>"

Regex to locate a specific sentence fragment but there may be an underscore some where in the string

I am trying to find a string that may exist in a Windows menu item. As such a simple text search is complicated by the potential presence of an underscore character, which may be anywhere in the string. For example, I may be looking for "Import File" but the resulting string may be any of the following strings.
These are easy enough:
Import File
_Import File
But these elude simple grepping:
I_mport _File
Im_port File
Impor_t File
This works but it's clunky and prone to error, and it means that every time I need to look for a new menu item, I have to completely reconstruct the pattern. Is there an easier way?
I_{0,1}m_{0,1}p_{0,1}o_{0,1}r_{0,1}t _{0,1}F_{0,1}i_{0,1}l_{0,1}e
This is the regex tester URL: https://regex101.com/r/7ptBHG/2
UPDATED regex tester URL: https://regex101.com/r/7ptBHG/3
I'm using the VS2015 editor's "Use Regular Expressions" feature.
UPDATE -> RESPONSE TO QUESTIONS:
I hadn't considered using the ? instead of {0,1} That does make it a bit less cumbersome.
I_?m_?p_?o_?r_?t _?F_?i_?l_?e
If you proved an answer, I'll accept that. I did ask, "Is there an easier way?".

Trouble with C++ Regex POSIX character class

I'm trying to create a regex that is capable of analysing something like this:
002561-1415179671591i.jpg
The second part is a unix timestamp (before the i), and I need to extract that. I came up with the following syntax, but std::regex keeps throwing a regex_error before I even check for a match and I'm not too sure what's wrong.
Here's what I've got so far:([:d:])*-[[:d:]]*([:alpha:])\.jpg
The C++ code line that throws the error is the constructor to regex
std::regex reg(regex_expr);
where regex_expr is a string that has been read in from a file.
Really appreciate any help you can give.
Edit: I wrote a try catch section, and it seems I'm getting the following error code
std::regex_constants::error_brack
Edit 2: Ok... seems I'm still getting the error even with a cut-down test:
([:alpha:])*
Edit 3:
Seems I can't get any expression to work. Here's a bit more info. I'm using clang++ 3.5.0 on Kubuntu 14.04.
Not sure that [:d:] can stand for [:digit:]. [EDIT] (It seems it's possible)
When you use a POSIX character class, it must be enclosed in a character class like that:
[[:digit:]]
(This syntax allows to compose other classes [[:digit:]ab])
so:
std::string regex_expr = "([[:digit:]]*)-([[:digit:]]*)([[:alpha:]])\\.jpg";
or if you use the basic mode:
std::string regex_expr = "\\([[:digit:]]*\\)-\\([[:digit:]]*\\)\\([[:alpha:]]\\)\\.jpg";
I would rather use the perl-compatible syntax instead of character classes:
\d+-(\d+)[a-z]*\.jpg
After many tests, for whatever reason, I couldn't get this to work. As I had boost anyway, I tried out the boost::regex and it worked straight away, so I presume that something to do with either clang, or the version of the standard on this pc just wasn't working.
So simply put, tried boost and it worked straight away. Not much of an answer, but I guess that's how things go sometimes.

Are my regex just wrong or is there a buggy behaviour in td-agent's format behaviour?

I am using fluentd, elasticsearch and kibana to organize logs. Unfortunately, these logs are not written using any standard like apache, so I had to come up with the regex for the format myself. I used this site here to verify that they are working: http://fluentular.herokuapp.com/ .
The logs have roughly this format here:
DEBUG: 24.04.2014 16:00:00 [SingleActivityStrategy] Start Activitiy 'barbecue' zu verabeiten.
the format regex I am using is as follows:
format /(?<pri>([INFO]|[DEBUG]|[ERROR])+)...(?<date>(\d{2}\.\d{2}\.\d{4})).(?<time>(\d{2}:\d{2}:\d{2})).\[(?<subject>(.*))\].(?<msg>(.*))/
Now, judging by that website that is supposed to test specifically fluentd's behaviour with regexes, the output SHOULD be this one:
Record
Key Value
pri DEBUG
date 24.04.2014
subject SingleActivityStrategy
msg Start Activitiy 'barbecue' zu verabeiten.
Instead though, I have this ?bug? that pri is always shortened to DEBU. Same for ERROR which becomes ERRO, only INFO stays INFO. I am not very experienced with regular expressions and I find it hard to believe that this is a bug, still it confuses me and any help is greatly appreciated.
I'm not sure I can link the complete config file because I dont personally own these log files and I am trying to keep it on a level that my boss won't get mad at me for posting sensitive information, but should it definately be needed, I will post them later on after having asked him how much I can reveal.
In general, the logs always look roughly like this:
First the priority, which is either DEBUG, ERROR or INFO, next the date , next what we call the subject which is always written in [ ] and finally just a message.
Here is a link to fluentular with the format I am using and a teststring that produces the right result in fluentular, but not in my config file:
Fluentular
Sorry I couldn't make it work like a regular link to just click on.
Another link to test out regex with my format and test string is this one:
http://rubular.com/r/dfXOkQYNXP
tl;dr version:
my td-agent format regex cuts off the last letter, although fluentular says it shouldn't. My fault or a bug?
How the regex would look if you're trying to match the data specifically:
(INFO|DEBUG|ERROR)\:\s+(\d{2}\.\d{2}\.\d{4})\s(\d{2}:\d{2}:\d{2})\s\[(.*)\](.*)
In your format string, you were using . and ... for where your spaces and colon should be. I'm not to sure on why this works in Fluentular, but you should have matched the \: explicitly and each space between the values.
So you'd be looking at the following regular expression with the Fluentd fields (which are grouping names):
(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))
Meaning your td-agent.conf should look like:
<source>
type tail
path /var/log/foo/bar.log
pos_file /var/log/td-agent/foo-bar.log.pos
tag foo.bar
format /(?<pri>(INFO|ERROR|DEBUG))\:\s+(?<date>(\d{2}\.\d{2}\.\d{4}))\s(?<time>(\d{2}:\d{2}:\d{2}))\s\[(?<subject>(.*))\]\s(?<msg>(.*))/
</source>
I would also take a look into comparing Logstash vs. Fluentd. I like Logstash far more because you create Grok filters to match the type of data you want, and it makes formatting your fields much easier because you are providing an abstraction layer, but you essentially will get the same data.
And I would watch out when you're using sites like Rubular, as they are fairly particular about multi-line matching and the like. I'd suggest something like Regexr which gives immediate feedback and you can set global and multiline matching as well.

Log parsing rules in Jenkins

I'm using Jenkins log parser plugin to extract and display the build log.
The rule file looks like,
# Compiler Error
error /(?i) error:/
# Compiler Warning
warning /(?i) warning:/
Everything works fine but for some reasons, at the end of "Parsed Output Console", I see this message,
NOTE: Some bad parsing rules have been found:
Bad parsing rule: , Error:1
Bad parsing rule: , Error:1
This, I'm sure is a trivial issue but not able to figure it out at this moment.
Please help :)
EDIT:
Based on Kobi's answer and having looked into the "Parsing rules files", I fixed it this way (a single space after colon). This worked perfectly as expected.
# Compiler Error
error /(?i)error: /
# Compiler Warning
warning /(?i)warning: /
The Log Parser Plugin does not support spaces in your pattern.
This can be clearly seen in their source code:
final String ruleParts[] = parsingRule.split("\\s");
String regexp = ruleParts[1];
They should probably have used .split("\\s", 2).
As an alternative, you can use \s, \b, or an escape sequence - \u0020.
I had tried no spaces in the pattern, but that did not work.
Turns out that the Parsing Rules files does not support
empty lines in it. Once I removed the empty lines, I did not
get this "Bad parsing rule: , Error:1".
I think since the line is empty - it doesn't echo any rule after
the first colon. Would have been nice it the line number was
reported where the problem is.