I want to be able to match and parse some parameters read from a file such as :
"type:int,register_id:15,value:123456"
"type:int,register_id:16,value:-456789"
"type:double,register_id:17,value:123.456"
"type:double,register_id:18,value:-456.789"
"type:bool,register_id:19,value:true"
"type:bool,register_id:20,value:false"
"type:string,register_id:17,value:Test Set Data Register"
I've come up with the following Regex expression :
(^(type:)\b(bool|int|double|string)\b,(\bregister_id:\b)([1-9][0-9]),(\bvalue:\b)(.)$)
but I have issues where there are negative floats or ints, I can't get the hyphen sorted properly ...
Can someone point me in the right direction ?
https://regex101.com/r/WhXmBE/3
Thanks !
Tried [\s\S] but it reads everything, tried -? as well
Given your example, this seems to work:
(^(type:)(bool|int|double|string),(register_id:)([1-9][0-9]*),(value:)(.*)$)
At least from the example, I didn't see why the \b are necessary. Apologies if I missed something.
Looking at what you try to achieve, I would actually consider moving away from regexes, as regexes by themselves add complexity. You will likely have an easier life if you approach it like this:
Split the line by "," to get the key value pairs
Split each key value pair by the first ":" to split key and value
Validate that all keys are present and that every value matches the format for the key (e.g. if the type is bool then the value should parse to a bool)
You can easily adjust every step to e.g. trim whitespaces.
Edit: Fixed typo
I've done several syntax highlighting files for simple custom formats in the past (even changing the format a bit to be capable of making the syntax file basing on my skills, in effects).
But this time I feel confused and I will appreciate some help.
The file format is (obviously) a text file where every line contain three distinct elements separated by spaces, they can be "symbols" (names containing a series of alphanumerical chars plus hyphens) or "string" (a series of any chars, spaces included, but not pipes).
Strings can be only at start or end of a line, the middle element can be only a symbol. And string are delimited by a pipe at the end if it is the first element and at the start if it is the last element.
But a line can be also all symbols, string first and rest symbols, and string last and rest symbols.
Strings are always followed by a pipe if they are the first element, or
with a pipe as prefix if they are the last element.
Examples:
All symbols
this-is-a-symbol another-one and-another
First string
This is a string potentially containing any char| symbol symbol
Last string
symbol symbol |A string at the end of the line
First and last as strings
This is a string| now-we-have-a-symbol |And here another string
This four examples are the only possibilities available for a correct formatting.
All symbols need to be colored differently, a specific color for first element, a specific color for second, and one for third.
But strings will have one unique different color regardless of position.
If the pipe chars can be "dimmed" with a color similar (not precisely the same) to background this will be a big plus. But I think I can manage this myself.
A line in the file not like the ones showed will have to be highlighted as an error (like red background).
Some help?
ps: stackoverflow apply a sort of syntax highlighting to my examples which can be misleading
I have found a simpler approach than what I initially thought was necessary in terms of regular expressions. At end I just need to match the first element and the last, how can I've not think of that... So this is my solution, it seems to work well for my specifics. It only doesn't highlight bad formatted lines. Good enough for now. Thanks for the patience and the attention.
" Vim syntax file
" Language: ff .txt
if exists("b:current_syntax")
finish
endif
setlocal iskeyword+=:
syn match Asymbol /^[a-zA-Z0-9\-]* /
syn match Csymbol / [a-zA-Z0-9\-]*$/
syn match Astring /^.*| /
syn match Cstring / |.*$/
highlight link Asymbol Constant
highlight link Csymbol Statement
highlight link Astring Include
highlight link Cstring Comment
let b:current_syntax = "ff"
I'm new to Regular Expressions, and I have been trying to figure out how to code this: I need to find numbers greater than 25000 where the same line also has the number " 19" somewhere on that line (that's a space then 19). The problem is that the numbers have commas in them. I tried a few options:
This finds lines with any numbers over 25000:
^.*(25,|26,|27,|28,|29,|30,|31,|32,|33,|34,|35,|36,|37,|38,|39,|40,|41,|42,|43,|44,|45,|46,|47,|48,|49,|50,|51,|52,|53,|54,|55,|56,|57,|58,|59,|60,|61,|62,|63,|64,|65,|66,|67,|68,|69,|70,|71,|72,|73,|74,|75,|76,|77,|78,|79,|80,|81,|82,|83,|84,|85,|86,|87,|88,|89,|90,|91,|92,|93,|94,|95,|96,|97,|98,|99,|100,|101,|102,|103,|104,|105,|106,|107,|108,|109,|110,|111,|112,|113,|114,|115,|116,|117,|118,|119,|120,|121,|122,|123,|124,).*$
This finds line with both " 19" and 26, (but not with the comma behind the 26)
^.*( 19.*26).*$
Any help is appreciated!
Numbers over 25000 can be represented as follows :
\d{6,}|2[5-9]\d{3}|[3-9]\d{4}
That is, in english :
numbers of 6 digits or more
numbers of 5 digits starting with 2 and another digit equal or greater than 5
numbers of 5 digits starting with a digit greater than 2
So the complete regex would look like this :
.*(\d{6,}|2[5-9]\d{3,}|[3-9]\d{4,}).* 19.*
Which is said number somewhere in the line, followed by 19 somewhere in the line.
Here is a test run on regex101 for you to test with your data.
I also second the comment that this isn't a job for regular expressions, which as you can see work on characters rather than numbers.
I would try something like this:
^(([0-9,]*([3-9][0-9]|2[5-9]),?[0-9]{3})\s?)$
That should handle the numeric part. You didn't really explain if the " 19" would come before or after that, and what would delimit that from the numeric part, but just insert (\s19) wherever that bit needs to go.
example
Thanks everyone. The following RegEx worked for me:
^.* 19.(25,|26,|27,|28,|29,|30,|31,|32,|33,|34,|35,|36,|37,|38,|39,|40,|41,|42,|43,|44,|45,|46,|47,|48,|49,|50,|51,|52,|53,|54,|55,|56,|57,|58,|59,|60,|61,|62,|63,|64,|65,|66,|67,|68,|69,|70,|71,|72,|73,|74,|75,|76,|77,|78,|79,|80,|81,|82,|83,|84,|85,|86,|87,|88,|89,|90,|91,|92,|93,|94,|95,|96,|97,|98,|99,|100,|101,|102,|103,|104,|105,|106,|107,|108,|109,|110,|111,|112,|113,|114,|115,|116,|117,|118,|119,|120,|121,|122,|123,|124,).$
This finds lines that have " 19" first in the line then a number greater than 25K later in the line, when the numbers have commas in them. I couldn't use the shortcut "number ranges" that were suggested because there are other numbers on the lines without commas that are over 25K that I don't want to flag. Maybe there's any easier way that my brute force method, but if not, at least this works. Thanks again!