gvim search match multiple characters using regex - regex

I am trying to search in gvim for the following pattern:
arrayA[*].entryx
hoping it would match the following:
arrayA[size].entryx
arrayA[i].entryx
arrayA[index].entryx
but it prints message saying Pattern not found even though the above lines are present in the file.
arrayA[.].entryx
only matches arrayA[i].entryx
i.e. with only one character between [] braces.
What should I do to match multiple characters between [] braces?

Here is the PCRE expression detail
/arrayA\[[^]]*]\.entryx/
^^^^^ # 0 or more characters before a ']'
^^ ^^ # Escaped '[' & '.'
^ # Closing ']' -- does not need to be escaped
^^^^^^ ^^^^^^ # Literal parts
If you want to look for arrayA[X].entryx where, there is at least on character in the [],
You need to replace \[[^]]* with \[[^]]\+
ps: Note my edit -- I've changed the \* to just * -- you don't escape that either.
But, you need to escape the + :-)
Update on your comment:
While my comment answers your question on escaping ] broadly,
for more detail look at Perl Character Class details.
Specifically, the Special Characters Inside a Bracketed Character Class section.
Rules of what needs to be escaped change after a [character starts a Character Class (CCL).

The * repeats the previous character; and [ starts a character class. So, you need something more like:
/arrayA\[[^]]*]\.entryx/
That looks for a literal [, a series of zero or more characters other than ], a literal ], a literal . and the entryx.

Always remember that in VIM you need to scape some special characters, such as [, ], {, } and .. As said before the *repeats the previous character, with this you can simply use the /arrayA\[.*\]\.entryx, but the * is greedy character, it may match some strange things, add the following line to your file and you'll understand: arrayA[size].entryx = arrayB[].entryx
A "safer" Regular Expression would be:
/arrayA\[.\{-\}\]\.entryx
The .\{-\} matches any character in a non-greedy way, witch is safer for some cases.

Related

Replace "advanced" pattern in sed

I cant figure out how to change this:
\usepackage{scrpage2}
\usepackage{pgf} \usepackage[latin1]{inputenc}\usepackage{times}\usepackage[T1]{fontenc}
\usepackage[colorlinks,citecolor=black,filecolor=black,linkcolor=black,urlcolor=black]{hyperref}
to this using sed only
REPLACED
REPLACED REPLACEDREPLACEDREPLACED
REPLACED
Im trying stuff like sed 's!\\.*\([.*]\)\?{.\+}!REPLACED!g' FILE
but that gives me
REPLACED
REPLACED
REPLACED
I think .* gets used and everything else in my pattern is just ignored, but I can't figure out how to go about this.
After I learned how to format a regex like that, my next step would be to change it to this:
\usepackage{scrpage2}
\usepackage{pgf}
\usepackage[latin1]{inputenc}
\usepackage{times}
\usepackage[T1]{fontenc}
\usepackage[colorlinks,citecolor=black,filecolor=black,linkcolor=black,urlcolor=black]{hyperref}
So I would appreciate any pointers in that direction too.
Here's some code that happens to work for the example you gave:
sed 's/\\[^\\[:space:]]\+/REPLACED/g'
I.e. match a backslash followed by one or more characters that are not whitespace or another backslash.
To make things more specific, you can use
sed 's/\\[[:alnum:]]\+\(\[[^][]*\]\)\?{[^{}]*}/REPLACED/g'
I.e. match a backslash followed by one or more alphanumeric characters, followed by an optional [ ] group, followed by a { } group.
The [ ] group matches [, followed by zero or more non-bracket characters, followed by ].
The { } group matches {, followed by zero or more non-brace characters, followed by }.
Perl to the rescue! It features the "frugal quantifiers":
perl -pe 's!\\.*?\.?{.+?}!REPLACED!g' FILE
Note that I removed the capturing group as you didn't use it anywhere. Also, [.*] matches either a dot or an asterisk, but you probably wanted to match a literal dot instead.

Which characters must be escaped in a Perl regex pattern

Im trying to find files that are looking like this:
access_log-20160101
access_log-20160304
...
with perl regex i came up with something like this:
/^access_log-\d{8}$/
But im not sure about the "_" and the "-". are these metacharacter?
What is the expression for this?
i read that "_" in regex is something like \w, but how do i use them in my exypression?
/^access\wlog-\d{8}$/ ?
Underscore (_) is not a metacharacter and does not need to be quoted (though it won't change anything if you quote it).
Hyphen (-) IS a metacharacter that defines the range between two symbols inside a bracketed character class. However, in this particular position, it will be interpreted verbatim and doesn't need quoting since it is not inside [] with a symbol on both sides.
You can use your regexp as is; hyphens (-) might need quoting if your format changes in future.
Your regex pattern is exactly right
Neither underscore _ nor hyphen - need to be escaped. Outside a square-bracketed character class, the twelve Perl regex metacharacters are
Brackets ( ) [ {
Quantifiers * + ?
Anchors ^ $
Alternator |
Wild character .
The escape itself \
and only these must be escaped
If the pattern of your file names doesn't vary from what you have shown then the pattern that you are using
^access_log-\d{8}$
is correct, unless you need to validate the date string
Within a character class like [A-F] you must escape the hyphen if you want it to be interpreted literally. As it stands, that class is the equivalent to [ABCDEF]. If you mean just the three characters A, - or F then [A\-F] will do what you want, but it is usual to put the hyphen at the start or end of the class list to make it unambiguous. [-AF] and [AF-] are the same as [A\-F] and rather more readable

Regex Check Whether a string contains characters other than specified

How to check whether a string contains character other than:
Alphabets(Lowe-Case/Upper-Case)
digits
Space
Comma(,)
Period (.)
Bracket ( )
&
'
$
+(plus) minus(-) (*) (=) arithmetic operator
/
using regular expression in ColdFusion?
I want to make sure a string doesn't contain even single character other than the specified.
You can find if there are any invalid characters like this:
<cfif refind( "[^a-zA-Z0-9 ,.&'$()\-+*=/]" , Input ) >
<!--- invalid character found --->
</cfif>
Where the [...] is a character class (match any single char from within), and the ^ at the start means "NOT" - i.e. if it finds anything that is not an accepted char, it returns true.
I don't understand "Small Bracket(opening closing)", but guess you mean < and > there? If you want () or {} just swap them over. For [] you need to escape them as \[\]
Character Class Escaping
Inside a character class, only a handful of characters need escaping with a backslash, these are:
\ - if you want a literal backslash, escape it.
^ - a caret must be escaped if it's the first character, otherwise it negates the class.
- - a dash creates a range. It must be escaped unless first/last (but recommended always to be)
[ and ] - both brackets should be escaped.
ColdFusion uses Java's engine to parse regular expressions, anyway to make sure a string doesn't contain one of the characters you mentioned then try:
^(?![a-zA-Z0-9 ,.&$']*[^a-zA-Z0-9 ,.&$']).*$
The above expression would only work if you are parsing the file line by line. If you want to apply this to text which contains multiple lines then you need to use the global modifier and the multi-line modifier and change the expression a bit like this:
^(?![a-zA-Z0-9 ,.&$']*[^a-zA-Z0-9 ,.&$'\r\n]).*$
Regex101 Demo
The regular expression:
[^][a-zA-Z0-9 ,.&'$]
will match if the string contains any characters other than the ones in your list.

regular expression to match english words with some other characters

I use this regular expression: ^[a-zA-Z0-9]*$ to match English phrases, however, I want this expression to match English phrases that may contain some or all of these characters at the beginning, between or at the end of them:
? > < ; , { } [ ] - _ + = ! # # $ % ^ & * | ' and also the space character.
how can I update this regular expression to satisfy this requirement ?
thank you so much in advance ...
You could simply add all your desired characters to your character class.
^[a-zA-Z0-9?><;,{}[\]\-_+=!##$%\^&*|']*$
You will need to escape the following characters with a backslash, since they are considered as metacharacters inside character classes: ], -, ^.
Note that your regex will also match empty strings, since it uses the * quantifier. If you only want to match words having at least one character, replace it with the + quantifier.
You are looking for this pattern.
^[\s\w\d\?><;,\{\}\[\]\-_\+=!#\#\$%^&\*\|\']*$
I'm thankfully accept the \s\w\d groups from the previous answer, and add other delimiters and special characters as hexadecimal ASCII ranges (you can use Unicode ranges as well):
^[\s\w\d\x21-\x2f\x3a-\x40\x5b-\x60\x7b-\x7e]*$
You can refer here to the ASCII Codes
and Unicode characters

How do you regex match some unicode character follow by bracket?

I am not too familiar with regex and hope someone could help.
example:
This is a sentence with some_unicode[some other word] and other stuff.
After removing the characters and brackets, the result should be:
This is a sentence with and other stuff.
Thank you!!
Search for
some_unicode\[[^\]]*\]
and replace with nothing.
Explanation:
\[: Match a literal [.
[: Match a character class with the following properties (here [ is a metacharacter, starting a character class)...
^\]: "any character except a literal ]" (^ at the start of a character class negates its contents).
]*: ...zero or more times. Note again the unescaped ], ending the character class.
\]: Match a literal ].
This of course will only work if there can be no brackets inside brackets. How to actually format and use the regex is highly dependent on the language/tool you're doing this with; so if you add another tag to your question specifying the language, I can give you a code example.
[ and ] are metacharacters in regular expressions and must be escaped by a backslash, e.g. \[.