Regular Expression in Vim that will count 1 or colon - regex

I am rather new to using vim with regular expressions and I need to count specific entries in a .csv file The entries are in this form:
9,1,8-Mar-11,high,A2,mid,500,1000,0.143494345,0.153521446,1121.386992,409.6833333,,
9,2,8-Mar-11,high,A2,mid,500,1000,0.180015537,0.256840072,1190.977918,420.8229933,1,
9,3,8-Mar-11,high,A2,mid,500,1000,0.250273568,0.16378268,1061.417761,419.1692065,,1
I need to count the number of 8-Mar-11, A2 conditions which have either ,, or ,1 (,|1) or (,|1),1 on the end of the lines.
here is the regular expression that i use in vim to get some count data:
:%s/.*8-Mar-11.*A2.*,,1$//gn
What I would like to know is there a way to use either in vim? like:
:%s/.*8-Mar-11.*A2.*,1\(,|1\)//gn
Any advice or help is greatly appreciated!

VIM regexes are weird and confusing. You have to escape the | as \| for or

When you're composing a regular expression which is searching for one character or another, alternation using pipes is overkill. Use a character class instead:
:%s/.*8-Mar-11.*A2.*,1[,1]//gn

Related

Postgres regex and special characters

I am using posgresql, and I have encountered an issue with regular expressions and special characters.
select regexp_replace('asdf|asdf','|','.');
This function returns:
.asdf|asdf
Desired output:
asdf.asdf
How I can solve it? Please help :)
| is a special character in regex syntax called alternation, it means "or".
Your regex is selecting the empty string at the beginning of your string.
Try escaping it:
select regexp_replace('asdf|asdf','\|','.');
As #pozs underlined, for this particular task it is way more suited to use a simple replace:
select replace('asdf|asdf','|','.');

Find and trim part of what is found using regular expression

I'm a newbie in writing regular expressions
I have a file name like this TST0101201304-123.txt and my target is to get the numbers between '-' and '.txt'
So I wrote this formula -([0-9]*)\.txt this will get me the numbers that I want, but in addition, it is retrieving the highfin '-' and the last part of the string also '.txt' so the result in the example above is '-123.txt'
So my question is:
Is there a way in regular expressions to get only part of the matched string, like a submatch of the match without the need to trim it in my shell script code for unix?
I found this answer but it is getting the same result:
Regexp: Trim parts of a string and return what ever is left
Tip: To test my regular expressions is used this website
You can use lookbehind and lookahead
(?<=-)[0-9]*(?=[.]txt)
Don't know if it would work in unix
Different regex-engines are different. Since you're using expr match, you need to make two changes:
expr match expects a regex that matches the entire string; so, you need to add .* at the beginning of yours, to cover everything before the hyphen.
expr match uses POSIX Basic Regular Expressions (BREs), which use \( and \) for grouping (and capturing) rather than merely ( and ).
But, conveniently, when you give expr match a regex that contains a capture-group, its output is the content of that capture-group; you don't need to do anything else special. So:
$ expr match TST0101201304-123.txt '.*-\([0-9]*\)\.txt'
123
sed is your friend.
echo filename | sed -e 's/-\([0-9]*\)/\1'
should get you what you want.

. (dot) Regular expression

I would like to add delimters to a .txt file.
Each line has the same amount of characters; and I know where the splits should happen.
For example,
MyNameIsHarry
I would like to transform the file to look like this instead:
My|Name|Is|Harry
I am on notepad++ using Regular Expression, and I can do this:
(..)(....)(..)(.....)
Replace with
\1|\2|\3|\4
Is there a more efficient way I can write this regular expression? Would i have to use 100 "." (dots) if there was a split of 100 characters?
Many thanks for your help!
http://www.regular-expressions.info/reference.html at your service!
You can use (.{100}) if you expect exactly 100.
as stated in the reference:
{n} where n is an integer >= 1
Repeats the previous item exactly n times.
Example: a{3} matches aaa
If the text is all in the same format as your example you could just use:
Find what : ([a-z])([A-Z])
Replace with : \1|\2
Make sure Match case and Regular expression are checked
Replace All

Regular Expression Time Format?

I have a regular expression which accept time in a specific format like the following,
"10:00".
I want to change the regular expression to etiher accpet this format or accept only a single one dash only ("-").
Here is the expression:
/^((\d)|(0\d)|(1\d)|(2[0-3]))\:((\d)|([0-5]\d))$/
Key points to solving this:
Square brackets ([ and ]) are used to enclose character classes.
The pipe | means or.
[\:|-] means chech for either a literal : or a hyphen -.
The resulting pattern is:
^((\d)|(0\d)|(1\d)|(2[0-3]))[\:|-]((\d)|([0-5]\d))$
Just use an alternative:
^<your regex>$|^-$
This will match either a time in your format or a single hyphen-minus.
Does this regex need to have so many brackets?
/^(([01]?\d|2[0-3]):[0-5]\d|-)$/

Regular expression to extract all words starting with colon

I would like to use a regular expression to extract "bind variable" parameters from a string that contains a SQL statement. In Oracle, the parameters are prefixed with a colon.
For example, like this:
SELECT * FROM employee WHERE name = :variable1 OR empno = :variable2
Can I use a regular expression to extract "variable1" and "variable2" from the string? That is, get all words that start with colon and end with space, comma, or the end of the string.
(I don't care if I get the same name multiple times if the same variable has been used several times in the SQL statement; I can sort that out later.)
This might work:
:\w+
This just means "a colon, followed by one or more word-class characters".
This obviously assumes you have a POSIX-compliant regular expression system, that supports the word-class syntax.
Of course, this only matches a single such reference. To get both, and skip the noise, something like this should work:
(:\w+).+(:\w+)
For being able to handle such an easy case by yourself you should have a look at regex quickstart.
For the meantime use:
:\w+
If your regex parser supports word boundaries,
:[a-zA-Z_0-9]\b
Try the following:
sed -e 's/[ ,]/\\n/g' yourFile.sql | grep '^:.*$' | sort | uniq
assuming your SQL is in a file called "yourFile.sql".
This should give a list of variables with no duplicates.