using findstr with regex to search through CSV - regex

I was wondering if it's possible to use findstr to search through a CSV for anything matching this regular expression
^([BPXT][0-9]{6})|([a-zA-Z][a-zA-z][0-9][0-9](adm)?)$

I don't know which language you're talking about, but there is one obvious problem with your regex: The ^ and $ anchors require that it matches the entire string, and you seem to be planning on matching individual entries in your CSV file.
Therefore, you should use word boundary anchors instead if your regex engine supports them:
\b(?:([BPXT][0-9]{6})|([a-zA-Z]{2}[0-9]{2}(adm)?))\b
I've also added another non-capturing group around the alternation. In your regex the anchors at the start and end of the string would have been part of the alternation, which is probably not intended. Whether you really need all the other parentheses depends on what you're going to do with the match.

No, it is not possible to use findstr to search for matching substrings, especially those matching the complex expression you've provided.
findstr is a Windows built-in.
findstr /? shows the subset of regex that it can use:
Regular expression quick reference:
. Wildcard: any character
* Repeat: zero or more occurrences of previous character or class
^ Line position: beginning of line
$ Line position: end of line
[class] Character class: any one character in set
[^class] Inverse class: any one character not in set
[x-y] Range: any characters within the specified range
\x Escape: literal use of metacharacter x
\<xyz Word position: beginning of word
xyz\> Word position: end of word
This means that most of your expression is out the window.
Also, findstr can't limit its output to just the matched expression; it only identifies lines containing matches.
It is entirely unsuitable for the task described.

Related

Regular expressions: inserting a word and NOT replacing the found key

I have a list of items, such as:
this_thing.ety
other-stuff.ety
34-pairings.ety
I want to do this:
"At the beginning of every line, insert "images/"
so the result of search/replace with reg exp would yield:
images/this_thing.ety
images/other-stuff.ety
images/34-pairings.ety
I am using:
^.
as my anchor to find the beginning of each line but everything I've tried to add "images/" has resulted in actually replacing that first character. I am using Notepad ++, but can use anything.
I thought using ${foo} was on the right track but I'm missing something here.
In a regex ^.is matching begin of line and a character. If you replace this by 'image', first character, which matched, will be replaced. Empty line wont have 'image' but stay identical (they don't match ^.)
Just use ^ as regexp for begin of line
. is the any character symbol, but can only account for one character. You will want to use ^..*$ or ^.+$ if your version of regex allows so that every line that contains at least one character will be fully replaced. With replace, it would look like this
s/^(.+)$/images\/\1/
where the \1 re-inserts the part in parenthesis in the regex. In older versions of regex, try
s/^^\(..*\)$/\1/

Regex to check if first character is "."

How to write regex to match if only first character is . ?
I'v been trying this:
hide_file={.*}
But unfortunately, it will find all files that has . in it.
For example:
/home/user
.bashrc
.bash_history
some_text.csv
foo.json
In this example I would like this regex to affect only first two files.
P.S
That's the requirement:
Supported regex syntax is any number of *, ? and unnested {,} operators. Regex matching is only supported on the last component of a path, e.g. a/b/? is supported but a/?/c is not. Example: deny_file={*.mp3,*.mov,.private}
Simply use
^\s*?\..*$
See http://regex101.com/r/oW1xP3 for a live demo
If you are sure there are no whitespaces in front of your input remove the \s*?
The trick is to anchor ^ the regex to the beginning of the string.
^\. will match any string that begins with a period. *Note: * you will need to escape this regex appropriately for your programming language.
hide_file={^\.}

Notepad++ Replace all with an exception

I am attempting to edit a csv file, below is a sample line from this file.
|MIGRATE|;|10000|;|2ACC0003|;|30/09/13|;|Positive Adjmt.|;||;|MIGRATE|;|95004U
The beginning of the line |MIGRATE| needs to be modified without changing the second MIGRATE so the line would read
|MIGRATE|;|MIG_IN|;|10000|;|2ACC0003|;|30/09/13|;|Positive Adjmt.|;||;|MIGRATE|;|95004U
There are 7700 or so lines so if I am forced to do this manually I will probably cry a little.
Thanks in advance!
Just replace all the ones you want not changed with another word temporarily, then replace the rest with what you want. I'm not sure what you're asking here, but from what I can guess this might help.
It seems like you could just search for Just search for:
^\|MIGRATE\|
And replace with:
|MIGRATE|;|MIG_IN|
Make sure you've checked 'Regular expression' in the 'Search Mode' options.
Explanation: The ^ is a begin anchor; it will match the beginning of the line, ensuring that it does not match the second |MIGRATE|. The \ characters are required to escape the | characters since they normally have special meaning in regular expressions, and you want to match a literal |.
You can use beginning of line anchors:
Find:
^(\|MIGRATE\|)
Replace with:
$1;|MIG_IN|
regex101 demo
Just make sure that you are using the regular expression mode of the Search&Replace.
If you want to be a bit fancier, you can use a positive lookbehind:
Find:
(?<=^\|MIGRATE\|)
Replace with:
;|MIG_IN|
^ Will match only at the beginning of a line.
( ... ) is called a capture group, and will save the contents of the match in variable you can use (in the first regex, I accessed the variable using $1 in the replace. The first capture gets stored to $1, the second to $2, etc.)
| is a special character meaning 'or' in regex (to match a character or group of characters or another, e.g. a|b matches a or b. As such, you need to escape it with a backslash to make a regex match a literal |.
In my second regex, I used (?<= ... ) which is called a positive lookbehind. It makes sure that the part to be matched has what's inside before it. For instance, (?<=a)b matches a b only if it has an a before it. So that the b in ab matches but not in bb.
The website I linked also explains the details of the regex and you can try out some regex yourself!

Regular expression to replace a string, but not values assigned to a Scalar

Please do pardon me if my question sounds a bit awkward. I am looking for a regex which will replace line numbers in perl source file without affecting values assigned to scalars.
I think below will make my question a little bit clearer. Say I have a perl source which looks like this:
1. $foo = 2.4;
2. print $foo;
I would like a regular expression to replace those line numbers (1. 2. etc..) without affecting value assigned to scalars, and so in this case $foo.
Thanks
anchor your regexp to the start of the line:
to remove the numbers:
perl -p -i.bak -e's{^\d+\. }{}' myperl
Within a perl regex you can use the caret symbol ^ to represent the start of a line. $ represents the end of a line. These are known as anchors.
So to find a number \d at the beginning of a line (only) you can search for
/^\d+/
If you wanted to remove those numbers you can "replace" them with nothing, as in
s/^\d+//g
You also want to include the dot after the number, so you might try
;
/^\d+./
But in regex a dot represents "any character" so you will need to escape the dot to have it interpreted literally
/^\d+\./
The caret symbol ^ also serves double-duty in character sets (it negates them), I only mention this as it is a common source of confusion when learning regex.
/[^\d]/ # Match characters that are not digits

what can be the regex for the following string

I am doing this in groovy.
Input:
hip_abc_batch hip_ndnh_4_abc_copy_from_stgig abc_copy_from_stgig
hiv_daiv_batch hip_a_de_copy_from_staging abc_a_de_copy_from_staging
I want to get the last column. basically anything that starts with abc_.
I tried the following regex (works for second line but not second.
\abc_.*\
but that gives me everything after abc_batch
I am looking for a regex that will fetch me anything that starts with abc_
but I can not use \^abc_.*\ since the whole string does not start with abc_
It sounds like you're looking for "words" (i.e., sequences that don't include spaces) that begin with abc_. You might try:
/\babc_.*\b/
The \b means (in some regular expression flavors) "word boundary."
Try this:
/\s(abc_.*)$/m
Here is a commented version so you can understand how it works:
\s # match one whitepace character
(abc_.*) # capture a string that starts with "abc_" and is followed
# by any character zero or more times
$ # match the end of the string
Since the regular expression has the "m" switch it will be a multi-line expression. This allows the $ to match the end of each line rather than the end of the entire string itself.
You don't need to trim the whitespace as the second capture group contains just the text. After a cursory scan of this tutorial I believe this is the way to grab the value of a capture group using Groovy:
matcher = (yourString =~ /\s(abc_.*)$/m)
// this is how you would extract the value from
// the matcher object
matcher[0][1]
I think you are looking for this: \s(abc_[a-zA-Z_]*)$
If you are using perl and you read all lines into one string, don't forget to set the the m option on your regex (that stands for "Treat string as multiple lines").
Oh, and Regex Coach is your free friend.