similar questions have been asked, but they miss one thing I need to do and I can't figure it out.
I need to find all files that do NOT have either a tif, or tiff extension, but I DO need to find all others including those that have no extension. I got the first part working with the regex below, but this doesn't match files with no extension.
^(.+)\.(?!tif$|tiff$).+$
That works great, but I need the following to work.
filename.ext MATCH
filename.abc MATCH
filename.tif FAIL
filename MATCH
Thanks :)
If you're not working with JS/ECMAscript regex, you can use:
^.*(?<!\.tif)(?<!\.tiff)$
Rather than writing a negative regex, consider using the simpler, positive regex, but taking action when something does not match. This is often a superior approach.
It can't be used in every situation (e.g. if you are using a command line tool that requires you to specify what does match), but I would do this where possible.
This works for me:
^(?:(.+\.)((?!tif$|tiff$)[^.]*)|[^.]+)$
That regex is split in two different parts:
Part 1: (.+)\.((?!tif$|tiff$)[^.]*)
(.+) (1st capturing group) Match a filename (potentially containing dots)
\. Match the last dot of the string (preceding the extension).
((?!tif$|tiff$)[^.]*) (2nd capturing group) Then check if the dot is not followed by exactly "tif" or "tiff" and if so match the extension.
Part 2: [^.]+ If part 1 didn't match, check if you have just a filename containing no dot.
If you have some strings in a text file ( that has newline ):
perl -lne '/(?:tiff?)/ || print' file
If you have some files in a directory:
ls | perl -lne '/(?:tiff?)/ || print'
Screen-shot:
Here's what I came up with:
^[^\.\s]+(\.|\s)(?!tiff?)
Explanation:
Beginning of line to dot or whitespace, put your matching group around this, ie:
^(?<result>[^\.\s]+)
It will then look for a dot or a whitespace, with a negative lookahead on the tiff (tiff? will match to both tif and tiff).
This makes the assumption that there will always be a dot or a whitespace after the filename. You can change this to be an end of line if that is what you need:
^[^\.\s]+(\.(?!tiff?)|\n) linux
^[^\.\s]+(\.(?!tiff?)|\r\n) windows
Related
I'm trying to find and replace some function calls in py program. The idea is to add some boolean parameter to each call found on the project.
I looked for solutions on the internet 'cause I don't know regex science at all... It seems like a basic exercice for regex guys but still.
In my case I have this call in a lot of files :
myFunction("test")
My gooal is to find and replace this call into :
myFunction("test", false)
Could you help me write the regex ?
Try this command:
sed -re 's/(myFunction)[[:space:]]*\([[:space:]]*("test")[[:space:]]*\)/\1(\2, false)/' SOURCE_FILENAME
If you prefer to replace the existing source file with an updated one, then write -i SOURCE_FILENAME instead of SOURCE_FILENAME.
This works by defining a pattern to match the function call you would like to update:
myFunction (obviously) matches the text myFunction;
[[:space:]] matches any whitespace character, mainly spaces and tabs.
[[:space:]]* matches zero or more whitespace characters.
\( and \) match literal parenthesis in your program text;
( and ) are regex metacharacters that match nothing, but ("test") matches "test" and captures the matched text for later use.
Note that this pattern captures two things using ( and ). The ("test") is the second of these.
Now let us examine the overall structure of the Sed command 's/.../.../'. The s means "substitute," so 's/.../.../' is Sed's substitution command.
Between the first and second slashes comes the pattern we have just discussed. Between the second and third slashes comes the replacement text Sed uses to replace the matched part of any line of your program text that matches the pattern. Within the replacement text, the \1 and \2 are backreferences that place the text earlier captured using ( and ).
So, there it is. Not only have I helped you to write the regex but have shown you how the regex works so that, next time, you can write your own.
Refer this:
import re
#Replace all white-space characters with the digit "9":
str = "The rain in Spain"
x = re.sub("\s", "9", str)
print(x)
you could use this regex to match and capture
(myFunction\("test")(\))
then use the regex below to replace
$1, false$2
I'm using PowerShell to query for a service path from which results should resemble C:\directory\sub-directory\service.exe
Some results however also include characters after the .exe file extension, for example output may resemble one of the following:
C:\directory\sub-directory\service.exe ThisTextNeedsRemoving
C:\directory\sub-directory\service.exe -ThisTextNeedsRemoving
C:\directory\sub-directory\service.exe /ThisTextNeedsRemoving
i.e. ThisTextNeedsRemoving may be proceeded by a space, hyphen or forward slash.
I can use the regex -replace '($*.exe).*' to remove everything after, but including the .exe file extension, but how do I keep the .exe in the results?
You can use a look-around:
$txt = 'C:\directory\sub-directory\service.exe /ThisTextNeedsRemoving'
$txt -replace '(?<=\.exe).+', ''
This uses a look-behind which is a zero-width match so it doesn't get replaced.
Debuggex Demo
Using lookbehind is possible, but note that lookbehinds are only necessary when you need to specify some rather complex condition or to obtain overlapping matches. In most cases, when you can do without a lookbehind, you should consider using a non-lookbehind solution because it is rather a costly operation. It is easier to check once if the current character is not a whitespace than to also check if each of these symbols is preceded with something else. Or a whole substring, or a more complext pattern.
Thus, I'd suggest using a solution based on capturing mechanism, with a backreference in the replacement part to restore the captured substring in the result:
$s -replace '^(\S+\.exe) .*','$1'
or - for paths containing spaces and not inside double quotes:
$s -replace '^(.*?\.exe) .*','$1'
Explanation:
^ - start of string
(\S+\.exe) - one or more character other than whitespace (\S+) (or any characters other than a newline, any amount, as few as possible, with .*?) followed with a literal . and exe
.* - a space and then any number of characters other than a newline.
How to write regex to match if only first character is . ?
I'v been trying this:
hide_file={.*}
But unfortunately, it will find all files that has . in it.
For example:
/home/user
.bashrc
.bash_history
some_text.csv
foo.json
In this example I would like this regex to affect only first two files.
P.S
That's the requirement:
Supported regex syntax is any number of *, ? and unnested {,} operators. Regex matching is only supported on the last component of a path, e.g. a/b/? is supported but a/?/c is not. Example: deny_file={*.mp3,*.mov,.private}
Simply use
^\s*?\..*$
See http://regex101.com/r/oW1xP3 for a live demo
If you are sure there are no whitespaces in front of your input remove the \s*?
The trick is to anchor ^ the regex to the beginning of the string.
^\. will match any string that begins with a period. *Note: * you will need to escape this regex appropriately for your programming language.
hide_file={^\.}
I am attempting to edit a csv file, below is a sample line from this file.
|MIGRATE|;|10000|;|2ACC0003|;|30/09/13|;|Positive Adjmt.|;||;|MIGRATE|;|95004U
The beginning of the line |MIGRATE| needs to be modified without changing the second MIGRATE so the line would read
|MIGRATE|;|MIG_IN|;|10000|;|2ACC0003|;|30/09/13|;|Positive Adjmt.|;||;|MIGRATE|;|95004U
There are 7700 or so lines so if I am forced to do this manually I will probably cry a little.
Thanks in advance!
Just replace all the ones you want not changed with another word temporarily, then replace the rest with what you want. I'm not sure what you're asking here, but from what I can guess this might help.
It seems like you could just search for Just search for:
^\|MIGRATE\|
And replace with:
|MIGRATE|;|MIG_IN|
Make sure you've checked 'Regular expression' in the 'Search Mode' options.
Explanation: The ^ is a begin anchor; it will match the beginning of the line, ensuring that it does not match the second |MIGRATE|. The \ characters are required to escape the | characters since they normally have special meaning in regular expressions, and you want to match a literal |.
You can use beginning of line anchors:
Find:
^(\|MIGRATE\|)
Replace with:
$1;|MIG_IN|
regex101 demo
Just make sure that you are using the regular expression mode of the Search&Replace.
If you want to be a bit fancier, you can use a positive lookbehind:
Find:
(?<=^\|MIGRATE\|)
Replace with:
;|MIG_IN|
^ Will match only at the beginning of a line.
( ... ) is called a capture group, and will save the contents of the match in variable you can use (in the first regex, I accessed the variable using $1 in the replace. The first capture gets stored to $1, the second to $2, etc.)
| is a special character meaning 'or' in regex (to match a character or group of characters or another, e.g. a|b matches a or b. As such, you need to escape it with a backslash to make a regex match a literal |.
In my second regex, I used (?<= ... ) which is called a positive lookbehind. It makes sure that the part to be matched has what's inside before it. For instance, (?<=a)b matches a b only if it has an a before it. So that the b in ab matches but not in bb.
The website I linked also explains the details of the regex and you can try out some regex yourself!
grep "http:\/\/.*\.jpg" index.html -o
Gives me text starting with http:// and ending with .jpg
So does: grep "http:\/\/.*\.\(jpg\)" index.html -o
What is the difference? And is there any condition where this might fail?
I got it to match either jpg,png or gif using this regex:
http:\/\/.*\.\(jpg\|png\|gif\)
Something to do with backreference or regex grouping that I read. Cannot understand this part \(\)
Grouping is used for two purposes in regular expressions.
One uses is to delimit parts of the regexp when using alternatives. That's the case in your third regexp, it allows you to say that the extension can be any of jpg, png, or gif.
The other use is for backreferences. This allows you to refer to the text that matched an earlier part of the regexp later in the regexp. For instance, the following regexp matches any letter that appears twice in a row:
\([a-z]\)\1
The backreference \1 means "match whatever matched the first group in the regexp".
( and ) are metacharacters. i.e. they don't match themselves, but mean something to grep.
From here:
Grouping is performed with backslashes followed by parentheses ‘(’,
‘)’.
so in the above the \( and \) define within them a group of possibilities to match separated by the | character. i.e. your filename extensions.