match first space on a line using sublime text and regular expressions - regex

So regular expressions have always been tough for me. Im getting frustrated trying to find a regular expression that will select the first white space on a line. So then i can use sublime text to replace that with a /
If you could give a quick explanation that would help to

In the spirit of #edi's answer, but with some explanation of what's happening. Match the beginning of the line with ^, then look for a sequence of characters that are not whitespace with [^\s]* or \S* (the former may work in more editors, libraries, etc than the latter), then find the first whitespace character with \s. Putting these together, you have
^[^\s]*\s
You may want to group the non-whitespace and whitespace parts, so you can do the replacement you're talking about:
^([^\s]*)(\s)
Then the replacement pattern is just \1/

You can use this regex.
^([^\s]*)\s

Related

What is the difference between `(\S.*\S)` and `^\s*(.*)\s*$` in regex?

I'm doing the RegexOne regex tutorial and it has a question about writing a regular expression to remove unnecessary whitespace.
The solution provided in the tutorial is
We can just skip all the starting and ending whitespace by not capturing it in a line. For example, the expression ^\s*(.*)\s*$ will catch only the content.
The setup for the question does indicate the use of the hat at the beginning and the dollar sign at the end, so it makes sense that this is the expression that they want:
We have previously seen how to match a full line of text using the hat ^ and the dollar sign $ respectively. When used in conjunction with the whitespace \s, you can easily skip all preceding and trailing spaces.
That said, using \S instead, I was able to come up with what seems like a simpler solution - (\S.*\S).
I've found this Stack Overflow solution that match the one in the tutorial - Regex Email - Ignore leading and trailing spaces? and I've seen other guides that recommend the same format but I'm struggling to find an explanation for why the \S is bad.
Additionally, this validates as correct in their tool... so, are there cases where this would not work as well as the provided solution? Or is the recommended version just a standard format?
The tutorial's solution of ^\s*(.*)\s*$ is wrong. The capture group .* is greedy, so it will expand as much as it can, all the way to the end of the line - it will capture trailing spaces too. The .* will never backtrack, so the \s* that follows will never consume any characters.
https://regex101.com/r/584uVG/1
Your solution is much better at actually matching only the non-whitespace content in the line, but there are a couple odd cases in which it won't match the non-space characters in the middle. (\S.*\S) will only capture at least two characters, whereas the tutorial's technique of (.*) may not capture any characters if the input is composed of all whitespace. (.*) may also capture only a single character.
But, given the problem description at your link:
Occasionally, you'll find yourself with a log file that has ill-formatted whitespace where lines are indented too much or not enough. One way to fix this is to use an editor's search a replace and a regular expression to extract the content of the lines without the extra whitespace.
From this, matching only the non-whitespace content (like you're doing) probably wouldn't remove the undesirable leading and trailing spaces. The tutorial is probably thinking to guide you towards a technique that can be used to match a whole line with a particular pattern, and then replace that line with only the captured group, like:
Match ^\s*(.*\S)\s*$, replace with $1: https://regex101.com/r/584uVG/2/
Your technique would work given the problem if you had a way to make a new text file containing only the captured groups (or all the full matches), eg:
const input = ` foo
bar
baz
qux `;
const newText = (input.match(/\S(?:$|.*\S)/gm) || [])
.join('\n');
console.log(newText);
Using \S instead of . is not bad - if one knows a particular location must be matched by a non-space character, rather than by a space, using \S is more precise, can make the intent of the pattern clearer, and can make a bad match fail faster, and can also avoid problems with catastrophic backtracking in some cases. These patterns don't have backtracking issues, but it's still a good habit to get into.

Problem with regular expression using look behind feature

I try to build simple regular expression to remove some parts of bad (unwanted) code and needed use look behind feature.
It worked until i added \s+ to it to exclude spaces from mark.
Eliminating parts of expression i finally got to (?<=\s+)foo which is still warned as invalid expression.
It may looks a little weird or unclear so expanding it:
(?<=foo\s+)bar is warned as invalid expression, where (?<=foo)\s+bar is working but it marks spaces before bar.
I am use it in notepad++.
From the comment by #Toto Notepad++ does not support variable length lookbehind. It uses the boost regex.
Notepad++ does support \K to reset the starting point of the reported match.
\bfoo\s+\Kbar\b
Regex demo
Another way is to capture bar in a capturing group.
\bfoo\s+(bar)\b
Regex demo
Note that \s also matches a newline, and perhaps you might also use \h+ to match 1+ horizontal whitespace characters.

Regular Expression to match two words near each other on a single line

Hi I am trying to construct a regular expression (PCRE) that is able to find two words near each other but which occur on the same line. The near examples generally provided are insufficient for my requirements as the "\W" obviously includes new lines. I have spent quite a bit of time trying to find an answer to this and have thus far been unsuccessful. To exemplify what I have so far, please see below:
(?i)(?:\b(tree)\b)\W+(?:\w+\W+){0,5}?\b(house)\b.*
I want this to match on:
here is a tree with a house
But not match on
here is a tree
with a house
Any help would be greatly appreciated!
How about
\btree\b[^\n]+\bhouse\b
Just add a negative lookahead to match all the non-word characters but not of a new line character.
(?i)(?:\b(tree)\b)(?:(?!\n)\W)+(?:\w+\W+){0,5}?\b(house)\b.*
DEMO
Dot matches anything except newlines, so just:
(?i)\btree\b.{1,5}\bhouse\b
Note it is impossible for there to be zero characters between the two words, because then they wouldn't be two words - they would be the one word and the \b wouldn't match.
Just replace \W with [^\w\r\n] in your regex:
(?i)(?:\b(tree)\b)[^\w\r\n]+(?:\w+[^\w\r\n]+){0,5}?\b(house)\b.*
To get the closest matches of both words on the same line, an option is to use a negative lookahead:
(?i)(\btree\b)(?>(?!(?1)).)*?\bhouse\b
The . dot default does not match a newline (only with s DOTALL modifier)
(?>(?!(?1)).)*? As few as possibly of any characters, that are not followed by \btree\b
(?1) pastes the first parenthesized pattern.
Example at regex101.com; Regex FAQ
Maybe this helps, found here https://www.regular-expressions.info/near.html
\bword1\W+(?:\w+\W+){1,6}?word2\b.

Matching an expression including arbitrary lines with regex in Vim

In a text file opened with Vim, I'm trying to match the occurrence of two strings, DRIVER_ACTIVITY and DriverGroup, with an arbitrary amount of lines in between:
2013-07-01 05:06:23,801 DRIVER_ACTIVITY
2013-07-01 05:06:23,804 text
2013-07-01 05:06:23,804 more text
2013-07-01 05:06:23,805 DriverGroup
using:
/DRIVER_ACTIVITY(.*)DriverGroup/s
/DRIVER_ACTIVITY((.|\n|\r)*)DriverGroup
/\vDRIVER_ACTIVITY((.|\n|\r)*)DriverGroup
/DRIVER_ACTIVITY\[\S\s\]*DriverGroup
Nothing matches. How do I match all the lines/new lines?
If you want to use the more common (...) for grouping, you need to include the \v atom to switch Vim's regular expression syntax to "very magic"; else, it's \(...\). But for your case, Vim has a special atom that matches arbitrary characters including newlines: \_., like this:
/DRIVER_ACTIVITY\_.*DriverGroup
There's no way around learning Vim's different regular expression dialect; see :help pattern.
The \_s construct searches spaces including newlines
/DRIVER_ACTIVITY\(\_s\|.\)*DriverGroup
Ok, I see the problem. In this sample file, the third try matches, as does Ingo Karkat's and Explosion Pills' suggestions. The reason I didn't succeed is because all these seem to be greedy. That's why none of these matches in "the big file", 'cause it's greedy and keeps on looking, not returning a match in several seconds, though the marker is located on the same line where the first match should appear. So it actually matches but my patience is the problem :)
I made it non greedy and it worked:
/DRIVER_ACTIVITY_.{-}DriverGroup

Find whitespace in end of string using wildcards or regex

I have a Resoure.resx file that I need to search to find strings ending with a whitespace. I have noticed that in Visual Web Developer I can search using both regex and wildcards but I can not figure out how to find only strings with whitespace in the end. I tried this regex but didn't work:
\s$
Can you give me an example? Thanks!
I'd expect that to work, although since \s includes \n and \r, perhaps it's getting confused. Or I suppose it's possible (but really unlikely) that the flavor of regular expressions that Visual Web Developer uses (I don't have a copy) doesn't have the \s character class. Try this:
[ \f\t\v]$
...which searches for a space, formfeed, tab, or vertical tab at the end of a line.
If you're doing a search and replace and want to get rid of all of the whitespace at the end of the line, then as RageZ points out, you'll want to include a greedy quantifier (+ meaning "one or more") so that you grab as much as you can:
[ \f\t\v]+$
You were almost there. adding the + sign means 1 characters to infinite number of characters.
This would probably make it:
\s+$
Perhaps this would work:
^.+\s$
Using this you'll be able to find nonempty lines that end with a whitespace character.