Regular expression matching space but at the end of line - regex

I'm trying to replace multiple spaces with a single one, but at the start of the line.
Example:
___abc___def__
___ghi___jkl__
should turn to
___abc_def__
___ghi_jkl__
Note that I've replaced space with underscore
A simple search using the following pattern:
([^\s])\s+
matches the space at the end of the first line up to the space at the beginning of the next one.
So, if I replace with \1_, I get the following:
___abc_def_ghi_jkl
And that is absolutely not what I expect and regex engines, e.g., PowerGREP or the one in Visual Studio, don't behave that way.

If you want to match only horizontal spaces, use \h:
Find what: (?<=\S)\h+(?=\S)
Replace with: (a space)

There are several possible interpretations of the question. For each of them the replacement will be a single space character.
If spaces is plural and means space characters but not tabs then use
a find string of (^ {2,})|( {2,}$).
If spaces is plural and should includes tabs then use a find string
of (^[ \t]{2,})|([ \t]{2,}$).
If any leading or trailing spaces and tabs (one or more) is to be
replaced with a space then use a find string of (^[ \t]+)|([ \t]+$).
The general form of each of these is (^...)|(...$). The | means an alternation so either the preceding or the following bracketed expression can match. Hence the find what text can match either at the beginning or the end of a line. The ... varies depending on exactly what needs to be matched. Specifying [ \t] means only the two characters space and tab, whereas \s includes the line-end characters.

Ok, so the intention was to replace this:
Hey diddle diddle, \n<br/>
The Cat and the fiddle,\n
with this:
Hey diddle diddle,\n<br/>
The Cat and the fiddle,\n
A slightly modified version of Toto's answer did the trick:
(?<=\S)\h+(?=\S)|\s+$
finding any space(s) between word-characters and trailing space at the end of the line.

Related

Perl: How to substitute the content after pattern CLOSED

So I cant use $' variable
But i need to find the pattern that in a file that starts with the string “by: ” followed by any characters , then replace whatever characters comes after “by: ” with an existing string $foo
im using $^I and a while loop since i need to update multiple fields in a file.
I was thinking something along the lines of [s///]
s/(by\:[a-z]+)/$foo/i
I need help. Yes this is an assignment question but im 5 hours and ive lost many brain cells in the process
Some problems with your substitution:
You say you want to match by: (space after colon), but your regex will never match the space.
The pattern [a-z]+ means to match one or more occurrences of letters a to z. But you said you want to match "any characters". That might be zero characters, and it might contain non-letters.
You've replaced the match with $foo, but have lost by:. The entire matched string is replaced with the replacement.
No need to escape : in your pattern.
You're capturing the entire match in parentheses, but not using that anywhere.
I'm assuming you're processing the file line-by line. You want "starts with the string by: followed by any characters". This is the regex:
/^by: .*/
^ matches beginning of line. Then by: matches exactly those characters. . matches any character except for a newline, and * means zero-or more of the preceding item. So .* matches all the rest of the characters on the line.
"replace whatever characters that come after by: with an existing string $foo. I assume you mean the contents of the variable $foo and not the literal characters $foo. This is:
s/^by: .*/by: $foo/;
Since we matched by:, I repeated it in the replacement string because you want to preserve it. $foo will be interpolated in the replacement string.
Another way to write this would be:
s/^(by: ).*/$1$foo/
Here we've captured the text by: in the first set of parentheses. That text will be available in the $1 variable, so we can interpolate that into the replacement string.

Regex: How to match a part of the text within two characters e.g. quotes

I need to match a text within a text that is surrounded by two characters, in this case ‘ and ’. So assume that the whole string is:
Regarding the cat, I asked him ‘can you take care of my cat while I am away’ and he
said ‘yes’.
Now, if I use the following regex
(?<=‘)(.*?)(?=’)
It will match
can you take care of my cat while I am away
and
yes
What if I want to search for a single character e.g. "e" (matches in both quoted strings) or word e.g. "cat" within those two groups? How can I do that? I cannot figure out how to replace (.*?) in order to search for a substring/character within those special quotes.
You only need to replace the dot that is too permissive with a class that excludes the closing quote and the first character of your target:
(?<=‘)([^’e]*(e)[^’]*)(?=’)
or
(?<=‘)([^’c]*(?:(?:\Bc|c(?!at\b))[^’c]*)*\b(cat)\b[^’]*)(?=’)

Regular expressions: inserting a word and NOT replacing the found key

I have a list of items, such as:
this_thing.ety
other-stuff.ety
34-pairings.ety
I want to do this:
"At the beginning of every line, insert "images/"
so the result of search/replace with reg exp would yield:
images/this_thing.ety
images/other-stuff.ety
images/34-pairings.ety
I am using:
^.
as my anchor to find the beginning of each line but everything I've tried to add "images/" has resulted in actually replacing that first character. I am using Notepad ++, but can use anything.
I thought using ${foo} was on the right track but I'm missing something here.
In a regex ^.is matching begin of line and a character. If you replace this by 'image', first character, which matched, will be replaced. Empty line wont have 'image' but stay identical (they don't match ^.)
Just use ^ as regexp for begin of line
. is the any character symbol, but can only account for one character. You will want to use ^..*$ or ^.+$ if your version of regex allows so that every line that contains at least one character will be fully replaced. With replace, it would look like this
s/^(.+)$/images\/\1/
where the \1 re-inserts the part in parenthesis in the regex. In older versions of regex, try
s/^^\(..*\)$/\1/

interpreting regular expression in perl

I am trying to reverse engineer a Perl script. One of the lines contains a matching operator that reads:
$line =~ /^\s*^>/
The input is just FASTA sequences with header information. The script is looking for a particular pattern in the header, I believe.
Here is an example of the files the script is applied to:
>mm9_refGene_NM_001252200_0 range=chr1:39958075-39958131 5'pad=0 3'pad=0 strand=+
repeatMasking=none
ATGGCGAACGACTCTCCCGCGAAGAGCCTGGTGGACATTGACCTGTCGTC
CCTGCGG
>mm9_refGene_NM_001252200_1 range=chr1:39958354-39958419 5'pad=0 3'pad=0 strand=+
repeatMasking=none
GACCCTGCTGGGATTTTTGAGCTGGTGGAAGTGGTTGGAAATGGCACCTA
TGGACAAGTCTATAAG
This is a matching operator asking whether the line, from its beginning, contains white spaces of at least more than zero, but then I lose its meaning.
This is how I have parsed the regex so far:
from beginning [ (/^... ], contains white spaces [ ...\s... ] of at least more than zero [ ...*... }.
Using RegexBuddy (or, as r3mus said, regex101.com, which is free):
Assert position at the beginning of the string «^»
Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Assert position at the beginning of the string «^»
Match the character “>” literally «>»
EDIT: Birei's answer is probably more correct if the regex in question is actually wrong.
You have to get rid of the second ^ character. It is a metacharacter and means the beginning of a line (without special flags like /m), but that meaning it's already achieved with the first one.
The character > will match at the beginning of the line without the second ^ because the initial whitespace is optional (* quantifier). So, use:
$line =~ /^\s*>/
It is much easier to reverse engineer perl script with debugger.
"perl -d script.pl" or if you have Linux ddd: "ddd cript.pl &".
For multiline regex this regex match for emptyline with spaces and begin of the next FASTA.
http://www.rexfiddle.net/c6locQg

Regex to change the number of spaces in an indent level

Let's say you have some lines that look like this
1 int some_function() {
2 int x = 3; // Some silly comment
And so on. The indentation is done with spaces, and each indent is two spaces.
You want to change each indent to be three spaces. The simple regex
s/ {2}/ /g
Doesn't work for you, because that changes some non-indent spaces; in this case it changes the two spaces before // Some silly comment into three spaces, which is not desired. (This gets far worse if there are tables or comments aligned at the back end of the line.)
You can't simply use
/^( {2})+/
Because what would you replace it with? I don't know of an easy way to find out how many times a + was matched in a regex, so we have no idea how many altered indents to insert.
You could always go line-by-line and cut off the indents, measure them, build a new indent string, and tack it onto the line, but it would be oh so much simpler if there was a regex.
Is there a regular expression to replace indent levels as described above?
In some regex flavors, you can use a lookbehind:
s/(?<=^ *) / /g
In all other flavors, you can reverse the string, use a lookahead (which all flavors support) and reverse again:
s/ (?= *$)/ /g
Here's another one, instead utilizing \G which has NET, PCRE (C, PHP, R…), Java, Perl and Ruby support:
s/(^|\G) {2}/ /g
\G [...] can match at one of two positions:
✽ The beginning of the string,
✽ The position that immediately follows the end of the previous match.
Source: http://www.rexegg.com/regex-anchors.html#G
We utilize its ability to match at the position that immediately follows the end of the previous match, which in this case will be at the start of a line, followed by 2 whitespaces (OR a previous match following the aforementioned rule).
See example: https://regex101.com/r/qY6dS0/1
I needed to halve the amount of spaces on indentation. That is, if indentation was 4 spaces, I needed to change it to 2 spaces.
I couldn't come up with a regex. But, thankfully, someone else did:
//search for
^( +)\1
//replace with (or \1, in some programs, like geany)
$1
From source: "^( +)\1 means "any nonzero-length sequence of spaces at the start of the line, followed by the same sequence of spaces. The \1 in the pattern, and the $1 in the replacement, are both back-references to the initial sequence of spaces. Result: indentation halved."
You can try this:
^(\s{2})|((?<=\n(\s)+))(\s{2})
Breakdown:
^(\s{2}) = Searches for two spaces at the beginning of the line
((?<=\n(\s)+))(\s{2}) = Searches for two spaces
but only if a new line followed by any number of spaces is in front of it.
(This prevents two spaces within the line being replaced)
I'm not completely familiar with perl, but I would try this to see if it work:
s/^(\s{2})|((?<=\n(\s)+))(\s{2})/\s\s\s/g
As #Jan pointed out, there can be other non-space whitespace characters. If that is an issue, try this:
s/^( {2})|((?<=\n( )+))( {2})/ /g