I am having hundreds of lines as illustrated below, with more than one opening double-quote (“) occurring within almost every line as shown below:
... “ ... “ ..... “ .....
note: those dots (...) above denote both words & spaces in this context for illustrative purposes.
How to search (via regex) for every such occurrence within every line? I tried achieving this with:
“.*“ or,
“.* “
but it is disappointingly returning even those who are proper i.e., with both opening & closing double quotes also (which is the correct way it should be) as follows:
... “ ...” ..... “ .....” ...... “ .....
For every second [space]“ recurring within every line it encounters — How to replace them (via regex) into ” [space]?
use [^”]* instead of .*, so it will search all occurence of two opening quotes with any character sequence in between except of closing qoute.
EDIT:
“[^”]*?“ -- miss, that it will find largest srting between two opening quotes (OQ) as possible, in “some text “more text “text it will find “some text “more text “, so you need ? after *.
And as of your pictures, you are using sublime, so replace (“[^”]*?)\s“ with \1”
() capturing a group, which you can access later with \n, where n is group number.
*? lazy expression, stop at first occurence of next character (\s here)
\s any whitespace character (space, tab, new line, etc.)
\1 first captured group, here - opening quote and some text
It is possible to use look behind (?<=text), but it length must be known, in your exampole its length is unknown (because of *).
If you search for s/(“.*?)(“)/, you could replace every second occurrence of “ into ” by r/(“.*?)(“)/$1”/g
.*? as a lazy operator would make it stop right on the second occurrence.
Related
I am new to Regex world. I would like to rename the files that have time stamp added on the end of the file name. Basically remove last 25 characters before the extension.
Examples of file names to rename:
IMG523314(2021-12-05-14-51-25_UTC).jpg > IMG523314.jpg
Test run1(2021-08-05-11-32-18_UTC).txt > Test run1.txt
To remove 25 characters before the .extension (2021-12-05-14-51-25_UTC)
or if you like, remove the brackets ( ) which are always there and everything inside the brackets.
After the right bracket is always a dot '. "
Will Regex syntax as shown in the Tittle here, select the above? If yes, I wonder how it actually works?
Many Thanks,
Dan
Yes \(.*\) will select the paranthesis and anything inside of them.
Assuming when you ask how it works you mean why do the symbols work how they do, heres a breakdown:
\( & \): Paranthesis are special characters in regex, they signify groups, so in order to match them properly, you need to escape them with backslashes.
.: Periods are wildcard matcher, meaning they match any single character.
*: Asterisks are a quantifier, meaning match zero to inifite number of the previous matcher.
So to put everything together you have:
Match exactly one opening parathesis
Match an unlimited number of any character
Match exactly one closing bracket
Because of that closing bracket requirement, you put a limit to the infinite matching of the asterisk and therefore only grab the parenthesis and characters inside of them.
Yes, it's possible:
a='IMG523314(2021-12-05-14-51-25_UTC).jpg'
echo "${a/\(*\)/}"
and
b='Test run1(2021-08-05-11-32-18_UTC).txt'
echo "${b/\(*\)/}"
Explanation:
the first item is the variable
the second is the content to be replaced \(*\), that is, anything inside paranthesis
the third is the string we intend to replace the former with (it's empty string in this case)
So I cant use $' variable
But i need to find the pattern that in a file that starts with the string “by: ” followed by any characters , then replace whatever characters comes after “by: ” with an existing string $foo
im using $^I and a while loop since i need to update multiple fields in a file.
I was thinking something along the lines of [s///]
s/(by\:[a-z]+)/$foo/i
I need help. Yes this is an assignment question but im 5 hours and ive lost many brain cells in the process
Some problems with your substitution:
You say you want to match by: (space after colon), but your regex will never match the space.
The pattern [a-z]+ means to match one or more occurrences of letters a to z. But you said you want to match "any characters". That might be zero characters, and it might contain non-letters.
You've replaced the match with $foo, but have lost by:. The entire matched string is replaced with the replacement.
No need to escape : in your pattern.
You're capturing the entire match in parentheses, but not using that anywhere.
I'm assuming you're processing the file line-by line. You want "starts with the string by: followed by any characters". This is the regex:
/^by: .*/
^ matches beginning of line. Then by: matches exactly those characters. . matches any character except for a newline, and * means zero-or more of the preceding item. So .* matches all the rest of the characters on the line.
"replace whatever characters that come after by: with an existing string $foo. I assume you mean the contents of the variable $foo and not the literal characters $foo. This is:
s/^by: .*/by: $foo/;
Since we matched by:, I repeated it in the replacement string because you want to preserve it. $foo will be interpolated in the replacement string.
Another way to write this would be:
s/^(by: ).*/$1$foo/
Here we've captured the text by: in the first set of parentheses. That text will be available in the $1 variable, so we can interpolate that into the replacement string.
I am using Notepad++ to find (".*)"(.*) and replace it with \1\"\2 but it doesn't seem to work. I don't know why.
Example:
Someone said "My name is "sean""
I want it to be:
Someone said "My name is \"sean\""
Edit: In my case the closing quote is always on the end of line so will (".*)"(.*"$) work?
Edit2: Also the first quote is preceded with a comma so I will use (,".*)"(.*"$) though it may not work in some cases but I think it will work with my file.
Now there is the problem with the replace it doesn't add \" it just add some space.
It should work... you just need to do a little fixing...
The Find what regex should be ("[^"]*)("\w*)(")([^"]*")
The Replace with expression should be \1\\\2\\\3\4
Make sure you select the Search Mode to be "Regular expression"
Explanation...
This is quite tricky - I've assumed that the quoted text WITHIN quotes is just a single word. If you assume something else it becomes very hard to pin down.
You need to find a
" followed by
[^"]* - any number of characters that are NOT a " and then
("\w*)(") - a quoted word, and then finally
([^"]*") - any additional number of non-quote characters + a final quote
This is important because regular expression matching is greedy by default, and a .* would continue to match all characters, including " until the end of the string (see link )
In the replacement string you need to have \\ to represent a single \
If i have a line of text that i want to remove from a text file in notepad and it is always formatted like this
[text]:
except that the words in the text area change. what is a regular expression i could create to remove the whole section with the search and replace function in notepad?
To delete the entire line starting with [any text]: you can use: ^[\t ]*\[.*?\]:.*?\r\n
Explanation:
^ ... start search at beginning of a line (in this case).
[\t ]* ... find 0 or more tabs or spaces.
\[ ... find the opening square bracket as literal character.
.*? ... find 0 or more characters except the new line characters carriage return and line-feed non greedy which means as less characters as possible to get a positive match, i.e. stop matching on first occurrence of following ] in the search expression.
\]: ... find the closing square bracket as literal character and a colon.
.*?\r\n ... find 0 or more characters except the new line characters and finally also the carriage return and line-feed terminating the line.
The search string ^[\t ]*\[.*?\]:.*?$ would find also the complete line, but without matching also the line termination.
The replace string is for both search strings an empty string.
If by removing the entire section, you mean remove the [text]: up to the next [otherText]:, you can try this:
\[text\]:((?!\[[^\]]*\]:).)*
Remember to set the flag for ". matches newline".
This regex basically first matches your section title. Then, it would start matching right after this title and for each character, it uses a negative lookahead to check if the string following this character looks like a section title. If it does the matching is terminated.
Note: Remember that this regex would replace all occurrences of the matched pattern. In other words, if you have more than one of that section, they are both replaced.
I want to delete all lines ending with |
I tried
.*[|;]
but it's not the end
Use the following regex:
.*\|$
This says "any character any number of times (.*), followed by a pipe (\| - you have to escape it), and then the end of a line ($)".
If you want to find lines ending with either ; or |, use:
.*[\|;]$
You don't have to escape the pipe in this case, but I prefer to do so anyway.
In either case, make sure you're in "Regular expression" search mode with ". matches newline" unchecked.