How to use regular expression edit the content? - regex

I'm edit an rst file for a document. There are lot of image links I have to edit them one by one, I'd like to ask, is there anybody can help write a regular expression that can transfer it in one time.
The original text looks like:
*Figure 1.2: Where is the dog?* <dog.html#fig_dog>
I'd like it translate it to:
:ref:fig_dog
And there is another one:
*How are you* <how_are_you.html>
I'd like it translate it to:
:ref:how_are_you
I have try some expression in editplus or notepad++, but i can't match them very well.

Search:
\*.*?\*\s*<(?:.*#)?([^.>]+)(\.[^>]*)?>
Replace:
:ref:\1

split into two regexes
To match before the html
<(.*).html.*?>
To match the anchor
<.*.html#(.*?)>

Related

How to exclude delimiters inside text qualifiers using Regex?

I am trying to exclude delimiters within text qualifiers. For this, I am trying to use Regex. However, I am new to Regex and am not able to fully accomplish my needs. I would be very greatful if someone can help me out.
In Alteryx, I load a delimited flat text file as 'non-delimited' and say that it does not have text qualifiers. Thus, the input will look something like this:
"aabb"|ccdd|eeff|gghh
"aa|bb"|ccdd|eeff|gghh
"aa|bb"|ccdd|"ee|ff"|gghh
"aa|bb"|"cc|dd"|"ee|ff"|"gg|hh"
"aabb"|"ccdd"|"eeff"|"gghh"
"aabb"|"ccdd"|"eeff"|"gg|hh"
aabb|ccdd|eeff|gghh
"aa|bb"|ccdd|eeff|"gg|hh"
aabb|cc|dd|eeff|gghh
aabb|"cc||dd"|eeff|gghh
aabb|"c|c|dd"|eeff|gghh
"aa||bb"|ccdd|eeff|gghh
"a|a|b|b"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|"g|g|hh"
"aabb"|ccdd|eeff|"gg||hh"
I want to exclude all delimiters that are in between text qualifiers.
I have tried to use Regex to replace the delimiters within text qualifiers with nothing.
So far, I have tried the following Regex code for my target:
(")(.*?[^"])\|+(.*?)(")
And I have used the following for my replace:
$1$2$3$4
However, this will not fix te lines 11, 13, 14 and 15.
I wish to obtain the following results:
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|"eeff"|gghh
"aabb"|"ccdd"|"eeff"|"gghh"
"aabb"|"ccdd"|"eeff"|"gghh"
"aabb"|"ccdd"|"eeff"|"gghh"
aabb|ccdd|eeff|gghh
"aabb"|ccdd|eeff|"gghh"
aabb|cc|dd|eeff|gghh
aabb|"ccdd"|eeff|gghh
aabb|"ccdd"|eeff|gghh
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|gghh
"aabb"|ccdd|eeff|"gghh"
"aabb"|ccdd|eeff|"gghh"
Thank you in advance for helping me out!
With kind regards,
Robin
I can't think of the correct syntax in REGEX unless you are putting in each pattern that could be found.
However, an easier way (maybe not as performant), would be to use a Text to Columns selecting Ignore delimiters in quotes. If you need it back together in one cell afterwards, you can transpose, then remove delimiters followed by a Summarize to concatenate each RecordID Group.

Regular expression: how to not select a given world in regular expression

I have this scenario:
I need to find every urls that not contains http
E.g.:
I can select:
I can't select:
I know I can negate a word in regular expression:
^((?!http).)*
But I dont know how to select the without the http://
Like this?
<a href="(?!http://)([^"]*)">

Regex in Notepad++ to move contents of an element to an attribute value

I'm trying to solve a regex riddle. Let's say I have rows of hrefs looking like this:
anchor1.in
an3.php
setup.exe
What I want the regex (or any other solution) to do is to take the href title and copy it over to the actual url with a foward slash in front of it.
A successful result would become:
anchor1.in
an3.php
setup.exe
If you can solve this please explain how you did it.
You can use the following to match:
(<a\s+href=")(.*?)(">)(.*?)(<\/a>)
And replace with:
\1\2/\4\3\4\5
See DEMO and Explanation

Regular Expression filter for Meta Fields

I want to parse text content to extract some parameters with Regular Expression.
My text looks like below:
//_META_FIELD{Parameter: S}
And, I want to filter content start with "//_META_FIELD{" and end with "}"
So, I can get the filtered content will : Parameter: S
Can any one help?
This Regex will find what you are looking for:
#^//_META_FIELD{(.+?)}$#m
^ is to make sure is at the beginning of the line and $ is to make sure nothing else is after that closing } You can remove that if you don't need it.
Also you can see an example of that RegExp here
The regex should look something like this:
^//_META_FIELD\{(.*?)\}$

Remove after .jpg

I'm getting a value like this:
myimage.jpg123456jpg
and I need to remove everything after .jpg
how can I write this in razor?
I don't know anything about razor but this regex would match the part you'd like to save in the first result group:
(.+\.jpg)
You can see it in action here: http://regexr.com?2v7ki
Just match on .+\.jpg, which will give you the myimage.jpg section of the text.