Regexp to strip characters after URL in emacs

Regexp to strip characters after URL in emacs - regex

I have a .org file with lines of this sort:
*
http://en.wikipedia.org/wiki/Qibla Qibla - Wikipedia, the free
as you can see, an asterisk, followed by newline, followed by URL, followed by one space, and then some extraneous useless text that i want to get rid of.
i would like to format this file to this structure:
*
http://en.wikipedia.org/wiki/Qibla
or, strip all the characters after the end of the URL while maintaining the rest of the structure.
how can i do this in emacs?

Assuming you're doing this interactively with query-replace-regexp, try using this regex to string the junk off the end of the URLS:
^\(http[^ ]+\).*$
Replacement:
\1
You can get rid of the asterisks easily enough, use this regex and replace with nothing:
*^J
Use control-Q followed by control-J to enter the newline.
Edit: Or, to do it in one, replace
*^J\(http[^ ]+\) .*^J
With
\1^J
Where ^J is a literal newline inserted by typing control-Q followed by control-J.

Related

Remove trailing whitespace at the end of aspx file

I am trying to remove trailing whitespace including \r and \n at the end of aspx files by using Find and Replace using the pattern
\s+(?!.)
trying to replace whitespace followed by nothing with nothing.
The result is that everything will come on the same line.
Why?
I also tried \s+$ with the same result.

You may add a negative lookahead to the end of your current pattern:
(\s+\r?\n)+$(?!.)
This will ensure that only final lines with whitespace only are matched. See the demo here.

Is there regex to remove space and newline from xml input file

I would like to change an xml which is in format
<input>My
Input</input>
<input2>My
input2</input2>
to
<input>My Input</input>
<input2>My input2</input2>
The input xml file has more than 10000 records with xml in the above format which breaks the software to work properly.
Need a regex to fix it in one stroke.
I tried ('//n','') but it is not functioning as expected

If your regex flavor supports Lookbehinds, you may use something like this:
(?<!>)(\s)*[\r\n]+
..and replace with \1.
This will match any number of new-line characters, preceded by zero or more other whitespace characters and not preceded by the > character. Then, it will replace them with a whitespace character (if present) or nothing.
Demo.
If Lookbehind is not supported, you may use:
([^>])(\s)*[\r\n]+
..and replace with \1\2.

regex match file with multiple extension

I have several strings like this
XYZ_TEST_2017.txt
ASD_TEST_2017.txt.tmp
I need to extract only those strings ending with .txt
So I'm using this regex:
[A-Z]{3}_TEST_[0-9]{4}.txt
However I still get the strings with multiple extensions like the second one (.txt.tmp)
See my regex demo.
How can I handle it?

To have your regex match everything up to the end, append an "end-of-text marker" ($) to your pattern like this:
[A-Z]{3}_TEST_[0-9]{4}\.txt$
As you may have noticed, I also escaped the dot, otherwise this filename would match as well:
SOM_TEST_1234Etxt
The dot (.) would match any character (depending on your flags, even newline and carriage return), in this case, the E before txt.

Regular expression matching space but at the end of line

I'm trying to replace multiple spaces with a single one, but at the start of the line.
Example:
___abc___def__
___ghi___jkl__
should turn to
___abc_def__
___ghi_jkl__
Note that I've replaced space with underscore
A simple search using the following pattern:
([^\s])\s+
matches the space at the end of the first line up to the space at the beginning of the next one.
So, if I replace with \1_, I get the following:
___abc_def_ghi_jkl
And that is absolutely not what I expect and regex engines, e.g., PowerGREP or the one in Visual Studio, don't behave that way.

If you want to match only horizontal spaces, use \h:
Find what: (?<=\S)\h+(?=\S)
Replace with: (a space)

There are several possible interpretations of the question. For each of them the replacement will be a single space character.
If spaces is plural and means space characters but not tabs then use
a find string of (^ {2,})|( {2,}$).
If spaces is plural and should includes tabs then use a find string
of (^[ \t]{2,})|([ \t]{2,}$).
If any leading or trailing spaces and tabs (one or more) is to be
replaced with a space then use a find string of (^[ \t]+)|([ \t]+$).
The general form of each of these is (^...)|(...$). The | means an alternation so either the preceding or the following bracketed expression can match. Hence the find what text can match either at the beginning or the end of a line. The ... varies depending on exactly what needs to be matched. Specifying [ \t] means only the two characters space and tab, whereas \s includes the line-end characters.

Ok, so the intention was to replace this:
Hey diddle diddle, \n<br/>
The Cat and the fiddle,\n
with this:
Hey diddle diddle,\n<br/>
The Cat and the fiddle,\n
A slightly modified version of Toto's answer did the trick:
(?<=\S)\h+(?=\S)|\s+$
finding any space(s) between word-characters and trailing space at the end of the line.

regex_replace doesn't replace the hyphen/dash

I'm using regex_replace in postgreSQL and trying to strip out any character in a string that is not a letter or number. However, using this regex:
select * from regexp_replace('blink-182', '[^a-zA-Z0-9]*$', '')
returns 'blink-182'. The hyphen is not being removed and replaced with nothing ('') as I would expect.
How do I modify this regex to also replace the hypen - I've tested with many other characters (!,.#) and they are all replaced correctly.
Any ideas?

You currently replace a run of non-alphanumeric characters at the end of the string only. I guess your tests were mainly strings of the form foobar!# which worked because the characters to remove were at the end of the string.
To replace every occurrence of such a character in the string remove the $ from the regex:
[^a-zA-Z0-9]+
(also I changed the * into a + to prevent zero-length replaces between every character.
If you want to retain whitespace as well you need to add it to the character class:
[^a-zA-Z0-9 ]+
or possibly
[^a-zA-Z0-9\s]+
If the regex in the beginning was in fact correct in that you only want to remove non-alphanumeric characters from the end of the string but you also want to remove hyphen-minus in the middle of a string (while retaining other non-alphanumeric characters in the middle of the string), then the following should work:
[^a-zA-Z0-9]+$|-
maniek points out that you need to add an argument to regexp_replace so it will replace more than once match:
regexp_replace('blink-182', '[^a-zA-Z0-9]+$|-', '', 'g')

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Regexp to strip characters after URL in emacs - regex

Related

Remove trailing whitespace at the end of aspx file

Is there regex to remove space and newline from xml input file

regex match file with multiple extension

Regular expression matching space but at the end of line

regex_replace doesn't replace the hyphen/dash

Categories

Resources