Regexp to strip characters after URL in emacs - regex

I have a .org file with lines of this sort:
*
http://en.wikipedia.org/wiki/Qibla Qibla - Wikipedia, the free
as you can see, an asterisk, followed by newline, followed by URL, followed by one space, and then some extraneous useless text that i want to get rid of.
i would like to format this file to this structure:
*
http://en.wikipedia.org/wiki/Qibla
or, strip all the characters after the end of the URL while maintaining the rest of the structure.
how can i do this in emacs?

Assuming you're doing this interactively with query-replace-regexp, try using this regex to string the junk off the end of the URLS:
^\(http[^ ]+\).*$
Replacement:
\1
You can get rid of the asterisks easily enough, use this regex and replace with nothing:
*^J
Use control-Q followed by control-J to enter the newline.
Edit: Or, to do it in one, replace
*^J\(http[^ ]+\) .*^J
With
\1^J
Where ^J is a literal newline inserted by typing control-Q followed by control-J.

Related

Remove trailing whitespace at the end of aspx file

I am trying to remove trailing whitespace including \r and \n at the end of aspx files by using Find and Replace using the pattern
\s+(?!.)
trying to replace whitespace followed by nothing with nothing.
The result is that everything will come on the same line.
Why?
I also tried \s+$ with the same result.
You may add a negative lookahead to the end of your current pattern:
(\s+\r?\n)+$(?!.)
This will ensure that only final lines with whitespace only are matched. See the demo here.

Is there regex to remove space and newline from xml input file

I would like to change an xml which is in format
<input>My
Input</input>
<input2>My
input2</input2>
to
<input>My Input</input>
<input2>My input2</input2>
The input xml file has more than 10000 records with xml in the above format which breaks the software to work properly.
Need a regex to fix it in one stroke.
I tried ('//n','') but it is not functioning as expected
If your regex flavor supports Lookbehinds, you may use something like this:
(?<!>)(\s)*[\r\n]+
..and replace with \1.
This will match any number of new-line characters, preceded by zero or more other whitespace characters and not preceded by the > character. Then, it will replace them with a whitespace character (if present) or nothing.
Demo.
If Lookbehind is not supported, you may use:
([^>])(\s)*[\r\n]+
..and replace with \1\2.

regex match file with multiple extension

I have several strings like this
XYZ_TEST_2017.txt
ASD_TEST_2017.txt.tmp
I need to extract only those strings ending with .txt
So I'm using this regex:
[A-Z]{3}_TEST_[0-9]{4}.txt
However I still get the strings with multiple extensions like the second one (.txt.tmp)
See my regex demo.
How can I handle it?
To have your regex match everything up to the end, append an "end-of-text marker" ($) to your pattern like this:
[A-Z]{3}_TEST_[0-9]{4}\.txt$
As you may have noticed, I also escaped the dot, otherwise this filename would match as well:
SOM_TEST_1234Etxt
The dot (.) would match any character (depending on your flags, even newline and carriage return), in this case, the E before txt.

Regular expression matching space but at the end of line

I'm trying to replace multiple spaces with a single one, but at the start of the line.
Example:
___abc___def__
___ghi___jkl__
should turn to
___abc_def__
___ghi_jkl__
Note that I've replaced space with underscore
A simple search using the following pattern:
([^\s])\s+
matches the space at the end of the first line up to the space at the beginning of the next one.
So, if I replace with \1_, I get the following:
___abc_def_ghi_jkl
And that is absolutely not what I expect and regex engines, e.g., PowerGREP or the one in Visual Studio, don't behave that way.
If you want to match only horizontal spaces, use \h:
Find what: (?<=\S)\h+(?=\S)
Replace with: (a space)
There are several possible interpretations of the question. For each of them the replacement will be a single space character.
If spaces is plural and means space characters but not tabs then use
a find string of (^ {2,})|( {2,}$).
If spaces is plural and should includes tabs then use a find string
of (^[ \t]{2,})|([ \t]{2,}$).
If any leading or trailing spaces and tabs (one or more) is to be
replaced with a space then use a find string of (^[ \t]+)|([ \t]+$).
The general form of each of these is (^...)|(...$). The | means an alternation so either the preceding or the following bracketed expression can match. Hence the find what text can match either at the beginning or the end of a line. The ... varies depending on exactly what needs to be matched. Specifying [ \t] means only the two characters space and tab, whereas \s includes the line-end characters.
Ok, so the intention was to replace this:
Hey diddle diddle, \n<br/>
The Cat and the fiddle,\n
with this:
Hey diddle diddle,\n<br/>
The Cat and the fiddle,\n
A slightly modified version of Toto's answer did the trick:
(?<=\S)\h+(?=\S)|\s+$
finding any space(s) between word-characters and trailing space at the end of the line.

regex_replace doesn't replace the hyphen/dash

I'm using regex_replace in postgreSQL and trying to strip out any character in a string that is not a letter or number. However, using this regex:
select * from regexp_replace('blink-182', '[^a-zA-Z0-9]*$', '')
returns 'blink-182'. The hyphen is not being removed and replaced with nothing ('') as I would expect.
How do I modify this regex to also replace the hypen - I've tested with many other characters (!,.#) and they are all replaced correctly.
Any ideas?
You currently replace a run of non-alphanumeric characters at the end of the string only. I guess your tests were mainly strings of the form foobar!# which worked because the characters to remove were at the end of the string.
To replace every occurrence of such a character in the string remove the $ from the regex:
[^a-zA-Z0-9]+
(also I changed the * into a + to prevent zero-length replaces between every character.
If you want to retain whitespace as well you need to add it to the character class:
[^a-zA-Z0-9 ]+
or possibly
[^a-zA-Z0-9\s]+
If the regex in the beginning was in fact correct in that you only want to remove non-alphanumeric characters from the end of the string but you also want to remove hyphen-minus in the middle of a string (while retaining other non-alphanumeric characters in the middle of the string), then the following should work:
[^a-zA-Z0-9]+$|-
maniek points out that you need to add an argument to regexp_replace so it will replace more than once match:
regexp_replace('blink-182', '[^a-zA-Z0-9]+$|-', '', 'g')