How do I replace the spaces between words in Emacs? - regex

I have a files consisting of words:
ndkjsank ndjksandknsakj
dnsjakndjskndjknsakjn dsajkndksnakjndsnajkndjkas
dsnjak a
And I would like to replace the (sometimes multiple) spaces between them with a single tab:
dsnak**\t**ndsjka
njdkas**\t**ndksjankda
njdsaibdusai**\t**nkdsnakjdnas
Is this possible using a regular expression in Emacs? I thought I might get away with using a rectangular selection area but the words are of varying length (and the file is far too long to do it manually).
EDIT:
This comes close but it also selects the spaces/newlines/tabs to the right of the second word:
\s-

The regexp you want is +. Call M-x replace-regex then replace + with \t. Note there's an empty space before the + sign.
Also, to produce the tab you might have to hit the C-qC-i keys. I'm not sure if it accepts the \t syntax when called interactively.

Related

Regex NotePad++ or batch script to find and replace double bracketed text with CR LF -- would prefer NP++

I managed to do most of my conversion in VBA Macro (Word > txt) but some changes were made also that I could not forego or get around. Unfortunately, I had not been in the habit of using styles and precise formatting in my docs... (Which is why a PanDoc conversion did not "pan" out well, if you'll excuse the pun.)
In my docs, I was using bold text/lines for in-text titles (not Heading 2 alas) but as I was converting mid-sentence one or two-word bold phrases into phrases to go between double square brackets, the makeshift titles/headings were also changed to [[some title]] format in the process.
With Find and Replace (a batch script that goes through all files in a folder would also do), I would like to search for each and any number of instances of CRLF [[some title CRLF]]CRLF and replace the brackets with ** (to make the title bold), or perhaps ## to make the headings I was missing back in MS Word (I would of course need the line breaks as well).
For better understanding, please see attached picture here:
I am fairly sure that all instances are similarly syntaxed. If not, I may be able to tailor your regex code to differing instances later on.
As you can see, I was trying to do it in two steps but that's not good, because the second step (which I couldn't even get right) would propably have altered other texts I need intact (there must be sentences that start with double brackets after CRLF).
I would need the two steps in one so that only the targeted double bracketed text would be changed to bold or Heading 2.
Basically what I could not do is: find the proper regex solution for matching double CRLF-ed and square-bracketed text for any number of words than may occupy more than one line and starts with a capital letter. I would need an empty line above and below the title as indicated in the image (the VBA macro somehow made two instances of CRLF and carried the brackets to a new line, which I do not like, either).
EDIT.
In the meantime I managed to cook something up but now I couldn't insert the CRLF in front of the match string. At this point this is not enough as other instances are also changed, even lowercase in-line items, for some reason...
Regex:
\[\[([A-Z][\S\s]+?)\]\]
Substitution:
## $1\r\n
https://regex101.com/r/mH6B9N/1
Since then, I made improvements towards what I wanted (I had to test in NotePad++ and not Regex101, for different results), but now in multiple documents I have found match across spill-over lines, as described in here:
Single line regex search in Notepad++
Is it possible that I cannot do what I want? The problem is having non-title text strings having line-break, double brackets and capitalized letters.
What it looks like in other documents:
See here.
I circled around with red in image for clarification. See also:
https://regex101.com/r/8XsIGx/1
Is it possible to match a certain word like "címnél" and not execute on that match if that word is present in a line?
Thanks very much in advance,
F.
You can use
(?s)\R\K\[\[((?:(?!\[\[|]]).)*)\R*]](?=\R)
Replace with ## $1. See the regex demo.
Details:
(?s) - equivalent of the . matches newline option
\R - a line break sequence
\K - omit the text matched so far (the newlines)
\[\[ - a [[ text
((?:(?!\[\[|]]).)*) - Group 1: any char, as many as possible occurrences, that does not start a [[ or ]] char sequence
\R* - zero or more line breaks
]] - a ]] text
(?=\R) - immediately to the right, there must be a line break.

Replace several lines

In Notepad++, I would like to (find-in-files and) replace several lines, but I fail to paste several lines into the "Replace what:" text-box.
I select the lines in an opened file, press Ctrl+Shift+F, and I get the lines copied into the "Find what:" text-box. "Extended" is pre-selected.
The problem is, when I try to copy the lines from the "Find what:" text-box to the "Replace with:" text-box, only the first line is copied. Had the lines appeared with \r\n it would have solved the problem, but they don't. (I did get once the lines to appear in "Find what:" with the \r\n, but I don't know what caused it.)
The Find what and Replace fields take a single line each. Those lines can match or generate, respectively, multiple lines by inserting the four characters \r\n where line breaks are needed; provided the Extended or Regular expression are selected.
To generate four lines of text the Replace field could be set to contain these 31 characters One\r\nTwo\r\nThree\r\nFour\r\n
Notepad has three search modes to control how the characters within the Find what and Replace fields are interpreted. See the Searching ... Normal search part of the Notepad++ help pages for more details.
Normal means
is handled literally. That means a Find what string such as s\t looks for the three letters s, \ and t in that order.
Extended means that \ characters are used to indicate special characters such as newline and tab characters, etc. That means a Find what string such as s\t looks for the two letters s and TAB in that order. To look for the three letters s, \ and t in that order needs the Find what string to be s\\t.
Regular Expression means that several other characters are interpreted, not as themselves but, specially as parts of a regular expression.
A cumbersome workaround:
open a file with the multiple-lines you want to copy (but can't) into the find/replace box.
replace (using "Extended") all \r\n into \\r\\n.
you can now copy the multiple-lines as one line (with the new \r\n as text) to the search-box.

NOTEPAD ++ List: How to put each word on new line

If this cane be done with notepad++ I'm sure it's something simple I'm looking over. Or if there is another way i'm all ears.
I have a list of 10,000 - 20,000 words. Each word is a single word. No spaces in any one word but a single space between each and every word.
All the words are in a straight line format and rap-around. I would like to put each word on a new line all the way down my txt file. I need this as I need to be able to append something on the front and back of each word. That I can do. But I do not have the the 24 hours its going to take to drop each word manually. Any ideas? Thanks!
use the Replace function search for space and replace with \n remember to use the extended option.
I tried with the extended version but it didn't work for me so I tried with regular expression it works for me. Here is step by step process:
First type ctrl + H on windows. (Find & Replace)
In find section type: [ ]+ (there is single space between the brackets)
In the replace section type: \n
Select the Regular Expression option.
Finally, click on find & replace all.
It will automatically put all words in the new line.
Hope it will work for you as well!

Remove everything before and after variable=int

I'm terrible at regex and need to remove everything from a large portion of text except for a certain variable declaration that occurs numerous times, id like to remove everything except for instances of mc_gross=anyint.
Generally we'd need to use "negative lookarounds" to find everything but a specified string. But these are fairly inefficient (although that's probably of little concern to you in this instance), and lookaround is not supported by all regex engines (not sure about notepad++, and even then probably depends on the version you're using).
If you're interested in learning about that approach, refer to How to negate specific word in regex?
But regardless, since you are using notepad++, I'd recommend selecting your target, then inverting the selection.
This will select each instance, allowing for optional white space either side of the '=' sign.
mc_gross\s*=\s*\d+
The following answer over on super user explains how to use bookmarks in notepad++ to achieve the "inverse selection":
https://superuser.com/questions/290247/how-to-delete-all-line-except-lines-containing-a-word-i-need
Substitute the regex they're using over there, with the one above.
You could do a regular expression replace of ^.*\b(mc_gross\s*=\s*\d+)\b.*$ with \1. That will remove everything other than the wanted text on each line. Note that on lines where the wanted text occurs two or more times, only one occurrence will be retained. In the search the ^.*\b matches from start-of-line to a word boundary before the wanted text; the \b.*$ matches everything from a word boundary after the wanted text until end of line; the round brackets capture the wanted text for the replacement text. If text such as abcmc_gross=13def should be matched and retained as mc_gross=13 then delete the \bs from the search.
To remove unwanted lines do a regular expression search for ^mc_gross\s*=\s*\d+$ from the Mark tab, tick Bookmark line and click Mark all. Then use Menu => Search => Bookmark => Remove unmarked lines.
Find what: [\s\S]*?(mc_gross=\d+|\Z)
Replace with: \1
Position the cursor at the start of the text then Replace All.
Add word boundaries \b around mc_gross=\d+ if you think it's necessary.

How to replace line-breaks with commas using grep in TextWrangler?

I have a text-file container a number of lines, which I need to turn into a csv. What is the easiest way to replace all the line-breaks with ", ". I have TextWrangler and read that it would do so by using grep and regular expressions, but have very little experience using regular expressions and don't know how grep works. Anyone who can help me get started?
Choose Find from the Search menu. TextWrangler opens the Find window.
Select the "Grep" checkbox
Type the string you are looking for ("\n" or "\r\n" or "\r") in the Find textfield.
Type the replace string (", ") in the Replace text field.
Click "Replace All"
See chapters 7 and 8 of the TextWrangler User Manual if you have problems.
Alternatively, and with only two pieces of software (Excel and Notepad++, which is also free and AWESOME).
Take your list (I assume it's one per line, in a column, for example):
Remove any empty cells
Copy the addresses, and Paste Special>Transpose (this put them into cells going from A-->).
Copy list into notepad++ - you'll note that it shoves them in as one long list, losing that irritating table structure.
Find the shortest clear space (but use the entire space) between two email addresses, and replace all with a ","
Then find all remaining spaces (i go with one space at a time), and replace with nothing (i.e. the 'replace with' box is empty).
Et voila!
and you know you can also save stuff as comma-separated values, right?