Why is this vim regex so expensive: s/\n/\\n/g - regex

Attempting this on a sufficiently large file (say 80,000+ lines and about 500k+) will crash things or stall eventually both on my server and on my local Mac.
I've tried this at the command line as well, with the same result:
vim -es -c '%s/\n/\\n/g' -c wq $file
Also, the problem appears to be with the selection (\n) and not the replacement (\\n).
For my larger files I can of course split them and cat them back when finished, but the split points cannot be arbitrary in my case and must be adjusted manually for each and every split.
I appreciate that there are other ways to do this -- sed, etc. -- but I have similar and additional problems there, and I would like to be able to do this with vim.

I'm adding my comment as an answer:
Text editors usually don't like 'gigantic' lines (which is what you'll get with that replacement).
To test that if this is is due because of the 'big line' and not the substitution itself I did this test:
I created a simple ~500KB file with a script. No new line characters, just a single line. Then I tried to load the file with vim. Result? I had to kill it :-).
However, if on the same script I write some new lines every now and then, I have no problems opening the file.
Also, one thing you could try is the following: on vim, replace \n by \n\n if it is fast, then this should also confirm the 'big line' issue.

Related

bulk file renaming in bash, to remove name with spaces, leaving trailing digits

Can a bash/shell expert help me in this? Each time I use PDF to split large pdf file (say its name is X.pdf) into separate pages, where each page is one pdf file, it creates files with this pattern
"X 1.pdf"
"X 2.pdf"
"X 3.pdf" etc...
The file name "X" above is the original file name, which can be anything. It then adds one space after the name, then the page number. Page numbers always start from 1 and up to how many pages. There is no option in adobe PDF to change this.
I need to run a shell command to simply remove/strip out all the "X " part, and just leave the digits, like this
1.pdf
2.pdf
3.pdf
....
100.pdf ...etc..
Not being good in pattern matching, not sure what regular expression I need.
I know I need something like
for i in *.pdf; do mv "$i$" ........; done
And it is the ....... part I do not know how to do.
This only needs to run on Linux/Unix system.
Use sed..
for i in *.pdf; do mv "$i" $(sed 's/.*[[:blank:]]//' <<< "$i"); done
And it would be simple through rename
rename 's/.*\s//' *.pdf
You can remove everything up to (including) the last space in the variable with this:
${i##* }
That's "star space" after the double hash, meaning "anything followed by space". ${i#* } would remove up to the first space.
So run this to check:
for i in *.pdf; do echo mv -i -- "$i" "${i##* }" ; done
and remove the echo if it looks good. The -i suggested by Gordon Davisson will prompt you before overwriting, and -- signifies end of options, which prevents things from blowing up if you ever have filenames starting with -.
If you just want to do bulk renaming of files (or directories) and don't mind using external tools, then here's mine: rnm
The command to do what you want would be:
rnm -rs '/.*\s//' *.pdf
.*\s selects the part before (and with) the last white space and replaces it with empty string.
Note:
It doesn't overwrite any existing files (throws warning if it finds an existing file with the target name).
And this operation is failsafe. You can get back the changes made by last rnm command with rnm -u.
Here's a list of documents for rnm.

Sed isn't replacing all occurrences of string in file

EDIT: I am using Cygwin. I am unsure whether this is of relevance and it was a detail I missed during writing this question.
EDIT2: Have tried replacing the "TAB" char people pointed out with the RegEx \s which covers spacing chars (spaces and tabs primarily) and this did not affect the expression at all, meaning that it is not the tabs causing the issue, especially since the expression runs once without errors anyway.
So far this script has been causing me a ton of trouble.
I DID have an issue before but I resolved that while I was writing a question here (lucky imo) but this one I've been stuck on for at least an hour now and I've tried varying solutions, none of which actually work or told me something I didn't already try.
I have a rather cool seeming FTP log fetcher script and part of this script replaces the 600MB of errors in this logfile to nothing, essentially removing them. Unfortunately this script also gets rid of parts of other errors too, so I've had to edit it. This is where I'm getting stuck.
Through base research I managed to find out that sed could do what I want, and through three hours of playing so far it does most of what I tell it to, minus one thing. One, and ONLY one, of the sed statements I have built only replaces the first instance of the string I've given it despite having the g modifier attached to the end.
I am working with a test script right now as to avoid potential permanent damage to my original FTP script, and the test script copies over an example file with a few of the errors I need replacing.
Walkthrough of the scripts INTENDED behaviour before showing:
1. Sets a prefix which happens on ALL lines in the file, pretty important part of the script.
2. Copies the example file to a file named test2.log
3. Replace all instances of the UNIX newline char \n with [loll] (first thing that came to my mind)
4. Remove all instances of battle error type 1 and 2.
5. Return all [loll] strings with the UNIX \n for newlines, therefore returning the logfile to its original state minus the errors.
Script:
#DTP="\[([0-9]+-[0-9]+-[0-9]+-[0-9]+|latest)\.log\] \[[0-9]+:[0-9]+:[0-9]+\] \[Server thread/(INFO|WARN)\]: "
echo "${DTP}"
DTP1="\[[0-9]*:[0-9]*:[0-9]*\]\s\[Server\sThread\/\(WARN\|INFO\)\]:\s"
DTP="\[loll\]\[[0-9]*:[0-9]*:[0-9]*\]\s\[Server\sThread\/\(WARN\|INFO\)\]:\s"
echo "${DTP}"
echo "1"
cp test.log test2.log
#cat test.log >test2.log
sed -i ':a;N;$!ba;s/\n/\[loll\]/g' test2.log #| egrep -i "" >test2.log
sed -i 's/'${DTP1}'Caught error in battle. Continuing...'${DTP}'java.lang.NullPointerException'${DTP}' at com.pixelmonmod.pixelmon.battles.controller.participants.PixelmonWrapper.useAttack(PixelmonWrapper.java:173)'${DTP}' at com.pixelmonmod.pixelmon.battles.controller.participants.PixelmonWrapper.takeTurn(PixelmonWrapper.java:330)'${DTP}' at com.pixelmonmod.pixelmon.battles.controller.BattleControllerBase.takeTurn(BattleControllerBase.java:276)'${DTP}' at com.pixelmonmod.pixelmon.battles.controller.BattleControllerBase.update(BattleControllerBase.java:157)'${DTP}' at com.pixelmonmod.pixelmon.battles.BattleRegistry.updateBattles(BattleRegistry.java:63)'${DTP}' at com.pixelmonmod.pixelmon.battles.BattleTickHandler.tickStart(BattleTickHandler.java:12)'${DTP}' at cpw.mods.fml.common.eventhandler.ASMEventHandler_20_BattleTickHandler_tickStart_WorldTickEvent.invoke(.dynamic)'${DTP}' at cpw.mods.fml.common.eventhandler.ASMEventHandler.invoke(ASMEventHandler.java:51)'${DTP}' at cpw.mods.fml.common.eventhandler.EventBus.post(EventBus.java:122)'${DTP}' at cpw.mods.fml.common.FMLCommonHandler.onPostWorldTick(FMLCommonHandler.java:255)'${DTP}' at net.minecraft.server.MinecraftServer.func_71190_q(MinecraftServer.java:929)'${DTP}' at net.minecraft.server.dedicated.DedicatedServer.func_71190_q(DedicatedServer.java:429)'${DTP}' at net.minecraft.server.MinecraftServer.func_71217_p(MinecraftServer.java:776)'${DTP}' at net.minecraft.server.MinecraftServer.run(MinecraftServer.java:639)'${DTP}' at java.lang.Thread.run(Thread.java:745)//gI' test2.log
echo "2"
sed -i 's/'${DTP1}'Caught error in battle. Continuing...'${DTP}'java.lang.NullPointerException\[loll\]//gI' test2.log
echo "3"
sed -i 's/\[loll\]/\n/g' test2.log
I've set them to also run case insensitive checks on the provided strings as sometimes I write with all lower case, however for most of this I copied and pasted it directly.
Sample input:
http://pastebin.com/3KPB33X2
Outputs:
Expected:
meow
Test message
WOOF MEOWLOL
Actual: http://pastebin.com/pnvDwkxz
It's been killing my mind for a while now because I had this issue even before the other one, except I barely noticed it. I can't find any predictable behaviour in the script, and as far as I am aware it SHOULD be working perfectly fine and giving me the output I expect.
Any help would be appreciated, because as soon as I can get this bug sorted out I'll be able to enter in the rest of the script and replace this with the existing battle-error replacement script in my log-fetcher.
Knowing me it's something small and stupid but I've tried literally everything I came across, including adding the :a;N;$!ba; to the start of the bit which isn't working properly (and realising that failed horribly).
Thanks.
~BAI1
Are you looking for something like this:
sed -n ':a;s/\[.*Server thread\/\(INFO\|WARN\).*//i;/^$/!p;n;b a' battle.log

Windows Batch File - Find and return string inside matching pattern

I'm using a batch file to identify and load fonts temporarily. It looks for strings like /FontFamily(Rubber Dinghy Rapids)/ occurring inside .ai and .pdf files.
Now if I do findstr /r FontFamily\(.*\) MyFile.ai, this command returns a hugely interminable line of crap data with FontFamily(Rubber Dinghy Rapids) lost somewhere in there. I ACTUALLY need it to return the value of .* it found inside - in this case Rubber Dinghy Rapids.
Can I do this more elegantly? Or maybe I can switch to using VBScript if it's more elegant there?
My current solution is slow as hell... nested for loops, with one of them delimiting the crap data by the ( character, then finding the line that says FontFamily(Rubber Dinghy Rapids then stripping out the FontFamily( string, leaving me finally with Rubber Dinghy Rapids.
I wrote an hybrid Batch-JScript program called FindRepl.bat that use JScript's regular expressions to search for strings in a file. Using my program you may solve your problem this way:
FindRepl.bat "FontFamily\((.*)\)" /$:1 < input.txt
You may get my program from this site.

Can Notepad++ save out search results to a text file?

I need to do quite a few regular expression search/replaces throughout hundreds and hundreds of static files. I'm looking to build an audit trail so I at least know what files were touched by what searches/replaces.
I can do my regular expression searches in Notepad++ and it gives me file names/paths and number of hits in each file. It also gives me the line #s which I don't really care that much about.
What I really want is a separate text file of the file names/paths. The # of hits in each file would be a nice addition, but really it's just a list of file names/paths that I'm after.
In Notepad++'s search results pane, I can do a right click and copy, but that includes all the line #s and code which is just too much noise, especially when you're getting hundreds of matches.
Anyone know how I can get these results to just the file name/paths? I'm after something like:
/about/foo.html
/about/bar.html
/faq/2012/awesome.html
/faq/2013/awesomer.html
/foo/bar/baz/wee.html
etc.
Then I can name that file regex_whatever_search.txt and at the top of it include the regex used for the search and replace. Below that, I've got my list of files it touched.
UPDATE What looks like the easiest thing to do (at least that I've found) is to just copy all the search results into a new text file and run the following regex:
^\tLine.+$
And replace that with an empty string. That'll give you just the file path and hit counts with a lot of empty space between each entry. Then run the following regex:
\s+\n
And replace with:
\n
That'll strip out all the unwanted empty space and you'll be left with a nice list.
maybe you need power of unix tools
assume you have GNUWin32 installed in c:\tools\gnuwin32
than if you have replace.bat file with that content:
#echo off
set BIN=c:\tools\gnuwin32\bin
set WHAT=%1
set TOWHAT=%2
set MASK=%3
rem Removing quotes
SET WHAT=###%WHAT%###
SET WHAT=%WHAT:"###=%
SET WHAT=%WHAT:###"=%
SET WHAT=%WHAT:###=%
SET TOWHAT=###%TOWHAT%###
SET TOWHAT=%TOWHAT:"###=%
SET TOWHAT=%TOWHAT:###"=%
SET TOWHAT=%TOWHAT:###=%
SET MASK=###%MASK%###
SET MASK=%MASK:"###=%
SET MASK=%MASK:###"=%
SET MASK=%MASK:###=%
echo %WHAT% replaces to %TOWHAT%
rem printing matching files
%BIN%\grep -r -c "%WHAT%" %MASK%
rem actual replace
%BIN%\find %MASK% -type f -exec %BIN%\sed -i "s/%WHAT%/%TOWHAT%/g" {} +
you can do regex replace in masked files recursively with output you required
replace "using System.Windows" "using Nothing" *.cs
The regulat expression I use for this kind of problem is
^\tLine.[0-9]*:.
And it works for me
This works well if you have Excel available and want to avoid using regular expressions:
Ctrl+A to select all the results
drag & drop the selected results to Excel
Create a Filter on the 1st row
Filter out the lines that have "(Blank)" on the 1st column
Select the remaining lines (i.e. the lines with the filenames) and copy/paste them to another sheet or any wanted destination
You could also Ctrl+A, Ctrl+C the search results, then use the Paste Option "Use Text Import Wizard" in Excel, say that the data is "Fixed width" and put one single break line after the 2nd character (to remove the two leading spaces in the filename during import), and use a filter to filter out the unwanted rows.

Compounding switch regexes in Vim

I'm working on refactoring a bunch of PHP code for an instructor. The first thing I've decided to do is to update all the SQL files to be written in Drupal SQL coding conventions, i.e., to have all-uppercase keywords. I've written a few regular expressions:
:%s/create table/CREATE TABLE/gi
:%s/create database/CREATE DATABASE/gi
:%s/primary key/PRIMARY KEY/gi
:%s/auto_increment/AUTO_INCREMENT/gi
:%s/not null/NOT NULL/gi
Okay, that's a start. Now I just open every SQL file in Vim, run all five regular expressions, and save. This feels like five times the work it should be. Can they be compounded in to one obnoxiously long but easily copy-pastable regex?
why do you have to do it in vim? how about sed/awk?
e.g. with sed
sed -e 's/create table/\U&/g' -e's/not null/\U&/g' -e 's/.../\U&/' *.sql
btw, in vi you may do
:%s/create table/\U&/g
to change case, well save some typing.
update
if you really want a long command to execute in vi, maybe you could try:
:%s/create table\|create database\|foo\|bar\|blah/\U&/g
Open the file containing that substitution commands.
Copy its contents (to the unnamed register, by default):
:%y
If there is only one file where the substitutions should be
performed, open it as usual and run the contents of that register
as a Normal mode command:
:#"
If there are several files to edit automatically, open those
files as arguments:
:args *.sql
Execute the yanked substitutions for each file in the argument list:
:argdo #"|up
(The :update command running after the substitutions, writes
the buffer to file if it has been changed.)
While sed can handle what you want (hovewer it can be interactive as you requestred by flag 'i'), vim still much powerfull. Once I needed to change last argument in some function call in 1M SLOC code base. The arguments could be in one line or in several lines. In vim I achieved it pretty easy.
You can open all php files in vim at once:
vim *.php
After that run in ex mode:
:bufdo! %s/create table/CREATE TABLE/gi
Repeat the rest of commands. At the end save all the files and exit vim:
:xall