Find and replace pattern in large number of files - regex

I want to replace text in about 80.000 log files using a regex. I love the batch search and replace of VSCode. I was unable to do this with VSCode, because it did not seem to handle this amount of data well. Any suggestion how I could do this with VSCode? Are there suggestions for alternatives?

Instead of depending on a GUI based tool, it might be easier to for a CLI tool for this.
If you're using Linux, or willing to install any of the tools like sed and find if you're on Windows then it should be relatively simple.
You can use sed which is a command line tool on all (or at least most) distributions of Linux, and can be installed on Windows.
Usage (for this use case):
sed -i s/{pattern}/{replacement}/g {file}
Use sed to replace the matched pattern with a replacement, using the global modifier to match all results, and the file to do the replacement and overwrite.
To target all files in a directory you can do:
find -type f -name "*.log" exec sed -i s/{pattern}/{replacement}/g {};
Find items recursively starting from the current directory where it's type is file, and it has a name ending with .log. Then use sed to replace the pattern with the contents you want for each matched file.
You can find how to get tools like sed and find for Windows on the following question:
https://stackoverflow.com/a/127567/6277798

Related

Interactive find-and-replace in all files including those in sub-directories using Vim

I would like to use Vim to find certain string and replace it with another. For every replacements, it should ask for confirmation similar to what %s/foo/replace/gc does for a single file in Vim.
What have I tried?
sed: It doesn't do interactive replacements.
One of the comments in the following this link suggests vim -esnc '%s/foo/bar/g|:wq' file.txt. I tried vim -esnc '%s/foo/bar/gc|:wq' file.txt (used gc instead of g). Now the terminal gets stuck.
Emacs xah-find-replace package. Unfortunately it didn't do interactive replacements as promised in the link.
Combining :argdo with the substitute command would be the recommended way to do this.
You can populate the args by either opening all the files vim *.txt or manually populate this after opening vim using the command:
:args `find . -type f -name '*.txt'`
Now set hidden using the command:
:set hidden
this is required so that you're not prompted to save the file when switching from one buffer to the other. Refer, :h hidden for more information.
Now use the substitute command like you're used to, prefixing the argdo to perform this for every file in the argslist
:silent argdo %s/pattern/replace/gec
The silent is optional and just mutes the reporting. The e flag is to stop reporting the error no matches found message in some of the buffers
Now after replace, you can write the changes using the following command
:argdo update
This will write buffers that were modified only.
If you are looking for an interactive mode of replacement, it is easier to do it with vim.
vim -c '%s/PATTERN/REPLACEMENT/gc' -c 'wq' FILENAME
The stuck terminal in your case is due to piping the save command to the replacement string, as it does not allow the interactive mode to come in to action. And it is not a stuck terminal, if you type "yes" and press enter it should still show you the expected result.
In case multiple files are involved which is spread across multiple subdirectories, using find command with for loop will help as mentioned below:
for FILENAME in `find DIRECTORYPATH -type f -name *.txt`
do
vim -c '%s/PATTERN/REPLACEMENT/gc' -c 'wq' $FILENAME
done
In bash turn on double star to list all files in all subdirectories:
shopt -s globstar
Now start vim once with all files and run the substitute command for all files, then save and exit:
vim -c 'set nomore' -c 'argdo %s/foo/bar/gc' -c xa **/*.txt

how can i use regex with locate command in linux

I want to use the locate command with regex but i am not able to use it.
I want to find pip file which is in /usr folder. i am trying this
locate -r "/usr/*pip"
To use globbing characters in your query you shouldn't specify regex (as you do with -r option), so just do:
locate "/usr/*pip"
From the man page:
If --regex is not specified, PATTERNs can contain globbing characters.
If any PATTERN contains no globbing characters, locate behaves as if
the pattern were *PATTERN*.
I would do so: locate -r '/usr/.*pip'

Automatically fix filename cases in C++ codebase?

I am porting a C++ codebase which was developed on a Windows platform to Linux/GCC. It seems that the author didn't care for the case of filenames, so he used
#include "somefile.h"
instead of
#include "SomeFile.h"
to include the file which is actually called "SomeFile.h". I was wondering if there is any tool out there to automatically fix these includes? The files are all in one directory, so it would be easy for the tool to find the correct names.
EDIT: Before doing anything note that I'm assuming you either have copies of the files off ot the side or preferably that you have a baseline version in source control should you need to roll back for any reason.
You should be able to do this with sed: Something like sed -i 's/somefile\.h/SomeFile.H/I' *.[Ch]
This means take a case-insensitive somefile (trailing /I) and do an in-place (same file) replacement (-i) with the other text, SomeFile.H.
You can even do it in a loop (totally untested):
for file in *.[Ch]
do
sed -i "s/$file/$file/I" *.[Ch]
done
I should note that although I don't believe this applies to you, Solaris sed doesn't support -i and you'd have to install GNU sed or redirect to a file and rename.
Forgive my, I'm away from my linux environment right now so I can't test this myself, but I can tell you what utilities you would need to use to do it.
Open a terminal and use cd to navigate to the correct directory.
cd ~/project
Get a list of all of the .h files you need. You should be able to accomplish this with the shell's wildcard expansion without any effort.
ls include/*.h libs/include/*.h
Get a list of all of the files in the entire project (.c, .cpp, .h, .whatever), anything that can #include "header.h". Again, wildcard expansion.
ls include/*.h libs/include/*.h *.cpp libs/*.cpp
Iterate over each file in the project with a for loop
for f in ... # wildcard file list
do
echo "Looking in $f"
done
Iterate over each header file with a for loop
for h in ... # wildcard header list
do
echo "Looking for $h"
done
For each header in each project file, use sed to search for #include "headerfilename.h", and replace with #include "HeaderFileName.h" or whatever the correct case is.
Warning: Untested and probably dangerous: This stuff is a place to start and should be thoroughly tested before use.
h_escaped=$(echo $h | sed -e 's/\([[\/.*]\|\]\)/\\&/g') # escapes characters in file name
argument="(^\s*\#include\s*\")$h_escaped(\"\s*\$)" # I think this is right
sed -i -e "s/$argument/\$1$h\$2/gip"`
Yes, I know it looks awful.
Things to consider:
Rather than going straight to running this on your production codebase, test it thoroughly first.
sed can eat files like a VCR can eat tapes.
Make a backup.
Make another backup.
This is an O(N^2) operation involving hard disk access, and if your project is large it will run slowly. If your project is not gigantic, don't bother, but if it is, consider doing something to pipe sed's output to other seds.
Your search should be case insensitive: it should match #include, #INCLUDE, #iNcLuDe, and any combination of case present in the existing header filename, as well as any amount of whitespace between the include and the header. Bonus points if you preserve whitespace.
Use Notepad++ to do a 'Find in Files' and replace.
From toolbar:
Search - Find in Files.
Then complete the 'Find what' and 'Replace with'.

Howto: Searching for a string in a file from the Windows command line?

Is there a way to search a directory and its subdirectories' files for a string? The string is rather unique. I want to return the name of the string and hopefully the line that the string is on in the file. Is there anything built into Windows for doing this?
You're looking for the built-in findstr command.
The /S option performs a recursive search.
There is the find.exe command, but it's pretty limited in its capabilities. You could install Cygwin or Unxutils and use a pipeline including its Unix-style find and grep:
find . -type f | xargs grep unique-string

Loading files that meet certain criteria into hidden buffers in vim

I'd like to do some code refactoring in vim. I have found the following gem to apply transformations to all buffers.
:dobuf %s/match/replace/gc
My code is layed out with the root directory having a directory for the dependencies and a build directory. I want to load all .cc , .h and .proto files from ./src ./include and ./tests. But not from the dependencies and build directories, into background/hidden buffers. I want to do this to do the refactor using the command above.
If someone knows of a cleaner way to perform the use case, please show it.
Note: I know you can string together find and sed to do this from the shell, however I prefer doing it in vim , if at all possible. The /gc prefix in the pattern I presented above serves the role of confirming replacements on each match, I need this functionality as often I don't want to replace certain matches, the find and sedsolution is too restrictive and finicky when attempting my use-case, it is also easy to destroy files when doing in-place replacements.
For reference using sed and find:
List candidate replacements:
find src include tests -name *.h -or -name *.cc -or -name *.proto|
xargs sed -n 's/ListServices/list_services/p'
Perform replacements:
`find src include tests -name *.h -or -name *.cc -or -name *.proto|
xargs sed -i 's/ListServices/list_services`'
You can use :argadd to add the files you need to vim's argument list. This will load them as inactive buffers (you can see them afterwards with an :ls. In your case, it might look like this:
argadd src/**/*.cc
argadd src/**/*.h
argadd src/**/*.proto
And so on, for the include and tests directories. You might want to make a command for that or experiment with glob patterns to make it a bit simpler. Afterwards, your command should work, although I'd recommend running it with :argdo instead:
argdo %s/match/replace/gc
This will only execute it for the buffers you explicitly specified, not for any of the other ones you might have opened at the time. Check :help argadd and :help argdo for more information.