sort list by keywords - regex

I have a list of keywords in a keywords.txt file. I have another file list.txt with the keywords in the beginning of each line. How can I sort the lines in list.txt to the same order they appear in keywords.txt?
keywords.txt
house
car
tree
woods
mailbox
list.txt
car bbdfbdfbdfbdf
tree gdfgvsgsgs
mailbox gsgsdfsdf
woods gsgsdgsdgsdgsdgsddsd
house gsdgfsdgsdgsdgsdg
final result in list.txt
house gsdgfsdgsdgsdgsdg
car bbdfbdfbdfbdf
tree gdfgvsgsgs
woods gsgsdgsdgsdgsdgsddsd
mailbox gsgsdfsdf

Here is an improved and simplified version of kiswa's answer.
#echo off
(
for /f "usebackq" %%A in ("keywords.txt") do findstr /bl "%%A" list.txt
)>sorted.txt
REM move /y sorted.txt list.txt
The FINDSTR command only matches lines that begin with the keyword, and it forces the search to be a literal search. (FINDSTR could give the wrong result if the /L option is not specified and the keyword happens to contain a regex meta-character.)
The code to replace the original file with the sorted file is commented out. Simply remove the REM statement to activate the MOVE statement.
As with kiswa's answer, the above will only output lines from list.txt that match a keyword in keywords.txt.
You might have lines in list.txt that do not match a keyword. If you want to preserve those lines at the bottom of the sorted output, then use:
#echo off
(
for /f "usebackq" %%A in ("keywords.txt") do findstr /bli "%%A" "list.txt"
findstr /vblig:"keywords.txt" "list.txt"
)>sorted.txt
::move /y sorted.txt list.txt
Note that the /I (case insensitive) option must be used because of a FINDSTR bug dealing with multiple literal search strings of different lengths. The /I option avoids the bug, but it would cause problems if your keywords are case sensitive. See What are the undocumented features and limitations of the Windows FINDSTR command?.
You might have keywords that are missing from list.txt. If you want to include those keywords without any data following them, then use:
#echo off
(
for /f "usebackq" %%A in ("keywords.txt") do findstr /bl "%%A" "list.txt" || echo %%A
)>sorted.txt
::move /y sorted.txt list.txt
Obviously you can combine both techniques to make sure you preserve the union of both files:
#echo off
(
for /f "usebackq" %%A in ("keywords.txt") do findstr /bli "%%A" "list.txt" || echo %%A
findstr /vblig:"keywords.txt" "list.txt"
)>sorted.txt
::move /y sorted.txt list.txt
All of the above assume the keywords do not contain space or tab characters. If they do, then the FOR /F options and FINDSTR options must change:
#echo off
(
for /f "usebackq delims=" %%A in ("keywords.txt") do findstr /bic:"%%A" "list.txt" || echo %%A
findstr /vblig:"keywords.txt" "list.txt"
)>sorted.txt
::move /y sorted.txt list.txt

$ join -1 2 -2 1 <(cat -n keywords.txt | sort -k2) <(sort list.txt) | sort -k2n | cut -d ' ' -f 1,3-
house gsdgfsdgsdgsdgsdg
car bbdfbdfbdfbdf
tree gdfgvsgsgs
woods gsgsdgsdgsdgsdgsddsd
mailbox gsgsdfsdf

Here's a Windows batch file. It's probably not the most efficient, but I think it's nicely readable.
#echo off
for /F "tokens=*" %%A in (keywords.txt) do (
for /F "tokens=*" %%B in ('findstr /i /C:"%%A" list.txt') do (
echo %%B >> sorted.txt
)
)
del list.txt
rename sorted.txt list.txt
This creates a sorted file, then removes the list file and renames the sorted file.

Related

Remove string from file name, using windows cmd

EDIT:
Thank you all,
Magoo tips fix it , I using the following code:
echo off
cls
setlocal enabledelayedexpansion
for %%f in (*.mov) do (
set name=%%~nf
set new_name=!name:*_= !
echo File renamed from: !name!.mov to:!new_name!.mov
rename %%f !new_name!.mov
)
This only works if the file name is something_file_name.mov. If original file is only file_name.mov, I end up it only name.mov
So using REGEX is the best option.
I'm using a tweaked version of Ben Personick and Squashman suggestions.
#echo off
cls
SET "_Regex=^[0-9]*_"
FOR %%F IN (*.mov) DO (
ECHO.%%~nF | findstr /R "%_Regex%" >nul && (
FOR /F "Tokens=1* Delims=_" %%f IN ("%%~nxF") DO (
MOVE /Y "%%F" "%%g"
)
)
)
I have some files named 3424_file_name.mov and need to remove the numbers until the first _ so to get file_name.mov
Even better would be to set a range to remove like [0-9]
Like to do it in cmd windows 7. This is what i got so far, but not working.
cls
setlocal enabledelayedexpansion
for %%f in (*.mov) do (
set name=%%~nf
set new_name=%name%:*_=x
echo %new_name%)
rename %%f %new_name%.mov
)
Thanks
Alex
This would do the needful.
Note how I am using MOVE /Y in case the file is read-only, as Rename cannot handle renaming Read-only files.
SETLOCAL
ECHO OFF
SET "_Regex=^[0-9][0-9]*_.*"
SET "_FileGlob=*_*.Mov"
SET "_FilePath=C:\Path"
FOR %%F IN ("%_FilePath%\%_FileGlob%") DO (
ECHO.%%~nF | FINDSTR /R "%_Regex%" >NUL && (
FOR /F "Tokens=1* Delims=_" %%f IN ("%%~nxF") DO (
MOVE /Y "%%~F" "%%~dpF%%~g"
)
)
)
ENDLOCAL
You can use the underscore to your advantage if you use a FOR /F command instead. This will allow you to break up the file name into two variables.
#echo off
for /F "tokens=1* delims=_" %%G in ('dir /a-d /b *.mov ^|findstr /RC:"^[0-9][0-9]*_"') do (
rename "%%~G_%%~H" "%%~H"
)
This is a simpler task in PowerShell. Example:
Get-ChildItem *.mov | ForEach-Object {
$fileName = $_.Name
if ( $fileName -match '^[0-9]+_' ) {
$newName = $fileName -replace '^[0-9]+_', ''
Rename-Item $fileName $newName
}
}
As other have commented, this can redesigned to be shorter and more efficient. Example:
Get-ChildItem *.mov |
Where-Object { $_.Name -match '^\d+_' } |
Rename-Item -NewName ($_.Name -replace '^\d+_','')
The purpose here is to illustrate how PowerShell is far more powerful and flexible than cmd.exe batch scripting.

Command Prompt: dir /s EXCLUDE full path but INCLUDE sub folders

I have a bit of a simple but annoying problem. I am making a batch file and I am using:
dir /B /S /A:-D *.wad *.mdl *.wav *.spr *.bmp *.tga *.pcx *.mp3 *.txt *.res > sample.res
to get:
C:\Downloads\Sample1.wad
C:\Downloads\Sample2.wav
C:\Downloads\Folder1\Sample3.mdl
C:\Downloads\Folder1\Folder2\Sample4.txt
But what I really want is:
Sample1.wad
Sample2.wav
Folder1/Sample3.mdl
Folder1/Folder2/Sample4.txt
I want the sub folders included but I don't want the full path included. How can I accomplish this? Thanks.
[EDIT: Realized for my purposes I apparently need a FORWARD slash for folders instead of a BACK slash]
Try like this :
#echo off
setlocal enabledelayedexpansion
(for /f "delims=" %%a in ('dir /B /S /A:-D *.wad *.mdl *.wav *.spr *.bmp *.tga *.pcx *.mp3 *.txt *.res') do (
set "$Path=%%a"
set $path=!$path:%cd%=!
echo !$path:~1!)
)>sample.res
EDIT : To have the \ replaced with / :
#echo off
setlocal enabledelayedexpansion
(for /f "delims=" %%a in ('dir /B /S /A:-D *.wad *.mdl *.wav *.spr *.bmp *.tga *.pcx *.mp3 *.txt *.res') do (
set "$Path=%%a"
set $path=!$path:%cd%=!
set $path=!$path:\=/!
echo !$path:~1!)
)>sample.res
This works for me :
FOR /F "tokens=*" %G IN ('dir /B /S /A:-D *.wad *.mdl *.wav *.spr *.bmp *.tga *.pcx *.mp3 *.txt *.res') DO ECHO %~nG%~xG >> sample.res

Regular expression in batch file is not working

we used Regex in batch by using following command,
Dir "C:\Test\Res345_45664_1335" /s /b /a:-d | findstr /R "[(\d+)_(\d+)_(\d+)]" > filelist.txt
The "C:\Test\Res345_45664_1335" directory contains following files,
Res345_45664_1335.txt
Output.txt
list.txt
We need file that’s in the format
But the above dir command with regex displaying all files present in the "C:\Test\Res345_45664_1335" directory. Because "C:\Test\Res345_45664_1335" directory contains the same format "Res345_45664_1335". But We need files only(with full path).
Thanks.
\d, () and + are not a valid meta characters in findstr. See findstr /? fore more advanced help. You should substitute it with [0-9][0-9]* .
Dir "C:\Test\Res345_45664_1335" /s /b /a:-d | findstr /ER "[0-9][0-9]*_[0-9][0-9]*_[0-9][0-9]*.txt" > filelist.txt
Do the files have extensions?
Dir "c:\test\Res345_45664_1335" /s /b /a:-d | findstr /R "[0-9]*_[0-9]*_[0-9]*\."
try
Dir /s /b /a:-d *_*_*
Not really sure what you mean by "the format"
Ah - filename in format string_string_string...
FOR /f "delims=" %%i IN ('dir /s /b /a-d *_*_*') DO ECHO "%%~ni"|FINDSTR /r "..*_..*_..*" >nul&IF NOT ERRORLEVEL 1 ECHO %%i
(that's as a batch line - reduce %% to % to run from the prompt)
Batch scripting does not have such feasibility as like programming languages. Some simple regular expression can be worked with 'findstr' but it has some limitations like '+' character has replacement as '.*'. But don't worry, I have a solution for you.
you can simply use power-shell inside a batch script and here below your solution:
#powershell -command "if (($string -match $regex) -eq $true) {write-host matched!}"
It will work for you and if you want to throw an exception then simply throw and assert in batch script as:
#powershell -command "if (($string -match $regex) -eq $true) {throw [System.ArgumentException] invalid input}"
I hope this would suffice for you and others who may in need of this.

Batch Script to find Folder Length of 2 alpha-numerics only and delete thos folders

I need a batch script that will search in one folder or root, Not recursively, for folders with a folder name that has only two letters or numbers. Example A1 B0 E2 22 52 . I had a program that would dump folders on the C drive and i now i have hundreds of folders on many computers. I want to delete these folders. I do not have any folders as short as 2 letters that are needed. Can someone help?
this removes only folders with two letters or digits in its name:
for %%i in ('dir /b /ad ?? ^| findstr /r "^[a-z0-9][a-z0-9]$"') do echo rd /s /q "%%~i"
Look at the output and remove the word echo if it looks good. For a more advenced use of Regex have a look at sed.
this will delete all empty folders with two characters or less:
for /f %%i in ('dir /b /ad ??') do rd %%i
If you want also not-empty folders to be deleted:
for /f %%i in ('dir /b /ad ??') do rd %%i /s /q
If you use it not within a batchfile but as a single command, replace every %%i with %i
EDIT (exclude a folder):
for /f %%i in ('dir /b /ad ??') do ( if "%%i" neq "FP" rd %%i /s /q )

Batch File: List all files not beginning with "SP"

I have a DOS batch file that performs an action for files beginning with text "SP", as follows:
FOR /F "tokens=*" %%A IN ( 'DIR SP*.sql /s /b' ) DO ECHO .compile_file = "%%A" >> output.txt
The key bit here is obviously:
DIR SP*.sql /s /b
I need to do a similar thing before the "FOR" line above, but for all other files not starting with SP*.sql
Something like:
FOR /F "tokens=*" %%A IN ( 'DIR [^SP]*.sql /s /b' ) DO ECHO .run_file = "%%A" >> output.txt
FOR /F "tokens=*" %%A IN ( 'DIR SP*.sql /s /b' ) DO ECHO .compile_file = "%%A" >> output.txt
Does anyone know how I can do this?
Can I use regex for this sort of thing?
The batch file will run under CMD in Windows XP.
Cheers
In batch, usually you can use findstr with /V to negate regex , eg
.... | findstr /V "^SPL"
try passing the filename to findstr and see how it goes
%~nI - expands %I to a file name only