Trim extra pipes off end of text file in Windows - regex

So basically I have a process that outputs a text file that is pipe delimited, looks something like this:
|abc123|1*|004|**gobbligook|001|%|2014-01-01|||||||||||||
This is just an example, and I'm not sure if the answer involves regular expressions. If it does i will put the actual line. Anyways
ISSUE
So for this example the import process that accepts this file is looking for 8 pipes, but there are 20, if it sees any more pipes after the 8 it's looking for the import process fails.
Question
Is there a process that I can use in a Windows environment to trim the trailing pipes off the end of this for the entire file?
UPDATE
Magoo supplied me with a great answer that I am working but I keep getting this error: Delimiter was unexpected at this time
Here is code:
#ECHO OFF
SETLOCAL
SET "sourcedir=C:\Users\Desktop\Pipe Delimiter Project"
SET "destdir=C:\Users\Desktop\Pipe Delimiter Project"
(
FOR /f "tokens=1-7delims=|" %%a IN ('TYPE "%sourcedir%\test.txt"') DO (
ECHO(^|%%a^|%%b^|%%c^|%%d^|%%e^|%%f^|%%g^|
)
)>%destdir%\newfile.txt
Anyone know what's wrong? I also just put in the line from the question |abc123|..| pasted in the file like 6 times...thanks!

#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET "sourcedir=."
SET "destdir=U:\destdir"
(
FOR /f "delims=" %%a IN ('TYPE "%sourcedir%\q22863616.txt"') DO (
SET "line=%%a"
ECHO(!line:~0,-12!
)
)>%destdir%\newfile.txt
GOTO :EOF
I used a file named q22863616.txt containing your data for my testing.
Produces newfile.txt
Assuming that the final 12 fields are all empty, for lack of information otherwise.
Another form, given additional information
#ECHO OFF
SETLOCAL
SET "sourcedir=."
SET "destdir=U:\destdir"
(
FOR /f "tokens=1-7delims=|" %%a IN ('TYPE "%sourcedir%\q22863616.txt"') DO (
ECHO(^|%%a^|%%b^|%%c^|%%d^|%%e^|%%f^|%%g^|
)
)>%destdir%\newfile.txt
OK - third time's a charm.
#ECHO OFF
SETLOCAL enabledelayedexpansion
SET "sourcedir=."
SET "destdir=U:\destdir"
:: remove variables starting $
FOR /F "delims==" %%a In ('set $ 2^>Nul') DO SET "%%a="
(
FOR /f "delims=" %%a IN ('TYPE "%sourcedir%\q22863616.txt"') DO (
SET "$0=%%a"
SET "$1=%%a"
FOR /l %%c IN (1,1,8) DO SET "$1=!$1:*|=!"
SET "$2=%%a"
SET "$3="
SET /a tot=0
FOR /f "delims=:" %%e IN ('set $^|findstr /o /r "$"') DO SET /a tot=%%e - !tot! - 5
CALL :show !tot!
CALL ECHO %%$2:~0,-!tot!%%
)
)>%destdir%\newfile.txt
GOTO :EOF
:show
CALL SET "$3=%%$2:~0,-%1%%"
FOR /f "tokens=1*delims==" %%y IN ('set $3') DO ECHO(%%z
GOTO :eof
This seems immune to % in the data, but chokes on ! or &. You pays your money, you takes your choice...

This should eat trailing pipes but leave 8 of them
#echo off
type "file.txt"| repl "(.*?\|{8})\|*$" "$1" >"newfile.txt"
This uses a helper batch file called repl.bat - download from: https://www.dropbox.com/s/qidqwztmetbvklt/repl.bat
Place repl.bat in the same folder as the batch file or in a folder that is on the path.

Related

How to subtract string and non-null value entries from txt file?

I have a script that extracts lines such as :
THIS_IS_A_LINE:=
THIS_IS_A_LINE2:=
and outputs all of the same kind into another .txt file as:
THIS_IS_A_LINE
THIS_IS_A_LINE2
The script is the following:
set "file=%cd%/Config.mak"
set /a i=0
set "regexp=.*:=$"
setlocal enableDelayedExpansion
IF EXIST Source_List.txt del /F Source_List.txt
for /f "usebackq delims=" %%a in ("%file%") do (
set /a i+=1
call set Feature[!i!]=%%a
)
cd .. && cd ..
rem call echo.!Feature[%i%]!
for /L %%N in (1,1,%i%) do (
echo(!Feature[%%N]!|findstr /R /C:"%regexp%" >nul && (
call echo FOUND
call set /a j+=1
call set Feature_Disabled[%j%]=!Feature[%%N]:~0,-2!
call echo.!Feature_Disabled[%j%]!>>Source_List.txt
) || (
call echo NOT FOUND
)
)
endlocal
I also have another script that extracts lines such as:
THIS_IS_ANOTHER_LINE:=true
THIS_IS_ANOTHER_LINE2:=true
...
and outputs all of the same kind into another .txt file as:
THIS_IS_ANOTHER_LINE
THIS_IS_ANOTHER_LINE2
...
The script is the following:
set "file=%cd%/Config.mak"
set /a i=0
set "regexp=.*:=true$"
setlocal enableDelayedExpansion
IF EXIST Source_List2.txt del /F Source_List2.txt
for /f "usebackq delims=" %%a in ("%file%") do (
set /a i+=1
call set Feature[!i!]=%%a
)
cd .. && cd ..
rem call echo.!Feature[%i%]!
for /L %%N in (1,1,%i%) do (
echo(!Feature[%%N]!|findstr /R /C:"%regexp%" >nul && (
call echo FOUND
call set /a j+=1
call set Feature_Disabled[%j%]=!Feature[%%N]:~0,-6!
call echo.!Feature_Disabled[%j%]!>>Source_List2.txt
) || (
call echo NOT FOUND
)
)
endlocal
Nevertheless, there is a third kind of lines which contain numerical numbers (also some hexadecimal values), such as:
THIS_IS_AN_UNPROCESSED_LINE:=0xA303
THIS_IS_AN_UNPROCESSED_LINE2:=1943
THIS_IS_AN_UNPROCESSED_LINE3:=HELLO_DOOD_CAN_YOU_PARSE_ME?
So I need the way to extract as well those kind of lines into another .txt file such as:
THIS_IS_AN_UNPROCESSED_LINE:=0xA303
THIS_IS_AN_UNPROCESSED_LINE2:=1943
THIS_IS_AN_UNPROCESSED_LINE3:=HELLO_DOOD_CAN_YOU_PARSE_ME?
So basically extract lines which are not of the kind:
THIS_IS_AN_UNPROCESSED_LINE:=
or
THIS_IS_AN_UNPROCESSED_LINE:=true
but keeping both the sides of the line entry.
I know there must be some trick with the regular expression but I just can't find it out.
You have made your code much more complicated than it needs to be. There is no need to create an array of every line in the file.
If there are no other : or = before the first :=, then you can use FINDSTR to print out all lines that contain a string, followed by :=. FOR /F can capture and parse each matching line into the parts before and after :=, and then IF statements can classify the three different types of lines.
I use n> to open all three output files outside the main code block for improved performance, and then I use the &n> syntax to direct each output to the appropriate, already opened file. I use high numbered file handles to avoid problems described at Why doesn't my stderr redirection end after command finishes? And how do I fix it?.
#echo off
setlocal
set "file=Config.mak"
set /a "empty=7, true=8, unprocessed=9"
%empty%>empty.txt %true%>true.txt %unprocessed%>unprocessed.txt (
for /f "delims=:= tokens=1*" %%A in ('findstr /r "^[^:=][^:=]*:=" "%file%"') do (
if "%%B" equ "" (
>&%empty% (echo %%A)
) else if "%%B" equ "true" (
>&%true% (echo %%A)
) else (
>&%unprocessed% (echo %%A:=%%B)
)
)
)
The above will ignore lines that contain : or = before :=, and it will not work properly if the first character after := is : or =. I'm assuming that should not be a problem.
It should be relatively easy to write a very efficient solution using PowerShell, VBScript, or JScript that eliminates the limitations.
You could also use JREPL.BAT - a powerful and efficient regular expression text processing command line utility. JREPL.BAT is pure script (hybrid batch/JScrpt) that runs natively on any Windows machine from XP onward, no 3rd party exe required. And JREPL is much faster than any pure batch solution, especially if the files are large.
Here is one JREPL solution
#echo off
setlocal
set repl=^
$txt=false;^
if ($2=='') stdout.WriteLine($1);^
else if ($2=='true') stderr.WriteLine($1);^
else $txt=$0;
call jrepl "^(.+):=(.*)$" "%repl%" /jmatchq^
/f Config.mak /o unprocessed.txt >empty.txt 2>true.txt
If all you have to do is classify the lines into three different files, without worrying about stripping off the :=true and := parts for the empty and true lines, then there is a very simple pure batch solution using nothing but FINDSTR.
#echo off
set "file=Config.mak"
findstr /r ".:=$" "%file%" >empty.txt
findstr /r ".:=true$" "%file%" >true.txt
findstr /r ".:=" "%file%" | findstr /r /v ":=$ :=true$" >unprocessed.txt

batch file regex to get specific number from filename

I have a file name
pp-sssss-iiii-12.0.111.22_31-i-P.0.16.1.1
I want from the name only this (output desired):
12.0.111.22_31
then replace . with 0 and remove '_' so I got the below
1200011102231
well I tried to start from something like this
cd %cd%
for %%F in (*.txt) do echo %%~nxF >>1.txt
but I didnt know how to continue
edit , the code :
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET "sourcedir=C:\Users\moudiz\Desktop\new folder\tttt"
FOR /f "delims=" %%a IN (
'dir /b /a-d "%sourcedir%\*.pbd" '
) DO (
FOR /f "tokens=4delims=-" %%d IN ("%%a") DO (
SET "modname=%%d"
SET "modname=!modname:.=0!"
SET "modname=!modname:_=!"
ECHO %%a becomes !modname!
)
)
GOTO :EOF
pause
the name of the file
P-Script-LogFiles-1.0.33.33_123-IB-P.0.16.357.1.pbd
the output 100033033123
[note: OP belatedly asked for the first token of the name to be prepended to the name generated from the original post, hence use of %%c below]
#ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET "sourcedir=U:\sourcedir"
FOR /f "delims=" %%a IN (
'dir /b /a-d "%sourcedir%\*.txt" '
) DO (
echo File "%%a"
FOR /f "tokens=1,4delims=-" %%c IN ("%%a") DO (
SET "modname=%%d"
SET "modname=!modname:.=0!"
SET "modname=!modname:_=!"
ECHO %%a becomes %%c!modname!
)
pause
)
pause
GOTO :EOF
You would need to change the setting of sourcedir to suit your circumstances.
Using delayedexpansion, !var! refers to the modified value of the variable and set "var=!var:string=gnirts!" substitutes "gnirts" for "string" invarand assigns the result back tovar`.
Now - quite what you want to do with this modified result, you don't reveal - but guessing at renaming,
echo ren "%%a" "!modname!"
should be usable
To prepend the first token, simply change the tokens= to 1,4 *to select the first and fourth tokens) - and change the metavariable in the for to %%c (in order that the %%d processing remains the same), then use %%c which will contain the portion of the original filename before the first -.

Batch script to find files that contains two consecutive lines

I'm trying to code a Batch script to find txt files in a folder (and recursively in subfolders) that contain these two consecutive lines (no spaces between the lines):
Code "5898"
Price "50"
I tried with this:
Findstr -m /S /C:"Code ""5898""\r\n Price ""50"" *.txt" >> output.txt
but I don't know how to manage the carriage return and the newline. If I try to find the string without using Price ""50"" it works fine while no good results if I try to look for the two lines I need.
Please do not alter this file, doing so would probably cause it not to work.
#Echo Off
SetLocal EnableDelayedExpansion
For /F %%A In ('Copy /Z "%~dpf0" Nul') Do Set "CR=%%A"
Set LF=^
Dir/B/S/A-D|Findstr/RNF:"/" /C:"Code \"5898\"!CR!*!LF!Price \"50\"">"%tmp%\$_.tmp"
>Output.txt ( For /F "UseBackTokens=1-3 Delims=:" %%A In ("%tmp%\$_.tmp") Do (
Echo("%%A:%%B"))
Del "%tmp%\$_.tmp"
To output only the filename, (odd when the search is recursive):
#Echo Off
SetLocal EnableDelayedExpansion
For /F %%A In ('Copy /Z "%~dpf0" Nul') Do Set "CR=%%A"
Set LF=^
Dir/B/S/A-D|Findstr/RNF:"/" /C:"Code \"5898\"!CR!*!LF!Price \"50\"">"%tmp%\$_.tmp"
>Output.txt ( For /F "UseBackTokens=1-3 Delims=:" %%A In ("%tmp%\$_.tmp") Do (
Echo("%%~nxB"))
Del "%tmp%\$_.tmp"
You could technically change the Tokens to 1-2 now but I'll leave it as is so it is easier to re-implement the line number check again, If %%C==1 before the echo on the second last line if ever necessary.

Batch: how to split string on uppercase letter

I have a directory structure containing home directories named after the users full name (ForenameSurname), like:
/user/JohnDoe
/user/JaneDoe
/user/MobyDick
Now i want to copy the whole structure, changing ForenameSurname to "'first two letters of first name'+'surname'", resulting:
/user/JoDoe
/user/JaDoe
/user/MoDick
I know how to get substrings (~n), but how to split a string on the first capital letter? Is it possible at all using pure batch?
#echo off
setlocal enableextensions enabledelayedexpansion
set "root=%cd%\users"
for /d %%f in ( "%root%\*" ) do (
set "name=%%~nxf"
for /f %%a in ("!name:~0,2!"
) do for /f "tokens=* delims=abcdefghijklmnopqrstuvwxyz" %%b in ("!name:~2!"
) do if not "%%~nxf"=="%%~a%%~b" if not exist "%root%\%%~a%%~b" (
echo ren "%%~ff" "%%~a%%~b"
) else (
echo "%%~nxf" can not be renamed to "%%~a%%~b"
)
)
Rename operations are only echoed to console. If the output is correct, remove the echo that prefixes the ren command.
Try this:
#echo off
setlocal EnableDelayedExpansion
set "upcaseLetters=ABCDEFGHIJKLMNOPQRSTUVWXYZ"
cd \user
for /D %%a in (*) do (
call :convert name=%%a
echo New name: !name!
)
goto :EOF
:convert
set "var=%2"
:nextChar
set "char=%var:~2,1%"
if "!upcaseLetters:%char%=%char%!" equ "%upcaseLetters%" goto end
set "var=%var:~0,2%%var:~3%"
goto nextChar
:end
set "%1=%var%"
exit /B
I would use my JREN.BAT regular expression rename utility - a hybrid JScript/batch script that runs natively on any Windows machine form XP onward.
jren "^([A-Z][a-z])[a-z]*(?=[A-Z])" $1 /d /t /p c:\users
The /T option is test mode, meaning it only displays the proposed rename results. Remove the /T option to actually rename the folders.

Split a file name at last occurrence 'by'

I am working on a batch script to move files from one master directory which has 1000+ files to sub folders, according to the file name, sub folders have to be created and moved accordingly. Below is the scenario/ file name format.
title_or_work_done_by_user_name.xls
From this file name pattern, I have to pick "user_name" and create a folder for that user_name. I found similar code, but not able to break it exactly at the last 'by'.
#ECHO OFF
SETLOCAL
SET "sourcedir=E:\Source"
SET "destdir=E:\Destination"
FOR /f "tokens=2*delims='by_'" %%a IN ('dir /b /a-d "%sourcedir%\*by_*.xls" ') DO (
ECHO %%a
ECHO(MD "%destdir%\%%a" 2>nul
ECHO(MOVE "%sourcedir%\*by_%%a.xls" "%destdir%\%%a\")
pause
GOTO :EOF
Can some one please help me out in extracting 'user_name' by splitting it at the last occurrence of 'by_'.
Thanks in advance :)
The DELIMS option specifies a list of characters, not a string. So your FOR loop will split tokens at ' or _ or b or y. Also, you have no way of knowing what is the number for the last token. Your design is a dead end.
Option 1
Here is a pure batch solution that will do what you want. I use substitution to convert the file name into a pseudo path. It is then easy to pick off the desired name. Delayed expansion is used in order to access the value of a variable within the same loop (code block) that sets it. The only tricky part is toggling delayed expansion on and off as needed so as to preserve any !. A FOR variable containing the ! character will be corrupted if it is expanded while delayed expansion is enabled.
#echo off
setlocal disableDelayedExpansion
for %%F in (*_by_*.jpg) do (
%= Initialize name without extension =%
set "name=%%~nF"
%= Convert "Part1_by_Part2_by_Name" into "Part1\Part2\Name" =%
setlocal enableDelayedExpansion
for %%f in ("!name: - =\!") do (
%= Only execute endlocal on the first iteration =%
if "!!" equ "" endlocal
%= The name might contain a dot, so need name and extension =%
set "name=%%~nxf"
)
set "file=%%F"
setlocal enableDelayedExpansion
%= Hide error message if folder already exists =%
md "!name!" 2>nul
move "!file!" "!name!"
endlocal
)
Option 2
The logic is simpler if a subroutine is used, as it avoids delayed expansion issues. The CALL makes the code less efficient (slower), but that shouldn't be an issue for a task like this.
#echo off
setlocal disableDelayedExpansion
for %%F in (*_by_*.jpg) do call :moveFile "%%F"
exit /b
:moveFile
set "name=%~n1"
for %%F in ("%name:_by_=\%") do set "name=%%~nxF"
md "%name%" 2>nul
move %1 "%name%"
exit /b
Option 3
The simplest solution is to use my JREPL.BAT utility - a hybrid JScript/batch script that performs regex replacement. JREPL is pure script that runs natively on any Windows machine from XP onward.
#echo off
for /f "tokens=1,2 delims=: eol=:" %%A in (
'dir /b /a-d *_by_*.jpg ^| jrepl "^.*_by_(.*)\.jpg" "$&:$1" /i'
) do (
md "%%B" 2>nul
move "%%A" "%%B"
)
#ECHO OFF
SETLOCAL
FOR %%a IN (
title_or_work_done_by_user_name.xls
title_or_work_done_by_digby_hill.xls
title_or_work_done_by_hook_or_by_crook.xls
) DO CALL :process %%a
GOTO :eof
:process
SET "name=%~1"
:: This is the actual processing
ECHO processing "%name%"
SET "name=%name:_by_=.%"
:loop
FOR /f "tokens=1,2,3*delims=." %%p IN ("%name%") DO IF "%%s"=="" (SET "user_name=%%q") ELSE (
SET "name=%%q.%%r.%%s"&GOTO loop
)
ECHO extracted name is "%user_name%"
GOTO :EOF
I've chosen to use the string _by_ as the separator, since there are names that end "by".
Simply replace the string _by_ with a string that won't occur (or has a restricted use) in the filename. I chose . byt perhaps with sme modifications (like removing the extension from the name using %~n : could be used.
The reult is [string.]*required_name.xls
By repeatedly removing the first token using . as a separator, when there is no 4th+token, then the second token would be the required string.