Dos script to extract X number of lines - regex

I am trying to make a script to:
- Ask the user for customer number (max 8 Digits)
Search a very large text file for that #
(Source.txt)
Extract 19 lines of text above customer # (everything as is, including empty lines)
The line number of customer # would be line 20 in this case.
Extract line 20
Extract the next 30 lines below the customer #.
Save all extracted output in: Output.txt
Basically like copying a block of text and pasting in new text file.
In the source text file, customer# location is not random line number.

You can use standard linux command-line utilities (on windows too) like cat, grep and output redirection (in bash script, for example) as follow.
# read and validate customer number (stdin, parameter, ...)
cat Source.txt | grep '12345678' -A 30 -B 19 > Output.txt
where 12345678 is customer number, -B specifies number of lines before and -A number of lines after match with customer number.

Related

Using grep for searching in text file with ignore

I want search in text files in a variable folder a variable string by using "grep". I want him to ignore the first 6 character and the last 6 character while he is searching in the file.
Example line in file:
xxxxxx TEXT TEXT TEXT TEXT xxxxxx
somthing like this:
grep -PInr "[^......]TEXT" /var/local/data/textfiles/

Phone Numbers in separate lines in UNIX

In UNIX----
I have a Sample file i want all the phone numbers starting from 987 in another file as a list,
that means if in single row there are 2 phone numbers they should be in separate lines.
Sample File Contents
ajfhvjfdhvjdfb jfbhfb fg 9871177454 9563214578 shgfsehfgvhb vhf 9877745212
sjdjfgsfhvg b 9874789645 sfjkvhbjfbg shgfhbfg 2563145278
9874561231
This should work,
echo "ajfhvjfdhvjdfb jfbhfb fg 9871177454 9563214578 shgfsehfgvhb vhf 9877745212 sjdjfgsfhvg b 9874789645 sfjkvhbjfbg shgfhbfg 2563145278 9874561231" > sample.txt
egrep -o '987([0-9]+)' sample.txt
returns,
9871177454
9877745212
9874789645
9874561231
or to be specific for 10 digit phone numbers,
egrep -o '987([0-9]{7})' sample.txt
returns similar results.

How to extract values between 2 known strings

I have some huge files containing mixed binary and xml data. I want to extract all values between 2 XML tags that have multiple occurrences in the file. Pattern would be as following: <C99><F1>050</F1><F2>random value</F2></C99> . Portions of XML data are not formatted, everything is in a single line.
I need all values between <F1> and </F1> from <C99> where value is between range 050 and 999(<F1> exists under other fields as well but I need only values of F1 from C99). I need to count them, to see how many C99 have F1 with values between 050 and 999.
I want a hint how I could easily reach and extract that values (using cat and grep? or sed?). Sorting and counting is easy to do it once values are exported in a file.
My temporary solution:
After removing all binary data from the file, I can run the following command:
cat filename | grep -o "<C99><F1>......." > file.txt
This will export first 12 characters from all strings starting with <C99><F1>.
<C99><F1>001
<C99><F1>056
<C99><F1>123
<C99><F1>445
.....
Once exported in a text file, I replace <C99><F1> with nothing and then I sort and count remaining values.
Thank you!
Using XMLStarlet:
$ xml sel -t -v '//C99/F1[. >= 50 and . <= 999]' -nl data.xml | wc -l
Not much of a hint there, sorry.

Bash: working with linenumbers to use them in sed

Basically i just need to uncomment two lines containing a specific string.
Therefore i grep the string to get the line numbers and use sed to uncomment
(sure one might also use sed to get line numbers but the problem is the same).
You get the line numbers each on its own line, i dont know how to work with the line numbers sitting on its own line, so im trying to to get them into one line to use bash variables to handle the line numbers:
$ cat configfile
some text
#a string foo
#b string bar
some other text
#more text
much more text
so my first try is:
linenr=$(grep -n string configfile | cut -d: -f1) # get line numbers (several lines)
linenr=(${linenr//\ / }) # put line numbers into one line
sed -i "${linenr[0]},${linenr[1]} s/##*//" configfile # uncomment lines
my second try is:
linenr=$(sed -n '/string/=' configfile) # get line numbers (several lines)
linenr=$(echo $linenr | sed -i 's/\n/ /' configfile) # put line numbers into one line
sed -i "${linenr[0]},${linenr[1]} s/##*//" configfile # uncomment lines
I need to do this twice, for two nearly similar configfiles and for some reason, i get different output of the line numbers, altough the code is the same for both configfiles: (works for configfile4 but not for configfile6? i assume the content of those files is irrelevant for the output of the found line numbers? also checked line endings, are same in both files)
configfile4lines:
44 45
configfile6lines:
54
55
how should one work in such situtions with line numbers or are there better ways to do this?
You can use a regexp match as the address in sed, instead of line numbers.
sed -i '/string/s/##*//' configfile

Delete all lines without # textmate regex

I have a huge file that I need to filter out all lines (comma delimited file) that do not contain an email address (determining that by # character).
Right now what I have is this to find all lines containing the # sign:
.*,.*,.*#.*,.*$
basically you have 4 values and the 3rd value has the email address.
the replace with: value would be empty.
You have about 10 different ways to do this in TextMate and even more from the command line. Here are some of the easier ways...
From TextMate:
Command-control-t, start typing some part of the command "Copy Non-Matching Lines into New Document", use # (nothing else) for the pattern.
Same as above, except the command you're looking for is "Distill Document / Selection"
Find and select an # symbol. Then do the same as the above but search for the command "Strip Lines Matching Selection/Clipboard". You may not have it as I may have developed this one myself.
From the command line:
Type one of the following commands, replacing FILE with the filename, including the filepath if it's not in your current working directory. The filtered content can be found in FILE-new.
Using egrep: egrep -v '#' FILE > FILE-new
Using sed: cat FILE | sed -e "/#/D" > FILE-new
For both of the above, use diff to see what you accomplished: diff FILE{,-new}
That should probably do, I'm guessing...
try replace ^[^#]*$ with nothing. Alternatively, grep the file with your regex and redirect the result into a new file.