Using grep for searching in text file with ignore - regex

I want search in text files in a variable folder a variable string by using "grep". I want him to ignore the first 6 character and the last 6 character while he is searching in the file.
Example line in file:
xxxxxx TEXT TEXT TEXT TEXT xxxxxx
somthing like this:
grep -PInr "[^......]TEXT" /var/local/data/textfiles/

Related

Regular Expression to match against first character and file extension

I'm using Bash to try to write a command that gets every file where the first character is not 'a' and the file does not end with '.html' but cannot seem to get both to work properly.
So far I can get my regex to match all the files that start with 'a' and end with '.html' and remove them but my issue that I cannot seem to solve is when the file starts with 'a' and ends with a different file extension. My regex seems to ignore that second requirement and just hides it regardless.
cat inputfile.txt | sed -n '/^[^a].*[^html$]/p'
Input File Contents:
123
anapple.html
456
theapple.html
789
nottrue.html
apple.csv
12
Output:
123
456
theapple.html
789
nottrue.html
12
Instead of trying to write a pattern that matches the rows to keep, write a pattern that matches the rows to remove, and use grep -v to print all the lines that don't match it.
grep -v '^a.*\.html$' inputfile.txt

How to find a specific string and take it as an entry to a new find in the same file

I need to find a string into many files at multiples folders with a regular expression. The string can be in multiple files.
When I find it, take the string and search again in the same file but now with the specific string.
And to return the name of the file where the string was found and the string itself.
Think about doing with grep command to find the string and then looping the output but anybody has any idea to solve it better?
for example:
Look in file.js the pattern regex: SearchMethod\(([a-zA-Z] *)\)
Once found, look for the previous capture at the same file with another regex capture=('[a-zA-Z']') and will find something like the following:
From capture='value';
get the string 'value'.
And return the string 'value' and the name of file to which it belongs.
First, some sample data:
$ cat file
capture='foo'
capture='bar'
capture='baz'
capture='foobar'
SearchMethod(foo)
SearchMethod(bar)
SearchMethod(qux)
Then, get the "search strings", the SearchMethod parameters
$ search_strings=$( grep -oP 'SearchMethod\(\K\w+' file | paste -s -d'|' )
$ echo "$search_strings"
foo|bar|qux
Then, search for the "capture" words, with the filename in the output
$ grep -HoP "capture='\\K($search_strings)\\b" file
file:foo
file:bar
The \b gives you a word-boundary constraint, which is why foobar does not show up in the final output.
Requires GNU grep, which you get on Linux.

Dos script to extract X number of lines

I am trying to make a script to:
- Ask the user for customer number (max 8 Digits)
Search a very large text file for that #
(Source.txt)
Extract 19 lines of text above customer # (everything as is, including empty lines)
The line number of customer # would be line 20 in this case.
Extract line 20
Extract the next 30 lines below the customer #.
Save all extracted output in: Output.txt
Basically like copying a block of text and pasting in new text file.
In the source text file, customer# location is not random line number.
You can use standard linux command-line utilities (on windows too) like cat, grep and output redirection (in bash script, for example) as follow.
# read and validate customer number (stdin, parameter, ...)
cat Source.txt | grep '12345678' -A 30 -B 19 > Output.txt
where 12345678 is customer number, -B specifies number of lines before and -A number of lines after match with customer number.

awk: how to include file names when concatenating files?

Am running GNUwin32 under windows 7.
Have many files in a single directory with file names that look like this:
chem.001.txt
chem.002.b4.txt
chem.003.md6.txt
(more files.txt) ...
In their current form, none of the files includes the file name.
Need to clean these files for further use.
Want to concatenate all files into a single file.
But also need to include the file name at the beginning of concatenated content to later associate the original file with clean data.
For example, the single, concatenated file (new_file.txt) would look like this:
chem.001.txt delimiter (could be a tab or pipe) followed by text from chem.001.txt...
chem.002.b4.txt delimiter followed by text from chem.002.b4.txt ...
chem.003.md6.txt delimiter followed by text from chem.003.md6.txt ...
etc. ...
Will then clean the concatenated file and parse content as needed.
awk - gawk may have a means to associate the file name with ($1), associate the text in the file with ($2) and then, in sequence, print ($1, $2) for each file into 'new_file.txt' but I've not been able to make it work.
How to do this?
Put this in foo.awk:
BEGIN{ RS="^$"; ORS=""; OFS="|" }
{ gsub(/\n[\r]?/," "); print FILENAME, $0 > "new_file.txt" }
and then execute it as
awk -f foo.awk <files>
where <files> is however you provide a list of file names in Windows. It uses GNU awk for multi-char RS to let you read a whole file as a single record.

Remove all lines that don't end with specific string

working on a large text file and I'd like to remove all lines that don't contain the text "event":"click"}]
I've tried to do some regex within Sublime 3 and can't get it to stick.
I have not used sublime but you could select all line not containing the text "event":"click"}] with the regex:
^(.(?!"event":"click"\}\]))*$
I think you could replace them by nothing(empty string) or backspace
Use this one to get result to stdout
sed -n '/"event":"click"\}\]$/p' your_large_file
Use this one to keep only lines that end with "event":"click"}], your_large_file.old backup will be generated
sed -i.old -n '/"event":"click"\}\]$/p' your_large_file