Creation of mass phrase search for website domain - replace

I need to replace a single text phrase across my entire website domain with another one. What is the best way to do a mass search/ replace?

If you can do it file-by-file, then you could use a simple Perl one-liner:
perl -pi -e 's/search/replace/gi' filename.txt
If you are on a UNIX system with a shell, you can combine this with find to search and replace text on files in subdirectoies:
find /dir/to/files -iname 'foo.*' -exec perl ... {}\;
where ... is the above perl command.

I use KEDIT to do this every day. I have a script I wrote called TooMany.kex which allows me to edit a list of files across computers and networks. The only other way I know how to do it is using a shell script - you already have that.
* TooMany.kex - perform commands on files using a directory
* (this is designed to issue edit commands to too many files for the ring buffer)
* Multiple commands are separated by a semicolon ";"
*
* eg. TooMany c\one\two\**;c\two\three\**;file
* commands:
* 1. kedit "dirfileid.1()" (nodefext noprof'
* 2. c\one\two\**
* 3. c\two\three\**
* 4. file
*
parse arg CmdStr
if ftype.1() \= 'DIR' then do
'alert /The File Type Must Be "DIR"./ title /TooMany/'
exit
end
'nomsg less /</|/>/'
if rc = 0 then do
if nbscope.1() = 0 then do
'alert /No files found/ title /TooMany/'
exit
end
end
'top'
* give user something to look at while macro is running
'extract /nbfile/fileid'
* the number of files can change depending on the setting SCOPE/DISPLAY or ALL
size = nbscope.1()
if scope.1() = "ALL" then size = size.1()
nfiles = size
'msg Processing' size 'files.'
'refresh'
* save the directory file name
dir_fileid = fileid.1()
do nfiles - 1
* if less than 3K ISA free, leave early so user has some to work with
if memory.3() < 3 then do
'alert $TooMany aborting. ISA nearly full. You Forgot To File.$ title $TooMany$'
'qquit'
exit
end
'down'
'refresh'
'kedit "'dirfileid.1()'" (nodefext noprof'
if rc \= 0 then do
'alert $TooMany aborting. KEDIT rc='rc'$ title $TooMany$'
exit
end
Call ExecuteCommands
* edit file # 1 in the ring
'kedit "'fileid.1'" (noprof'
*'refresh'
end
* quit out of dir.dir and edit the last file
'next'
fid = dirfileid.1()
** 'qquit'
'kedit "'fid'" (nodefext noprof'
Call ExecuteCommands
'msg TooMany:' nfiles 'file(s) processed'
exit
ExecuteCommands:
* special skip files - don't edit the directory file
if dir_fileid = fileid.1() then return
* Execute commands separated by ";"
istart = 1
do forever
if pos(";",CmdStr,istart) = 0 then do
command substr(CmdStr,istart,length(CmdStr))
return
end
else do
iend = pos(";",CmdStr,istart)
command substr(CmdStr,istart,iend - istart)
istart = iend + 1
if istart > length(CmdStr) then return
end
end
return

Related

RegEx query of text file doesn't find any matches

The following code is intended to open a text file and search for any matches from a list of strings then output how many results it finds. For some reason, it's always "finding" 0.
validcards=array("NVIDIA GRID K140Q","AMD FirePro S7150","VMware SVGA 3D")
textFile = fso.opentextfile("_cards.txt",1,0,1).readall
set fso=nothing
set query = new regexp
with query
.global=true
.multiline=true
.ignorecase=true
.pattern="^.*?" & join(validcards,".*?") & ".*?$"
end with
counter = 0
set results = query.execute(textFile)
for each result in results
stdout.WriteLine escape(result)
counter = counter + 1
next
When I output counter it is always zero. What am I missing? Here is what the text file looks like:
Name
VMware SVGA 3D
The text file is generated using wmic path win32_VideoController get name > _cards.txt
UPDATE
In desperation, I just printed out the file after it's loaded. It looks like this:
 ■N a m e
V M w a r e S V G A 3 D
I was able to fix this by changing the OpenTextFile line to textFile = fso.opentextfile("_cards.txt",1,0,-1).readall. However, the regex still is not working.
I changed the pattern to the following and now it seems to be working fine:
.pattern="^.*(" & join(validcards,"|") & ").*$"

Python Outputting Text in Hex

I'm working with a very large text file (58GB) that I'm attempting to split into smaller chunks. The problem that I'm running into is that the smaller chunks appear to be Hex. I'm having my terminal print each line to stdout as well, but when I'm seeing it printed in stdout it's looking like normal strings to me. Is this known behavior? I've never encountered an issue where Python keeps spitting stuff out in Hex before. Even odder when I tried using Ubuntu's split from the command line it was also generating everything in Hex.
Code snippet below:
working_dir = '/SECRET/'
output_dir = path.join(working_dir, 'output')
test_file = 'SECRET.txt'
report_file = 'SECRET_REPORT.txt'
output_chunks = 100000000
output_base = 'SECRET'
input = open(test_file, 'r')
report_output = open(report_file, 'w')
count = 0
at_line = 0
output_f = None
for line in input:
if count % output_chunks == 0:
if output_f:
report_output.write('[{}] wrote {} lines to {}. Total count is {}'.format(
datetime.now(), output_chunks, str(output_base + str(at_line) + '.txt'), count))
output_f.close()
output_f = open('{}{}.txt'.format(output_base, str(at_line)), 'wb')
at_line += 1
output_f.write(line.encode('ascii', 'ignore'))
print line.encode('ascii', 'ignore')
count += 1
Here's what was going on:
Each line was started with a NUL character. When I was opening up parts of the file using head or PyCharm's terminal it was showing up normal, but when I was looking at my output in Sublime Text it was picking up on that NUL character and rendering the results in Hex. I had to strip '\x00' from each line of the output and it started looking the way I would expect it to

Searching for text in log files with inputs from columns in excel

I am trying to use AutoIt for extracting text from multiple log files which are sized more than 500 MB and the text to be extracted is in an excel column. I'm having issues with FileRead which throws an error about memory. I've even tried FileReadToArray which I thought would make it easier for the function to process the huge string. All the files are collectively sized around 7.8 GB. The largest file is sized at around 800 MB.
Global $aUserNames[] = _Excel_RangeRead($file,$Worksheet) ; Usernames need to be read from Excel
Global $sFolderPath = FileSelectFolder("Select Folder", "")
Global $aFileList = _FileListToArrayRec($sFolderPath, "*.*", $FLTAR_FILES, $FLTAR_RECUR, $FLTAR_SORT, $FLTAR_FULLPATH)
If #error = 1 Then Exit MsgBox(0, "", "No Folders Found.")
If #error = 4 Then Exit MsgBox(0, "", "No Files Found.")
Local $sRegEx = "(?i)"
For $i = 0 To UBound($aUserNames) - 1
$sRegEx &= "\b" & $aUserNames[$i] & "\b|"
Next
$sRegEx = StringTrimRight($sRegEx, 1)
Global $Store
For $i = 1 To $aFileList[0]
$sFileContent = _FileReadToArray($aFileList[$i],$Store)
If StringRegExp($sFileContent, $sRegEx) Then MsgBox(0, "Info", "One of more users found in file " & $aFileList[$i])
Next
The code was assisted by jguinch in the AutoIt forum.
You can read the file one line at a time to avoid the memory problem.
For $i = 1 To $aFileList[0]
$fileHandle = FileOpen($aFileList[$i])
While True
$fileLine = FileReadLine($fileHandle)
If #error Then Exitloop
If StringRegExp($fileLine , $sRegEx) Then MsgBox(0, "Info", "One of more users found in file " & $aFileList[$i])
WEnd
Next

Save multiple lines of text in .txt

I am a python newbie. I can print the twitter search results, but when I save to .txt, I only get one result. How do I add all the results to my .txt file?
t = Twython(app_key=api_key, app_secret=api_secret, oauth_token=acces_token, oauth_token_secret=ak_secret)
tweets = []
MAX_ATTEMPTS = 10
COUNT_OF_TWEETS_TO_BE_FETCHED = 500
for i in range(0,MAX_ATTEMPTS):
if(COUNT_OF_TWEETS_TO_BE_FETCHED < len(tweets)):
break
if(0 == i):
results = t.search(q="#twitter",count='100')
else:
results = t.search(q="#twitter",include_entities='true',max_id=next_max_id)
for result in results['statuses']:
tweet_text = result['user']['screen_name'], result['user']['followers_count'], result['text'], result['created_at'], result['source']
tweets.append(tweet_text)
print tweet_text
text_file = open("Output.txt", "w")
text_file.write("#%s,%s,%s,%s,%s" % (result['user']['screen_name'], result['user']['followers_count'], result['text'], result['created_at'], result['source']))
text_file.close()
You just need to rearrange your code to open the file BEFORE you do the loop:
t = Twython(app_key=api_key, app_secret=api_secret, oauth_token=acces_token, oauth_token_secret=ak_secret)
tweets = []
MAX_ATTEMPTS = 10
COUNT_OF_TWEETS_TO_BE_FETCHED = 500
with open("Output.txt", "w") as text_file:
for i in range(0,MAX_ATTEMPTS):
if(COUNT_OF_TWEETS_TO_BE_FETCHED < len(tweets)):
break
if(0 == i):
results = t.search(q="#twitter",count='100')
else:
results = t.search(q="#twitter",include_entities='true',max_id=next_max_id)
for result in results['statuses']:
tweet_text = result['user']['screen_name'], result['user']['followers_count'], result['text'], result['created_at'], result['source']
tweets.append(tweet_text)
print tweet_text
text_file.write("#%s,%s,%s,%s,%s" % (result['user']['screen_name'], result['user']['followers_count'], result['text'], result['created_at'], result['source']))
text_file.write('\n')
I use Python's with statement here to open a context manager. The context manager will handle closing the file when you drop out of the loop. I also added another write command that writes out a carriage return so that each line of data would be on its own line.
You could also open the file in append mode ('a' instead of 'w'), which would allow you to remove the 2nd write command.
There are two general solutions to your issue. Which is best may depend on more details of your program.
The simplest solution is just to open the file once at the top of your program (before the loop) and then keep reusing the same file object over and over in the later code. Only when the whole loop is done should the file be closed.
with open("Output.txt", "w") as text_file:
for i in range(0,MAX_ATTEMPTS):
# ...
for result in results['statuses']:
# ...
text_file.write("#%s,%s,%s,%s,%s" % (result['user']['screen_name'],
result['user']['followers_count'],
result['text'],
result['created_at'],
result['source']))
Another solution would be to open the file several times, but to use the "a" append mode when you do so. Append mode does not truncate the file like "w" write mode does, and it seeks to the end automatically, so you don't overwrite the file's existing contents. This approach would be most appropriate if you were writing to several different files. If you're just writing to the one, I'd stick with the first solution.
for i in range(0,MAX_ATTEMPTS):
# ...
for result in results['statuses']:
# ...
with open("Output.txt", "a") as text_file:
text_file.write("#%s,%s,%s,%s,%s" % (result['user']['screen_name'],
result['user']['followers_count'],
result['text'],
result['created_at'],
result['source']))
One last point: It looks like you're writing out comma separated data. You may want to use the csv module, rather than writing your file manually. It can take care of things like quoting or escaping any commas that appear in the data for you.

Python 2.7.3: Search/Count txt file for string, return full line with final occurrence of that string

I'm trying to create a WiFi Log Scanner. Currently we go through logs manually using CTRL+F and our keywords. I just want to automate that process. i.e. bang in a .txt file and receive an output.
I've got the bones of the code, can work on making it pretty later, but I'm running into a small issue. I want the scanner to search the file (done), count instances of that string (done) and output the number of occurrences (done) followed by the full line where that string occurred last, including line number (line number is not essential, just makes things easier to do a gestimate of which is the more recent issue if there are multiple).
Currently I'm getting an output of every line with the string in it. I know why this is happening, I just can't think of a way to specify just output the last line.
Here is my code:
import os
from Tkinter import Tk
from tkFileDialog import askopenfilename
def file_len(filename):
#Count Number of Lines in File and Output Result
with open(filename) as f:
for i, l in enumerate(f):
pass
print('There are ' + str(i+1) + ' lines in ' + os.path.basename(filename))
def file_scan(filename):
#All Issues to Scan will go here
print ("DHCP was found " + str(filename.count('No lease, failing')) + " time(s).")
for line in filename:
if 'No lease, failing' in line:
print line.strip()
DNS= (filename.count('Host name lookup failure:res_nquery failed') + filename.count('HTTP query failed'))/2
print ("DNS Failure was found " + str(DNS) + " time(s).")
for line in filename:
if 'Host name lookup failure:res_nquery failed' or 'HTTP query failed' in line:
print line.strip()
print ("PSK= was found " + str(testr.count('psk=')) + " time(s).")
for line in ln:
if 'psk=' in line:
print 'The length(s) of the PSK used is ' + str(line.count('*'))
Tk().withdraw()
filename=askopenfilename()
abspath = os.path.abspath(filename) #So that doesn't matter if File in Python Dir
dname = os.path.dirname(abspath) #So that doesn't matter if File in Python Dir
os.chdir(dname) #So that doesn't matter if File in Python Dir
print ('Report for ' + os.path.basename(filename))
file_len(filename)
file_scan(filename)
That's, pretty much, going to be my working code (just have to add a few more issue searches), I have a version that searches a string instead of a text file here. This outputs the following:
Total Number of Lines: 38
DHCP was found 2 time(s).
dhcp
dhcp
PSK= was found 2 time(s).
The length(s) of the PSK used is 14
The length(s) of the PSK used is 8
I only have general stuff there, modified for it being a string rather than txt file, but the string I'm scanning from will be what's in the txt files.
Don't worry too much about PSK, I want all examples of that listed, I'll see If I can tidy them up into one line at a later stage.
As a side note, a lot of this is jumbled together from doing previous searches, so I have a good idea that there are probably neater ways of doing this. This is not my current concern, but if you do have a suggestion on this side of things, please provide an explanation/link to explanation as to why your way is better. I'm fairly new to python, so I'm mainly dealing with stuff I currently understand. :)
Thanks in advance for any help, if you need any further info, please let me know.
Joe
To search and count the string occurrence I solved in following way
'''---------------------Function--------------------'''
#Counting the "string" occurrence in a file
def count_string_occurrence():
string = "test"
f = open("result_file.txt")
contents = f.read()
f.close()
print "Number of '" + string + "' in file", contents.count("foo")
#we are searching "foo" string in file "result_file.txt"
I can't comment yet on questions, but I think I can answer more specifically with some more information What line do you want only one of?
For example, you can do something like:
search_str = 'find me'
count = 0
for line in file:
if search_str in line:
last_line = line
count += 1
print '{0} occurrences of this line:\n{1}'.format(count, last_line)
I notice that in file_scan you are iterating twice through file. You can surely condense it into one iteration :).