Notepad++ or UltraEdit: regex remove special duplicates - regex

I need to remove duplicates if
key = anything
but NOT
key=anything
the key can be anything too
e.g.
edit_home=home must be in place
while
edit_home = home or even other string must be removed IF edit_home is a duplicate
for all the lines of the document
thank you
p.s. clearer example:
one=you are
two=we are
three_why=8908908
one = good
two = fine
three_4 = best
three_why = win
from that list i only need to keep:
one=you are
two=we are
three_why=8908908
three_4 = best // because three_4 doesn't have a duplicate
I found a method to do it, but I would need a better search list support by regex or a plugin or a direct regex (which I don't know).
That is: I have two files to compare.
One has the full keys, the other has incomplete.
I merge in a new file all the keys from the first file with those ones of the second, in groups (because the keys are in groups e.g. many keys titled one, many titled two and so on...). Then I regex replace all the keys in the new file by
find (.*)(\s\=\s) replace with \1\=
So they all become key=anything
Then I replace everything after = with empty to isolate the keys.
Then remove the duplicates.
At this point I have trouble to do something like
^.*(^keyone\b|^keytwo\b|^keythree\b).*$
to find all those keys in the document I need. So from that I can select all and replace with the correct keys.
Why? Because in this example the keys are 3 only BUT indeed the keys are many and the find field breaks at a certain point.
How to do it right?
Update: I found Toolbucket plugin which allows to search for many strings, but another issue is that in addition to duplicate, I also have to remove the original.
That is, if I find 2 times the same key "one" I have to remove all the lines containing one.

Ctrl + F
Find tab
Find what: ^.*\S=\S.*$
Find All in Current Document
Copy result from result window to a new window (the list of Line 1: Line 2: Line 3: ...)
Ctrl + F
Replace tab
(the following will remove the leading "Line number:" from every line)
Find what: ^.*?\d:\s
Replace with: Empty

ok, after all that i wrote, one solution could be (therefore, once i have the merged keys)
(?m)^(.*)$(?=\r?\n^(?!\1).*(?s).*?\1)
with this i can mark/highlight all the duplicated keys :-) so then i can manage those only, removing them from the first list and adding what remains to the second file...
If someone has a solution with a direct regex will be really appreciated

Here is a commented UltraEdit script for this task.
// Note: This script does not work for large files as it loads the
// entire file content into very limited scripting memory for fast
// processing even with multiple GB of RAM installed.
if (UltraEdit.document.length > 0) // Is any file opened?
{
// Define environment for this script and select entire file content.
UltraEdit.insertMode();
UltraEdit.columnModeOff();
UltraEdit.activeDocument.selectAll();
// Determine line termination used currently in active file.
var sLineTerm = "\r\n";
if (typeof(UltraEdit.activeDocument.lineTerminator) == "number")
{
// The two lines below require UE v16.00 or UES v10.00 or later.
if (UltraEdit.activeDocument.lineTerminator == 1) sLineTerm = "\n";
else if (UltraEdit.activeDocument.lineTerminator == 2) sLineTerm = "\r";
}
else // This version of UE/UES does not offer line terminator property.
{
if (UltraEdit.activeDocument.selection.indexOf(sLineTerm) < 0)
{
sLineTerm = "\n"; // Not DOS, perhaps UNIX.
if (UltraEdit.activeDocument.selection.indexOf(sLineTerm) < 0)
{
sLineTerm = "\r"; // Also not UNIX, perhaps MAC.
if (UltraEdit.activeDocument.selection.indexOf(sLineTerm) < 0)
{
sLineTerm = "\r\n"; // No line terminator, use DOS.
}
}
}
}
// Get all lines of active file into an array of strings
// with each string being one line from active file.
var asLines = UltraEdit.activeDocument.selection.split(sLineTerm);
var nTotalLines = asLines.length;
// Process each line in the array.
for(var nCurrentLine = 0; nCurrentLine < asLines.length; nCurrentLine++)
{
// Skip all lines not containing or starting with an equal sign.
if (asLines[nCurrentLine].indexOf('=') < 1) continue;
// Get string left to equal sign with tabs/spaces trimmed.
var sKey = asLines[nCurrentLine].replace(/^[\t ]*([^\t =]+).*$/,"$1");
// Skip lines beginning with just tabs/spaces left to equal sign.
if (sKey.length == asLines[nCurrentLine].length) continue;
var_dump(sKey);
// Build the regular expression for the search in all other lines.
var rRegSearch = new RegExp("^[\\t ]*"+sKey+"[\\t ]*=","g");
// Ceck all remaining lines for a line also starting with
// this key string case-sensitive with left to an equal sign.
var nLineCompare = nCurrentLine + 1;
while(nLineCompare < asLines.length)
{
// Does this line also has this key left to equal
// sign with or without surrounding spaces/tabs?
if (asLines[nLineCompare].search(rRegSearch) < 0)
{
nLineCompare++; // No, continue on next line.
}
else // Yes, remove this line from array.
{
asLines.splice(nLineCompare,1);
}
}
}
// Was any line removed from the array?
if (nTotalLines == asLines.length)
{
UltraEdit.activeDocument.top(); // Cancel the selection.
UltraEdit.messageBox("Nothing found to remove!");
}
else
{
// If version of UE/UES supports direct write to clipboard, use
// user clipboard 9 to paste the lines into file with overwritting
// everything as this is much faster than using write command in
// older versions of UE/UES.
if (typeof(UltraEdit.clipboardContent) == "string")
{
var nActiveClipboard = UltraEdit.clipboardIdx;
UltraEdit.selectClipboard(9);
UltraEdit.clipboardContent = asLines.join(sLineTerm);
UltraEdit.activeDocument.paste();
UltraEdit.clearClipboard();
UltraEdit.selectClipboard(nActiveClipboard);
}
else UltraEdit.activeDocument.write(asLines.join(sLineTerm));
var nRemoved = nTotalLines - asLines.length;
UltraEdit.activeDocument.top();
UltraEdit.messageBox("Removed " + nRemoved + " line" + ((nRemoved != 1) ? "s" : "") + " on updated file.");
}
}
Copy this code and paste it into a new ASCII file using DOS line terminators in UltraEdit.
Next use command File - Save As to save the script file for example with name RemoveDuplicateKeys.js into %AppData%\IDMComp\UltraEdit\MyScripts or wherever you want to have saved your UltraEdit scripts.
Open Scripting - Scripts and add the just saved UltraEdit script to the list of scripts. You can enter a description for this script, too.
Open the file with the list, or make this file active if it is already opened in UltraEdit.
Run the script by clicking on it in menu Scripting, or by opening Views - Views/Lists - Script List and double clicking on the script.

Related

How to speed up regex searching for large quantity of potentially large files in C++?

I'm trying to make a program to read user inputted wildcard files and wildcard strings using an excel document as a configuration file. For example the user may be able to enter in C:\Read*.txt, and any files in the C drive that start with Read and then any characters after read and are text files will be included in the search.
They could search for Message: * and all strings beginning with "Message: " and ending with any sequence of characters would get matched.
So far it is a working program but the problem is that the speed efficiency is quite terrible and I need it to be able to search very large files. I'm using a filestream and the regex class to do so and I'm not sure what is taking so much time.
The bulk of the time in my code is being spent in the following loop (I've only included the lines above the while loop so you can better understand what I'm trying to do):
smatch matches;
vector<regex> expressions;
for (int i = 0; i < regex_patterns.size(); i++){expressions.emplace_back(regex_patterns.at(i));}
auto startTimer = high_resolution_clock::now();
// Open file and begin reading
ifstream stream1(filePath);
if (stream1.is_open())
{
int count = 0;
while (getline(stream1, line))
{
// Continue to next step if line is empty, no point in searching it.
if (line.size() == 0)
{
// Continue to next step if line is empty, no point in searching it.
continue;
}
// Loop through each search string, if match, save line number and line text,
for (int i = 0; i < expressions.size(); i++)
{
size_t found = regex_search(line, matches, expressions.at(i));
if (found == 1)
{
lineNumb.push_back(count);
lineTextToSave.push_back(line);
}
}
count = count + 1;
}
}
auto stopTimer = high_resolution_clock::now();
auto duration2 = duration_cast<milliseconds>(stopTimer - startTimer);
cout << "Time to search file: " << duration2.count() << "\n";
Is there a better method of searching files than this? I tried looking up many things but haven't found a programmatic example that I've understood thus far.
Some ideas by order of priority:
You could join all the regex patterns together to form a single regex instead of matching r regexes on each line. This will speed up your program by a factor of r. Example: (R1)|(R2)|(...)|(Rr)
Ensure you are compiling the regex before usage.
Do not add the final .* to your regex pattern.
Some ideas but non-portable:
Memory map the file instead of reading through iostreams
Consider if it is worth reimplementing grep instead of calling to grep through popen()

Gtk::TextView with constant string

I am using Gtkmm 3+ and What I am trying to do is have the text buffer have the constant string "> " even if the user tries to delete it. In addition when the user pressed return it will automatically be there again. Basically have a constant string like a terminal does.
The only way I can think about about accomplishing this would be to connect to the delete and backspace signals so the user cannot delete the string. But, is there a better way?
so far this is the only way I can think of:
//in constructor
txt_view_i_.signal_event().connect(sigc::mem_fun(*this, &MainWindow::inputEvent));
//function
bool MainWindow::inputEvent(GdkEvent* event)
{
if((event->key.keyval == GDK_KEY_BackSpace || event->key.keyval == GDK_KEY_Delete) && buffer_input_->get_char_count() < 3)
return true;
return false;
}
But doesn't work perfectly, because if you type in more then 3 characters then go to the beginning of the line you can delete the constant string.
Another way I just thought about was to add a label to the TextView widget. I did that but, the user could still delete it. Here is the code for that:
Gtk::TextBuffer::iterator it = buffer_input_->get_iter_at_line(1);
Glib::RefPtr<Gtk::TextChildAnchor> refAnchor = buffer_input_->create_child_anchor(it);
Gtk::Label* lbl = Gtk::manage(new Gtk::Label("> "));
txt_view_i_.add_child_at_anchor(*lbl, refAnchor);
This is very similar, but not quite identical, to the question I answered here: You can create a GtkTextTag that makes its contents uneditable, and apply it from the beginning of the buffer up to and including the "> " prompt.
Then when you receive input, append your output to the buffer and then append a new prompt on the next line, and re-apply the tag to make the whole thing uneditable.
The links in the linked answer show some C code where this is done, even including a prompt. It's not Gtkmm or C++, but it should serve as an illustration.
Here is the code I used to solve it:
Glib::RefPtr<Gtk::TextBuffer::Tag> tag = Gtk::TextBuffer::Tag::create();
tag->property_editable() = false;
Glib::RefPtr<Gtk::TextBuffer::TagTable> tag_table = Gtk::TextBuffer::TagTable::create();
tag_table->add(tag);
buffer_input_ = Gtk::TextBuffer::create(tag_table);
txt_view_i_.set_buffer(buffer_input_);
scroll_win_i_.add(txt_view_i_);
Gtk::TextBuffer::iterator buffer_it_ = buffer_input_->begin();
buffer_input_->insert_with_tag(buffer_it_, "> ", tag);
Here is how I made it so that the user cannot edit before the constant string:
//connect to the mark set signal
buffer_input_->signal_mark_set().connect(sigc::mem_fun(*this, &MainWindow::setMark));
//make the box uneditable
void MainWindow::setMark(const Gtk::TextBuffer::iterator& it, const Glib::RefPtr<Gtk::TextBuffer::Mark>& mark)
{
if(it.get_offset() < 2)
txt_view_i_.set_editable(false);
else
txt_view_i_.set_editable(true);
}
Hopefully someone will find this useful.

Match beginning of file to string literal

I'm working with a multi line text block where I need to divide everything into 3 groups
1: beginning of the file up to a string literal // don't keep
2: The next line //KEEP THE LINE FOLLOWING STRING LITERAL
3: Everything following that line to the end of file. // don't keep
<<
aFirstLing here
aSecondLine here
MyStringLiteral //marks the next line as the target to keep
What I want to Keep!
all kinds of crap that I don't
<<
I'm finding plenty of ways to pull from the beginning of a line but am unable to see how to include an unknown number of non-blank lines until I reach that string literal.
EDIT: I'm removing the .net-ness to focus on regex only. Perhaps this is a place for understanding backreferences?
Rather than read the entire file into memory, just read what you need:
List<string> TopLines = new List<string>();
string prevLine = string.Empty;
foreach (var link in File.ReadLines(filename))
{
TopLines.Add(line);
if (prevLine == Literal)
{
break;
}
prevLine = line;
}
I suppose there's a LINQ solution, although I don't know what it is.
EDIT:
If you already have the text of the email in you application (as a string), you have to split it into lines first. You can do that with String.Split, splitting on newlines, or you can create a StringReader and read it line-by-line. The logic above still applies, but rather than File.ReadLines, just use foreach on the array of lines.
EDIT 2:
The following LINQ might do it:
TopLines = File.ReadLines(filename).TakeWhile(s => s != Literal).ToList();
TopLines.Add(Literal);
Or, if the strings are already in a list:
TopLines = lines.TakeWhile(s => s != Literal).ToList();
TopLines.Add(Literal);
.*(^MyStringLiteral\r?\n)([\w|\s][^\r\n]+)(.+) seems to work. the trick wasn't back references - it was the exclusion of \r\n.
File.ReadAllLines() will give you an array you can iterate over until you find your literal, then take the next line
string[] lines = File.ReadAllLines();
for(int i;i<lines.Length;i++)
{
if(line == Literal)
return lines[i + 1];
}

awk: Either modify or append a line, based on its existence

I have a small awk script that does some in-place file modifications (to a Java .properties file, to give you an idea). This is part of a deployment script affecting a bunch of users.
I want to be able to set defaults, leaving the rest of the file at the user's preferences. This means appending a configuration line if it is missing, modifying it if it is there, leaving everything else as it is.
Currently I use something like this:
# initialize
BEGIN {
some_value_set = 0
other_value_set = 0
some_value_default = "some.value=SOME VALUE"
other_value_default = "other.value=OTHER VALUE"
}
# modify existing lines
{
if (/^some\.value=.*/)
{
gsub(/.*/, some_value_default)
some_value_set = 1
}
else if (/^other\.value=.*/)
{
gsub(/.*/, other_value_default)
other_value_set = 1
}
print $0
}
# append missing lines
END {
if (some_value_set == 0) print some_value_default
if (other_value_set == 0) print other_value_default
}
Especially when the number of lines I want to control gets larger, this is increasingly cumbersome. My awk knowledge is not all that great, and the above just feels wrong - how can I streamline this?
P.S.: If possible, I'd like to stay with awk. Please don't just recommend that using Perl/Python/whatever would be much easier. :-)
BEGIN {
defaults["some.value"] = "SOME VALUE"
defaults["other.value"] = "OTHER VALUE"
}
{
for (key in defaults) {
pattern = key
gsub(/\./, "\\.", pattern)
if (match($0, "^" pattern "=.*")) {
gsub(/=.*/, "=" defaults[key])
delete defaults[key]
}
}
print $0
}
END {
for (key in defaults) {
print key "=" defaults[key]
}
}
My AWK is rusty, so I won't provide actual code.
Initialize an array with the regular expressions and values.
For each line, iterate the array and do appropriate substitutions. Clean out used entries.
At end, iterate the array and append lines for remaining entries.

What is the easiest way to parse an INI File in C++? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I'm trying to parse an INI file using C++. Any tips on what is the best way to achieve this? Should I use the Windows API tools for INI file processing (with which I am totally unfamiliar), an open-source solution or attempt to parse it manually?
You can use the Windows API functions, such as GetPrivateProfileString() and GetPrivateProfileInt().
If you need a cross-platform solution, try Boost's Program Options library.
I have never parsed ini files, so I can't be too specific on this issue.
But i have one advice:
Don't reinvent the wheel as long as an existing one meets your requirements
http://en.wikipedia.org/wiki/INI_file#Accessing_INI_files
http://sdl-cfg.sourceforge.net/
http://sourceforge.net/projects/libini/
http://www.codeproject.com/KB/files/config-file-parser.aspx
Good luck :)
If you are already using Qt
QSettings my_settings("filename.ini", QSettings::IniFormat);
Then read a value
my_settings.value("GroupName/ValueName", <<DEFAULT_VAL>>).toInt()
There are a bunch of other converter that convert your INI values into both standard types and Qt types. See Qt documentation on QSettings for more information.
I use SimpleIni. It's cross-platform.
this question is a bit old, but I will post my answer. I have tested various INI classes (you can see them on my website) and I also use simpleIni because I want to work with INI files on both windows and winCE.
Window's GetPrivateProfileString() works only with the registry on winCE.
It is very easy to read with simpleIni. Here is an example:
#include "SimpleIni\SimpleIni.h"
CSimpleIniA ini;
ini.SetUnicode();
ini.LoadFile(FileName);
const char * pVal = ini.GetValue(section, entry, DefaultStr);
inih is a simple ini parser written in C, it comes with a C++ wrapper too. Example usage:
#include "INIReader.h"
INIReader reader("test.ini");
std::cout << "version="
<< reader.GetInteger("protocol", "version", -1) << ", name="
<< reader.Get("user", "name", "UNKNOWN") << ", active="
<< reader.GetBoolean("user", "active", true) << "\n";
The author has also a list of existing libraries here.
Have you tried libconfig; very JSON-like syntax. I prefer it over XML configuration files.
I ended up using inipp which is not mentioned in this thread.
https://github.com/mcmtroffaes/inipp
Was a MIT licensed header only implementation which was simple enough to add to a project and 4 lines to use.
If you are interested in platform portability, you can also try Boost.PropertyTree. It supports ini as persistancy format, though the property tree my be 1 level deep only.
Unless you plan on making the app cross-platform, using the Windows API calls would be the best way to go. Just ignore the note in the API documentation about being provided only for 16-bit app compatibility.
I know this question is very old, but I came upon it because I needed something cross platform for linux, win32... I wrote the function below, it is a single function that can parse INI files, hopefully others will find it useful.
rules & caveats:
buf to parse must be a NULL terminated string. Load your ini file into a char array string and call this function to parse it.
section names must have [] brackets around them, such as this [MySection], also values and sections must begin on a line without leading spaces. It will parse files with Windows \r\n or with Linux \n line endings. Comments should use # or // and begin at the top of the file, no comments should be mixed with INI entry data. Quotes and ticks are trimmed from both ends of the return string. Spaces are only trimmed if they are outside of the quote. Strings are not required to have quotes, and whitespaces are trimmed if quotes are missing. You can also extract numbers or other data, for example if you have a float just perform a atof(ret) on the ret buffer.
// -----note: no escape is nessesary for inner quotes or ticks-----
// -----------------------------example----------------------------
// [Entry2]
// Alignment = 1
// LightLvl=128
// Library = 5555
// StrValA = Inner "quoted" or 'quoted' strings are ok to use
// StrValB = "This a "quoted" or 'quoted' String Value"
// StrValC = 'This a "tick" or 'tick' String Value'
// StrValD = "Missing quote at end will still work
// StrValE = This is another "quote" example
// StrValF = " Spaces inside the quote are preserved "
// StrValG = This works too and spaces are trimmed away
// StrValH =
// ----------------------------------------------------------------
//12oClocker super lean and mean INI file parser (with section support)
//set section to 0 to disable section support
//returns TRUE if we were able to extract a string into ret value
//NextSection is a char* pointer, will be set to zero if no next section is found
//will be set to pointer of next section if it was found.
//use it like this... char* NextSection = 0; GrabIniValue(X,X,X,X,X,&NextSection);
//buf is data to parse, ret is the user supplied return buffer
BOOL GrabIniValue(char* buf, const char* section, const char* valname, char* ret, int retbuflen, char** NextSection)
{
if(!buf){*ret=0; return FALSE;}
char* s = buf; //search starts at "s" pointer
char* e = 0; //end of section pointer
//find section
if(section)
{
int L = strlen(section);
SearchAgain1:
s = strstr(s,section); if(!s){*ret=0; return FALSE;} //find section
if(s > buf && (*(s-1))!='\n'){s+=L; goto SearchAgain1;} //section must be at begining of a line!
s+=L; //found section, skip past section name
while(*s!='\n'){s++;} s++; //spin until next line, s is now begining of section data
e = strstr(s,"\n["); //find begining of next section or end of file
if(e){*e=0;} //if we found begining of next section, null the \n so we don't search past section
if(NextSection) //user passed in a NextSection pointer
{ if(e){*NextSection=(e+1);}else{*NextSection=0;} } //set pointer to next section
}
//restore char at end of section, ret=empty_string, return FALSE
#define RESTORE_E if(e){*e='\n';}
#define SAFE_RETURN RESTORE_E; (*ret)=0; return FALSE
//find valname
int L = strlen(valname);
SearchAgain2:
s = strstr(s,valname); if(!s){SAFE_RETURN;} //find valname
if(s > buf && (*(s-1))!='\n'){s+=L; goto SearchAgain2;} //valname must be at begining of a line!
s+=L; //found valname match, skip past it
while(*s==' ' || *s == '\t'){s++;} //skip spaces and tabs
if(!(*s)){SAFE_RETURN;} //if NULL encounted do safe return
if(*s != '='){goto SearchAgain2;} //no equal sign found after valname, search again
s++; //skip past the equal sign
while(*s==' ' || *s=='\t'){s++;} //skip spaces and tabs
while(*s=='\"' || *s=='\''){s++;} //skip past quotes and ticks
if(!(*s)){SAFE_RETURN;} //if NULL encounted do safe return
char* E = s; //s is now the begining of the valname data
while(*E!='\r' && *E!='\n' && *E!=0){E++;} E--; //find end of line or end of string, then backup 1 char
while(E > s && (*E==' ' || *E=='\t')){E--;} //move backwards past spaces and tabs
while(E > s && (*E=='\"' || *E=='\'')){E--;} //move backwards past quotes and ticks
L = E-s+1; //length of string to extract NOT including NULL
if(L<1 || L+1 > retbuflen){SAFE_RETURN;} //empty string or buffer size too small
strncpy(ret,s,L); //copy the string
ret[L]=0; //null last char on return buffer
RESTORE_E;
return TRUE;
#undef RESTORE_E
#undef SAFE_RETURN
}
How to use... example....
char sFileData[] = "[MySection]\r\n"
"MyValue1 = 123\r\n"
"MyValue2 = 456\r\n"
"MyValue3 = 789\r\n"
"\r\n"
"[MySection]\r\n"
"MyValue1 = Hello1\r\n"
"MyValue2 = Hello2\r\n"
"MyValue3 = Hello3\r\n"
"\r\n";
char str[256];
char* sSec = sFileData;
char secName[] = "[MySection]"; //we support sections with same name
while(sSec)//while we have a valid sNextSec
{
//print values of the sections
char* next=0;//in case we dont have any sucessful grabs
if(GrabIniValue(sSec,secName,"MyValue1",str,sizeof(str),&next)) { printf("MyValue1 = [%s]\n",str); }
if(GrabIniValue(sSec,secName,"MyValue2",str,sizeof(str),0)) { printf("MyValue2 = [%s]\n",str); }
if(GrabIniValue(sSec,secName,"MyValue3",str,sizeof(str),0)) { printf("MyValue3 = [%s]\n",str); }
printf("\n");
sSec = next; //parse next section, next will be null if no more sections to parse
}
Maybe a late answer..But, worth knowing options..If you need a cross-platform solution , definitely you can try GLIB,, its interesting.. (https://developer.gnome.org/glib/stable/glib-Key-value-file-parser.html)