JScript regexp: start of line doesn't work? - regex

I want to cut path from config file:
var out = '#Path to the database root';
out += '\ndatadir="C:/Program Files/MySQL/MySQL Server 5.0/Data/"';
out += '\nblah-blah-blah-blah-blah';
var re = new RegExp('^datadir="(.*)"', 'g');
var result = out.match(re);
if (result == null){
WScript.Echo("datadir not found");
}
WScript.Echo("datadir=" + RegExp.lastParen);
but my code doesn't found the required string. On the other hand, if i remove the 'caret' symbol (^) it works. It's not a solution because I want to make sure I grab data from line which really starts with that word.
Update:
In fact '\n' is really the new line for me despite single quote. For example
WScript.Echo("out=" + out);
produces
out=#Path to the database root
datadir="C:/Program Files/MySQL/MySQL Server 5.0/Data/"
blah-blah-blah-blah-blah
What am I doing wrong?

A ^ boundary normally anchors to the beginning of the entire input string rather than the beginning of each individual line.
The m flag can be used to anchor at each line instead:
var re = new RegExp('^datadir="(.*)"', 'gm');
Example: http://jsfiddle.net/PjLd4/

Related

How to remove newlines inside csv cells using regex/terminal tools?

I have a csv file where some of the cells have newline character inside. For example:
id,name
01,"this is
with newline"
02,no newline
I want to remove all the newline characters inside cells.
How to do it with regex or with other terminal tools generically without knowing number of columns in advance?
This is actually a harder problem than it looks, and in my opinion, means that regex isn't the right solution. Because you're dealing with quoting/escaped strings, spanning multiple 'lines' you end up with a complicated and difficult to read regex. (It's not impossible, it's just messy).
I would suggest instead - use a parser. Perl has one in Text::CSV and it goes a bit like this:
#!/usr/bin/env perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new( { binary => 1, eol => "\n" } );
while ( my $row = $csv->getline( \*ARGV ) ) {
s/\n/ /g for #$row;
$csv->print( \*STDOUT, $row );
}
This will take files as piped in/specified on command line - that's what \*ARGV does - it's a special file handle that lets you do ... basically what sed does:
somecommand.sh | myscript.pl
myscript.pl filename_to_process
The ARGV filehandle doe either automagically. (You could explicitly open a file or use \*STDIN if you prefer)
I suspect that instead of removing the newline you actually want to replace it with a space. If your input file is as simple as it looks this should do it for you:
$ awk '{ORS=( (c+=gsub(/"/,"&"))%2 ? FS : RS )} 1' file
id,name
01,"this is with newline"
02,no newline
If you are using this xlsx2csv tool, it has this option:
-e, --escape Escape \r\n\t characters
Use it, and then replace \n as needed, like (if \n should be replaced by the empty string):
sed 's/\\n//g' filein.csv` > fileout.csv
In one pass:
PATH/TO/xlsx2csv.py -e filein.xlsx | sed 's/\\n//g' > fileout.csv
How to do it with regex or with other terminal tools generically without knowing number of columns in advance?
I don't think a regex is the most appropriate approach and might end up being quite complicated. Instead, I think a separate program to process the files might be easier to maintain in the long-term.
Since you're OK with any terminal tools, I've chosen python, and the code's below:
#!/usr/bin/python3 -B
import csv
import sys
with open(sys.argv[1]) as csvfile:
reader = csv.reader(csvfile)
for row in reader:
stripped = [col.replace('\n', ' ') for col in row]
print(','.join(stripped))
I think the code above is very straightforward and easy to understand, without a need for complicated regular expressions.
The input file here has the following contents:
id,name
01,"this is
with newline"
02,no newline
To prove it works, its output is reproduced below:
➜ ~ ./test.py input.csv
id,name
01,this is with newline
02,no newline
You could call the python script from some other program and feed filenames to it. You just need to add a minor update for the python program to write out files, if that's what you really need.
I've replaced the newlines with spaces to avoid a potentially unwanted concatenation (e.g. this iswith newline), but you can replace the newline with whatever you want, including the empty string ''.
I have written a method to remove the embedded new line inside the cell. The method below returns a java.util.List object that contains all rows in the CSV file
List<String> getAllRowsInCSVFileAsList(File selectedCSVFile){
FileReader fileReader = null;
BufferedReader reader = null;
List<String> values = new ArrayList<String>();
try{
fileReader = new FileReader(selectedCSVFile);
reader = new BufferedReader(fileReader);
String line = reader.readLine();
String previousLine = "";
//
boolean intendLineInCell = false;
while(line != null){
if(intendLineInCell){
if(line.indexOf("\"") != -1 && line.indexOf("\"") == line.lastIndexOf("\"")){
previousLine += line;
values.add(previousLine);
previousLine = "";
intendLineInCell = false;
} else if(line.indexOf("\"") != -1 && line.indexOf("\"") != line.lastIndexOf("\"")){
if(getTotalNumberOfCharacterSequenceOccurrenceInString("\"", line) % 2 == 0){
previousLine += line;
}else{
previousLine += line;
values.add(previousLine);
previousLine = "";
intendLineInCell = false;
}
} else{
previousLine += line;
}
}else{
if(line.indexOf("\"") == -1){
values.add(line);
}else if ((line.indexOf("\"") == line.lastIndexOf("\"")) && line.indexOf("\"") != -1){
intendLineInCell = true;
previousLine = line;
}else if(line.indexOf("\"") != line.lastIndexOf("\"") && line.indexOf("\"") != -1){
values.add(line);
}
}
line = reader.readLine();
}
}catch(IOException ie){
ie.printStackTrace();
}finally{
if(fileReader != null){
try {
fileReader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
if(reader != null){
try {
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return values;
}
int getTotalNumberOfCharacterSequenceOccurrenceInString(String characterSequence, String text){
int count = 0;
while(text.indexOf(characterSequence) != -1){
text = text.replaceFirst(characterSequence, "");
count++;
}
return count;
}
Imagine you are creating a csv file with one row and five columns and in the 4th cell you have an embedded new line(enter inside the cell)
Your data will be look like below (actually we have only one row in csv but if you opened it in notepad it would look like 2 rows).
dinesh,kumar,24,"23
tambaram india",green
If there is a enter inside the cell could be like below
"23
tambaram india"
That cell starts with double quote(") and ends with double quote(").
Through using the double quote(") while reading the line if there is a double quote(") we can understand there is a embedded enter inside the cell.
The code concats the next line with that line and checks whether there is an end double quote(") or not. If there is, it adds a new row in the java.util.List object else it concats the next line and check it for end double quote(") and so on. Here I have explained for one cell, but the method also works if the row has a lot of cells with embedded enter.
Open the *csv file with notepadd++ and then press Ctrl+ H. Go to tab replace and enter to search box the "newline" and then write to replace the word you want to replace or let it empty if you want.

how to copy a part of a text file and paste it in the end of the same line after adding semicolon , but for 1000 lines

I have a text file which had 1000 lines, and I want to copy a part of every line and past it at the end of the same line after adding a semicolon, but that must be done for 1000 lines.
I have imported the text file to excel so I can do it, but I did not get any hint to do it at one go.
here is for example how look likes the first line:
{"loginId":"gcdmtest_bp_pr_acc_po_20000#trash-mail.com","password":"test1234"};dc9b88ce-f26e-43fa-a2c1-6b67cc628404
I want to add a semicolon at the end of the line, and then copy the email pattern at the end of the same line like:
;gcdmtest_bp_pr_acc_po_20000#trash-mail.com
I'm just giving you a general guide to do this.
Read the text file (using C#, Java, whatever you are comfortable with)
Use a for loop to go through each line, extract the email portion, then add to the end of the line.
Save the new text file.
Thanks guys I found the solution on this code:
File mFile = new File(newfile);
FileInputStream fis = new FileInputStream(mFile);
BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(mFile))) ;
String result = "";
String line = "";
while( (line = br.readLine()) != null){
String emailpattern = line.substring(12, 54);
result += line + ";"+emailpattern + System.getProperty("line.separator");
}
result = result;
mFile.delete();
FileOutputStream fos = new FileOutputStream(mFile);
fos.write(result.getBytes());
fos.flush();

Notepad++ or UltraEdit: regex remove special duplicates

I need to remove duplicates if
key = anything
but NOT
key=anything
the key can be anything too
e.g.
edit_home=home must be in place
while
edit_home = home or even other string must be removed IF edit_home is a duplicate
for all the lines of the document
thank you
p.s. clearer example:
one=you are
two=we are
three_why=8908908
one = good
two = fine
three_4 = best
three_why = win
from that list i only need to keep:
one=you are
two=we are
three_why=8908908
three_4 = best // because three_4 doesn't have a duplicate
I found a method to do it, but I would need a better search list support by regex or a plugin or a direct regex (which I don't know).
That is: I have two files to compare.
One has the full keys, the other has incomplete.
I merge in a new file all the keys from the first file with those ones of the second, in groups (because the keys are in groups e.g. many keys titled one, many titled two and so on...). Then I regex replace all the keys in the new file by
find (.*)(\s\=\s) replace with \1\=
So they all become key=anything
Then I replace everything after = with empty to isolate the keys.
Then remove the duplicates.
At this point I have trouble to do something like
^.*(^keyone\b|^keytwo\b|^keythree\b).*$
to find all those keys in the document I need. So from that I can select all and replace with the correct keys.
Why? Because in this example the keys are 3 only BUT indeed the keys are many and the find field breaks at a certain point.
How to do it right?
Update: I found Toolbucket plugin which allows to search for many strings, but another issue is that in addition to duplicate, I also have to remove the original.
That is, if I find 2 times the same key "one" I have to remove all the lines containing one.
Ctrl + F
Find tab
Find what: ^.*\S=\S.*$
Find All in Current Document
Copy result from result window to a new window (the list of Line 1: Line 2: Line 3: ...)
Ctrl + F
Replace tab
(the following will remove the leading "Line number:" from every line)
Find what: ^.*?\d:\s
Replace with: Empty
ok, after all that i wrote, one solution could be (therefore, once i have the merged keys)
(?m)^(.*)$(?=\r?\n^(?!\1).*(?s).*?\1)
with this i can mark/highlight all the duplicated keys :-) so then i can manage those only, removing them from the first list and adding what remains to the second file...
If someone has a solution with a direct regex will be really appreciated
Here is a commented UltraEdit script for this task.
// Note: This script does not work for large files as it loads the
// entire file content into very limited scripting memory for fast
// processing even with multiple GB of RAM installed.
if (UltraEdit.document.length > 0) // Is any file opened?
{
// Define environment for this script and select entire file content.
UltraEdit.insertMode();
UltraEdit.columnModeOff();
UltraEdit.activeDocument.selectAll();
// Determine line termination used currently in active file.
var sLineTerm = "\r\n";
if (typeof(UltraEdit.activeDocument.lineTerminator) == "number")
{
// The two lines below require UE v16.00 or UES v10.00 or later.
if (UltraEdit.activeDocument.lineTerminator == 1) sLineTerm = "\n";
else if (UltraEdit.activeDocument.lineTerminator == 2) sLineTerm = "\r";
}
else // This version of UE/UES does not offer line terminator property.
{
if (UltraEdit.activeDocument.selection.indexOf(sLineTerm) < 0)
{
sLineTerm = "\n"; // Not DOS, perhaps UNIX.
if (UltraEdit.activeDocument.selection.indexOf(sLineTerm) < 0)
{
sLineTerm = "\r"; // Also not UNIX, perhaps MAC.
if (UltraEdit.activeDocument.selection.indexOf(sLineTerm) < 0)
{
sLineTerm = "\r\n"; // No line terminator, use DOS.
}
}
}
}
// Get all lines of active file into an array of strings
// with each string being one line from active file.
var asLines = UltraEdit.activeDocument.selection.split(sLineTerm);
var nTotalLines = asLines.length;
// Process each line in the array.
for(var nCurrentLine = 0; nCurrentLine < asLines.length; nCurrentLine++)
{
// Skip all lines not containing or starting with an equal sign.
if (asLines[nCurrentLine].indexOf('=') < 1) continue;
// Get string left to equal sign with tabs/spaces trimmed.
var sKey = asLines[nCurrentLine].replace(/^[\t ]*([^\t =]+).*$/,"$1");
// Skip lines beginning with just tabs/spaces left to equal sign.
if (sKey.length == asLines[nCurrentLine].length) continue;
var_dump(sKey);
// Build the regular expression for the search in all other lines.
var rRegSearch = new RegExp("^[\\t ]*"+sKey+"[\\t ]*=","g");
// Ceck all remaining lines for a line also starting with
// this key string case-sensitive with left to an equal sign.
var nLineCompare = nCurrentLine + 1;
while(nLineCompare < asLines.length)
{
// Does this line also has this key left to equal
// sign with or without surrounding spaces/tabs?
if (asLines[nLineCompare].search(rRegSearch) < 0)
{
nLineCompare++; // No, continue on next line.
}
else // Yes, remove this line from array.
{
asLines.splice(nLineCompare,1);
}
}
}
// Was any line removed from the array?
if (nTotalLines == asLines.length)
{
UltraEdit.activeDocument.top(); // Cancel the selection.
UltraEdit.messageBox("Nothing found to remove!");
}
else
{
// If version of UE/UES supports direct write to clipboard, use
// user clipboard 9 to paste the lines into file with overwritting
// everything as this is much faster than using write command in
// older versions of UE/UES.
if (typeof(UltraEdit.clipboardContent) == "string")
{
var nActiveClipboard = UltraEdit.clipboardIdx;
UltraEdit.selectClipboard(9);
UltraEdit.clipboardContent = asLines.join(sLineTerm);
UltraEdit.activeDocument.paste();
UltraEdit.clearClipboard();
UltraEdit.selectClipboard(nActiveClipboard);
}
else UltraEdit.activeDocument.write(asLines.join(sLineTerm));
var nRemoved = nTotalLines - asLines.length;
UltraEdit.activeDocument.top();
UltraEdit.messageBox("Removed " + nRemoved + " line" + ((nRemoved != 1) ? "s" : "") + " on updated file.");
}
}
Copy this code and paste it into a new ASCII file using DOS line terminators in UltraEdit.
Next use command File - Save As to save the script file for example with name RemoveDuplicateKeys.js into %AppData%\IDMComp\UltraEdit\MyScripts or wherever you want to have saved your UltraEdit scripts.
Open Scripting - Scripts and add the just saved UltraEdit script to the list of scripts. You can enter a description for this script, too.
Open the file with the list, or make this file active if it is already opened in UltraEdit.
Run the script by clicking on it in menu Scripting, or by opening Views - Views/Lists - Script List and double clicking on the script.

Regex expression to search all files and subdirectories for expression and return only the first line of match

I need a regular expression utility that will search through a specified directory and return only the first line of each file, or a special 13 digit number on the first line.
Is there a simple and effective way to do this in a regex in c# or vb.net or vb6?
The code I'm trying to search for is this:000999D5, but the 13 digit number I want to return is only on the first line.
Thanks,
Marc
I can't imagine a readily available API request to do this, so you'll have to code this yourself. Nothing too fancy here.
public class ScanDirectory
{
public void WalkDirectory(string directory)
{
WalkDirectory(new DirectoryInfo(directory));
}
private void WalkDirectory(DirectoryInfo directory)
{
// Scan all files in the current path
foreach (FileInfo file in directory.GetFiles())
{
// Do something with each file.
}
DirectoryInfo [] subDirectories = directory.GetDirectories();
// Scan the directories in the current directory and call this method
// again to go one level into the directory tree
foreach (DirectoryInfo subDirectory in subDirectories)
{
WalkDirectory(subDirectory);
}
}
}
(Code is from here: http://www.codeproject.com/KB/cs/ScanDirectory.aspx)
And in every file you'll have to read a first line. You can do this with
// Read the file and display it line by line.
System.IO.StreamReader file = new System.IO.StreamReader("c:\\test.txt");
String firstLine = file.ReadLine();
if (null != firstLine)
{
// do regexp comparison
}
Regex comparison should look like this:
string input = "0123456789132";
// Here we call Regex.Match.
Match match = Regex.Match(input, #"([0-9]{13})",
RegexOptions.IgnoreCase);
// Here we check the Match instance.
if (match.Success)
{
// DO your stuff
}
Probably you'll need to change regexp to match your exact requirements, as they are not very clear at the moment.

Match beginning of file to string literal

I'm working with a multi line text block where I need to divide everything into 3 groups
1: beginning of the file up to a string literal // don't keep
2: The next line //KEEP THE LINE FOLLOWING STRING LITERAL
3: Everything following that line to the end of file. // don't keep
<<
aFirstLing here
aSecondLine here
MyStringLiteral //marks the next line as the target to keep
What I want to Keep!
all kinds of crap that I don't
<<
I'm finding plenty of ways to pull from the beginning of a line but am unable to see how to include an unknown number of non-blank lines until I reach that string literal.
EDIT: I'm removing the .net-ness to focus on regex only. Perhaps this is a place for understanding backreferences?
Rather than read the entire file into memory, just read what you need:
List<string> TopLines = new List<string>();
string prevLine = string.Empty;
foreach (var link in File.ReadLines(filename))
{
TopLines.Add(line);
if (prevLine == Literal)
{
break;
}
prevLine = line;
}
I suppose there's a LINQ solution, although I don't know what it is.
EDIT:
If you already have the text of the email in you application (as a string), you have to split it into lines first. You can do that with String.Split, splitting on newlines, or you can create a StringReader and read it line-by-line. The logic above still applies, but rather than File.ReadLines, just use foreach on the array of lines.
EDIT 2:
The following LINQ might do it:
TopLines = File.ReadLines(filename).TakeWhile(s => s != Literal).ToList();
TopLines.Add(Literal);
Or, if the strings are already in a list:
TopLines = lines.TakeWhile(s => s != Literal).ToList();
TopLines.Add(Literal);
.*(^MyStringLiteral\r?\n)([\w|\s][^\r\n]+)(.+) seems to work. the trick wasn't back references - it was the exclusion of \r\n.
File.ReadAllLines() will give you an array you can iterate over until you find your literal, then take the next line
string[] lines = File.ReadAllLines();
for(int i;i<lines.Length;i++)
{
if(line == Literal)
return lines[i + 1];
}