How to parse multiline records in groovy?

How to parse multiline records in groovy? - regex

I have log file containing somewhere five * in two places. The file can be big.
Log record
*****
Log record
Log record
*****
Log record
I would like to get everything which is between five *. Right, I can read line by line but perhaps there are better solutions like parsing using Regular Expressions in Groovy?
Thank you.

You could also write a custom Reader like:
class DelimitedReader extends BufferedReader {
String delimiterLine
DelimitedReader( String delimiterLine, Reader reader ) {
super( reader )
this.delimiterLine = delimiterLine
scanUntilDelimiter()
}
private scanUntilDelimiter() {
String line = super.readLine()
while( line != null && line != delimiterLine ) {
line = super.readLine()
}
}
String readLine() {
String line = super.readLine()
if( line == delimiterLine ) {
line = null
}
line
}
}
And then, you can do something like this to iterate over them
new File( '/tmp/test.txt' ).withReader { r ->
new DelimitedReader( '*****', r ).eachLine { line ->
println line
}
}
This saves you having the whole file loaded in to a single (potentially huge) string

Try this regex:
(?s)(?<=[*]{5}).+(?=[*]{5})
Demo
http://groovyconsole.appspot.com/script/2405001

This regex matches everything between the first ***** and the next one:
(?<=\*{5})[\s\S]*(?=\*{5})

Related

LINQ query to find elements from a list in a csv file

I am trying to find if a any element of a first list is found for each line of a csv file
First list :
XX01235756777
YY01215970799
Second list (that would be the csv file) :
Column_1|Column_2|Column_3|Column_4|Column_5|Column_6|Column_7
VX|2022-06-09 11:50:55|Y|Y|N|TT56431254135|Microsoft
VX|2022-06-09 11:50:55|Y|Y|N|XX01235756777|Meta
VX|2022-06-09 11:50:55|Y|Y|N|YY18694654355|Nokia
VX|2022-06-09 11:50:55|Y|Y|N|OO01215970799|BlackStone
VX|2022-06-09 11:50:55|Y|Y|N|YY01215970799|Alphabet
My code attempt :
List<string> filteredList = new List<string>();
using (StreamReader reader = new StreamReader(csvFilePath))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (line.Any(x => isinList.Contains(x.ToString())))
{
filteredList.Add(line);
}
}
}
return filteredList;
filteredList should be :
VX|2022-06-09 11:50:55|Y|Y|N|XX01235756777|Meta
VX|2022-06-09 11:50:55|Y|Y|N|YY01215970799|Alphabet
I succeeded at finding a single element in each line of the csv file.
But I can't code the right LINQ to process if whole list is present in each line.
Any ideas or anywhere you can point me to would be a great help.

What you have ends up splitting the line from the csv file into characters and each character is checked against the isinList. You should instead check if any string in the isinList exists in each line from the csv file.
List<string> filteredList = new List<string>();
using (StreamReader reader = new StreamReader(csvFilePath))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (isinList.Any(x => line.Contains(x)))
{
filteredList.Add(line);
}
}
}

Using LINQ, and the File.ReadLines() to read the lines in the file, you can filter as follows:
var filteredLines = File.ReadLines(csvFilePath)
.Where(line => isinList.Any(isin => line.Contains(isin)))
.ToList();
OTOH, if isinList is very large, or the CSV file is very long, and you know Column_6 is the only possible match, you might gain performance by using a HashSet<string> and testing only Column_6:
var isinSet = isinList.ToHashSet();
var filteredLines = File.ReadLines(csvFilePath)
.Where(line => isinSet.Contains(line.Split('|')[5]))
.ToList();

How to extract values from a text file using a script?

If I have a file with the contents as below:
field1=value1,field2=value2,field3=value3,field4=value4,field5=value5,..;(new line)
field1=value1.1,field2=value1.2,field3=value1.3,field4=value1.4,field5=value1.5,...; (new line)
.....
....
...
Each line ends with semi-colon and a new line character.
How can I extract and store(or display) in the below format?

string[] lines = System.IO.File.ReadAllLines(#"C:\Users\Public\TestFolder\WriteLines2.txt");
foreach (string line in lines)
{
String[] Contain=line.Split(",");
foreach (string ordata in Contain)
{
String[] data=ordata.Split("=");
var Value=data[1];
// Write Code for store Data
}
}
above code helps you to make it working.

Better way to skip comments in a file parsing

At the moment I use
<file>.eachLine { line ->
if (line ==~ /^#.*$/) {
return // skip comments
}
}
Is there an easier way?

Are you trying to separate the test for comments from the rest of the code in your closure?
You could do this, for some File 'f'....
f.filterLine( { it ==~ /^[^#].*/ } ).each { < process non-comments > }

Index of string containing a part of string (one word)

im trying to read a large file, so i thought that instead of looping with an array i decided to use a list, but I'm having some difficulties with searching a line which contains a word that needs to be searched for. Here is my code
public List<string> AWfile = new List<string>();
private void button1_Click(object sender, EventArgs e)
{
if (File.Exists(#"C:\DataFolder\file.txt"))
{
using (StreamReader r = new StreamReader(#"C:\DataFolder\file.txt"))
{
string line;
while ((line = r.ReadLine()) != null)
{
AWfile.Add(line); label1.Text = "ListWritten!"; label1.BackColor = Color.Green;
}
}
}
}
private void button2_Click(object sender, EventArgs e)
{
int linen = AWfile.IndexOf("A102");
label2.Text = Convert.ToString(linen);
}
So my question is if there is any way to search just for a part of a word in a list instead of the whole string, because that's the only way the .IndexOf returns me anything at all.

You can try something like:
var result = list.Select(x => x.Contains("hello")).ToList()
This will result in a list with all the elements in the list which contains "hello".
And if you want to do something only with this elements:
list.Select(x => x.Contains("hello")).ToList().ForEach(x => DoSomething(x));
I hope this helps

If I understand your question correctly... you are reading in a file and adding each line to a list. Then you want to check if any of those lines contain part of a word.
One way of doing this would be to do a foreach loop over each of the lines in your list and checking if the line contains the partial word.
Something like:
foreach(var line in AWFile)
{
if(line.Contains("PartialWordWeWant"))
{
// Do something with the line that contains the word we are looking for
}
}

Regular Expression for validating Windows-based file paths including UNC paths

I wanted to validate a file name along with its full path. I tried certain Regular Expressions as below but none of them worked correctly.
^(?:[\w]\:|\\)(\\[a-z_\-\s0-9\.]+)+\.(txt|gif|pdf|doc|docx|xls|xlsx)$
and
^(([a-zA-Z]\:)|(\\))(\\{1}|((\\{1})[^\\]([^/:*?<>""|]*))+)$
etc...
My requirement is as mentioned below:
Lets say if the file name is "c:\Demo.txt" then it should check every possibilites like no double slash should be included(c:\\Demo\\demo.text) no extra colon like(c::\Demo\demo.text). Should accept UNC files like(\\staging\servers) and others validation as well. Please help. I am really stuck here.

Why are you not using the File class ?
Always use it !
File f = null;
string sPathToTest = "C:\Test.txt";
try{
f = new File(sPathToTest );
}catch(Exception e){
Console.WriteLine(string.Format("The file \"{0}\" is not a valid path, Error : {1}.", sPathToTest , e.Message);
}
MSDN : http://msdn.microsoft.com/en-gb/library/system.io.file%28v=vs.80%29.aspx
Maybe you're just looking for File.Exists ( http://msdn.microsoft.com/en-gb/library/system.io.file.exists%28v=vs.80%29.aspx )
Also take a look to the Path class ( http://msdn.microsoft.com/en-us/library/system.io.path.aspx )
The GetAbsolutePath could be one way to get what you want! ( http://msdn.microsoft.com/en-us/library/system.io.path.getfullpath.aspx )
string sPathToTest = "C:\Test.txt";
string sAbsolutePath = "";
try{
sAbsolutePath = Path.GetAbsolutePath(sPathToTest);
if(!string.IsNullOrEmpty(sAbsolutePath)){
Console.WriteLine("Path valid");
}else{
Console.WriteLine("Bad path");
}
}catch(Exception e){
Console.WriteLine(string.Format("The file \"{0}\" is not a valid path, Error : {1}.", sPathToTest , e.Message);
}

If you are interested only in the filename part (and not the whole path because you get the file via upload) then you could try something like this:
string uploadedName = #"XX:\dem<<-***\demo.txt";
int pos = uploadedName.LastIndexOf("\\");
if(pos > -1)
uploadedName = uploadedName.Substring(pos+1);
var c = Path.GetInvalidFileNameChars();
if(uploadedName.IndexOfAny(c) != -1)
Console.WriteLine("Invalid name");
else
Console.WriteLine("Acceptable name");
This will avoid the use of Exceptions as method to drive the logic of your code.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to parse multiline records in groovy? - regex

Try this regex: (?s)(?<=[]{5}).+(?=[]{5}) Demo http://groovyconsole.appspot.com/script/2405001

This regex matches everything between the first ***** and the next one: (?<=\{5})[\s\S](?=\*{5})

Related

LINQ query to find elements from a list in a csv file

How to extract values from a text file using a script?

Better way to skip comments in a file parsing

Index of string containing a part of string (one word)

Regular Expression for validating Windows-based file paths including UNC paths

Categories

Resources

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to parse multiline records in groovy? - regex

Try this regex: (?s)(?<=[*]{5}).+(?=[*]{5}) Demo http://groovyconsole.appspot.com/script/2405001

This regex matches everything between the first ***** and the next one: (?<=\*{5})[\s\S]*(?=\*{5})

Related

LINQ query to find elements from a list in a csv file

How to extract values from a text file using a script?

Better way to skip comments in a file parsing

Index of string containing a part of string (one word)

Regular Expression for validating Windows-based file paths including UNC paths

Categories

Resources

Try this regex: (?s)(?<=[]{5}).+(?=[]{5}) Demo http://groovyconsole.appspot.com/script/2405001

This regex matches everything between the first ***** and the next one: (?<=\{5})[\s\S](?=\*{5})