How to extract values from a text file using a script? - regex

If I have a file with the contents as below:
field1=value1,field2=value2,field3=value3,field4=value4,field5=value5,..;(new line)
field1=value1.1,field2=value1.2,field3=value1.3,field4=value1.4,field5=value1.5,...; (new line)
.....
....
...
Each line ends with semi-colon and a new line character.
How can I extract and store(or display) in the below format?

string[] lines = System.IO.File.ReadAllLines(#"C:\Users\Public\TestFolder\WriteLines2.txt");
foreach (string line in lines)
{
String[] Contain=line.Split(",");
foreach (string ordata in Contain)
{
String[] data=ordata.Split("=");
var Value=data[1];
// Write Code for store Data
}
}
above code helps you to make it working.

Related

LINQ query to find elements from a list in a csv file

I am trying to find if a any element of a first list is found for each line of a csv file
First list :
XX01235756777
YY01215970799
Second list (that would be the csv file) :
Column_1|Column_2|Column_3|Column_4|Column_5|Column_6|Column_7
VX|2022-06-09 11:50:55|Y|Y|N|TT56431254135|Microsoft
VX|2022-06-09 11:50:55|Y|Y|N|XX01235756777|Meta
VX|2022-06-09 11:50:55|Y|Y|N|YY18694654355|Nokia
VX|2022-06-09 11:50:55|Y|Y|N|OO01215970799|BlackStone
VX|2022-06-09 11:50:55|Y|Y|N|YY01215970799|Alphabet
My code attempt :
List<string> filteredList = new List<string>();
using (StreamReader reader = new StreamReader(csvFilePath))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (line.Any(x => isinList.Contains(x.ToString())))
{
filteredList.Add(line);
}
}
}
return filteredList;
filteredList should be :
VX|2022-06-09 11:50:55|Y|Y|N|XX01235756777|Meta
VX|2022-06-09 11:50:55|Y|Y|N|YY01215970799|Alphabet
I succeeded at finding a single element in each line of the csv file.
But I can't code the right LINQ to process if whole list is present in each line.
Any ideas or anywhere you can point me to would be a great help.
What you have ends up splitting the line from the csv file into characters and each character is checked against the isinList. You should instead check if any string in the isinList exists in each line from the csv file.
List<string> filteredList = new List<string>();
using (StreamReader reader = new StreamReader(csvFilePath))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (isinList.Any(x => line.Contains(x)))
{
filteredList.Add(line);
}
}
}
Using LINQ, and the File.ReadLines() to read the lines in the file, you can filter as follows:
var filteredLines = File.ReadLines(csvFilePath)
.Where(line => isinList.Any(isin => line.Contains(isin)))
.ToList();
OTOH, if isinList is very large, or the CSV file is very long, and you know Column_6 is the only possible match, you might gain performance by using a HashSet<string> and testing only Column_6:
var isinSet = isinList.ToHashSet();
var filteredLines = File.ReadLines(csvFilePath)
.Where(line => isinSet.Contains(line.Split('|')[5]))
.ToList();

Regex for get the path of file

I have code to display a name of file to a jtable. Here is the code :
StringBuilder nameOfComparedFile = new StringBuilder(); //
if (idLexerSelection != getIDLexer()) {
nameOfComparedFile.append(file.getCanonicalPath()); //
System.out.println(file.getCanonicalPath() + " )");
}
And then, in jtable is displayed like this : D:/Data/File.java
I dont wanna change getCanonicalPath, because on jtable that i Created will be using for next process. My question is : how to get just the name of file using regex
To get just the name:
file.getName()
If you absolutely must use regex:
String filename = file.getCanonicalPath().replaceAll(".*[\\\\/](.*)", "$1");

How to parse multiline records in groovy?

I have log file containing somewhere five * in two places. The file can be big.
Log record
*****
Log record
Log record
*****
Log record
I would like to get everything which is between five *. Right, I can read line by line but perhaps there are better solutions like parsing using Regular Expressions in Groovy?
Thank you.
You could also write a custom Reader like:
class DelimitedReader extends BufferedReader {
String delimiterLine
DelimitedReader( String delimiterLine, Reader reader ) {
super( reader )
this.delimiterLine = delimiterLine
scanUntilDelimiter()
}
private scanUntilDelimiter() {
String line = super.readLine()
while( line != null && line != delimiterLine ) {
line = super.readLine()
}
}
String readLine() {
String line = super.readLine()
if( line == delimiterLine ) {
line = null
}
line
}
}
And then, you can do something like this to iterate over them
new File( '/tmp/test.txt' ).withReader { r ->
new DelimitedReader( '*****', r ).eachLine { line ->
println line
}
}
This saves you having the whole file loaded in to a single (potentially huge) string
Try this regex:
(?s)(?<=[*]{5}).+(?=[*]{5})
Demo
http://groovyconsole.appspot.com/script/2405001
This regex matches everything between the first ***** and the next one:
(?<=\*{5})[\s\S]*(?=\*{5})

Index of string containing a part of string (one word)

im trying to read a large file, so i thought that instead of looping with an array i decided to use a list, but I'm having some difficulties with searching a line which contains a word that needs to be searched for. Here is my code
public List<string> AWfile = new List<string>();
private void button1_Click(object sender, EventArgs e)
{
if (File.Exists(#"C:\DataFolder\file.txt"))
{
using (StreamReader r = new StreamReader(#"C:\DataFolder\file.txt"))
{
string line;
while ((line = r.ReadLine()) != null)
{
AWfile.Add(line); label1.Text = "ListWritten!"; label1.BackColor = Color.Green;
}
}
}
}
private void button2_Click(object sender, EventArgs e)
{
int linen = AWfile.IndexOf("A102");
label2.Text = Convert.ToString(linen);
}
So my question is if there is any way to search just for a part of a word in a list instead of the whole string, because that's the only way the .IndexOf returns me anything at all.
You can try something like:
var result = list.Select(x => x.Contains("hello")).ToList()
This will result in a list with all the elements in the list which contains "hello".
And if you want to do something only with this elements:
list.Select(x => x.Contains("hello")).ToList().ForEach(x => DoSomething(x));
I hope this helps
If I understand your question correctly... you are reading in a file and adding each line to a list. Then you want to check if any of those lines contain part of a word.
One way of doing this would be to do a foreach loop over each of the lines in your list and checking if the line contains the partial word.
Something like:
foreach(var line in AWFile)
{
if(line.Contains("PartialWordWeWant"))
{
// Do something with the line that contains the word we are looking for
}
}

How can I create an .arff file from .txt?

Is there any simple way to do that? I'm not in Java and I'm new in Python so I would need another way(s). Thanks in advance!
Do you perhaps mean a csv file that ends in .txt? If the data inside the file looks like this:
1,434,2236,5,569,some,value,other,value
4,347,2351,1,232,different,value,than,those
Then it has comma separated values (csv) and Weka has classes and functions which convert a csv file into an arff: http://weka.wikispaces.com/Converting+CSV+to+ARFF You can use these from the command line, like this:
java weka.core.converters.CSVLoader filename.csv > filename.arff
Otherwise, #D3mon-1stVFW 's comment links to great documentation from weka about turning text files (things like blog posts or books or essays) into the arff format. http://weka.wikispaces.com/ARFF+files+from+Text+Collections and this can also be called from the command line, like this:
java weka.core.converters.TextDirectoryLoader /directory/with/your/text/files > output.arff
Missing -dir argument specifier:
java weka.core.converters.TextDirectoryLoader -dir /directory/with/your/text/files > output.arff
This solution assumes you have your data in .csv format - see kaz's solution.
One simple way to do this is in version 3.6.11 (I'm on a mac) is to open up the Explorer and then in the Preprocess tab select "Open file...", just as you would when you want to open a .arff file. Then where it asks for the File Format at the bottom of the dialog box, change it to .csv. You can now load CSV files straight into Weka. If the first line of your CSV file is a header line, these names will be used as the attribute names.
On the right-hand side of the Preprocesses tabs is a "Save..." button. You can click on that and save your data as a .arff file.
This is a bit long-winded to explain, but takes only a few moments to perform and is very intuitive.
package WekaDemo;
public class Txt2Arff {
static ArrayList inList=new ArrayList();
static String colNames[];
static String colTypes[];
static String indata[][];
static ArrayList clsList=new ArrayList();
static ArrayList disCls=new ArrayList();
static String res="";
public String genTrain()
{File fe=new File("input2.txt");
FileInputStream fis=new FileInputStream(fe);
byte bt[]=new byte[fis.available()];
fis.read(bt);
fis.close();
String st=new String(bt);
String s1[]=st.trim().split("\n");
String col[]=s1[0].trim().split("\t");
colNames=col;
colTypes=s1[1].trim().split("\t");
for(int i=2;i<s1.length;i++)
{
inList.add(s1[i]);
}
ArrayList at1=new ArrayList();
for(int i=0;i<inList.size();i++)
{
String g1=inList.get(i).toString();
if(!g1.contains("?"))
{
at1.add(g1);
res=res+g1+"\n";
}
}
indata=new String[at1.size()][colNames.length-1]; // remove cls
for(int i=0;i<at1.size();i++)
{
String s2[]=at1.get(i).toString().trim().split("\t");
for(int j=0;j<s2.length-1;j++)
{
indata[i][j]=s2[j].trim();
}
if(!disCls.contains(s2[s2.length-1].trim()))
disCls.add(s2[s2.length-1].trim());
clsList.add(s2[s2.length-1]);
}
String ar="#relation tra\n";
try
{
for(int i=0;i<colNames.length-1;i++) // all columName which you have split
//and store in Colname
{
//where yor attitude in nominal or you can say character value
if(colTypes[i].equals("con"))
ar=ar+"#attribute "+colNames[i].trim().replace(" ","_")+" real\n";
else
{
ArrayList at1=new ArrayList();
for(int j=0;j<indata.length;j++) //your all numeric data
{
if(!at1.contains(indata[j][i].trim()))
at1.add(indata[j][i].trim());
}
String sg1="{";
for(int j=0;j<at1.size();j++)
{
sg1=sg1+at1.get(j).toString().trim()+",";
}
sg1=sg1.substring(0,sg1.lastIndexOf(","));
sg1=sg1+"}";
ar=ar+"#attribute "+colNames[i].trim().replace(" ", "_")+" "+sg1+"\n";
}
}
//end of attribute
// now adding a class Attribute
ArrayList dis=new ArrayList();
String c1="";
for(int i=0;i<clsList.size();i++)
{
String g=clsList.get(i).toString().trim();
if(!dis.contains(g))
{
dis.add(g);
c1=c1+g+",";
}
}
c1=c1.substring(0, c1.lastIndexOf(","));
ar=ar+"#attribute class {"+c1+"}\n"; //attribute name
//adding class attribute is done
//now data
ar=ar+"#data\n";
for(int i=0;i<indata.length;i++)
{
String g1="";
for(int j=0;j<indata[0].length;j++)
{
g1=g1+indata[i][j]+",";
}
g1=g1+clsList.get(i);
ar=ar+g1+"\n";
}
}
catch(Exception e)
{
e.printStackTrace();
}
return ar;
}
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
Txt2Arff T2A=new Txt2Arff();
String ar1=T2A.genTrain();
File fe1=new File("tr.arff");
FileOutputStream fos1=new FileOutputStream(fe1);
fos1.write(ar1.getBytes());
fos1.close();
}}