I am not good at using regex and struggling to make a regex that will get data from new line after the semicolon. For example, look at this example
Regarding:
267 Covert St Rm 3.5
Contact Information:
(869) 869-5365 <8698695365>
abc#gmail.com
Comments:
I'd like to schedule a viewing for Covert St #3.5, Brooklyn, NY 11207, US.
Please contact me with more information! I am available at
abc#gmail.com
From the above text, I need to get texts under Regarding, contact Information, and comments.
I have made this regex but It is getting whole string
regExp = new RegExp("(?<=Regarding:)(\n).*");
I am making a script in google apps script and parsing data from an email.
Thanks.
I'm not sure if is need RegExp in this case. It can be just splitted:
var s = `Regarding:
267 Covert St Rm 3.5
Contact Information:
(869) 869-5365 <8698695365>
abc#gmail.com
Comments:
I'd like to schedule a viewing for Covert St #3.5, Brooklyn, NY 11207, US.
Please contact me with more information! I am available at
abc#gmail.com`
var regarding = s.split("Regarding:")[1].split("Contact Information:")[0];
console.log(regarding);
var contacts = s.split("Contact Information:")[1].split("Comments:")[0];
console.log(contacts);
var coments = s.split("Comments:")[1];
console.log(coments);
If you need just one first line after : here you go:
var s = `Regarding:
267 Covert St Rm 3.5
Contact Information:
(869) 869-5365 <8698695365>
abc#gmail.com
Comments:
I'd like to schedule a viewing for Covert St #3.5, Brooklyn, NY 11207, US.
Please contact me with more information! I am available at
abc#gmail.com`
var regarding = s.split("Regarding:\n")[1].split('\n')[0];
console.log(regarding);
var contacts = s.split("Contact Information:\n")[1].split('\n')[0];
console.log(contacts);
var coments = s.split("Comments:\n")[1].split('\n')[0];
console.log(coments);
But your regExp gives about the same result:
var s = `Regarding:
267 Covert St Rm 3.5
Contact Information:
(869) 869-5365 <8698695365>
abc#gmail.com
Comments:
I'd like to schedule a viewing for Covert St #3.5, Brooklyn, NY 11207, US.
Please contact me with more information! I am available at
abc#gmail.com`
var regExp = new RegExp("(?<=Regarding:)(\n).*");
var r = s.match(regExp)[0];
console.log(r);
var regExp = new RegExp("(?<=Contact Information:)(\n).*");
var c = s.match(regExp)[0];
console.log(c);
var regExp = new RegExp("(?<=Comments:)(\n).*");
var cm = s.match(regExp)[0];
console.log(cm);
So, what your problem is, after all?
Actually, regex is the way to go here. This snippet does the job and the result array contains all three paragraphs:
const str =
`Regarding:
267 Covert St Rm 3.5
Contact Information:
(869) 869-5365 <8698695365>
abc#gmail.com
Comments:
I'd like to schedule a viewing for Covert St #3.5, Brooklyn, NY 11207, US.
Please contact me with more information! I am available at
abc#gmail.com`;
const re = /(?<=:(\n|^)).*?(?=\n\n|$)/gis;
const result = str.match(re);
console.log(JSON.stringify(result, null, 2));
Related
Hi I have desperately been trying to work this out and have referred to several posts but am still not getting the correct answer!
I have a bunch of providers of different provider type. I calculate an average cost change for each provider (from more granular payment data). I then want to find the standard deviation of these provider level changes for the difference provider type.
This is where I've got up to with the dax - this gives the same standard deviation across all provider types rather than the required output.
group_test =
var tab1 = SUMMARIZECOLUMNS(ProvData[Provider Type],ProvData[Provider Code], "prov_avg",AVERAGEX(core_data, sum(PayData[Payment1])-sum(PayData[Payment2]))/SUM(PayData[Payment1]))
var sd_type = SELECTCOLUMNS(SUMMARIZE(tab1,[Provider Type],[Provider Code], "test", STDEVX.S(tab1,[prov_avg])), "sd_type", [test])
var tab2 = ADDCOLUMNS(tab1, "sd_type", sd_type)
return tab2
I want my final table to look like this
Provider Code
Provider type
Prov_avg
sd_type
1
a
x
sd for a
2
a
y
sd for a
3
b
z
sd for b
Thanks in advance for any help
Add a column to your table:
stdColumn =
var prov_Code = ProvData[Provider Code]
var prov_type = ProvData[Provider Type]
var stdValue = CALCULATE (STDEV.S([prov_avg]), FILTER(prov_Code = ProvData[Provider Code] && prov_type = ProvData[Provider Type]))
return stdValue
So what we do is to calculate the stdev based on the filter given on Code & Type
I have a big txt file which includes chat transcripts, My goal would be extract different components and create a Pandas Df to store in it. A sample of the chat is as below:
*****************************************************
Session:123456
Chat Date: 2017-05-01T08:01:45+00:00
Chat exec name: Sam
Member name: Sara
2017-05-01T08:01:45+00:00 Sara: I need help on element A
2017-05-01T08:01:47+00:00 Sam: Sure I can help you on this one
2017-05-01T08:01:48+00:00 Sara: Is there a better product
2017-05-01T08:01:48+10:00 Sam: Sure we have a lot of new products
2017-05-01T08:01:49+18:00 Sara: Can you let me know
2017-05-01T08:01:51+20:00 Sam: Here is the solution
2017-05-01T08:01:52+00:00 Sara: Thanks for this
2017-05-01T08:01:52+11:00 Sam: Have a Nive day Bye!!
*****************************************************
Session:234567
Chat Date: 2017-05-02T18:00:30+00:00
Chat exec name: PAUL
Member name:CHRIS
2017-05-02T18:00:30+00:00 CHRIS: I need help on element A
2017-05-02T18:02:30+00:00 PAUL: Sure I can help you on this one
2017-05-02T18:02:39+00:00 CHRIS: Is there a better product
2017-05-02T18:04:01+00:00 PAUL: Sure we have a lot of new products
2017-05-02T18:04:30+00:00 CHRIS: Can you let me know
2017-05-02T18:08:11+00:00 PAUL: Here is the solution
2017-05-02T18:08:59+00:00 CHRIS: Thanks for this
2017-05-02T18:09:11+00:00 PAUL: Have a Nice day Bye!!
*****************************************************
If I am able to create a table with the columns:
Session, ChatDate, ChatExecName, Membername, Time, Person, Sentence
The first 4 columns should be repeated for the complete block of chat. besides the delimiters are fixed and they never change.
I have tried this but this returns all blocks together can somebody please help.
import re
def GetTheSentences(infile):
Delim1 = '*****************************************************'
Delim2 = '*****************************************************'
with open(infile) as fp:
for result in re.findall('Delim1(.*?)Delim2', fp.read(), re.S):
print (result)
and
import re
def GetTheSentences2(file):
start_rx =re.compile('*****************************************************')
end_rx = re.compile('*****************************************************')
start = False
output = []
with open(file, encoding="latin-1") as datafile:
for line in datafile.readlines():
if re.match(start_rx, line):
start = True
elif re.match(end_rx, line):
start = False
if start:
output.append(line)
print (output)
I sure hope this is helpful:
data = '''*****************************************************
Session:123456
Chat Date: 2017-05-01T08:01:45+00:00
Chat exec name: Sam
Member name: Sara
2017-05-01T08:01:45+00:00 Sara: I need help on element A
2017-05-01T08:01:47+00:00 Sam: Sure I can help you on this one
2017-05-01T08:01:48+00:00 Sara: Is there a better product
2017-05-01T08:01:48+10:00 Sam: Sure we have a lot of new products
2017-05-01T08:01:49+18:00 Sara: Can you let me know
2017-05-01T08:01:51+20:00 Sam: Here is the solution
2017-05-01T08:01:52+00:00 Sara: Thanks for this
2017-05-01T08:01:52+11:00 Sam: Have a Nive day Bye!!
*****************************************************
Session:234567
Chat Date: 2017-05-02T18:00:30+00:00
Chat exec name: PAUL
Member name:CHRIS
2017-05-02T18:00:30+00:00 CHRIS: I need help on element A
2017-05-02T18:02:30+00:00 PAUL: Sure I can help you on this one
2017-05-02T18:02:39+00:00 CHRIS: Is there a better product
2017-05-02T18:04:01+00:00 PAUL: Sure we have a lot of new products
2017-05-02T18:04:30+00:00 CHRIS: Can you let me know
2017-05-02T18:08:11+00:00 PAUL: Here is the solution
2017-05-02T18:08:59+00:00 CHRIS: Thanks for this
2017-05-02T18:09:11+00:00 PAUL: Have a Nice day Bye!!
*****************************************************'''
data = data.split('*****************************************************')
data = [item.split('\n') for item in data if item]
result = []
for group in data:
group = [item for item in group if item]
times = []
people = []
lines = []
for item in group:
if item.startswith('Session'):
session = item.split(':')[-1]
print session
elif item.startswith('Chat Date'):
chatDate = item.split(':', 1)[-1]
elif item.startswith('Chat exec'):
execName = item.split(':')[-1]
elif item.startswith('Member'):
memberName = item.split(':')[-1]
else:
times.append(item[:25])
people.append(item[26:].split(':')[0])
lines.append(item[26:].split(':')[-1])
for i in range(len(times)):
result.append([session, chatDate, execName, memberName, times[i], people[i], lines[i]])
import pandas as pd
df = pd.DataFrame(result, columns=['Session', 'ChatDate', 'ChatExecName', 'Membername', 'Time', 'Person', 'Sentence'])
print df
public static void main(String[] args) throws Exception {
Pattern cp1 =Pattern.compile("(\\(?\\+?\\d{1,3}\\)?[\\s-]+)?\\(?\\d{1,3}\\)?[\\s-]+\\d{3}[\\s-]?\\d{2}[\\s-]?\\d{2,}");
Set<String> contacts = new HashSet<String>();
Document doc = Jsoup.connect("http://www.ejrsearch.com/contact-us.html").ignoreHttpErrors(true).userAgent("Mozilla").timeout(0).get();
Elements doc1 = doc.select("body");
Matcher matcherc = cp1.matcher(doc1.text());
while (matcherc.find()) {contacts.add(matcherc.group());}
System.out.println("Contacts:"+contacts);}}
In doc1.text()
Menu Contact Us We Want to Hear from You! If you are looking for your
next position or for that “high performance” player that can help
deliver for you please contact us. EJR Search Partners 1440 Broadway
23rd floor NY NY 10018 212-410-4141 info#ejrsearch.com Copyright 2011,
EJR Search. All rights reserved.
Actually, Address isEJR Search Partners 1440 Broadway 23rd floor NY NY 10018212-410-4141
Output:
Contacts:[018 212-410-4141].
But I want only the contact number by avoiding pin code.
Help me out in fixing this issue by modifying the existing pattern only.Thanks in advance.. :)
General Regex for matching phone numbers in different format:
(\(?\+?\d{1,3}\)?[\s-]+)?\(?\d{1,3}\)?[\s-]+\d{3}[\s-]?\d{2}[\s-]?\d{2,}
Regex for matching the phone number in the document provided in the question:
[0-9-]{8,}
I'm very new to python.
I get a serial data in COM port in fixed format as a string like this:
"21-12-2015 10:12:05 005 100 10.5 P"
The format is 'date time id count data data'
Here i don't require count and first data, instead i want to add one more data and send this again through another COM port.
I want to rearrange this and give output as
21-12-2015 10:12:05
SI.NO: 1451
Result: 10.5 P
My attempt:
ip = '21-12-2015_10:12:05_005_100_10.5 P'
dt = ip[0]+ip[1]+ip[3]+..... #save date as dt
tm = ip[9]+ip[10]+ip[11]+.... etc
and at the end
Result = dt + tm +"\n" + " "+ "SI.NO"+.......
Please suggest some good concept to do this in python 2.7.11
If you can mention some ideas i will search for the code.
Thank you
You can split up your string on whitespace into fields with split and build a new string using Python's string formatting syntax:
ip = "21-12-2015 10:12:05 005 100 10.5 P"
fields = ip.split()
s = '{date} {time}\n SI.NO: {sino}\n Result: {x} {y}'.format(
date=fields[0],
time=fields[1],
sino=1451, # Provide your own counter here
x=fields[4],
y=fields[5])
print s
21-12-2015 10:12:05
SI.NO: 1451
Result: 10.5 P
It isn't clear from your question whether your fields are separated by spaces or underscores. In the latter case, use fields = ip.split('_').
I'm using the following code to search a user either by first or last name:
var re = new RegExp(req.params.search, 'i');
User.find()
.or([{ firstName: re }, { lastName: re }])
.exec(function(err, users) {
res.json(JSON.stringify(users));
});
It works well if search equals 'John' or 'Smith', but not for 'John Sm'.
Any clue how to accomplish this kind of query?
Thanks!
Disclaimer: This question originally appeared on the comments of this previous question 3 years ago and remains unanswered. I'm starting a new thread because 1) It wasn't the main question and 2) I consider this is interesting enough to have its own thread
EDIT:
Suppose the database contains two records: John Smith and John Kennedy.
Querying John should return both John Smith and John Kennedy
Querying John Sm should return only John Smith
Separate the search term by words, and separate them using an alternation operator ('|').
var terms = req.params.search.split(' ');
var regexString = "";
for (var i = 0; i < terms.length; i++)
{
regexString += terms[i];
if (i < terms.length - 1) regexString += '|';
}
var re = new RegExp(regexString, 'ig');
For the input 'John Smith', this will create a regex which looks like /John|Smith/ig. This will return true for individual words as well as work when the input is just 'John Sm'
You can play around with this regex to get one more suited to your needs.
EDIT:
The problem here is that your name fields are separate. In this case, applying the same regex to both fields will not result the results that you want. The regex needs to be applied to the same field with the complete name.
A possible solution is using aggregation:
User.aggregate()
.project({fullName: {$concat: ['$firstName', ' ', '$lastName']}})
.match({fullName: re})