How do you extract phone numbers from email bodies in Outlook? - regex

I have a pretty large account full of ~20k emails in Outlook and I need to extract phone numbers from those emails.
An example of an email would be:
From: Amy Schwartz <amy#blahdyblah.com>
Dear Anatoliy,
I want you to do blahdy blahdy blah.
Amy Schwartz
(347) 555-1212 <---- I want this
Blahdy Blah Company
The idea is to go through every email and match the last Phone number via regex and export a list in the following format:
Name: Name from the "From" field
Email: Email from the "From" field
Phone: The last phone number matched in the email text
Do you have any ideas on how to go about doing this?
UPDATE: Didn't find any prebuilt solutions, but I'm hacking together my own using this. codeTwo Outlook Express. You can export any email field (body, HTML body, from, from name) to CSV. It's a little slow (3 seconds a message on my i7 iMac running a Win7 VM). But it works :) And from there I will probably just put in a database and do some regex magic. Will post process once I'm done.

Figured it out. It's super easy if you know how to make a Node.js script (but I'm sure you can write one in Bash).
1) Use Outlook Export plugin to export all your emails to a CSV. Make sure email is first column, name is second column, and Body (text) is 3rd column.
2) Write the following script in Node JS in the same directory as you CSV of emails
var fs = require('fs');
var csv = require('csv');
csv()
.from.stream(fs.createReadStream(__dirname+'/data.csv'))
.to.path(__dirname+'/out.csv')
.transform( function(row){
var match = row[2].match(/(?:\+?1\s*(?:[.-]\s*)?)?(?:\(\s*([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9])\s*\)|([2-9]1[02-9]|[2-9][02-8]1|[2-9][02-8][02-9]))\s*(?:[.-]\s*)?([2-9]1[02-9]|[2-9][02-9]1|[2-9][02-9]{2})\s*(?:[.-]\s*)?([0-9]{4})/);
return '"' + row[0] + '","' + row[1] + '","' + (match ? match[0] : '') + '"\n';
})
.on('error', function(error){
console.log(error.message);
});
and run it using node script.js.
And that's it! Runs super quickly (~20 secs for 20k emails).
Let me know if you have any suggestion (or package this into a downloadable executable)

Related

Regex - Coverting URLs to clickable links

We have some regex code that converts URLs to clickable links, it is working but we are running into issues where if a user submits a entry where they forget to space after a period it thinks it's a link as well.
example: End of a sentence.This is a new sentence
It would create a hyperlink for sentence.This
Is there anyway to valid the following code against say a proper domain like .com, .ca ect..?
Here is the code:
$url = '#(http)?(s)?(://)?(([a-zA-Z])([-\w]+\.)+([^\s\.]+[^\s]*)+[^,.\s])#';
$output = preg_replace($url, '$0', trim($val[0]));
Thanks,
Aaron

Question about Regex to edit claim and remove part of email / username

Completely new to regex only read a few guides my problem is as follows. A 3rd party solution is being connected to our Adfs 2016 enviroment. We have run into a problem as the solution cannot handle long usernames and the Upn and email of our users are in the format of users initials 3 or 4 letters.department#ourcompany.com, so Dave Dibley Jr would be ddj.department#ourcompany.com
what i would like to do is use Regex to Cut everything after the initals from the claim any suggestions how to do this ?
You can use RegEx for string manipulation in the Claims Rules Language. Fx:
c:[type == “http://contoso.com/role”]
=> issue (Type = “http://contoso.com/role”, Value = RegExReplace(c.Value, “(?i)director”, “Manager“);
Pass through any role claims. If any of the claims contain the word “Director”, RegExReplace() will change it to “Manager”. For example, “Director of Finance” would pass through as “Manager of Finance”.
See https://social.technet.microsoft.com/wiki/contents/articles/16161.ad-fs-2-0-using-regex-in-the-claims-rule-language.aspx for more information.

Python bs4 and request how to print text that is scrambled with numbers and sign in between the letters

I am currently working on a script that pulls info from http://www.supremenewyork.com/shop/all/accessories from the uk server
before I had successfully written a chunk of code that can gather this info perfectly from the us website
for item in soup1.find_all('div', class_='inner-article'):
url = item.a['href']
alt = item.find('img')['alt']
req = requests.get('http://www.supremenewyork.com' + url)
item_soup = BeautifulSoup(req.text, 'lxml')
name = item_soup.find('h1', itemprop='name').text
style = item_soup.find('p', itemprop='model').text
print alt +(' --- ')+ name +(' --- ')+ style
But I want this script to pull info from the uk website and so I wrote a proxy thing for it to be able to accses the website and pull the code
This worked for me
UK_Proxy1 = raw_input('UK http Proxy1: ')
UK_Proxy2 = raw_input('UK http Proxy2: ')
proxies = {
'http': 'http://' + UK_Proxy1 + '',
'https': 'http://' + UK_Proxy2 + '',
}
r1=requests.get('http://www.supremenewyork.com/shop/all/accessories', proxies=proxies)
soup1 = BeautifulSoup(r1.text, 'lxml')
But when I run the script I get this error
name.text
AttributeError: 'NoneType' object has no attribute 'text'
and I know why this happens In the us website the name of the item is perfectly fine and shows up in the html like this
Supreme®/Hanes® Crew Socks (4 Pack)
But on the UK website the name and the colour are scrambled and show up like this:
supremehanes etc......
(you get the point)
on the uk website the name and colour are scrambled but there is a solution around this You must make the script look for a letter one by one
Unfortunately I am not sure on how to do this
Im thinking something like .replace # with '' or .remove or something like that
If anyone may assist me I would greatly appreciate this

Display name & picture knowing ID

I was looking to find an answer to my question, but so far I got this:
https://graph.facebook.com/the_user_id?fields=name,picture
I need to be able to display/print first,last name and picture of a set list of users for which I know their ID. What code is required to get this data and then to publish it on a php/html page? Of course, this will means that if I want to show 10 users, I will input 10 different IDs (read smtg about an array list?). Notice that I DO NOT require for this to work for the current user.
Thanks in advance for your replies.
You need to use file_get_contents ( http://uk3.php.net/file_get_contents ) or curl in php and issue a request to the url such as follows:
https://graph.facebook.com/?ids=id1,id2,id3&fields=name,picture
(replacing id1,id2 with your ids)
this will then return you a json object. You then need to decode ( http://uk3.php.net/json_decode ) and loop through this and access the information
this should get you started
// people array uses the users id as the key and the dessert as the value. The id is then used in the query to facebook to select the corresponding value from this array
$people = array("id1"=>"favourite "dessert", "id2"=>"favourite dessert", "id3"=>"apple pie");
$json = file_get_contents('https://graph.facebook.com/?ids=id1,id2,id3&fields=id,name,picture');
$json = json_decode($json);
foreach($json as $key=>$person){
echo '<p><img src="'.$person->picture.'" alt="'.$person->name.'" />';
echo $person->name.'\'s favourite dessert is '.$people[$person->id'];
echo '</p>';
}
I've batched the requests here, alternatively you could perform 10 separate queries for each user, but that would be a bit pointless and inefficient
The easiest way is with an FQL query:
SELECT first_name, last_name, pic, uid FROM user WHERE uid IN
(Known_ID_1, Known_ID_2, ... Known_ID_n)
The easiest, if you're using PHP is to install the PHP SDK, though you can also make a call directly to https://graph.facebook.com/fql?q=URL_ENCODED_QUERY

Amazon Product Advertising API: Get Average Customer Rating

When using Amazon's web service to get any product's information, is there a direct way to get the Average Customer Rating (1-5 stars)? Here are the parameters I'm using:
Service=AWSECommerceService
Version=2011-08-01
Operation=ItemSearch
SearchIndex=Books
Title=A Game of Thrones
ResponseGroup=Large
I would expect it to have a customer rating of 4.5 and total reviews of 2177. But instead I get the following in the response.
<CustomerReviews><IFrameURL>http://www.amazon.com/reviews/iframe?...</IFrameURL></CustomerReviews>
Is there a way to get the overall customer rating, besides for reading the <IFrameURL/> value, making another HTTP request for that page of reviews, and then screen scraping the HTML? That approach is fragile since Amazon could easily change the reviews page structure which would bust my application.
You can scrape from here. Just replace the asin with what you need.
http://www.amazon.com/gp/customer-reviews/widgets/average-customer-review/popover/ref=dpx_acr_pop_?contextId=dpx&asin=B000P0ZSHK
As far as i know, Amazon changed it's API so its not possible anymore to get the reviewrank information. If you check this Link the note sais:
As of November 8, 2010, only the iframe URL is returned in the request
content.
However, testing with the params you used to get the Iframe it seems that now even the Iframe dosn't work anymore. Thus, even in the latest API Reference in the chapter "Motivating Customers to Buy" the part "reviews" is compleatly missing.
However: Since i'm also very interested if its still possible somehow to get the reviewrank information - maybe even not using amazon API but a competitors API to get review rank informations - i'll set up a bounty if anybody can provide something helpful on that. Bounty will be set in this topic in two days.
You can grab the iframe review url and then use css to position it so only the star rating shows. It's not ideal since you're not getting raw data, but it's an easy way to add the rating to your page.
Sample of this in action - http://spamtech.co.uk/positioning-content-inside-an-iframe/
Here is a VBS script that would scrape the rating. Paste the code below to a text file, rename it to Test.vbs and double click to run on Windows.
sAsin = InputBox("What is your ASIN?", "Amazon Standard Identification Number (ASIN)", "B000P0ZSHK")
if sAsin <> "" Then
sHtml = SendData("http://www.amazon.com/gp/customer-reviews/widgets/average-customer-review/popover/ref=dpx_acr_pop_?contextId=dpx&asin=" & sAsin)
sRating = ExtractHtml(sHtml, "<span class=""a-size-base a-color-secondary"">(.*?)<\/span>")
sReviews = ExtractHtml(sHtml, "<a class=""a-size-small a-link-emphasis"".*?>.*?See all(.*?)<\/a>")
MsgBox sRating & vbCrLf & sReviews
End If
Function ExtractHtml(sHtml,sPattern)
Set oRegExp = New RegExp
oRegExp.Pattern = sPattern
oRegExp.IgnoreCase = True
Set oMatch = oRegExp.Execute(sHtml)
If oMatch.Count = 1 Then
ExtractHtml = Trim(oMatch.Item(0).SubMatches(0))
End If
End Function
Function SendData(sUrl)
Dim oHttp 'As XMLHTTP30
Set oHttp = CreateObject("Msxml2.XMLHTTP")
oHttp.open "GET", sUrl, False
oHttp.send
SendData = Replace(oHttp.responseText,vbLf,"")
End Function
Amazon has completely removed support for accessing rating/review information from their API. The docs mention a Response Element in the form of customer rating, but that doesn't work either.
Google shopping using Viewpoints for some reviews and other sources
This is not possible from PAPI. You either need to scrape it by yourself, or you can use other free/cheaper third-party alternatives for that.
We use the amazon-price API from RapidAPI for this, it supports price/rating/review count fetching for up to 1000 products in a single request.