Twitter queries using regexp? - regex

I have been seeing posts and questions about using regexp to search for tweets on twitter using the java twitter API. Now, I have been trying to do that for the last two hours reading all possible resources on that with no success. For example: I am simply trying to query twitter once, but receive the tweets for both "guerillacafe" or "guerilla cafe"
I have tried (guerillacafe|guerilla cafe), ('guerillacafe'|'guerilla cafe'), (guerillacafe OR guerilla cafe), (guerilla?cafe) and many more, but like I said , I cant get it to work.
Please help with a simple regexp that would do the trick.

The Java Twitter API does not allow for regexp searches. The OR keyword is supported (your third example works for me).

Related

Can aws comprehend be used in splitting documents to sentences?

I started to try aws comprehend. One thing I noticed is that the sentences in the document will affect the sentiment analysis and entity extraction results especially when mixed sentiment sentences exist or some sentences are not capitalized in the document. So correctly splitting the sentences is an important step. However, I can't find an API in comprehend that splits the document in sentences. Is it because comprehend doesn't have the step? If there is, could someone points out how to obtain the splitting results?
BTW, I tried Stanford coreNLP and Google Language Cloud. They both make mistakes in some cases.
Here is what I did: I added '>>>' as a separator between reviews when I was scraping them, then I used this code:
reviews = all_reviews_as_text.split('>>>')
responses = []
for review in reviews:
response = comprehend.detect_sentiment(Text=review, LanguageCode="en")
responses.append(response)

python 2.7 using financial times education api to generate plain text articles

at the bottom of:
https://developer.pearson.com/apis/ft-education-api/#!/ftarticles/listArticles_get_0
i am trying to get a list of articles using the set parameters.
i have registered for, and received an API key.
the issue is that i am unable to generate a list of articles, say on '2014-05-01'
note: financial times set up the API to allow students to get plain text articles 30 days after the original article was published.
done extremely small amount of research and understand i may need to use python to receive the actual content...have python installed (familiar with absolute basics)...trying to get nothing more than plain text documents of articles.
thanks.

Using docx4j with ColdFusion

I am attempting to create Word documents with ColdFusion, but it does not seem there is any way to do it with only ColdFusion. The best solution seems to be docx4j. However, I can't seem to find any in-depth docx4j and ColdFusion examples (Aside from this question). Where can I get some doc4jx and ColdFusion examples?
pulling the data from a database.
https://stackoverflow.com/a/10845077/1031689 shows one approach to doing this. There are other ways, as to which see http://www.slideshare.net/plutext/document-generation-2012osdcsydney
The document needs page numbers and to
Typically you'd add these via a header or footer. You might find it easier to start with an almost empty docx structured appropriately, rather than creating the necessary structures via ColdFusion calling docx4j. You could still do it this way in conjunction with the final paragraph of this answer below.
create a table of contents.
Search the docx4j forums for how to do this.
In general, it looks like the easiest approach would be to create a Java class file which does everything you want (by invoking docx4j), and for your ColdFusion to just invoke that Java class. In other words, do a bit of Java programming first, get that working, then hook it up to your ColdFusion stuff.
I am not sure what exactly you mean with creating word document, which in my opinion is pretty simple. Manipulating yes, a bit tricky with docx4j or so.
<cfsavecontent variable="variables.mydoc">
Your content here
</cfsavecontent>
<cffile action="write" file="#yourFile.doc#" output="#variables.mydoc#">
Also see this post
Creating a Word document in Coldfusion - how to have pagenumbering?

Extracting key words from HTML to C++ under linux

I am working on a simple client-server project. Client is written in Java, it sends key words to C++ server written under Linux and recives a list of URLs with best ranks ( depending on number of occurrences of key words ). Server's job is to go through some URLs in search of key words and return best-fitting URLs. And now the problem is that I have to parse HTML sites to find occurrences of key words, plus I need to extract links from visited page to search on them as well. And my question is what library can I use to do that? Remember only C++ linux libraries are suitable for me. There were some similar topics, so I tried to go through most of them, but some of libraries parse only html files and I don't want to download every site I visit, but parse it on the fly and just store it's rank and url. Some of them look a bit complicated to me - for instance firstly parsing HTML to XML or something else and then finally work on the results with C++. Is there something simple and sufficient to do what I need it to do? Any advise will be appreciated.
I don't think regular expressions are appropriate for HTML parsing. I'm using libxml2, and I enjoy it very much - easy to use, portable and lightning fast.
To get URLs from the web using C/C++ you could use the libcurl library. To parse URLs and other not too easy stuff from the site you can use a regex library.
Separating the HTML tags from the real content can also be done without the use of a library.
For more advanced stuff one could use Qt which offers classes such as QWebPage (which uses WebKit) that allows one to access the DOM-Model of the page and extract individual HTML objects (e.g. single cells of a table) rather easyly.
You can try xerces-c. It's a powerful library for xml parsing. It support xml reading on the fly, dom and sax parsing.

Get a particular text from website

I'm looking for a way if you know the location where to read the text for example say, under a particular category, how would you connect to a website and search & read the text from it?
what steps do i need to follow to learn about that?
you could use libcurl/cURL for your HTML retrival
You're probably looking for a web crawler.
Here's an example of a simple crawler written in C++.
Moreover, you might want to have a look to wget, a software to retrieve files via HTTP, HTTPS and FTP.
if you are looking at a specific web-page, you could try retrieving the page and parsing it to get to the exact location you want. e.g. specific div, etc.
since you are using c++, you could try reading up on using libcurl to retrieve the information you need from the URL.
You can download an html file with WinHTTP(working example) and then search the file. There's some find algos in the std::string class for searching if your needs are relatively basic.