Reading data from an online text file in C++ - c++

Alright. What I'm after is, what seems to me, fairly simple.
I've got File I/O down to a fine art for basic text files.
But, what I need now, is a way to read a text file that's online.
Let's say, something like: http://www.iamawebsite.com.au/file.txt
I CAN download the file and store it locally, but that will produce a lot more pain for me in the future, and more-so for redistribution of the end program, so if I can get around in doing so, I will be forever grateful. (also, if possible, to refrain from any additional libraries or anything. If I have to use one, I will, but if there's a way around that, I'm happy)
I have looked around for a while on ways to do similar tasks, but they seem to be going for more than what I'm after, and skipping the small steps which are the ones I can't quite get.
(If it helps, using Windows 8, Visual Studio 2010 Ultimate, needs to work in Windows 7 and 8 if possible)

I tried many different things, and I couldn't get anything to work without over complicating things to a ridiculous level. I also tried libCurl but I couldn't manage to link it properly, not sure why.
I ended up using a combination of Batch and Powershell scripts, simple and powerful, and best of all it works.
If anybody is interested:
Batch script:
powershell -ExecutionPolicy Bypass "& "fileUrl\name.ps1"
Powershell script:
$webClient = New-Object System.Net.WebClient;
$url = "http://www.iamawebsite.com.au/iamafile.txt";
$file = "whereToSaveFile\desiredNameOfFile.txt";
$webClient.DownloadFile($url, $file);
I have both my Batch and Powershell files in the same directory, just to make it a little easier on myself
Thanks!

Try URLDownloadToCacheFile function maybe.

One way would be to use InternetOpen, thenInternetOpenUrl, and finally InternetReadFile.

Related

What could be the simplest way to incorporate Windows WPP Software Tracing into SCons builds?

I ask my question in such a specific way because I am afraid that a more generic form could lead to excessively theoretic discussions of how the things should be done best and in the most appropriate way (like a question about pre and post-process actions in SCons).
WPP incorporation actually requires execution of an additional command (commands) before compilation of a file and only even if the build process finds necessity to compile the file without any regard to WPP.
I would remark that this is easily achieved with few lines of definitions in a shared Visual Studio property page file making this work for multiple files in multiple projects, folders, etc. in an absolutely transparent for developers way.
Thus I am wondering whether this can be done in a similarly simple way with SCons? I do not have any deep knowledge of either SCons or MSBuild frameworks; I work with them for simple practical use so I would truly appreciate a practical and useful advise.
Here's what I'd suggest.
SCons builds command lines from Environment() variables.
For example the compile command line for building shared object for c++ is stored in SHCXXCOM (and the variable for what is displayed to user when the command is run defaults to SHCXXCOM, but can be changed by modifying SHCXXCOMSTR).
Back to the problem at hand.
Assuming you have a limited number of build steps you want to wrap, you can do something like.
env['SHCXXCOM'] = [ 'MPP PRE COMMAND LINE', env['SHCXXCOM'], 'MPP POST COMMAND LINE']
You'll have to figure out which variables you need to do this with, but take a look at the manpage to figure that out.
https://scons.org/doc/production/HTML/scons-man.html
p.s. I've not tried this, but in theory it should work. Let us know if not.

Netbeans - copy highlighted regex search results

I made a simple regex search in Netbeans 7.3 on Windows (using Ctrl+F):
\{\{.*?\}\}
The results get highlighted correctly and the question is - how to extract highlighted text search results? Let it be copying to clipboard, saving as file or whatever else.
Is there any method doing this?
Maybe someone has any suggestion of alternative quick approach to such task in Netbeans? (or other editor)
What OS are you running? If OS X or Linux, read on!
I'm not sure about automatically copying the highlighted results to the clipboard, but I do workaround this quite a bit as well.
The easiest way to accomplish this for me without leaving NetBeans is to simply open a built in terminal window through Window>Output>Terminal (in 7.2.1) - I then navigate to my project, and run the RegEx that I built in the Find feature with my tool of choice. In fact, I use the built in terminal for this type of quick stuff in NetBeans quite a bit. If running Linux, using clipboard tools like xsel (http://linux.die.net/man/1/xsel) in combination with a built in terminal emulator can allow for devising some nice workflow shortcuts within IDEs if you are more comfortable working/coding at a terminal. Note that built in terminal emulators like the one in NetBeans is likely not going to play nicely with cut/copy/paste using the mouse, for various reasons that I won't get in to here.
As far as a built in/extension based solution for something like this, it would be helpful! I am not aware of one.
Hope this workaround helps in the meantime.

RTF / doc / docx text extraction in program written in C++/Qt

I am writing some program in Qt/C++, and I need to read text from Microsoft Word/RTF/docx files.
And I am looking for some command-line program that can make that extraction. It may be several programs.
The closest thing I found is DocToText, but it has several bugs, so I can't use it.
I have also Microsoft Word installed on the PC. Maybe there is some way to read text using it (have no idea how to use COM)?
Now, this is pretty ugly and pretty hacky, but it seems to work for me for basic text extraction. Obviously to use this in a Qt program you'd have to spawn a process for it etc, but the command line I've hacked together is:
unzip -p file.docx | grep '<w:t' | sed 's/<[^<]*>//g' | grep -v '^[[:space:]]*$'
So that's:
unzip -p file.docx: -p == "unzip to stdout"
grep '<w:t': Grab just the lines containing '<w:t' (<w:t> is the Word 2007 XML element for "text", as far as I can tell)
sed 's/<[^<]>//g'*: Remove everything inside tags
grep -v '^[[:space:]]$'*: Remove blank lines
There is likely a more efficient way to do this, but it seems to work for me on the few docs I've tested it with.
As far as I'm aware, unzip, grep and sed all have ports for Windows and any of the Unixes, so it should be reasonably cross-platform. Despit being a bit of an ugly hack ;)
Try Apache Tika
I recommend not to use COM as this would defeat the usage of a portable library like Qt in the first place.
You might want to use the classic catdoc or a similar tool such as wvWare.
Note that although the catdoc author claims that catdoc doesn't work under Windows, there is a posting of 2001 which states the opposite.
To read .doc files you can use the structured storage API. A .doc is basically a structured storage repository with various streams corresponding to the various parts of the document.
Be warned that it is quite a hairy API and that even using this API, a .doc file can be quite messy to look at.
Ofcouse this is still windows only but atleast it's not COM. just a plain old C API.
This might help. It is cross-platform and has an API http://www.winfield.demon.nl/
Otherwise the iFilter methods are the way to go if this is windows only. It will allow you to parse anything that has an iFilter on your system. Here is examples of this http://the-lazy-programmer.com/blog/?p=8 . I have used iFilter from the C# end of things quite a bit.

HD Regular Expression Search

I am working on a project for my computer security class and I have a couple questions. I had an idea to write a program that would search the whole hard drive looking for email addresses. I am just looking for addresses stored in plain text since it would be hard to find anything otherwise. I figured the best way to find addresses would be to use a regular expression.
I wrote an application in C# that works fairly well but it I would like to see if anyone has any better ideas. I am completely up for writing this in another language since I'm assuming C# isn't the best for this type of thing. So far the application I created just starts at the C:/ and recursively locates all files on the drive skipping those that aren't accessible. It also skips all common image, video, audio, compressed, and files over 512mb. This speeds it up quite a bit but there is a small chance that a large file could contain something useful. It takes about 12 seconds to generate the list of files and I'm guessing about an hour to check them all. One downside is that it uses about 50% cpu while scanning.
I'm looking for ideas on how to improve the search. Is there a faster way, a more efficient way, a more thorough way, things like that? I was trying to think if there was any way that you could tell if the file would contain plain text strings or not. Just let me know if you have any cool ideas. Thanks.
To be honest, the easiest existing way to do this is to use grep. As you improve your program, compare your speeds to it, and when you get close, stop worrying about optimizing. Alternatively, take a look at its source for an example of an existing product that does what you're looking for.
As noted elsewhere, tools already exist for this if you install Win32 ports of UNIX tools. Alternatively, the Windows equivalent is:
for /r c:\ %i in (*.*) do findstr /i /r "regular expression" "%i"
you should just use grep + find. grep is optimized for searching files fast, and find is optimized for providing lists of appropriate files for things like this. people have spent a long time optimizing these tools - no need to reinvent the wheel.

ctags best practicies

I'm working on +1M LOC C/C++ project on Solaris (remote, via VNC or SSH). I have a daily updated copy of source code on my local machine too (Windows, just for browsing code).
I use VIM and ctags combo (on both Solaris and Windows) but I'm not happy with results / speed. What settings for ctags would you recommend? There are a lot of options what should be tagged and how. Should I use single tag file per project, per dir or perhaps just one for everything?
Using anything less than one for everything doesn't really make sense to me. Being able to quickly jump around your project is what tags are for in the first place. For instance, our code is divided into 3 main sections, Include/, Processes/, Libraries/. Without being able to jump between these I would be incredibly unproductive.
Personally I use cscope (its C++ parsing isn't great, but its ok, and its VIM integration is better than just ctags), but when I do use ctags I usually just add --c++-kinds=+p.
I use etags:
find src1 src2 src3 | grep -v "\\.svn" | xargs etags --append
In emacs, position cursor on identifier and press M-. ([alt] + [period], or [esc] followed by [period]).
I don't know how it compares to your setup as far as speed goes, or if you're willing to use emacs. I'm just posting in case you want to try some alternatives.