Comparing streams - c++

I'm looking into generalizing my data sources in my C++ application by using streams. However, my code also uses a resource manager that functions in a manner similar to a factory, except its primary purpose is to ensure that the same resource doesn't get loaded twice into memory.
myown::ifstream data("image.jpg");
std::ifstream data2("image2.jpeg");
ResourcePtr<Image> img1 = manager.acquire(data);
ResourcePtr<Image> img2 = manager.acquire(data);
cout << img1 == img2; // True
ResourcePtr<Image> img3 = manager.acquire(data2);
cout << img1 == img3; // False
For it to do this, it obviously has to do some checks. Is there a reasonable way (readable and efficient) to implement this, if the resource manager has data streams as input?

You cannot "compare" data streams. Streams are not containers; they are flows of data.
BTW, cout << a == b is (cout << a) == b; I think you meant cout << (a==b).

The level of abstraction where the identity of the data is well above your streams. Think about what your stream would do with that information if it knew it. It could not act upon it, it is just a bunch of data. In terms of the interface, a stream doesn't necessarily even have an end. You would be violating least surprise for me if you tried to tie identity to it at that level.
That sounds like a reasonable abstraction for your ResourcePtr, though. You could hash the data when you load it into ResourcePtr, but a key on the file path is probably just as good.

Like Tomalak said, you can't compare streams. You'll have to wrap them in some class which associates an ID to them, possibly based on the absolute path if they are all associated to files on the file system

Related

How to write content of an object into a file in c++

I have a code in this format:
srcSAXController control(input_filename.c_str());
std::string output_filename = input_filename;
output_filename = "c-" + output_filename.erase(input_filename.rfind(XML_STR));
std:: ofstream myfile(output_filename.c_str());
coverage_handler handler(i == MAIN_POS ? true : false, output_filename);
control.parse(&handler);
myfile.write((char *)&control, sizeof(control));
myfile.close();
I want the content of object 'control' to be written into my file. How to fix the code above, so that content of the control object is written to the file.
In general you need much more than just writing the bytes of the object to be able to save and reload it.
The problem is named "serialization" and depending on a lot of factors there are several strategies.
For example it's important to know if you need to save and reload the object on the same system or if you may need to reload it on a different system; it's also fundamental to know if the object contains links to other objects, if the link graph is a simple tree or if there are possibly loops, if you need to support versioning etc. etc.
Writing the bytes to disk like the code is doing is not going to work even for something as simple as an object containing an std::string.

Trying to find a way to LOG Graphical data in OpenCV/BOOST

To begin with: I am working on Image Processing using OpenCV C++. After loading a Mat image in a C++ program, I plotted a graph of the image using GNUPLOT.
Now, The Requirement is to log the graphical data of the Mat image.
To do this, I created a BOOST C++ Logger by including all BOOST Libraries. BOOST is an excellent library for Testing and logging data as well but, the problem with the it's Log is that it could log only text messages. Correct me if I'm wrong.
Below is my CODE for plotting graph using GNUPlot in OpenCV:
try
{
Gnuplot g1("lines");
std::vector<double> rowVector;
std::vector<double> rowVectorExp;
for (int i = 0; i < 50; i++)
{
rowVector.push_back((double)i);
rowVectorExp.push_back((double)exp((float)i/10.0));
}
cout << "*** user-defined lists of doubles" << endl;
g1 << "set term png";
g1 << "set output \"test.png\"";
//type of plot pattern
g1.set_grid().set_style("lines");
g1.plot_xy(rowVector, rowVectorExp, "user-defined points 2d");
waitKey(0);
}
catch (GnuplotException ge)
{
cout << ge.what() << endl;
}
cout << endl << "*** end of gnuplot example" << endl;
Here is my BOOST Log CODE:
namespace logging = boost::log;
void PlainGetEdgeVector::init()
{
logging::add_file_log("sample%3N.log");
}
BOOST_LOG_TRIVIAL(info) << "This is my first Log line";
The good news is, my BOOST Logger successfully logs the text message. It would be great if it could log my graphical data as well.
Any suggestions? If anyone knows how to implement the same using BOOST, I would be very grateful or if there are any alternatives, good to know that as well.
The solution to your problem greatly depends on the nature of the data how do you want to use the logged data.
1. Re-consider converting binary data to text
For debugging purposes it is often more convenient to convert your binary data to text. Even with large amounts of data this approach can be useful because there are generally many more tools for text processing than for working with arbitrary binary data. For instance, you could compare two logs from different runs of your application with conventional merge/compare tools to see the difference. Text logs are also easier to filter with tools like grep or awk, which are readily available, as opposed to binary data for which you will likely have to write a parser.
There are many ways to convert binary data to text. The most direct approach is to use the dump manipulator, which will efficiently produce textual view of a raw binary data. It suits graphical data as well because it tends to be relatively large in amounts and it is often easy enough to compare in text representation (e.g. when a color sample fits a byte).
std::vector< std::uint8_t > image;
// Outputs hex dump of the image
BOOST_LOG_TRIVIAL(info) << logging::dump(image.data(), image.size());
A more structured way to output binary data is to use other libraries, such as iterator_range from Boost.Range. This can be useful if your graphical data is composed of something more complex than raw bytes.
std::vector< double > image;
// Outputs all elements of the image vector
BOOST_LOG_TRIVIAL(info) << boost::make_iterator_range(image);
You can also write your own manipulator that will format the data the way you want, e.g. split the output by rows.
2. For binary data use attributes and a custom sink backend
If you intend to process the logged data by a more specialized piece of software, like an image viewer or editor, you might want to save the data in binary form. This can be done with Boost.Log, but it will require more effort because the sinks provided by the library are text-oriented and you cannot save a binary data into a text file as is. You will have to write a sink backend that will write binary data in the format you want (e.g. if you plan to use an image editor you might want to write files in the format supported by that editor). There is a tutorial here, which shows the interface you have to implement and a sample implementation. The important bit is the consume function of the backend, which will receive a log record view with your data.
typedef boost::iterator_range< const double* > image_data;
BOOST_LOG_ATTRIBUTE_KEYWORD(a_image, "Image", image_data)
class image_writer_backend :
public sinks::basic_sink_backend< sinks::synchronized_feeding >
{
public:
void consume(logging::record_view const& rec)
{
// Extract the image data from the log record
if (auto image = rec[a_image])
{
image_data const& im = image.get();
// Write the image data to a file
}
}
};
In order to pass your image binary data to your sink you will need to attach it to the log record as an attribute. There are multiple ways to do that, but assuming you don't intend to filter log records based on the image, the easiest way to do this is to use the add_value manipulator.
std::vector< double > image;
BOOST_LOG_TRIVIAL(info) << logging::add_value(a_image, image) << "Catch my image";
Caveat: In order to avoid copying the potentially large image data, we're passing a lightweight iterator_range as the attribute value. This will only work with synchronous logging because the image vector needs to stay alive while the log record is being processed. For async logging you will have to pass the image by value or use reference counting.
If you do want to apply filters to the image data then you can use scoped attributes or add the attribute to a logger.
Note that by adding your new sink for writing binary data you do not preclude also writing textual logs with other sinks, so that "Catch my image" message can be processed by a text sink. By using other attributes, like log record counters you can associate log records in different files produced by different sinks.

Transferring data between executables

I have two executables written in C++ on Windows. I generate some data in one, and want to call the other executable to process this data. I could write the data out to a file then read it in the other executable, but that seems rather expensive in terms of disk I/O. What is a better way of doing this? It seems like a simple enough question but google just isn't helping!
Let's say the data is around 100MB, and is generated in its entirety before needing to be sent (i.e. no streaming is needed).
Answers that work when mixing 32 bit and 64 bit processes gain bonus points.
If your processes can easily write to and read from file, just go ahead. Create the file with CreateFile and mark it as temporary & shareable. Windows uses this hint to delay physical writes, but all file semantics are still obeyed. Since your file is only 100 MB and actively in use, Windows is almost certainly able to keep its contents fully in RAM.
You can use Boost.MPI. It is from Boost, which has high quality standard, and the code sample seems pretty explicit:
http://www.boost.org/doc/libs/1_53_0/doc/html/mpi/tutorial.html#mpi.point_to_point
// The following program uses two MPI processes to write "Hello, world!"
// to the screen (hello_world.cpp):
int main(int argc, char* argv[])
{
mpi::environment env(argc, argv);
mpi::communicator world;
if (world.rank() == 0) {
world.send(1, 0, std::string("Hello"));
std::string msg;
world.recv(1, 1, msg);
std::cout << msg << "!" << std::endl;
} else {
std::string msg;
world.recv(0, 0, msg);
std::cout << msg << ", ";
std::cout.flush();
world.send(0, 1, std::string("world"));
}
return 0;
}
Assuming you only want to go "one direction" (that is, you don't need to get data BACK from the child process), you could use _popen(). You write your data to the pipe and the child process reads the data from stdin.
If you need bidirectional flow of data, then you will need to use two pipes, one as input and one as output, and you will need to set up a scheme for how the child process connects to those pipes [you can still set up the stdin/stdout to be the data path, but you could also use a pair of named pipes].
A third option is a shared memory region. I've never done this in Windows, but the principle is pretty much the same as what I've used in Linux [and many years back in OS/2]:
1. Create a memory region with a given name in your parent process.
2. The child process opens the same memory region.
3. Data is stored by parent process and read by child process.
4. If necessary, semaphores or similar can be used to signal completion/results ready/etc.

What is the best way to return an image or video file from a function using c++?

I am writing a c++ library that fetches and returns either image data or video data from a cloud server using libcurl. I've started writing some test code but still stuck at designing API because I'm not sure about what's best way to handle these media files. Storing it in a char/string variable as binary data seems to work, but I wonder if that would take up too much RAM memory if the files are too big. I'm new to this, so please suggest a solution.
You can use something like zlib to compress it in memory, and then uncompress it only when it needs to be used; however, most modern computers have quite a lot of memory, so you can handle quite a lot of images before you need to start compressing. With videos, which are effectively a LOT of images, it becomes a bit more important -- you tend to decompress as you go, and possibly even stream-from-disk as you go.
The usual way to handle this, from an API point of view, is to have something like an Image object and a Video object (classes). These objects would have functions to "get" the uncompressed image/frame. The "get" function would check to see if the data is currently compressed; if it is, it would decompress it before returning it; if it's not compressed, it can return it immediately. The way the data is actually stored (compressed/uncompressed/on disk/in memory) and the details of how to work with it are thus hidden behind the "get" function. Most importantly, this model lets you change your mind later, adding additional types of compression, adding disk-streaming support, etc., without changing how the code that calls the get() function is written.
The other challenge is how you return an Image or Video object from a function. You can do it like this:
Image getImageFromURL( const std::string &url );
But this has the interesting problem that the image is "copied" during the return process (sometimes; depends how the compiler optimizes things). This way is more memory efficient:
void getImageFromURL( const std::string &url, Image &result );
This way, you pass in the image object into which you want your image loaded. No copies are made. You can also change the 'void' return value into some kind of error/status code, if you aren't using exceptions.
If you're worried about what to do, code for both returning the data in an array and for writing the data in a file ... and pass the responsability to choose to the caller. Make your function something like
/* one of dst and outfile should be NULL */
/* if dst is not NULL, dstlen specifies the size of the array */
/* if outfile is not NULL, data is written to that file */
/* the return value indicates success (0) or reason for failure */
int getdata(unsigned char *dst, size_t dstlen,
const char *outfile,
const char *resource);

Checking for file existence in C++

Currently I use something like:
#include <sys/stat.h>
#include "My_Class.h"
void My_Class::my_function(void)
{
std::ofstream my_file;
struct stat file_info;
if ( filename_str.compare("")!=0 &&
stat(filename_str.c_str(),&file_info) == 0 )
{
my_file.open(filename_str.data(),std::ios::trunc);
//do stuff
my_file.close();
}
else if ( filename_str.compare("")==0 )
{
std::cout << "ERROR! ... output filename not assigned!" << std::endl;
}
else
{
std::cout << "ERROR! File :" << std::endl
<< filename_str << std::endl
<< "does not exist!!" << std::endl;
}
}
...is this a decent way to go, or is there a better alternative? Seems like I could run amuck of permissions if I don't have permissions to read the file.
This is NOT a homework, question, it is a question about best practice.
I'd use the boost::filesystem constructs. Not only are they cross platform, they're part of the next standard library.
Generally I think it is best to just try opening it and catch an error.
IMO, checking permissions is unwise because what if it's a Linux box and you check its attributes, decide you can't write to it, but the filesystem supports ACL's and they do grant you permission? (As a sysadmin I can't stand when apps do this. I like ACL's and if you're an app, don't tell me you can't write to a file unless you've tried first.)
Conceptually, I'd say it depends on what you're planning to do with that file..
If you need its contents, go ahead and try to open it, and be prepared to handle failure gracefully, for the reasons Ken detailed.
If you are not currently interested in its contents (for example, when enumerating directory contents, or only planning to access a file at some point in the future, etc.), you might be better off just checking attributes for now. Otherwise, nasty things like hierarchical storage management may trigger an expensive (=slow) recall of file contents from, say, a tape backup or network (whereas attributes may have been cached). You could try to avoid that by checking for respective file attributes, but that's additional complexity, too.
So as a best practice, I'd suggest to open files sparingly (i.e., if you're not immediately interested in the contents, contend yourself with file attribute-based information), AND handle failure strictly in response to the actual call that opens the file when you need it.