Read the file using FileAdapter in SOA - readfile

I have a requirement in my project where I need to read a file from a location, this file in the input location will receive entries regularly.
How can I get the latest entries all the time when I read this file without any duplicates and write it to DB without deleting the file in the input location.
Thanks in Advance!

Related

How to read and process millions of XML file and process only those XML files which are modified

I have attended one interview where the interviewer asked me that, in a directory millions of XML files are there . I need to read those files and process it in efficient way.
Constraint 1: Everyday I have to read only modified XML files only.
Constraint 2: you will not get to know the last modified date and time.
I told him that we can read all XML files to some data structure as it will be faster to read data structure than file reading every-time. And compare everyday by reading XML file contents, if any modified file content files found then process only those files.
But he told me, reading all file contents into memory increases the memory .
So what should be the best approach for the above ?Can someone help me ?

Read Partial Parquet file

I have a Parquet file and I don't want to read the whole file into memory. I want to read the metadata and then read the rest of the file on demand. That is, for example, I want to read the second page of the first column in the third-row group. How would I do that using Apache Parquet cpp library? I have the offset of the part that I want to read from the metadata and can read it directly from the disk. Is there any way to pass that buffer to Apache Parquet library to uncompress, decode and iterate through the values? How about the same thing for column chunk or row groups? Basically, I want to read the file partially and then pass it to the parquet APIs to process it as opposes to give the file handler to the API and let it go through the file. Is it possible?
Behind the scences this is what the Apache Parquet C++ library actually does. When you pass in a file handle, it will only read the parts it needs to. As it requires the file footer (the main metadata) to know where to find the segments of data, this will always be read. The data segments will only be read once you request them.
No need to write special code for this, the library already has it built-in. Thus, if you want to know in fine detail on how this is working, you only need to read the source of the library: https://github.com/apache/arrow/tree/master/cpp/src/parquet

Possible to store data into text file permanently in c++

all I have a one small program in c++.I need to store some text in the .txt file it stored and file also created but again i run with some other data the previous data is deleted in .txt file, help anyone how to solve that problem i ask doubt is it possible in c++ yes/no.someone help!
The behavior you are describing is called overwriting a file.
You have two choices, if the file already exists:
Append to already existing file. Read more in How to append text to a text file in C++?
Write the data to a different file.

clusters the file is occupying [duplicate]

I need to get any information about where the file is physically located on the NTFS disk. Absolute offset, cluster ID..anything.
I need to scan the disk twice, once to get allocated files and one more time I'll need to open partition directly in RAW mode and try to find the rest of data (from deleted files). I need a way to understand that the data I found is the same as the data I've already handled previously as file. As I'm scanning disk in raw mode, the offset of the data I found can be somehow converted to the offset of the file (having information about disk geometry). Is there any way to do this? Other solutions are accepted as well.
Now I'm playing with FSCTL_GET_NTFS_FILE_RECORD, but can't make it work at the moment and I'm not really sure it will help.
UPDATE
I found the following function
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364952(v=vs.85).aspx
It returns structure that contains nFileIndexHigh and nFileIndexLow variables.
Documentation says
The identifier that is stored in the nFileIndexHigh and nFileIndexLow members is called the file ID. Support for file IDs is file system-specific. File IDs are not guaranteed to be unique over time, because file systems are free to reuse them. In some cases, the file ID for a file can change over time.
I don't really understand what is this. I can't connect it to the physical location of file. Is it possible later to extract this file ID from MFT?
UPDATE
Found this:
This identifier and the volume serial number uniquely identify a file. This number can change when the system is restarted or when the file is opened.
This doesn't satisfy my requirements, because I'm going to open the file and the fact that ID might change doesn't make me happy.
Any ideas?
Use the Defragmentation IOCTLs. For example, FSCTL_GET_RETRIEVAL_POINTERS will tell you the extents which contain file data.

Chunk download with OneDrive Rest API

this is the first time I write on StackOverflow. My question is the following.
I am trying to write a OneDrive C++ API based on the cpprest sdk CasaBlanca project:
https://casablanca.codeplex.com/
In particular, I am currently implementing read operations on OneDrive files.
Actually, I have been able to download a whole file with the following code:
http_client api(U("https://apis.live.net/v5.0/"), m_http_config);
api.request(methods::GET, file_id +L"/content" ).then([=](http_response response){
return response.body();
}).then([=]( istream is){
streambuf<uint8_t> rwbuf = file_buffer<uint8_t>::open(L"test.txt").get();
is.read_to_end(rwbuf).get();
rwbuf.close();
}).wait()
This code is basically downloading the whole file on the computer (file_id is the id of the file I am trying to download). Of course, I can extract an inputstream from the file and using it to read the file.
However, this could give me issues if the file is big. What I had in mind was to download a part of the file while the caller was reading it (and caching that part if he came back).
Then, my question would be:
Is it possible, using the OneDrive REST + cpprest downloading a part of a file stored on OneDrive. I have found that uploading files in chunks seems apparently not possible (Chunked upload (resumable upload) for OneDrive?). Is this true also for the download?
Thank you in advance for your time.
Best regards,
Giuseppe
OneDrive supports byte range reads. And so you should be able to request chunks of whatever size you want by adding a Range header.
For example,
GET /v5.0/<fileid>/content
Range: bytes=0-1023
This will fetch the first KB of the file.