hdfs copy folder to network file system ends up in error

hdfs copy folder to network file system ends up in error - hdfs

One of my clients is used to save to a NFS volume the contents of some HDFS folders. Sometimes during the copy I'll receive the following error
copying /folder/foo
copyToLocal: Input/output error
I thought that some of the files inside the foo folder might be corrupt, but a check with hdfs fsck didn't report anything strange. In the foo folder there are many many files so it's not feasible to manually search for something strange. I tried to enable debug mode for the copyToLocal command but there is no clue about any type of error. How could I debug this issue ?
I have the gut feeling that there are some networking issues on the NFS, but I also don't know how to debug this kind of problems.
The HDFS folders are also very large, I don't know if this could cause any additional problem.
We are in a Kerberos environment and we're executing commands after the kinit as the hdfs superuser.
p.s. Probably SO isn't the right place to ask this question, feel free to redirect me to the right website :)

Related

AWS CodeDeploy - deploy using the incorrect revision files

been banging my head against the wall trying to get an unbelievably simple CodeDeploy run going. The behavior I'm seeing suggests either a configuration issue or an issue local to the running agent. Basically, deployments are not using the files explicitly supplied to them - they're stuck implicitly using a prior version.
Having created an application and deployment group (and ensuring all prerequisites are in place such as the agent and roles are correctly assigned), I'm creating a deployment with via zipping up my code folder (at the root, not including the code's containing folder). There were a few issues to fix in a few of the hook steps, but I was able to fix a couple of them by changing the code, re-zipping and re-uploading before things got particularly weird. There was a syntax issue in my ApplicationStart hook script (when I finally got that far), so I fixed it and re-uploaded as before. However, the same syntax error occurred. I tried re-uploading, deleting all my S3 files and re-uploading, downloading the listed revision files and checking their contents (changes were reflected), but the same syntax error occurred. I even deleted the script and hook step completely from the yml file and it still happened, so clearly the deployment system is "stuck" in some sense. I went as far as feeding it a completely empty text file, told it it was a tar file, and it's still running my old revision. It's as though the runner agent's local files are stale and it's failing to clear local contents.
What's the deal? I feel like I've missed something fundamental.
edit - I created an entirely identical but new deployment group and re-tried the deployment with my new files and it worked. So that deployment group itself is stuck.

Detect if a folder is a clouded folder

Is there a way to check and find all in once all clouded folder that are on a computer without enumerate all possibilities ?
What I need is : I take a path and it tells me if it's a clouded folder.
Example : I use Google Drive, OneDrive, DropBox, and other. I would like to know if it's one of them without having to enumerate all possibilities of each services like :
Onedrive can be check on registry and have "Drive" on his path;
Google Drive have a .ini file on root folder and have "Drive" on his path;
DropBox have a file on root folder (I think) ;
Can't find the magic solution that check all services that exists.
Is there a secret tag or info hidden on folder saying "I'm related to a cloud service" ?
Can't found nothing about it :(
I've checked GetDriveTypeW but OneDrive and GoogleDrive are detect as fixed drive (and it's normal) and FILE_REMOTE_PROTOCOL_INFO but no sign of answer there either..
Please help !
Thanks.

I think you're out of luck this time. I don't believe there is anything fundamental in the Windows operating system or filesystem that tells you if an application installed on the system is mirroring or otherwise syncing files and folders. There is no magic bullet, to the best of my knowledge.
You'll have to tackle these on an application-by-application basis using awareness of each type of service and which folders it is syncing as you are currently doing.

ColdFusion 9 cffile error Access is Denied

I am getting the following error:
The cause of this exception was:
java.io.FileNotFoundException:
//server/c$/folder1/folder2/folder3/folder4/folder5/login.cfm
(Access is denied).
When doing this:
<cffile action="copy"
destination="#copyto#\#apfold#\#applic#\#files#"
source="#path#\#apfold#\#applic#\#files#">
If I try to write to C:\folder1\folder2\folder3\folder4\folder5\login.cfm, it works fine. The problem with doing it this way is that this is a script for developers to be able to manually sync files to their application folder. We have multiple servers for each instance that is randomly picked by BigIP. So just writing to the C:\ drive would only copy the file to the server the developer is currently accessing. So if the developer were to close out the browser and go right back in to make sure their changes worked, if they happen to get sent to a different server, they won't see their change.
Since it works with writing to C:\, I know the permissions are correct. I've also copied the path out of the error message and put it in the address bar on the server and it got to the folder/file fine. What else could be stopping it from being able to access that server?

It seems that you want to access a file via UNC notation on a network folder (even if it incidentally refers to a directory on the local c:\ drive). To be able to do this, you have to change the user the ColdFusion 9 Application Server Service runs on. By default, this service runs with the user "Local System Account" which you need to change to an actual user. Have a look at the following link to find out how to do this: http://mlowell.hubpages.com/hub/Coldfusion-Programming-Accessing-a-shared-network-drive
Note that you might have to add a user with the same name as the one used for the CF 9 service to all of the file servers.

If you don't want to enable ftp on your servers another option would be to use RoboCopy to keep the servers in sync. I have had very good luck using this tool. You will need access to the cfexecute ColdFusion tag and you will need to create share(s) on your servers.
RoboCopy is an executable that comes with Windows. You can read some documentation here and here. It has some very powerful features and can be set to "mirror" the contents of directories from one server to the other. In this mode it will keep the folders identical (new files added, removed files deleted, updated files copied, etc). This is how I have used it.
Basically, you will create a share on your destination servers and give access to a specific user (can be local or domain). On your source server you will run some ColdFusion code that:
Logically maps a drive to the destination server
Runs the RoboCopy utility to copy files to the destination server
Then disconnects the mapped drive
The ColdFusion service on your source server will need access to C:\WINDOWS\system32\net.exe and C:\WINDOWS\system32\robocopy.exe. If you are using ColdFusion sandbox security you will need to add entries for these executables (on the source server only). Here are some basic code examples.
First, map to the destination server:
<cfexecute name="C:\WINDOWS\system32\net.exe"
arguments="use {share_name} {password} /user:{username}"
variable="shareLog"
timeout="30">
</cfexecute>
The {share_name} here would be something like \\server\c$. {username} and {password} should be obvious. You can specify username as \\server\username. NOTE I would suggest using a share that you create rather than the administrative share c$ but that is what you had in your example.
Next, copy the files from the source server to the destination server:
<cfexecute name="C:\WINDOWS\system32\robocopy.exe"
arguments="{source_folder} {destination_folder} [files_to_copy] [options]"
variable="robocopyLog"
timeout="60">
</cfexecute>
The {source_folder} here would be something like C:\folder1\folder2\folder3\folder4\folder5\ and the {destination_folder} would be \\server\c$\folder1\folder2\folder3\folder4\folder5\. You must begin this argument with the {share_name} from the step above followed by the desired directory path. The [files_to_copy] is a list of files or wildcard (*.*) and the [options] are RoboCopy's options. See the links that I have included for the full list of options. It is extensive. To mirror a folder structure see the /E and /PURGE options. I also typically include the /NDL and /NP options to limit the output generated. And the /XA:SH to exclude system and hidden files. And the /XO to not bother copying older files. You can exclude other files/directories specifically or by using wildcards.
Then, disconnect the mapped drive:
<cfexecute name="C:\WINDOWS\system32\net.exe"
arguments="use {share_name} /d"
variable="shareLog"
timeout="30">
</cfexecute>
Works like a charm. If you go this route and have not used RoboCopy before I would highly recommend playing around with the options/functionality using the command line first. Then once you get it working to your liking just paste those options into the code above.

I ran into a similar issue with this and it had me scratching my head as well. We are using an Active Directory along with a UNC path to SERVERSHARE/webroot. The application was working fine with the exception of using CFFILE to create a directory. We were running our CFService as a Domain account and permissions were granted onto the webroot folder (residing on the UNC Server). This same domain account was also being used to connect to the UNC path within IIS. I even went so far as to grant FULL Control on the webroot folder but still had no luck.
Ultimately what I found was causing the problem was that the Inetpub Folder (parent folder to our webroot) had sharing turned on but that sharing did not include 'Read/Write' sharing for our CFService domain account.
So while we had Sharing on Inetpub and more powerful user permissions turned on for Inetpub/webroot folder, the sharing permissions (or lack thereof) took precedence over the more granular webroot user security permissions.
Hope this helps someone else.

How to delete files via FTP when directory has over 100,000 files?

I went to upload a new file to my web server only to get a message in return saying that my disk quota was full... I wasn't using up my allotted space but rather my allotted FILE QUANTITY. My host caps my total number of files at about 260,000.
Checking through my folders I believe I found the culprit...
I have a small DVD database application (Video dB By split Brain) that I have installed and hidden away on my web site for my own personal use. It apparently caches data from IMDB, and over the years has secretly amassed what is probably close to a MIRROR of IMDB at this point. I don't know for certain but I did have a 2nd (inactive) copy of the program on the host that I created a few years back that I was using for testing when I was modifying portions of it. The cache folder in this inactive copy had 40,000 files totalling 2.3GB in size. I was able to delete this folder over FTP but it took over an hour. Thankfully it also gave me some much needed breathing room.
...But now as you can imagine the cache folder for the active copy of this web-app likely has closer to 150000 files totalling about 7GB worth of data.
This is where my problem comes in... I use Flash FXP for my FTP client and whenever I try to delete the cache folder, or even just view the contents it will sit and try to load a file list for a good 5 minutes and then lose connection to the server...
my host has a web based file browser and it crashes when trying to do this... as do free online FTP clients like net2ftp.com. I don't have SSH ability on this server so I can't login directly to delete either.
Anyone have any idea how I can delete these files? Is there a different FTP program I can download that would have better success... or perhaps a small script I could run that would be able to take care of it?
Any help would be greatly appreciated.

Anyone have any idea how I can delete
these files?
Submit a support request asking for them to delete it for you?

It sounds like it might be time for a command line FTP utility. One ships with just about every operating system. With that many files, I would write a script for my command-line FTP client that goes to the folder in question and performs a directory listing, redirecting the output to a file. Then, use magic (or perl or whatever) to process that file into a new FTP script that runs a delete command against all of the files. Yes, it will take a long time to run.
If the server supports wildcards, do that instead and just delete ..
If that all seems like too much work, open a support ticket with your hosting provider and ask them to clean it up on the server directly.
Having said all that, this isn't really a programming question and should probably be closed.

We had a question a while back where I ran an experiment to show that Firefox can browse a directory with 10,000 files no problem, via FTP. Presumably 150,000 will also be ok. Firefox won't help you delete, but it might be helpful in capturing the names of the files you need to delete.
But first I would just try the command-line client ncftp. It is well engineered and I have had good luck with it in the past. You can delete a large number of files at once using shell patterns. And it is available for Windows, MacOS, Linux, and many other platforms.
If that doesn't work, you sound like a long-term customer---could you beg your ISP the privilege of a shell account for a week so you can remote login with Putty or ssh and blow away the entire directory with a single rm -r command?

If your ISP provides ssh access, you can use one rm command to remove the files.
If there is no command line access, you can have a try with some powerful FTP client like CrossFTP. It works on win, mac, and linux. When you select to delete the huge amount of files on your server, it can queue in the delete operations, so that you don't need to reload the folder again. When you restart CrossFTP, the queue can also be restored and continued.

Volume Shadow Copy (VSS)

Can anyone clarify an issue? I'm using the VSS API (C++ using VSS2008 and the latest SDK running on XP SP3) in a home-brew backup utility*.
THe VSS snapshot operations work fine for folder that have no subfolders - i.e. my email and SQL server volumes. However when I take a snapshot of a folder that does contain subfolders, the nested structure is 'flattened' in the snapshot - all sub-directories cease to exist.
So here is the question: I am aware that support for VSS on XP is a bit limited but is there a way to specify a snapshot be non-recursive? The docs are not very helpful ...
I got really tired of buggy rubbish that costs boatloads and fails every few days so I thought I'd roll my own. It'll get onto CodeProject at some point. If anyone is interested let me know and you can have a (source) copy when it's ready ...
Thx++
Jerry

Your question is confusing...
VSS does not work at a "folder" level. It works a "volume" level.
You "snap" a volume and you will have a device path which you can "open" using the filesystem api (which will automatically mount the device volume with a filesystem) on a file by file or you can access the device directly (sector by sector).
It should be easy to backup all files on the snapped device volume (don't forget all of the file streams and ACL's for NTFS files), your problem will be restoring them... VSS will not help you on the restore. The main problem will be restoring a system volume, where you will need another OS to boot to like WinPE or DOS or something else. If your not worried about system volumes then restore can be easy.
If you backup the data in terms of sectors, then you get the added benefit that if you write a volume device driver for it (to look like a volume or HD) then windows will auto-load a filesystem driver for it. This gives you a free explorer application, this is what most sector based backup applications do. Also it gives them VM possibilities.
Even if you are doing simple file backups, it helps to understand filesystems (NTFS, FAT, etc) so that you know what you can/should backup and restore. Do you know what a NTFS reparse point is? How are you going to deal with it if you hit one during your backup? Do you know how windows actually boots and what files you need to backup and restore and "patch" to be able to have a chance at booting. On a restore, how best do you lay out the NTFS volume as not to affect NTFS performance on the restored volume? Are you going to support restoring system volumes to new hardware, what does that require you to do just to have a chance of working? The questions are endless.
System backup/restore is not easy, there are lots of edge cases (see some of the questions above) that you don't know about until you hit them.
Good luck on you project, I hope I haven't put you off too much, I'm just saying there is a lot of work to be able to deliver a backup application that most people have have no idea about.

Comment on the above - if a 'writer' is playing the VSS game then it will ensure that the file system is in a happy state as part of the VSS setup.
In the case of MS SQL Server - check that it is a VSS writer. If it is then your snapshot of the DB files should be OK. If not, then its in what is called a 'crash state'. So for example if you are using MySQL or some other non-MS, non-VSS aware SQL database - your backup may or may not be coherent ('a good one'). In that case it may be better than nothing, but it it may also still be useless. Using VSS MAY result in a better integrity from which to make your backup, but of the files are open, they are open and if the app does not play in the VSS pig-pen then you may or may not be hosed.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js