Sharepoint document library storing files on filesystem

Sharepoint document library storing files on filesystem - sharepoint-2013

I'm in a bit of trouble here. Here is the context:
One of our customers asked us to develop an alternative solution to storing documents of a document library in the content database as their content database is growing too fast. They provided us with a network storage so that the documents could be stored in the filesystem instead. After googling a bit, I've found a feature called Remote Blob Storage RBS RBS, but as the references say, this is a per content database feature which is not acceptable for the context. The other option I've come up with is the use of SPItemEventReceiver so that in the ItemAdded event I could save the SPFile associated with the ListItem of the SPItemEventProperties property to the filesystem and possibly delete or truncate the SPFile object
public static void DeleteAssociatedFile(SPWeb web, SPListItem item)
{
try
{
if (item == null) { throw new ArgumentNullException("item"); }
if (item.FileSystemObjectType == SPFileSystemObjectType.File)
{
web.AllowUnsafeUpdates = true;
using (var fileStream = item.File.OpenBinaryStream())
{
if (fileStream.CanWrite)
{
fileStream.SetLength(0);
}
}
item.File.Update();
}
}
catch (Exception ex)
{
// log error message
Logger.Unexpected("ListItemHelper.DeleteAssociatedFile", ex.Message);
throw;
}
finally
{
web.AllowUnsafeUpdates = false;
}
}
forcing it to not store its content into the content database. But it didn't work out. Everytime that I somehow manage to delete or truncate the SPFile associated with the ListItem, the ListItem itself either gets deleted from the document library or the file doesn't get affected by the change. So my is question is: is there a solution for this problem? Any other thoughts that could help me in this quest?
Thanks in advance!

As you have asked other thoughts
One thing coming into my mind is one drive for business instead of network storage
Another is develop custom file upload, upload the file directly to network storage and once uploaded, add an entry in SharePoint list.

Related

Joining a stream against a "table" in Dataflow

Let me use a slightly contrived example to explain what I'm trying to do. Imagine I have a stream of trades coming in, with the stock symbol, share count, and price: { symbol = "GOOG", count = 30, price = 200 }. I want to enrich these events with the name of the stock, in this case "Google".
For this purpose I want to, inside Dataflow, maintain a "table" of symbol->name mappings that is updated by a PCollection<KV<String, String>>, and join my stream of trades with this table, yielding e.g. a PCollection<KV<Trade, String>>.
This seems like a thoroughly fundamental use case for stream processing applications, yet I'm having a hard time figuring out how to accomplish this in Dataflow. I know it's possible in Kafka Streams.
Note that I do not want to use an external database for the lookups – I need to solve this problem inside Dataflow or switch to Kafka Streams.

I'm going to describe two options. One using side-inputs which should work with the current version of Dataflow (1.X) and one using state within a DoFn which should be part of the upcoming Dataflow (2.X).
Solution for Dataflow 1.X, using side inputs
The general idea here is to use a map-valued side-input to make the symbol->name mapping available to all the workers.
This table will need to be in the global window (so nothing ever ages out), will need to be triggered every element (or as often as you want new updates to be produced), and accumulate elements across all firings. It will also need some logic to take the latest name for each symbol.
The downside to this solution is that the entire lookup table will be regenerated every time a new entry comes in and it will not be immediately pushed to all workers. Rather, each will get the new mapping "at some point" in the future.
At a high level, this pipeline might look something like (I haven't tested this code, so there may be some types):
PCollection<KV<Symbol, Name>> symbolToNameInput = ...;
final PCollectionView<Map<Symbol, Iterable<Name>>> symbolToNames = symbolToNameInput
.apply(Window.into(GlobalWindows.of())
.triggering(Repeatedly.forever(AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardMinutes(5)))
.accumulatingFiredPanes())
.apply(View.asMultiMap())
Note that we had to use viewAsMultiMap here. This means that we actually build up all the names for every symbol. When we look things up we'll need make sure to take the latest name in the iterable.
PCollection<Detail> symbolDetails = ...;
symbolDetails
.apply(ParDo.withSideInputs(symbolToNames).of(new DoFn<Detail, AugmentedDetails>() {
#Override
public void processElement(ProcessContext c) {
Iterable<Name> names = c.sideInput(symbolToNames).get(c.element().symbol());
Name name = chooseName(names);
c.output(augmentDetails(c.element(), name));
}
}));
Solution for Dataflow 2.X, using the State API
This solution uses a new feature that will be part of the upcoming Dataflow 2.0 release. It is not yet part of the preview releases (currently Dataflow 2.0-beta1) but you can watch the release notes to see when it is available.
The general idea is that keyed state allows us to store some values associated with the specific key. In this case, we're going to remember the latest "name" value we've seen.
Before running the stateful DoFn we're going to wrap each element into a common element type (a NameOrDetails) object. This would look something like the following:
// Convert SymbolToName entries to KV<Symbol, NameOrDetails>
PCollection<KV<Symbol, NameOrDetails>> left = symbolToName
.apply(ParDo.of(new DoFn<SymbolToName, KV<Symbol, NameOrDetails>>() {
#ProcessElement
public void processElement(ProcessContext c) {
SymbolToName e = c.element();
c.output(KV.of(e.getSymbol(), NameOrDetails.name(e.getName())));
}
});
// Convert detailed entries to KV<Symbol, NameOrDetails>
PCollection<KV<Symbol, NameOrDetails>> right = details
.apply(ParDo.of(new DoFn<Details, KV<Symbol, NameOrDetails>>() {
#ProcessElement
public void processElement(ProcessContext c) {
Details e = c.element();
c.output(KV.of(e.getSymobl(), NameOrDetails.details(e)));
}
});
// Flatten the two streams together
PCollectionList.of(left).and(right)
.apply(Flatten.create())
.apply(ParDo.of(new DoFn<KV<Symbol, NameOrDetails>, AugmentedDetails>() {
#StateId("name")
private final StateSpec<ValueState<String>> nameSpec =
StateSpecs.value(StringUtf8Coder.of());
#ProcessElement
public void processElement(ProcessContext c
#StateId("name") ValueState<String> nameState) {
NameOrValue e = c.element().getValue();
if (e.isName()) {
nameState.write(e.getName());
} else {
String name = nameState.read();
if (name == null) {
// Use symbol if we haven't received a mapping yet.
name = c.element().getKey();
}
c.output(e.getDetails().withName(name));
}
});

Amazon web service batch file upload using specific key

I would like to ask if there is any way to set a key for each uploaded file using the TransferManager (or any other class)? I am currently using the method uploadFileList for this and I noticed that I can define a callback for each file sent using the ObjectMetadataProvider interface, but I only have the ObjectMetadata at my disposal. I thought it would be possible to get the parent ObjectRequest and set the key value in there, but that does not seem to be possible.
What I am trying to achieve:
MultipleFileUpload fileUpload = tm.uploadFileList(bucketName, "", new File(directory), files, new ObjectMetadataProvider() {
#Override
public void provideObjectMetadata(File file, ObjectMetadata objectMetadata) {
objectMetadata.getObjectRequest().setKey(myOwnKey);
}
});
I am most likely missing something obvious, but I spent some time looking for the answer and cannot find it anywhere. My problem is that if I supply some files for this method, it takes their absolute path (or something like that) as a key name and that is not acceptable for me. Any help is appreciated.

I almost forgot about this post.
There was no elegant solution, so I had to resort to making my own transfer manager (MultiUpload) and check the list of each upload manually.
I can then set the key for each object upon creating the Upload object.
List<Upload> uploads = new ArrayList();
MultiUpload mu = new MultiUpload(uploads);
for (File f : files) {
// Check, if file, since only files can be uploaded.
if (f.isFile()) {
String key = ((!directory.isEmpty() && !directory.equals("/"))?directory+"/":"")+f.getName();
ObjectMetadata metadata = new ObjectMetadata();
uploads.add(tm.upload(
new PutObjectRequest(bucketName,
key, f)
.withMetadata(metadata)));
}
}

Users receiving multiple notifications (timeline cards) on Glass

I have an app that sends out notifications to users (timeline cards) and some of the users are reporting that they are receiving the same timeline card multiple times (up to 5 times in one instance). Has anyone encountered this? My app is utilizing the Mirror API.
I've reviewed my log files and only see the timeline card produced once. I'm at a loss. I'll provide any code or logs that are needed. My app is written in Python.
Thanks!

This shouldn't be happening. If you're seeing it persist, file a bug in the official issue tracker.
If you do file a bug, there's one thing that might help Google find the root cause. Do a timeline.list on a user who reports the multiple notifications. Does the API show multiple cards? If so, include the JSON representation of them (including the ID)
The specific code to do this list depends on the language you're developing in. Here's an example of how to do it in Java:
public static List<TimelineItem> retrieveAllTimelineItems(Mirror service) {
List<TimelineItem> result = new ArrayList<TimelineItem>();
try {
Timeline.List request = service.timeline().list();
do {
TimelineListResponse timelineItems = request.execute();
if (timelineItems.getItems() != null && timelineItems.getItems().length() > 0) {
result.addAll(timelineItems.getItems());
request.setPageToken(timelineItems.getNextPageToken());
} else {
break;
}
} while (request.getPageToken() != null && request.getPageToken().length() > 0);
} catch (IOException e) {
System.err.println("An error occurred: " + e);
return null;
}
return result;
}

Sharepoint 2013 Query very slow

we set up a new SharePoint 2013 Server to test how it would work as Document-Storage.
The Problem is, that it is very slow and I dont know why..
I adapted from msdn:
ClientContext _ctx;
private void btnConnect_Click(object sender, RoutedEventArgs e)
{
try
{
_ctx = new ClientContext("http://testSP1");
Web web = _ctx.Web;
Stopwatch w = new Stopwatch();
w.Start();
List list = _ctx.Web.Lists.GetByTitle("Test");
Debug.WriteLine(w.ElapsedMilliseconds); //24 first time, 0 second time
w.Restart();
CamlQuery q = CamlQuery.CreateAllItemsQuery(10);
ListItemCollection items = list.GetItems(q);
_ctx.Load(items);
_ctx.ExecuteQuery();
Debug.WriteLine(w.ElapsedMilliseconds); //1800 first time, 900 second Time
}
catch (Exception)
{
throw;
}
}
There arent very much Documents in the Test list.
Just 3 Folders and 1 Word-File.
Any suggestions/ideas why it is this slow?

Storing unstructured content (Word docs, PDFs, anything except metadata) in SharePoint's SQL content database is going to result in slower upload and retrieval than if the files are stored on the file system. That's why Microsoft created the Remote BLOB (Binary Large Object) Storage interface to enable files to be managed in SharePoint but live on the file system or in the cloud. The bigger the files, the greater the performance hit.
There are several third-party solutions that leverage this interface, including my company's offering, Metalogix StoragePoint. You can reach out to me at trossi#metalogix.com if you would like to learn more or visit http://www.metalogix.com/Products/StoragePoint/StoragePoint-BLOB-Offloading.aspx

NHibernate Load vs. Get behavior for testing

In simple tests I can assert whether an object has been persisted by whether it's Id is no longer at it's default value. But delete an object and want to check whether the object and perhaps its children are really not in the database, the object Id's will still be at their saved values.
So I need to go to the db, and I would like a helper assertion to make the tests more readable, which is where the question comes in. I like the idea of using Load to save the db call, but I'm wondering if the ensuing exceptions can corrupt the session.
Below are how the two assertions would look, I think. Which would you use?
Cheers,
Berryl
Get
public static void AssertIsTransient<T>(this T instance, ISession session)
where T : Entity
{
if (instance.IsTransient()) return;
var found = session.Get<T>(instance.Id);
if (found != null) Assert.Fail(string.Format("{0} has persistent id '{1}'", instance, instance.Id));
}
Load
public static void AssertIsTransient<T>(this T instance, ISession session)
where T : Entity
{
if (instance.IsTransient()) return;
try
{
var found = session.Load<T>(instance.Id);
if (found != null) Assert.Fail(string.Format("{0} has persistent id '{1}'", instance, instance.Id));
}
catch (GenericADOException)
{
// nothing
}
catch (ObjectNotFoundException)
{
// nothing
}
}
edit
In either case I would be doing the fetch (Get or Load) in a new session, free of state from the session that did the save or delete.
I am trying to test cascade behavior, NOT to test NHib's ability to delete things, but maybe I am over thinking this one or there is a simpler way I haven't thought of.

Your code in the 'Load'-section will always hit Assert.Fail, but never throw an exception as Load<T> will return a proxy (with the Id-property set - or populated from the 1st level cache) without hitting the DB - ie. ISession.Load will only fail, if you access a property other than your Id-property on a deleted entity.
As for your 'Get'-section - I might be mistaken, but I think that if you delete an entity in a session - and later try to use .Get in the same session - you will get the one in 1st level cache - and again not return null.
See this post for the full explanation about .Load and .Get.
If you really need to see if it is in your DB - use a IStatelessSession - or launch a child-ISession (which will have an empty 1st level cache.
EDIT: I thought of a bigger problem - your entity will first be deleted when the transaction is committed (when the session is flushed per default) - so unless you manually flush your session (not recommended), you will still have it in your DB.
Hope this helps.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Sharepoint document library storing files on filesystem - sharepoint-2013

As you have asked other thoughts One thing coming into my mind is one drive for business instead of network storage Another is develop custom file upload, upload the file directly to network storage and once uploaded, add an entry in SharePoint list.

Related

Joining a stream against a "table" in Dataflow

Amazon web service batch file upload using specific key

Users receiving multiple notifications (timeline cards) on Glass

Sharepoint 2013 Query very slow

NHibernate Load vs. Get behavior for testing

Categories

Resources