SharePoint with large data

SharePoint with large data - list

I want to discussion about SharePoint 2013 we are go on to development correspondence transaction system as product for our company on SharePoint 2013 , we expect a large data with large transaction on correspondence list we want your experience about this issue so please can you help us in this we have tow option :
development with sql database
development with SharePoint list
so please want your helpful answer with details

From my experience SharePoint is not very good to store huge amount of records (hundred thousands or even millions) in list. The reason is in the way data is stored in SharePoint database - if you check SharePoint content database you will find that all records from all lists on particular site collection are stored in one table AllListsData. The table itself is not really normalized so queries are not really fast and you get some performance loss because of the use of SharePoint as middle-ware. Moreover because all data from all lists are stored in one table if you have one huge list on site collection then it could degrade performance of whole site collection.
If you really need SharePoint I would suggest you to use separate sql database to store data and use BCS functionality to access it from SharePoint. Using this approach you can normalize data in your separate database and access data from SharePoint.

To get around it with the client Object Model you can use this code snippet, it uses item pos to leverage getting more than 5000K items. if you need VB.NET or more in-depth breakdown this is the site for you here The Snippet is from here
List list = ClientContext.Web.Lists.GetByTitle(“Assets”);
ListItemCollectionPosition itemPosition = null;
while (true) {
CamlQuery camlQuery = new CamlQuery();
camlQuery.ListItemCollectionPosition = itemPosition;
camlQuery.ViewXml = “<View>” + Constants.vbCr + Constants.vbLf + “<ViewFields>” + Constants.vbCr + Constants.vbLf + “<FieldRef Name=’Id’/><FieldRef Name=’Title’/><FieldRef Name=’Serial_No’/><FieldRef Name=’CRM_ID’/>” + Constants.vbCr + Constants.vbLf + “</ViewFields>” + Constants.vbCr + Constants.vbLf + “<RowLimit>2201</RowLimit>” + Constants.vbCr + Constants.vbLf + ” </View>”;
ListItemCollection listItems = list.GetItems(camlQuery);
clientcontext.Load(listItems);
clientcontext.ExecuteQuery();
itemPosition = listItems.ListItemCollectionPosition;
foreach (ListItem listItem in listItems) {
Console.WriteLine(“Item Title: {0} Item Key: {1} CRM ID: {2}”, listItem(“Title”), listItem(“Serial_x0020_No_x002e_”), listItem(“CRM_ID”));
}
if (itemPosition == null) {
break; // TODO: might not be correct. Was : Exit While
}
Console.WriteLine(itemPosition.PagingInfo);
Console.WriteLine();
}

Related

Is it possible to query all records in Cassandra based on a condition?

I have a table with contains a list of user records. I need to query all records based on some condition. The use case is I have about 30 million records in the user table and my condition would match 3 million.
I have gone thru https://www.datastax.com/dev/blog/a-deep-look-to-the-cql-where-clause however, I couldn't find any real solution.
Is it even possible to query Cassandra table based on a condition?
I need to query and paginate the records just like a tradional rdbms or document store.

Cassandra has a concept of paging where you can specify the fetch size and then iterate over it page by page. The below code is if you are using Datastax Java Driver to query data. But other language should also have something similar.
final int RESULTS_PER_PAGE = 100;
Statement st = new SimpleStatement("your query");
st.setFetchSize(RESULTS_PER_PAGE);
String requestedPage = extractPagingStateStringFromURL();
// This will be absent for the first page
if (requestedPage != null) {
st.setPagingState(
PagingState.fromString(requestedPage));
}
ResultSet rs = session.execute(st);
PagingState nextPage = rs.getExecutionInfo().getPagingState();
// Note that we don't rely on RESULTS_PER_PAGE, since Cassandra might
// have not respected it, or we might be at the end of the result set
int remaining = rs.getAvailableWithoutFetching();
for (Row row : rs) {
renderInResponse(row);
if (--remaining == 0) {
break;
}
}
// This will be null if there are no more pages
if (nextPage != null) {
renderNextPageLink(nextPage.toString());
}
More details can be found here.

CAML Query to get only non duplicates

My first goal is to get all items that has a status of 'Pending' in my sharepoint list. My code so far is like this :
using (var clientContext = new ClientContext(spSite.Trim()))
{
clientContext.Credentials = GetNetworkCredential();
var approvalLists = clientContext.Web.Lists.GetByTitle(approvalLibraryName);
CamlQuery query = new CamlQuery();
query.ViewXml = "<View>" +
"<Query>" +
"<Where>" +
"<Eq>" +
"<FieldRef Name='IsApproved'/><Value Type='Choice'>Pending</Value>" +
"</Eq>" +
"</Where>" +
"</Query>" +
"</View>";
ListItemCollection approvalListItem = approvalLists.GetItems(query);
clientContext.Load(approvalListItem);
clientContext.ExecuteQuery();
}
It is working but I then realize that a particular item can be inserted more than 1 in that list. So for example for item request_100, it can have one row for pending and another row for approved. So I need to only get those non duplicates items with status 'Pending'. Is it possible to have group by then fetch only those who have count = 1? Because I'm thinking if I could just load all items then manipulate it in using linq list. Or do you guys have another suggestion for this?

My first thought was indeed also using linq on the ItemCollection which should be working just fine.
But just a thought, if you select only the 'pending' rows, you won't get duplicates will you? Or can there be more than one item like 'request_100' with statusses 'Pending'?
Check this question which is alike to yours: https://sharepoint.stackexchange.com/questions/43262/how-to-use-groupby-in-caml-query

SharePoint: Efficient way to retrieve all list items

I am retrieving all items from list containing around 4000 items.
But it seems to take longer time to fetch all items which is ~15 to ~22 seconds.
Is there any best way to fetch all items from list in negligible time?
Following is the code i am using to fetch all items:
using (SPSite spSite = new SPSite(site))
{
using (SPWeb web = spSite.OpenWeb())
{
list = web.Lists["ListName"];
SPQuery query1 = new SPQuery();
string query = "<View>";
query += "<ViewFields>";
query += "<FieldRef Name='ID' />";
query += "<FieldRef Name='Title' />";
query += "</ViewFields>";
query += "<Query>";
query += "<Where>";
query += "<Eq>";
query += "<FieldRef Name='ColName'></FieldRef>";
query += "<Value Type='Boolean'>1</Value>";
query += "</Eq>";
query += "</Where>";
query += "</Query>";
query += "</View>";
query1.Query = query;
SPListItemCollection listItems = list.GetItems(query1);
}
}

Normally when it is taking this long to Retrieve items you are hitting a Boundary or limit.
First up you need to test putting a limit of your query, so you return less than 2000 items, or until you find when it starts becoming unbelievably slow.
Then you need to see if you can break your query up, or do multiple queries to get your items depending on this figure.
Cheers
Truez

Fetching many items in one shot is for sure not the best practice, or suggested way.
You have to look into alternative options, like
Column indexes: related to the version of SharePoint you're using; evaluate and test if really can give some benefits in your case
Split fetch data queries into multiple queries, by finding a group by that suits best to your data, together with the threshold. In this way you can run the queries in parallel, and likely see performance benefits
Use Search: rely on the SharePoint Search engine, quite different between SharePoint versions, but for sure this will result to be blazing fast in comparison to SPQuery. With the downsides of having to rely on search crawl schedules for getting up-to-date data

Using a variety of integer types for primary key

I'm pretty new to SSIS and I'm trying to convert old database data into a new database schema. I've been playing around a while, but I cannot get my head around how I would be able to keep the integrity between 2 or more destination sources. For example, I have a projects table in the old database (projects for buildings), which contains the following information:
+------------------------+
+ TABLE: Projects +
+------------------------+
+ ProjectID (PK) + (primary key of project)
+ ProjectCode + (unique code of the project)
+ ProjectBuildingName + (name of the building for this project)
+ ProjectCompletionDate + (date when the project has been completed)
+ AddressLine1 + (AddressLine, Postalcode and City of the building)
+ Postalcode +
+ City +
+------------------------+
In my new database design, I want to split the data of [Projects] into the tables: [Projects], [Projectbuildings] and [Addresses].
In SSIS, I select the old.[Projects] table as source, and map these to the corresponding sources. Before that, I convert the data, and do a multicast, see figure below:
In this flow I migrate the data I want to the tables I want, but those tables won't have their FK-integrity. For example, my new design would look like this:
+--------------------------+
+ TABLE: Projects +
+--------------------------+
+ ProjectID (PK) +
+ ProjectCode +
+ ProjectCompletionDate +
+ ProjectBuildingID (FK) +
+--------------------------+
+--------------------------+
+ TABLE: ProjectBuildings +
+--------------------------+
+ ProjectBuildingID (PK) +
+ ProjectBuildingName +
+ AddressID (FK) +
+--------------------------+
+--------------------------+
+ TABLE: Addresses +
+--------------------------+
+ AddressID (PK) +
+ Country +
+ City +
+ Postalcode +
+ AddressLine1 +
+--------------------------+
(P.S. Ignore the 4th column in the figure, which would be the [Contacts] column. This is a limited data example to help me point out my question)
When I now map 1 on 1, I do transfer the data in the correct tables and columns, but how am I going to ensure that all tables have their relations linked to each other in the correct way as well?
I have seen 2 other stackoverflow posts about more or less the same question, but I just can't get around it. I was hoping for a more clear answer.
Note: I'm using SQL Server 2008 Data Center + Integrated Services with Microsoft Visual Studio 2008. I'm trying to migrate from old to new database, both on the same SQL Server 2008.
EDIT
I have found a very great explanation at How do I split flat file data and load into parent-child tables in database?.
I managed to split a table into two tables (a child and a parent), insert into the parent, then lookup the inserted ID, and use that ID for the insertion of the child. Problems occur when this child is also a parent of the 3rd table. I would think that running the same flow for the 2nd child would be the same: lookup ID of the inserted ID of child1, then use that when inserting child2. For some reason, it's not really working.
EDIT
Ok, here is the real example. I'm trying to migrate the following columns out of the old database (which is all in 1 table):
[OLDDB].Customersurname ----> [People].Surname
[OLDDB].Customerforename ----> [People].Forename
[OLDDB].Customergender ----> [People].bGender
[OLDDB].Customeraddressline ----> [Addresses].AddressLine1
[OLDDB].Customerpostalcode ----> [Addresses].Postalcode
[OLDDB].Customercity ----> [Addresses].City
[OLDDB].Customerphone ----> [AdditionalAddresses].Phone1
[OLDDB].Customeremail ----> [AdditionalAddresses].Email
Now I managed to insert [AdditionalAddresses] and [Addresses] with their corresponding links (and left Countries and AddressTypes NULL). My problem is the [Contacts] table, which only contains the [Address].AddressID and the boolean which tells if the [Contact] is a person or a company. I think, if [People] and [Companies] both would contain the [Address].AddressID as FK, it would work.
So what I have done so far is:
Migrate to [AdditionalAddresses] (DONE)
Lookup parent [AdditionalAddresses] key (DONE)
Export to [Addresses] based on lookup key of [AdditionalAddresses] (DONE)
Next I would:
Export to [People], this will create new unique ID's and I will have to put the FK constraint off
Lookup parent [People] keys
Export to [Contacts] based on lookup key of [People]
Then the last part, to update the [Contacts] table with the [Address].AddressID that belongs to that person...

A few questions to precise your need:
Is it a oneshot operation or not ?
Are the target tables empty ?
Do you do a lot of data tranformation/conversion ?
Two options to investigate in the meantime:
One of the easiest solution would be to split your dataflow in four distinct dataflows, ordered by precedence constraints. In the first one you extract and load the Projects. In the second one you extract the ProjectBuildings, do a lookup into your Projects table to get the corresponding ProjectID and then insert it.
Another option would be to use staging tables, but it seems way overkill for the case you are presenting.

Update with HQL error

Hi all i have the following code:
tx = session.beginTransaction();
Query query = session.createQuery("UPDATE com.nisid.entities.Payment set amount=:amount,paymentMethod=:method,paymentExpire=:expireDate"
+ "Where paymentId=:payid,actionId=:actionid");
query.setParameter("amount", amount);
query.setParameter("method", method);
query.setParameter("expireDate", expireDate);
query.setParameter("payid", projectId);
query.setParameter("actionid", actionId);
int resutl=query.executeUpdate();
Trying to do an update using HQL but i am getting error: galArgumentException: node to traverse cannot be null!
my table in the DB is called Payment and it has A COMPOSITE KEY ( projectId,actionId)
Could you please help me further???
The concept is that i have a JSP page which retrieves and displayes the results from DB retrieving info from Project Table, Payment Table and Action Table. Project has many to many relationship with Action and i am using Payment Table as the intermetiary table which holds the 2 FK of the other table.

You missed space before where, and replace , to and after where.
Query query = session.createQuery("UPDATE com.nisid.entities.Payment set amount=:amount,paymentMethod=:method,paymentExpire=:expireDate"
+ " Where paymentId=:payid and actionId=:actionid");

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

SharePoint with large data - list

Related

Is it possible to query all records in Cassandra based on a condition?

CAML Query to get only non duplicates

SharePoint: Efficient way to retrieve all list items

Using a variety of integer types for primary key

Update with HQL error

Categories

Resources