SharePoint: Efficient way to retrieve all list items - list

I am retrieving all items from list containing around 4000 items.
But it seems to take longer time to fetch all items which is ~15 to ~22 seconds.
Is there any best way to fetch all items from list in negligible time?
Following is the code i am using to fetch all items:
using (SPSite spSite = new SPSite(site))
{
using (SPWeb web = spSite.OpenWeb())
{
list = web.Lists["ListName"];
SPQuery query1 = new SPQuery();
string query = "<View>";
query += "<ViewFields>";
query += "<FieldRef Name='ID' />";
query += "<FieldRef Name='Title' />";
query += "</ViewFields>";
query += "<Query>";
query += "<Where>";
query += "<Eq>";
query += "<FieldRef Name='ColName'></FieldRef>";
query += "<Value Type='Boolean'>1</Value>";
query += "</Eq>";
query += "</Where>";
query += "</Query>";
query += "</View>";
query1.Query = query;
SPListItemCollection listItems = list.GetItems(query1);
}
}

Normally when it is taking this long to Retrieve items you are hitting a Boundary or limit.
First up you need to test putting a limit of your query, so you return less than 2000 items, or until you find when it starts becoming unbelievably slow.
Then you need to see if you can break your query up, or do multiple queries to get your items depending on this figure.
Cheers
Truez

Fetching many items in one shot is for sure not the best practice, or suggested way.
You have to look into alternative options, like
Column indexes: related to the version of SharePoint you're using; evaluate and test if really can give some benefits in your case
Split fetch data queries into multiple queries, by finding a group by that suits best to your data, together with the threshold. In this way you can run the queries in parallel, and likely see performance benefits
Use Search: rely on the SharePoint Search engine, quite different between SharePoint versions, but for sure this will result to be blazing fast in comparison to SPQuery. With the downsides of having to rely on search crawl schedules for getting up-to-date data

Related

DAX - Count one row for each group that meet filter requirement

I have a UserLog table that logs the actions "ADDED", "DELETED" or "UPDATED" along with the date/timestamp.
In Power BI, I would like to accurately show the amount of new users as well as removed users for a filtered time period. Since it's possible that a user has been added and then deleted, I need to make sure that I only get the last record (ADDED/DELETED) from the log for every user.
Firstly, I tried setting up a measure that gets the max date/timestamp:
LastUpdate = CALCULATE(MAX(UserLog[LogDate]), UserLog[Action] <> "UPDATED")
I then tried to create the measure that shows the amount of new users:
AddedCount = CALCULATE(COUNT(UserLog[userId]), FILTER(UserLog, [Action] = "ADDED" && [LogDate] = [LastUpdate]))
But the result is not accurate as it still counts all "ADDED" records regardless if it's the last record or not.
I find writing formulas against unstructured data is much harder. If you put a little more structure to the data, it becomes much easier. There are lots of ways to do this. Here's one.
Add 2 columns to your UserLog table to help count adds and deletes:
Add = IF([Action]="ADDED",1,0)
Delete = IF([Action]="DELETED",1,0)
Then create a summary table so you can tell if the same user was added and deleted in the same [LogDate]:
Counts = SUMMARIZE('UserLog',
[LogDate],
[userId],
"Add",SUM('UserLog'[Add]),
"Delete",SUM('UserLog'[Delete]))
Then the measures are simple to write. For the AddedCount, just filter the rows where [Delete]=0, and for DeletedCount, just filter the rows were [Add]=0:
AddedCount = CALCULATE(SUM(Counts[Add]),
Counts[Delete]=0,
FILTER(Counts,Counts[LogDate]=[LastUpdate]))
DeletedCount = CALCULATE(SUM(Counts[Delete]),
Counts[Add]=0,
FILTER(Counts,Counts[LogDate]=[LastUpdate]))

dynamodb - scan items where map contains a key

I have a table that contains a field (not a key field), called appsMap, and it looks like this:
appsMap = { "qa-app": "abc", "another-app": "xyz" }
I want to scan all rows whose appsMap contains the key "qa-app" (the value is not important, just the key). I tried something like this but it doesn't work in the way I need:
FilterExpression = '#appsMap.#app <> :v',
ExpressionAttributeNames = {
"#app": "qa-app",
"#appsMap": "appsMap"
},
ExpressionAttributeValues = {
":v": { "NULL": True }
},
ProjectionExpression = "deviceID"
What's the correct syntax?
Thanks.
There is a discussion on the subject here:
https://forums.aws.amazon.com/thread.jspa?threadID=164470
You might be missing this part from the example:
ExpressionAttributeValues: {":name":{"S":"Jeff"}}
However, just wanted to echo what was already being said, scan is an expensive procedure that goes through every item and thus making your database hard to scale.
Unlike with other databases, you have to do plenty of setup with Dynamo in order to get it to perform at it's great level, here is a suggestion:
1) Convert this into a root value, for example add to the root: qaExist, with possible values of 0|1 or true|false.
2) Create secondary index for the newly created value.
3) Make query on the new index specifying 0 as a search parameter.
This will make your system very fast and very scalable regardless of how many records you get in there later on.
If I understand the question correctly, you can do the following:
FilterExpression = 'attribute_exists(#0.#1)',
ExpressionAttributeNames = {
"#0": "appsMap",
"#1": "qa-app"
},
ProjectionExpression = "deviceID"
Since you're not being a bit vague about your expectations and what's happening ("I tried something like this but it doesn't work in the way I need") I'd like to mention that a scan with a filter is very different than a query.
Filters are applied on the server but only after the scan request is executed, meaning that it will still iterate over all data in your table and instead of returning you each item, it applies a filter to each response, saving you some network bandwidth, but potentially returning empty results as you page trough your entire table.
You could look into creating a GSI on the table if this is a query you expect to have to run often.

Sharepoint: Total count of filtered list via webservices

In SharePoint 2013 I use a webservice to query a list:
System.Xml.XmlNode node = myservice.GetListItems(listName, viewName, query, viewFields, rowLimit, queryOptions, null);
I use CAML to apply filters to the default view and to limit the rows returned.
This works fine; now I want to get the total count of items with this particular filter setting how do I get this?
I know I can use GetList method but this returns me the total number of all items in the list w/o any filter applied.
Any idea to get the total count of list that is filtered via webservices?
SharePoint has no equivalent of a COUNT function within a query; if you want anything besides the total item count, your only option is to execute the filtered query then count the number of items returned back.
Similar question from SP2010 (unfortunately the CAML spec hasn't changed in this regard in SP2013): Determine Total Count Of Items Returned By SPQuery
Reference code from SharePoint.stackexchange: https://sharepoint.stackexchange.com/questions/26150/is-it-possible-to-return-only-the-count-of-the-query

SharePoint with large data

I want to discussion about SharePoint 2013 we are go on to development correspondence transaction system as product for our company on SharePoint 2013 , we expect a large data with large transaction on correspondence list we want your experience about this issue so please can you help us in this we have tow option :
development with sql database
development with SharePoint list
so please want your helpful answer with details
From my experience SharePoint is not very good to store huge amount of records (hundred thousands or even millions) in list. The reason is in the way data is stored in SharePoint database - if you check SharePoint content database you will find that all records from all lists on particular site collection are stored in one table AllListsData. The table itself is not really normalized so queries are not really fast and you get some performance loss because of the use of SharePoint as middle-ware. Moreover because all data from all lists are stored in one table if you have one huge list on site collection then it could degrade performance of whole site collection.
If you really need SharePoint I would suggest you to use separate sql database to store data and use BCS functionality to access it from SharePoint. Using this approach you can normalize data in your separate database and access data from SharePoint.
To get around it with the client Object Model you can use this code snippet, it uses item pos to leverage getting more than 5000K items. if you need VB.NET or more in-depth breakdown this is the site for you here The Snippet is from here
List list = ClientContext.Web.Lists.GetByTitle(“Assets”);
ListItemCollectionPosition itemPosition = null;
while (true) {
CamlQuery camlQuery = new CamlQuery();
camlQuery.ListItemCollectionPosition = itemPosition;
camlQuery.ViewXml = “<View>” + Constants.vbCr + Constants.vbLf + “<ViewFields>” + Constants.vbCr + Constants.vbLf + “<FieldRef Name=’Id’/><FieldRef Name=’Title’/><FieldRef Name=’Serial_No’/><FieldRef Name=’CRM_ID’/>” + Constants.vbCr + Constants.vbLf + “</ViewFields>” + Constants.vbCr + Constants.vbLf + “<RowLimit>2201</RowLimit>” + Constants.vbCr + Constants.vbLf + ” </View>”;
ListItemCollection listItems = list.GetItems(camlQuery);
clientcontext.Load(listItems);
clientcontext.ExecuteQuery();
itemPosition = listItems.ListItemCollectionPosition;
foreach (ListItem listItem in listItems) {
Console.WriteLine(“Item Title: {0} Item Key: {1} CRM ID: {2}”, listItem(“Title”), listItem(“Serial_x0020_No_x002e_”), listItem(“CRM_ID”));
}
if (itemPosition == null) {
break; // TODO: might not be correct. Was : Exit While
}
Console.WriteLine(itemPosition.PagingInfo);
Console.WriteLine();
}

Slow Postgres JOIN Query

I'm trying to optimize a slow query that was generated by the Django ORM. It is a many-to-many query. It takes over 1 min to run.
The tables have a good amount of data, but they aren't huge (400k rows in sp_article and 300k rows in sp_article_categories)
#categories.article_set.filter(post_count__lte=50)
EXPLAIN ANALYZE SELECT *
FROM "sp_article"
INNER JOIN "sp_article_categories" ON ("sp_article"."id" = "sp_article_categories"."article_id")
WHERE ("sp_article_categories"."category_id" = 1081
AND "sp_article"."post_count" <= 50 )
Nested Loop (cost=0.00..6029.01 rows=656 width=741) (actual time=0.472..25.724 rows=1266 loops=1)
-> Index Scan using sp_article_categories_category_id on sp_article_categories (cost=0.00..848.82 rows=656 width=12) (actual time=0.015..1.305 rows=1408 loops=1)
Index Cond: (category_id = 1081)
-> Index Scan using sp_article_pkey on sp_article (cost=0.00..7.88 rows=1 width=729) (actual time=0.014..0.015 rows=1 loops=1408)
Index Cond: (sp_article.id = sp_article_categories.article_id)
Filter: (sp_article.post_count <= 50)
Total runtime: 26.536 ms
I have an index on:
sp_article_categories.article_id (type: btree)
sp_article_categories.category_id
sp_article.post_count (type: btree)
Any suggestions on how I can tune this to get the query speedy?
Thanks!
You've provided the vital information here - the explain analyse. That isn't showing a 1 second runtime though, it's showing 20 milliseconds. So - either that isn't the query being run, or the problem is elsewhere.
The only difference between explain analyse and a real application is that the results aren't actually returned. You would need a lot of data to slow things down to 1 second though.
The other suggestions are all off the mark since they're ignoring the fact that the query isn't slow. You have the relevant indexes (both sides of the join are using an index scan) and the planner is perfectly capable of filtering on the category table first (that's the whole point of having a half decent query planner).
So - you first need to figure out what exactly is slow...
Put an index on sp_article_categories.category_id
From a pure SQL perspective, your join is more efficient if your base table has fewer rows in it, and the WHERE conditions are performed on that table before it joins to another.
So see if you can get Django to select from the categories first, then filter the category_id before joining to the article table.
Pseudo-code follows:
SELECT * FROM categories c
INNER JOIN articles a
ON c.category_id = 1081
AND c.category_id = a.category_id
And put an index on category_id like Steven suggests.
You can use field names instead * too.
select [fields] from....
I assume you have run analyze on the database to get fresh statistics.
It seems that the join between sp_article.id and sp_article_categories.article_id is costly. What data type is the article id, numeric? If it isn't you should perhaps consider making it numeric - integer or bigint, whatever suites your needs. It can make a big difference in performance according to my experience. Hope it helps.
Cheers!
// John