Webtrends - utilization report - webtrends

Want to find out how many files were downloaded on my website out of total number of files. eg: i have a million pdf files and people have downlaoded only 100,000. this is 10% utilization.
I tried downloaded files report but it shows only top 1000 files. is there a way to get the complete count. ie number of fiels downloaded atleast once.
is it possible to get this count without re-analyzing the report.

First of all, no, it is not possible without re-analyzing the profile and the report. You have to adjust the so called "Table Size Limit", which limits the number of elements to analyze and the number of elements to show in the report.
Example: You have 1 Mio. Pages within your website. The report analysis limit is set to 250.000 pages, so after that, new pages will not be recorded and count by webtrends. The final report you see within reports will only show you the top 2.000 pages.
You need to increase the Table Size Limits and re-analyze. If you not use Webtrends On-Demand and you still have the logs, a re-analyze will not effect your page views licenses.

Related

Querying table with >1000 columns fails

I can create and ingest data into a table with 1100 columns, but when I try to run any kind of query on it, like get all vals:
select * from iot_agg;
It looks like I cannot read it with the following error
io.questdb.cairo.CairoException: [24] Cannot open file: /root/.questdb/db/table/iot_agg.d
at io.questdb.std.ThreadLocal.initialValue(ThreadLocal.java:36)
at java.lang.ThreadLocal.setInitialValue(ThreadLocal.java:180)
at java.lang.ThreadLocal.get(ThreadLocal.java:170)
at io.questdb.cairo.CairoException.instance(CairoException.java:38)
at io.questdb.cairo.ReadOnlyMemory.of(ReadOnlyMemory.java:135)
at io.questdb.cairo.ReadOnlyMemory.<init>(ReadOnlyMemory.java:44)
at io.questdb.cairo.TableReader.reloadColumnAt(TableReader.java:1031)
at io.questdb.cairo.TableReader.openPartitionColumns(TableReader.java:862)
at io.questdb.cairo.TableReader.openPartition0(TableReader.java:841)
at io.questdb.cairo.TableReader.openPartition(TableReader.java:806)
...
Ouroborus might be right in suggesting that the schema could be revisited, but regarding the actual error from Cairo:
24: OS error, too many open files
This is dependent on the OS that the instance is running on, and is tied to system-wide or user settings, which can be increased if necessary.
It is relatively common to hit limits like this for multiple different DB engines which handle large amounts of files. This is commonly configured with kernel variables to set the maximum number of open files. Checking the max limit for open files can be done on Linux and MacOS with
ulimit -n
You can also use ulimit to set this to a value you need. If you need to set it to 10,000, for example, you can do this with:
ulimit -n 10000
edit: There is official documentation for capacity planning when deploying QuestDB which takes several factors such as CPU, memory, network capacity, and a combination of these elements into consideration. For more information, see the capacity planning guide

Google DataPrep - Apparently Limited Table Size

I'm trying to prepare SEO data from Screaming Frog, Majestic and Ahrefs, join it before importing said data into BigQuery for analysis.
The Majestic and Ahrefs csv files import after some pruning down to the 100MB limit.
The Screaming Frog CSV file however doesn't fully load, only displaying approx 37,000 rows of 193,000. By further pruning less important cols in Excel and reducing the filesize(from 44MB to 39MB) , the number of rows loaded increases slightly. This would indicate to me that it's not an errant character or cell.
I've made sure(resaved via text editor) that the CSV file is saved in UTF8, checked the limitations of Dataprep to see if there is a limit on the number of cells per Flow/Wrangle and can find nothing.
The Majestic and AHREFS files are larger and load completely with no issue. There is no data corruption in the Screaming Frog file. Is there something common I'm missing?
Is the total limit for all files 100MB?
Any advice or insight would be appreciated.
To get the full transformation of your files, you should run the recipe.
What you see in the Dataprep Transformer Page is a head sample.
You can take a look about how the sampling works here.

How do I manage a rapidly growing postgres table in Django?

I have a website that is used to show live data from different machines in a crushing facility. Right now the "Sensor Data" obtained from all the machines in one facility is stored in a "sensordata" table and this data is used by the user to make reports for a time period(upto last three months). The site has been running for 3 months now and the sensor data table is already at 113 million rows. My company is planning to add even more facilities( this will multiply the number of machines). What's a good solution to store this large an amount of data from on which analysis is to be performed in the future(even create reports and such)?(That's also future safe for say 100's of facilities)

How to use Amazon MWS to indicate two different shipping times on items?

I have a bit of a unique problem here. I currently have two warehouses that I ship items out of for selling on Amazon, my primary warehouse and my secondary warehouse. Shipping out of the secondary warehouse takes significantly longer than shipping from the main warehouse, hence why it is referred to as the "secondary" warehouse.
Some of our inventory is split between the two warehouses. Usually this is not an issue, but we keep having a particular issue. Allow me to explain:
Let's say that I have 10 red cups in the main warehouse, and an additional 300 in the secondary warehouse. Let's also say it's Christmas time, so I have all 310 listed. However, from what I've seen, Amazon only allows one shipping time to be listed for the inventory, so the entire 310 get listed as under the primary warehouse's shipping time (2 days) and doesn't account for the secondary warehouse's ship time, rather than split the way that they should be, 10 at 2 days and 300 at 15 days.
The problem comes in when someone orders an amount that would have to be split across the two warehouses, such as if someone were to order 12 of said red cups. The first 10 would come out of the primary warehouse, and the remaining two would come out of the secondary warehouse. Due to the secondary warehouse's shipping time, the remaining two cups would have to be shipped out at a significantly different date, but Amazon marks the entire order as needing to be shipped within those two days.
For a variety of reasons, it is not practical to keep all of one product in one warehouse, nor is it practical to increase the secondary warehouse's shipping time. Changing the overall shipping date for the product to the longest ship time causes us to lose the buy box for the listing, which really defeats the purpose of us trying to sell it.
So my question is this: is there some way in MWS to indicate that the inventory is split up in terms of shipping times? If so, how?
Any assistance in this matter would be appreciated.
Short answer: No.
There is no way to specify two values for FulfillmentLatency, in the same way as there is no way to specify two values for Quantity in stock. You can only ever have one inventory with them (plus FBA stock)
Longer answer: You could.
Sign up twice with Amazon:
"MySellerName" has an inventory of 10 and a fulfillment latency of 2 days
"MySellerName Overseas Warehouse" has an inventory of 300 and a fulfillment latency of 30 days
I haven't tried by I believe Amazon will then automatically direct the customer to the best seller for them, which should be "MySellerName" for small orders and "MySellerName Overseas Warehouse" for larger quantities.

Coldfusion 8 - Problems with indexing large data using verity

I am currently running coldfusion 8 with verity running on a K2 server. I am using a query to index several different columns with my table using cfindex. One of the columns is a large varchar type.
It seems that when the data is being indexed only the first 30KB is being stored, resulting in no results being brought back if I search for anything after that. I tried moving several different phrases and words further up in the data, within the 30KB and the results then appear.
I then carried out more verity tests using the browse command in the command prompt to see whats actually in the collection.
i.e. Coldfusion8\verity\collections\\parts browse 0000001.ddd
I found out that the body being indexed (CF_BODY) never exceeds the size of 32000.
Can anyone tell me if there is a fixed index size per document for verity?
Many thanks,
Richard
Punch line
Version 6 has operator limits:
up to 32 764 children in one "topic" for ANY operator
up to 64 children for NEAR
Exceeding these values doesn't necessarily give error message. When you search, you're certain you don't exceed them?
Source
Verity documentation, Appendix B: Query limits says there are two limitations: search time and operator's. Quote below is whole section telling about the latter, straight from the book.
Verity Query Language and Topic Guide, Version 6.0:
Note the following limits on the use of operators:
There can be a maximum of 32,764 children for the ANY operator. If a topic exceeds
this limit, the search engine does not always return an error message.
The NEAR operator can evaluate only 64 children. If a topic exceeds this limit, the
search engine does not return an error message.
For example, assume you have created a large topic that uses the ACCRUE operator with
8365 children. This topic exceeds the 1024 limit for any ACCRUE-class topic and the
16000/3 limit for the total number of nodes.
In this case, you cannot substitute ANY for ACCRUE, because that would cause the topic
to exceed the 8,000 limit for the maximum number of children for the ANY operator.
Instead, you can build a deeper tree structure by grouping topics and creating some
named subnodes.