Get goal conversion rate by Time on Page by Page group? - regex

I want to answer this question:
Does the average time on page A (or more accurately page group A) affect the conversion rate of goal B?
So far in the GUI I have:
A) Created an advanced segment of Time on Page >= 120 ("per hit" option):
http://grab.by/tKOA
B) Modified the segment to also add a filter for Page = regex matching my group:
http://grab.by/tKOU
...But I don't know if this gives me the results I'm after; that is, if they are accurate
I have some other ideas, including assigning the page group as a funnel step and then segmenting by the Time on Page; still waiting on data to come in for that one
Wanting to know if there was a better solution or if I'm on the right track

Drewdavid,
Your approach is quite smart and correct, I would say, however keep in mind that in this context, you are mixing different scopes:
Time on Page is page-level metric
Page seen is visit-level dimension
What you would get in your report is the average time on page calculated from all the pages there were seen during visits which met the regex condition set in the filter (that's what segment does, it included all the pages, not just those that you want to filter). I know this can be confusing, but see this article that gives more examples and goes into greater detail.
To achieve what you are after, remove the segment filter and simply use the advanced filtering above the report table (and choose exactly the same regex you mentioned in your question).
Hope this helps!

Related

How to automatically feed a cell value from a range of values, based on its matching condition with other cell value

I'm making a time-spending tracker based on the work I do every hour of the day.
Now, suppose I have 28 types of work listed in my tracker (which I also have to increase from time to time), and I have about 8 significance values that I have decided to relate to these 28 types of work, predefined.
I want that, as soon as I enter a type of work in cell 1 - I want the adjacent cell 2 to get automatically populated with a significance value (from a range of 8 values) that is pre-definitely set by me.
Every time I input a new or old occurrence of a type of work, the adjacent cell should automatically get matched with its relevant significance value & automatically get populated in real-time.
I know how to do it using IF, IFS, and IF_OR conditions, but I feel that based on the ever-expanding types of work & significance values, the above formulas will be very big, complicated, and repetitive in the future. I feel there's a more efficient way to achieve it. Also, I don't want it to be selected from a drop-down list.
Guys, please help me out with the most efficient way to handle this. TUIA :)
Also, I've added a snapshot and a sample sheet describing the problem.
Sample sheet
XLOOKUP() may work. Try-
=XLOOKUP(D2,A2:A,B2:B)
Or FILTER() function like-
=FILTER(B2:B,A2:A=D2)
You can use this formula for a whole column:
=INDEX(IFERROR(VLOOKUP(C14:C,A2:B9,2,0)))
Adapt the ranges to your actual tables in order to include in the second argument all the potential values and their significances
This is the formula, that worked for me (for anybody's reference):
I created another reference sheet, stating the types of work & their significance. From that sheet, I'm using either vlookup, filter, xlookup.Using gforms for inputting my data.
=ARRAYFORMULA(IFS(ROW(D:D)=1,"Significance",A:A="","",TRUE,VLOOKUP(D:D,Reference!$A:$B,2,0)))

AWS GroundTruth text labeling - hide columns in the data, and checking quality of answers

I am new to SageMaker. I have a large csv dataset which I would like labelled:
sentence_id
sentence
pre_agreed_label
148392
A sentence
0
383294
Another sentence
1
For each sentence, I would like a) a yes/no binary classification in response to a question, and b) on a scale of 1-3, how obvious the classification was. I need the sentence id to map to other parts of the dataset, and will use the pre-agreed labels to assess accuracy.
I have identified SageMaker GroundTruth labelling jobs as a possible way to do this. Is this the best way? In trying to set it up I have run into a few problems.
The first problem is I can't find a way to display only the sentence column to the labellers, hiding the sentence_id and pre_agreed_labels.
The second is that there is either single labelling or multi labelling, but I would like a way to have two sets of single-selection labels:
Select one for binary classification:
Yes
No
Select one for difficulty of classification:
Easy
Medium
Hard
It seems as though this can be done using custom HTML, but I don't know how to do this - the template it gives you doesn't even render
Finally, having not used mechanical turk before, are there ways of ensuring people take the work seriously and don't just select random answers? I can see there's an option to have x number of people answer the same question, but is there also a way to put in an obvious question to which we already have a 'pre_agreed_label' every nth question, and kick people off the task if they get it wrong? There also appears to be a maximum of $1.20 per task which seems odd.

Optimizing / speeding up calculation time in Google Sheets

I have asked a few questions related to this personal project of mine already on this platform, and this should be the last one since I am so close to finishing. Below is the link to a mock example spreadsheet I've created, which mimics what my actual project does but it contains less sensitive information and is also smaller in size.
Mock Spreadsheet
Basic rundown of the spreadsheet:
Pulls data from a master schedule which is controlled/edited by another party into the Master Schedule tab.
In the columns adjacent to the imported data, an array formula expands the master schedule by classroom in case some of the time slots designate multiple rooms. Additional formulas adjust the date, start time, and end time to be capped within the current day's 24-hour period. The start time of each class is also made to be an hour earlier.
In the Room Schedule tab, an hourly calendar is created based on the room number in the first column, and only corresponds to the current day.
I have tested the spreadsheet extensively with multiple scenarios, and I'm happy with how everything works except for the calculation time. I figured the two volatile functions I use would take some processing time just by themselves, and I certainly didn't expect this to be lightning-fast especially without using a script, but the project that I am actually implementing this method for is much larger and takes a very long time to update. The purpose of this spreadsheet is to allow users to find an open room and "reserve" it by clicking the checkbox next to it (which will consequently color the entire row red) allowing everyone else to know that it is now taken.
I'd like to know if there is any way to optimize / speed up my spreadsheet, or to not update it every time a checkbox is clicked and instead update it "manually", similar to what OP is asking here. I am not familiar with Apps Script nor am I well-versed in writing code overall, but I am willing to learn - I just need a push in the right direction since I am going into this blind. I know the number of formulas in the Room Schedule tab is probably working against me yet I am so close to what I wanted the final product to be, so any help or insight is greatly appreciated!
Feel free to ask any questions if I didn't explain this well enough.
to speed up things you should avoid usage of the same formulae per each row and make use of arrayformulas. for example:
=IF(AND(TEXT(K3,"m/d")<>$A$1,(M3-L3)<0),K3+1,K3+0)
=ARRAYFORMULA(IF(K3:K<>"",
IF((TEXT(K3:K, "m/d")<>$A$1)*((M3:M-L3:L)<0), K3:K+1, K3:K+0), ))
=IF(AND(TEXT(K3,"m/d")=$A$1,(M3-L3)<0),TIMEVALUE("11:59:59 PM"),M3+0)
=ARRAYFORMULA(IF(K3:K<>"",
IF((TEXT(K3,"m/d")=$A$1)*((M3-L3)<0), TIMEVALUE("11:59:59 PM"), M3:M+0), ))

SAP JCo 3 RFC RSAQ_REMOTE_QUERY_CALL - unexpected results

We’re using JCo 3.0 to connect to RFCs and read data from SAP R/3. We use one RFC RFC_READ_TABLE often and use a second custom RFC to read employee information. My questions revolve around a third RFC RSAQ_REMOTE_QUERY_CALL. I'm calling an ad-hoc query I built in SAP using this RFC but I’m not getting the expected results. The main problem is that it appears that SAP is ignoring one of my selection criteria and using what was saved in SAP when I originally built it. The date criterion stored in my ad-hoc is 6/23/2013. If I pass in 6/28/2013 from JCo, I get the same results as if I had passed 6/23/2013 from JCo.
We have built several ad-hoc queries whose only criteria is a personnel number and call them successfully using RFC RSAQ_REMOTE_QUERY_CALL.
Background on my ad-hoc query: reporting period of today, joining together four aspects of an employee’s information: their latest action (hire, rehire, etc.), organization (e.g. company), pay (e.g. pay scale level) and communication (e.g. email). The query will run every workday.
Here are my questions:
My ad-hoc has three selection criteria. The first two are simple strings. The third is a date. The date will vary each time the query runs. We are referencing the first criteria using SP$00001, the second with SP$00002 and the third with SP$00003. The order of the criteria changes from the ad-hoc to SQ01 (what was SP$00001 in the ad-hoc is now SP$00003). Shouldn’t we reference them in the order defined in the ad-hoc (e.g. SP$00001)?
The two simple string selections are using OPTION “EQ”. The date criteria is using OPTION GT (greater than). Is “GT” correct?
We have some limited accessibility to SAP. Is there a way to see which SP$ parameters are mapped to which criteria?
If my ad-hoc was saved with five criteria but four of them never change when I call the ad-hoc from JCo, do I just need to set the value of the one or do I need to set the other four as well?
Do I have to call this ad-hoc using a variant (function.getImportParameterList().setValue(“VARIANT”, “VARIANT_NAME”))?
Does the Reporting Period have an impact on the date criteria? I have tried changing the Reporting Period to be PNPBEGDA = today and PNPENDDA = today and noticed no change.
Is there a way in SAP to get a “declaration” of your ad-hoc (name, inputs, outputs, criteria)? I have looked at JCoFunction.toXml() and JCoFunctionTemplate. These are good if you want to see something at runtime before it goes to SAP, but I’m looking for something I can use on the JCo end to help me write Java code that matches the ad-hoc.
I have looked at length on the web for answers to my questions and have not found anything that is useful. If there is anything which would help me, please let me know.
Thanks,
LM
Since I don't know much about SQnn, I won't be able to answer all of your questions...
I don't know, sorry.
It should be, at least it's the usual operator for greater than.
Yes - set an external breakpoint right inside the function module and trace its execution while performing the RFC call. Warning: At least basic ABAP knowledge required.
I don't know, sorry.
I don't know either, sorry.
That would depend on the query, I suspect...
JCo won't be able to help you out there - it doesn't know about queries, it only knows function modules. There might be other RSAQ_* function modules to get that information though.
I played with setting up a variant in SQ01 for my query. I added some settings in the variant that solved my problem and answered several of my questions in my post. The main thing I did was add a dynamically calculated date as part of my criteria. Here's how:
1. In SQ01, access menu "Go To" -> "Maintain Variants".
2. Choose your variant and in subobjects, choose "Attributes" and click "Change".
3. In the displayed list, find your date criterion.
4. Choose "D" in Selection Variable, choose a comparison option (mine was GT for greater than), and a "Name of a Variable" (really, this is the type of dynamic date calculation you need).
5. Go back to the Subobjects panel, choose "Values" and click "Change".
6. Enter any other criteria you need in the "Program selections" section.
7. Save the variant.
By doing this, I don't need to pass anything into the query from JCo. Also, SAP will automatically update the date criteria you entered in step #4 above.
So to to answer my questions from my original post:
1 and 4. It doesn't matter because I'm no longer passing anything in from JCo.
2. "GT" is Greater Than.
3 and 7. If anyone knows, I'd really like to find out.
5. Use the name you as it is in SAP (step #2 above).
6. I still don't know, but it's not holding me up.
I'm posting this in case anyone out there needs this type of information. Thanks to Esti and vwegert for helping me out.

Collaborative Filtering: Ways to determine implicit scores for products for each user?

Having implemented an algorithm to recommend products with some success, I'm now looking at ways to calculate the initial input data for this algorithm.
My objective is to calculate a score for each product that a user has some sort of history with.
The data I am currently collecting:
User order history
Product pageview history for both anonymous and registered users
All of this data is timestamped.
What I'm looking for
There are a couple of things I'm looking for suggestions on, and ideally this question should be treated more for discussion rather than aiming for a single 'right' answer.
Any additional data I can collect for a user that can directly imply an interest in a product
Algorithms/equations for turning this data into scores for each product
What I'm NOT looking for
Just to avoid this question being derailed with the wrong kind of answers, here is what I'm doing once I have this data for each user:
Generating a number of user clusters (21 at the moment) using the k-means clustering algorithm, using the pearsons coefficient for the distance score
For each user (on demand) calculating their a graph of similar users by looking for their most and least similar users within their cluster, and repeating for an arbitrary depth.
Calculating a score for each product based on the preferences of other users within the user's graph
Sorting the scores to return a list of recommendations
Basically, I'm not looking for ideas on what to do once I have the input data (I may need further help with that later, but it's not the point of this question), just for ideas on how to generate this input data in the first place
Here's a haymaker of a response:
time spent looking at a product
semantic interpretation of comments left about the product
make a discussion page about a product, brand, or product category and semantically interpret the comments
if they Shared a product page (email, del.icio.us, etc.)
browser (mobile might make them spend less time on the page vis-à-vis laptop while indicating great interest) and connection speed (affects amt. of time spent on the page)
facebook profile similarity
heatmap data (e.g. à la kissmetrics)
What kind of products are you selling? That might help us answer you better. (Since this is an old question, I am addressing both #Andrew Ingram and anyone else who has the same question and found this thread through search.)
You can allow users to explicitly state their preferences, the way netflix allows users to assign stars.
You can assign a positive numeric value for all the stuff they bought, since you say you do have their purchase history. Assign zero for stuff they didn't buy
You could do some sort of weighted value for stuff they bought, adjusted for what's popular. (if nearly everybody bought a product, it doesn't tell you much about a person that they also bought it) See "term frequency–inverse document frequency"
You could also assign some lesser numeric value for items that users looked at but did not buy.