Can Adobe Target be trusted?
First off, who am I and why am I asking this question?
I’m Dave, I’ve been involved in experimentation (online A/B tests) for the best part of 15 years. The company I work for want to explore using Adobe Target for A/B testing.
A wee bit of backstory:
Chance put me into a company you might have heard of, booking.com.
Little did I know back then what a revolutionary business experience I’d landed in, in many ways, with the culture of experimentation being at the core of it all.
Being a bit naive and still fresh-faced in the NL, I moved on to other companies with this new-found knowledge. I set out to spread this way of working in each and every role.
Pivotal down-the-rabbit-hole moment.
One, two skip a few (jobs) and fast forward to the present day where the love for the subject has grown, exponentially. If you’re just getting started on this topic, prepare yourself, it goes VERY deep.
When it comes to A/B testing tools, these things are essential to get right as many businesses don’t have enough knowledge to ask the right questions to discover if the tool that is meant to help them make critical decisions is wholly reliable.
Having used : VWO, Optimizely, Maximiser, Google Optimise, SiteSpect and Webtrends Optimise, I’ve racked up quite some experience and learned along the way about what questions to ask and some of the pitfalls these individual platforms have. From caps on traffic (go over and you’ll have to pay), to winners being called way too early, to delays in cron jobs of up to 12 hours to see the results (what if the test is killing conversion?) to bloat applications claiming to be useful UX insight generators (but with no means of exporting the data).
Of this list, I’d say SiteSpect is the tool of choice, my only qualm with it would be the intuitiveness of the interface, yet I’m told a lot of work has been done on that in recent years. (They are super good at customer support though)
Enter Adobe Target.
Am I biased? Of course. But opinions need to challenged, even our own, so aiming to enter this with as little bias as possible — why are we at this juncture?
This is the platform that our parent company is using (via an agency, another bugbear I have, a lot of CRO agencies don’t know what they are doing). I’ve been asked to provide some questions, being a ‘subject-matter-expert’ in the process of evaluating if the platform is suitable for our needs.
Having just witnessed Google Optimise fall into its grave (hurrah!) here we are faced with a platform that is rumoured to use the same algorithm GA uses to calculate sessions — HyperLogLog++ (Thanks to Georgi Georgiev)
Rumoured.
We have to know this as fact before making the decision to use a tool that has a fluctuation in traffic distribution by a +/-2%. As explained in the article above, this has a major impact on statistical significance (p-value). It quite literally is the difference between the true and false. But who’d know if we don’t get a definitive answer on if the tool uses this method of traffic distribution?
So we asked.
Does AT use hyperloglog++ algorithm for traffic distribution?
“Adobe Target leverages different methods for traffic distribution for different activity types. For an A/B test, Target will distribute traffic randomly based on the users identified traffic split in the A/B test setup. For example, if a customer has an A/B/C/D test by default Target will allocate 25% of the traffic to each experience, however the user has the ability to change that allocation if they so choose. If an Auto-Allocate test is chosen then Target will leverage the Thompson Sampling method to place more visitors in the better performing experiences. If an Auto-Target or Automated Personalization activity is chosen, then Target will leverage an ensemble of algorithms (Thompson Sampling and Random Forest) to match each visitor to the experience best suited for them. For more details on how Target works, see: https://experienceleague.adobe.com/docs/target/using/introduction/how-target-works.html?lang=en
The reply, is at best, an effort to not answer the question.
Information was given on how traffic can be spilt into percentages for A/B testing.
As far as we can tell, there are areas of AT that do work as expected BUT the A/B testing feature, alarm bells ring.
How do we arrive that this conclusion?
In a separate question, about SRM (sample ratio mismatching) we wanted to know if there was any notifications in place to warn us of experiments falling foul to this peril.
The response included a link to a 3rd party tool that can be added-on.
https://www.miaprova.com/blog/sample-ratio-mismatch-srm/
Quote from the website:
Automated Personalization, Experience Targeting, and Auto-Allocate Activity types have logic to promote differences in sample distribution across Activity variants. MiaProva provides this service for all A/B Activities, Multivariate Activities, and Adobe Recommendations.
So, from what can be deduced from the above, the traffic distribution in A/B activities, MVT and recommendations have an issue.
Now, we don’t know if it’s is attributable to HLL++ because that hasn’t been answered, but we do know it’s not to be trusted and requires external software to operate in a manner that could make the tool maybe trustworthy.
Does this sound like a representation of a solid RCT?
Put it this way, I’d not risk my business-decision making on it.