Would you buy insights from this guy? (How to assess and manage a Data Science vendor)

If the vendor evades detail by citing proprietary IP, there is no ‘There’ there.

Intellectual Property that is so basic that it can be stolen in a 15-minute dialogue is not very impressive.

Also, the specific algorithms proposed may or may not have relevance or value to your business problems.

A “neural network” is a real thing, quite well-defined over decades of research.

“Natural Language Processing” simply means “we extract information from text” and can refer to technology as simple as keyword matches.

“Cognitive” is an adjective.

A sound, technical approach is necessary, but of course not a sufficient condition for success.

For example, commercially successful fraud detection solutions have employed a wide variety of advanced algorithms, including anomaly detection, network analysis, graph theory, cluster analysis, number theory, decision trees, neural networks, linear programming, and Kalman filtering.

Figure 1 compares the performance of two real-time fraud solutions.

The incumbent solution (blue) combines expert rules with optimized decision trees.

The challenger solution (yellow) combines temporal signal processing, NLP, and neural networks.

(A third approach, using a cognitive linear programming solution failed to improve on the legacy solution and was dropped from the competition.

) The challenger solution has nearly double the statistical performance of the former, which directly translates into a 100% reduction in either fraud losses or operating costs.

All three approaches have credible analytical and theoretical foundations; the only way to resolve which approach is superior is an empirical test.

Figure 1: Performance Comparisons: don’t leave home without it.

Do they have an ROI or statistical performance comparison in their case studies? There really is no excuse for not being able to quote performance.

Data science implies a disciplined, empirical approach to business problems.

Performance and business benefit over the BAU practice can be directly calculated on the data, or if necessary, tested in a champion/ challenger live rollout.

Were the case studies performed on real, client data? This question should rightly raise alarm, but a surprising number of solutions on the market have never actually been tested on live data or were developed on ancillary or unrelated datasets.

To a data scientist, this is literally inconceivable, but solutions built on “synthetic data” are common in legacy software companies, as the focus historically has been on establishing a standardized API, rather than extracting value from the data.

Many firms do not even secure access their client data, so literally cannot validate whether their solution can deliver value.

Such systems are often essentially rules engines, and can severely limit the sophistication and value of downstream decisioning technologies.

Can they provide a reference site? A frustrating fact of life is that customers are not always willing to serve as a reference site.

Given the sensitivity of some projects, this is understandable.

However, if a promising vendor does not have an “Alpha” deployment and its technical approach and team seem credible, you have a unique opportunity to negotiate price.

Being a public reference site and data research rights are all assets that are traded for services.

Entering into a co-development agreement allows you to build out bespoke new functionality at a discount.

At many of the large consultancies, the Data Science teams have been running as a loss for many years, and they will be eager to publicly prove their bonafide as well as have access to a real dataset for research and product improvements.

 Ideally, a data science engagement should generate three deliverables: a diagnostic, proof of value, and an implementation plan.

A steering committee should be convened to review each deliverable in the course of the engagement.

Establish a steering/review committee.

 From the onset, the project should be overseen by a standing committee of the key stakeholders (typically the P&L owner, line of business or product owner, the operations lead, and an analytics lead).

Internal IT team leaders can be used to conduct due diligence, But IT departments typically do not have data science skills and can ‘cost’ a project out of existence (inflating the implementation cost estimates) no matter how trivial, if they don’t understand the mission or the technologies being used.

Analytics teams can actively or passive sabotage an objective test, by non-cooperation.

Another route is to engage a third-party advisor to conduct vendor due diligence.

Define business objectives and performance metrics.

 As much as practicable, the engineering objective should be defined by the business metrics (profit, revenue, costs, losses, incidence rates, conversion rates, and so on.


Clear metrics also simplify due diligence, establishing concrete client expectations and a ‘success criteria’ for the vendor.

Two examples of poor choices for proof-of-concept goals are predicting customer attrition or creating a customer segmentation.

Neither of these efforts has any direct business benefit.

(An attrition model just predicts you are going to lose certain customers, not what to do about it.

) Both use cases can be delivered quickly, with moderately-skilled analysts; hence, neither use case serves as a strong test of the vendor’s competence or technology.

Any business outcome or KPI can be used as a target, and if it can be measured, it can be predicted.

In a full data diagnostic, the information value of current and potential data sources can be measured against these metrics.

Even the value of an “Art of the Possible” POC can be simply and clearly stated in terms of cost reduction or revenue opportunities identified.

Schedule a “Go/No-Go” review early in the project.

 “No plan survives contact with the data,” an aphorism borrowed from Clausewitz, is rarely proven wrong.

Within 2–3 weeks of providing access to the data, an interim review should be scheduled to review the preliminary results.

By this time, the vendor should have been able to verify if the objective is supported by the data and provide a guarantee of minimal performance.

On the other hand, in the course of analysis, the vendor may have discovered and recommended alternative objectives and priorities.

This initial report, sometimes called a ‘Diagnostic,’ or ‘Sizing and Opportunity Analysis’ — in and of itself — should be viewed as a deliverable.

Often, deep empirical analyses of efficiencies, performance drivers, and root causes produce value-added recommendations of policies and processes that do not require a predictive analytic solution.

In this sense, such ‘actionable insights’ are a bonus, collateral benefits of a data science engagement.

While there is no guarantee that such ‘quick fixes’ exist, typically the benefits of implementing these recommendations can exceed the entire project cost.

After reviewing the results, the steering committee can then decide to abort the project, re-commit or reprioritize the objectives, and create a plan to exploit and test the learnings to date.

From this point on, there should be no doubt whether the approach will deliver value; there should only be questions regarding the magnitude of the benefits and implementation costs.

Last, beware of expensive implementation plans.

 Up front, sometimes only indicative implementation costs be estimated, as many of the constraints and data and infrastructure requirements are discovered or established in the process of building a prototype.

Whether the vendor is selling SaaS, a platform, on installing on-premises, reasonable provisions should be established early for the cost of expanded functionality, including the ability to install third-party and in-house solutions.

Some firms will try to recoup their costs at this stage, by overselling platforms and infrastructure.

Several industries heavily rely on decades-old decision engines.

Replacing these systems is an expensive proposition and often not necessary for data science delivery.

An infrastructure agnostic scoring engine can be used to create customer decisions, which in turn, can be ‘pushed’ into legacy decision engines, loaded as a table into a database, or fed into existing BI tools.

This minimally invasive approach, working in parallel with production data flows or systems of record, is both the fastest course to value and the lowest cost.

Enhancements and added functionality are relatively painless, as the Data Science delivery team has ongoing ownership of the engine.

The potential of data science continues to be diluted by ill-conceived initiatives and pretender practitioners.

The keys to success are to conduct rigorous due diligence, define the business problem, establish clear metrics, and run a proof of value.

There is gold in these hills, but be careful of whom you choose to prospect with.


Reposted with permission.

 Bio: Russell Anderson (Director, Transaction Analytics Advisory) has over 25 years’ experience developing data science solutions in the financial services, retail, e-commerce, and biomedical industries.

He has served as Scientific Advisor for several prominent analytics firms, including IBM, KPMG, Opera Solutions, NICE/Actimize, HCL, HNC Software, Mastercard Europe, JP Morgan Chase, and Halifax Bank of Scotland.

He has a Ph.


in Bioengineering from the University of California, has authored over 30 scientific publications, and holds several patents for commercial predictive solutions.

Related: var disqus_shortname = kdnuggets; (function() { var dsq = document.

createElement(script); dsq.

type = text/javascript; dsq.

async = true; dsq.

src = https://kdnuggets.



js; (document.

getElementsByTagName(head)[0] || document.


appendChild(dsq); })();.

. More details

Leave a Reply