While all of this may seem complex and data science-led, we cannot underestimate the role of the domain expert.
While all errors are mathematically equal, some errors can be more damaging to the company’s finances and reputation than others.
Domain experts play a critical role in understanding the impact of these errors.
Domain experts also help layout the best practices in the industry, understand customer expectations and adhere to regulatory requirements.
For example, even if the chatbot is 100% confident that the user has asked for a renewal of a relatively inexpensive service, the call may need to be routed to a human for regulatory compliance purposes depending on the nature of the service.
Repeatability and Reproducibility: Consistency in Labeled Data for Accurate AI Systems One of the final steps is to have a relevant subset of data labeled by human experts in a consistent manner.
At the vast scale of Big Data, we are talking about obtaining labels for hundreds of thousands of samples.
This will need a huge team of human experts to provide the labels.
A more efficient way would be to sample the data in such a manner that only the most diverse set of samples are sent for labeling.
One of the best ways to do this is to use stratified sampling.
Domain experts will need to analyze which data dimensions get used for the stratification.
Consistency in human labels is trickier than it may seem at first.
If the existing automated techniques for label generation are 100% accurate, then there is no need for training any newer machine learning algorithms.
And hence, there is no need for human-labeled training samples (e.
g.
, we do not need manual transcription of speech if speech-to-text systems are 100% accurate).
At the same time, if there is no subjectivity in human labeling, then it is just a matter of tabulating the list of steps that the human expert has followed and automating those steps.
Almost all practical machine learning systems need training because they are not able to adequately capture the various nuances that humans apply in coming to a particular decision.
Thus, there will be a certain level of inherent subjectivity in the human labels that can’t be done away with.
The goal, however, should be to design label-capturing systems that minimize avenues for ‘extraneous’ subjectivity.
For example, if we are training a machine learning system to predict emotion from speech, the human labels will be generated by playing the speech signals and asking the human labeler to provide the predominant emotion.
One way to minimize extraneous subjectivity is to provide a drop-down of the possible emotion label options instead of letting the human labeler enter his/her inputs in a free flow text format.
Similarly, even before the first sample gets labeled, there should be a normalization exercise among the human experts where they agree on the interpretation of each label (e.
g.
, what is the difference between ‘sad’ and ‘angry’).
An objective way to check the subjectivity is ‘repeatability and reproducibility (R&R)’.
Repeatability measures the impact of temporal context on human decisions.
It is computed as follows: The same human expert is asked to label the same data sample at two different times The proportion of the times the expert agrees with themselves is called repeatability Reproducibility measures how consistently the labels can be replicated across experts.
It is computed as follows: Two or more human experts are asked to label the same data samples in the same setting The proportion of the times the experts agree among themselves is called reproducibility Conducting R&R evaluations on even a small scale of data can help identify process improvements as well as help gauge the complexity of the problem.
Active Learning for Efficient and More Accurate AI Systems Machine learning is typically ‘passive’.
This means that the machine doesn’t proactively ask for human labels on samples where it is most confusing.
Instead, the machines are trained on labeled samples that are fed to the training algorithms.
A relatively new branch of machine learning called Active Learning tries to address this.
It does so by: First training a relatively simple model with limited human labels, and then Proactively highlighting only those samples where the model’s prediction confidence is below a certain threshold The human labels are sought on priority for such ‘confusing samples’.
Diverse Data Science Team Composition is Critical for Success For all the pieces to come together, we need an “all-rounder” data science team: It is absolutely critical that the data science team has a healthy mix of data scientists who are trained to think in a data-driven manner.
They should also be able to connect the problem-at-hand with established machine learning frameworks The team needs Big Data engineers who have expertise in data pipelining and automation.
They should also understand, among other things, the various design factors that contribute to latency The team also needs domain experts.
They can truly guide the rest of the members and the machine to interpret the data in ways consistent with the end customer’s needs End Notes We covered quite a lot of ground here.
We discussed the nuances of translating a qualitative business requirement into tangible quantitative business requirements.
Reach out to me in the comments section below if you have any questions.
I would love to hear your experience on this topic.
In the third article of this series, we will discuss various deployment aspects as the data-driven product gets ready for real-world deployment.
So watch this space!.You can also read this article on Analytics Vidhyas Android APP Share this:Click to share on LinkedIn (Opens in new window)Click to share on Facebook (Opens in new window)Click to share on Twitter (Opens in new window)Click to share on Pocket (Opens in new window)Click to share on Reddit (Opens in new window) Related Articles (adsbygoogle = window.
adsbygoogle || []).
push({});.. More details