Robert Lee: The importance of data governance and provenance have never been as front and center as it is today.
Regulations around data privacy, data sharing, and 3rd-party use of data have been thrust into the limelight with recent events, and there is a real demand from just a data management perspective for better solutions to tracking data access and usage.
These issues are magnified considerably with machine-learning because of the correctness, accuracy, and biases of a model are derived from the data used to train it, and not an explicit algorithm.
The only way to explain and defend a model for completeness and against bias is to be able to maintain provenance for the data used to train it – to show that a self-driving car was in fact trained on a sufficient set of data from low-light conditions; to show that a facial recognition model was trained on a sufficiently diverse set of races and ethnicities.
Solving this problem is rooted maintaining and archiving test data alongside rich enough metadata and data management catalogs to be able to recall and access those data as needed.
insideBIGDATA: What industries do you feel will make the best competitive use of AI, machine learning, and deep learning in the next year?.Pick one industry and describe how it will benefit from embracing or extending its embrace of these technologies.
Robert Lee: Financials, industrial/manufacturing, transportation and retail are among the industries most poised to effectively compete with AI/ML/DL in the next year.
As an example, fraud or anomaly detection is one commonly found in the finserv industry, but is also applicable in many security and network use-cases.
Fraud/anomaly detection is a great application for both supervised learning (where you can both provide pre-labelled training data of fraudulent and non-fraudulent activity) as well as unsupervised learning (where you may not be able to predict or supply examples ahead of time of all types of fraud to be detected).
Because most activities are (by definition) not anomalous, this makes unsupervised learning (essentially self-organizing or self-categorizing approaches) an effective approach.
And once identified, an anomaly can be added to a continuous retraining process to further refine the model.
This combination of supervised and unsupervised learning and continuous retraining will help applications in the fraud/anomaly detection space stay one step ahead – and give a fighting chance to identify new types of fraud or attacks before they’ve been fully understood by humans.
insideBIGDATA: As deep learning makes businesses innovate and improve with their AI and machine learning offerings, more specialized tooling and infrastructure will be needed to be hosted on the cloud.
What’s your view of enterprises seeking to improve their technological infrastructure and cloud hosting processes for supporting their AI, machine learning, and deep learning efforts?.Robert Lee: Deep learning continues to be a data-intensive process and that shows no sign of slowing.
Data at scale has gravity and is hard to move, so in general we see that you’ll train where your data is, and you’ll need fast access to that data wherever that is.
Software tooling and GPU hardware are available and fairly portable both on-premise and in the public cloud – the third piece of the puzzle (fast, abundant storage to feed the data) is often not given enough thought and planning, to the detriment of the speed of development.
Ultimately, we see customers at larger scales tending to have large pools of data on-premise, and ultimately finding better performance and economics training next to that data.
insideBIGDATA: How will AI-optimized hardware solve important compute and storage requirements for AI, machine learning, and deep learning?.Robert Lee: AI is highly performance oriented and any performance system is only as fast as its slowest link.
For AI, more performance = more iterations = ability to train and refine on more data = better results faster.
The nature of deep-learning benefits greatly from specialized compute hardware (GPUs) that can drive incredible parallelism for these specific types of calculations.
Modern GPUs (led by NVIDIA) have broken through previous CPU performance limitations, freeing them up to process data faster.
This creates a need for optimized storage (and networking) to be able to feed data quickly to those GPUs and keep them busy and extract all of the performance they are capable of.
Without optimized solutions that address each of these pillars (compute, storage and networking), you are potentially left with an unbalanced and underperforming system – like putting an F1 engine in a Toyota without changing the gearbox, tires, etc.
insideBIGDATA: What’s the most important role AI plays for your company’s mission statement?.How will you cultivate that role in 2019?.Robert Lee: AI aligns squarely into our core mission, which is to help innovators create a better world with data.
In 2018, we delivered AIRI, the first AI-ready converged compute/storage/networking solution, in partnership with NVIDIA.
We’ve helped many customers (from research institutions to government, financial services, social media, retail, and many others) deploy AI into production and we plan to continue evolving and building out solutions on this platform in 2019.
Additionally, as we talk to customers about how we can help them with productionizing AI projects, we also find customers who see the promise of AI but maybe aren’t prepared to go all-in on their first deep learning project.
For these customers we can help them today get prepared for that journey and begin with the end in mind – by building out better data management and cataloging practices, evolving their infrastructure to be prepared, and implementing and accelerating other parts of their analytics practice (such as data-warehousing or scale-out analytics).
Sign up for the free insideBIGDATA newsletter.
.