Multimodal Deep Learning

The most common method in practice is to combine high-level embeddings from the different inputs by concatenating them and then applying a softmax.Example of Multimodal deep learning where different types of NN are used to extract featuresThe problem with this approach is that it would give an equal importance to all the sub-networks / modalities which is highly unlikely in real-life situations.All Modalities have an equal contribution towards predictionWeighted Combination of NetworksWe take a weighted combination of the subnetworks so that each input modality can have a learned contribution(Theta) towards the output prediction.Our optimization problem becomes -Loss Function after Theta weight is given to each sub-network.The output is predicted after attaching weights to the subnetworks.But the use of all this!!Let's get to the point where I start bragging about the results.Accuracy and InterpretabilityWe achieve state-of-the-art results in two real-life multimodal datasets -Multimodal Corpus of Sentiment Intensity(MOSI) dataset —Annotated dataset 417 of videos per-millisecond annotated audio features..There is a total of 2199 annotated data points where sentiment intensity is defined from strongly negative to strongly positive with a linear scale from −3 to +3.The modalities are -Text2..Audio3..SpeechAmount of contribution of each modality on sentiment predictionTranscription Start Site Prediction(TSS) dataset — Transcription is the first step of gene expression, in which a particular segment of DNA is copied into RNA (mRNA)..The transcription start site is the location where transcription starts..The different parts of the DNA fragment have different properties which affect its presence..We divided the TSS into three parts -Upstream DNADownstream DNATSS regionWe achieved an unprecedented improvement of 3% over the previous state-of-the-art results..The downstream DNA region with the TATA box has the most influence on the process.We also performed experiments on synthetically generated data to verify our theory.Now we are in the process of drafting a paper to be submitted in a ML journal..For checking state-of-the-art results on single modality follow you are interested to know about the mathematical details or scope of multimodal learning, in general, ping me on on the work are welcome.. More details

Leave a Reply