Embeddings-free Deep Learning NLP model

It may not be feasible sometimes and therefore Ravi S.

and Kozareva Z.

introduced an embeddings-free deep learning model.

On-DeviceWe can easily allocate 1 GB memory with 16 CPU in cloud or on-premise to deploy our sexy model.

We do not need to sacrifice model accuracy to reduce model footprint and causing lots of mega model in most of our enterprise grade infrastructure.

Sometimes, we have no choice that deploying model to device rather than leveraging cloud infrastructure.

Reasons can beSensitive Data: Data may not able to send out from device to cloudNetwork: High speed network may not be covered.

Therefore, a very small footprint model is needed if we deploy model to devices such as Smart Swatch or IoT device.

No one want to load a 1 GB model in your Android Wear OS.

The challenges when deploying model to device:Small memory footprintLimited storageLow computational capacitySelf-Governing Neural NetworksSince the target is deploying model to small device.

It cannot be high resource demanding.

Therefore, the objectives of SGNN are:Tiny memory footprint: No initialisation on loading pre-trained embeddings.

On-the-fry: Transform incoming text to low dimensional features real-time.

Projection Neural NetworkInstead of using original neural network with high footprint, Ravi S.

and Kozareva Z.

leverage projection neural network model architecture to reduce memory and computation consumption.

The idea is training two neural network (i.

e.

Trainer Network and Projection Network) in training phase.

Whole network optimize trainer network loss, projection network loss and projection loss.

Projection Neural Network Architecture (Ravi S.

 2017)Model ArchitectureSGNN Model Architecture (Ravi S.

and Kozareva Z.

, 2018)On-the-fry Computation: Transforming (function F and P) text to intermediate result and project layer in real time without looking up pre-defined vectors.

Hash Function Projection: Reducing high dimensions feature to low dimension features via modified version of Locality Sensitive Hashing (LSH) which allowing to project similar inputs to same bucket.

Model Optimization: Using binary feature (either 0 or 1) in projection layer to achieve very low memory footprint.

ExperimentRavi S.

and Kozareva Z.

evaluated the SGNN via Switchboard Dialog Act Corpus (SwDA) and ICSI Meeting Recorder Dialog Act Corpus (MRDA) with a very good results.

SwDA Data Result (Ravi S.

and Kozareva Z.

, 2018)MRDA Data Result (Ravi S.

and Kozareva Z.

, 2018)Take AwayMore and more complex model architecture is released to achieve state-of-the-art results in different disciplines.

However, it may not fit into small device such as IoT device or mobile due to limited resource in those devices.

Accuracy is not the only one concern when delivering an amazing model.

Speed, model complexity have to been considered in some scenarios.

We may need to sacrifice accuracy to have a light weight model.

About MeI am Data Scientist in Bay Area.

Focusing on state-of-the-art in Data Science, Artificial Intelligence , especially in NLP and platform related.

You can reach me from Medium Blog, LinkedIn or Github.

ReferenceRavi S.

2017.

ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections.

Ravi S.

and Kozareva Z.

, 2018.

Self-Governing Neural Networks for On-Device Short Text Classification.

.

. More details

Leave a Reply