Introducing PyText

A high level view of the architecture of PyText clearly reveals how those stages are clearly encapsulated by native components of the framework.As illustrated in the previous figure, the architecture of PyText includes the following building blocks:· Task: Combines various components required for a training or inference task into a pipeline.· Data Handler: Processes raw input data and prepare batches of tensors to feed to the model.· Model: Defines the neural network architecture.· Optimizer: Encapsulates model parameter optimization using loss from forward pass of the model.· Metric Reporter: Implements the relevant metric computation and reporting for the models.· Trainer: Uses the data handler, model, loss and optimizer to train a model and perform model selection by validating against a holdout set.· Predictor: Uses the data handler and model for inference given a test dataset.· Exporter: Exports a trained PyTorch model to a Caffe2 graph using ONNX8.As you can see, PyText leverages the Open Neural Network Exchange Format(ONNX) to transition models from experimentation-friendly PyTorch to production-robust Caffe2 runtimes.PyText includes a large portfolio of NLP tasks such as text classification, word tagging, semantic parsing, and language modeling which streamline the implementation of NLP workflows..Similarly, PyText ventures in the area of language understanding by using contextual models such as a SeqNN model for intent labeling tasks and a Contextual Intent Slot model for joint training on multiple tasks.From an NLP workflow standpoint, PyText streamlines the process of transitioning an idea from experimentation to production..The typical workflow of a PyText application includes the following steps:1..Implement the model in PyText, and make sure offline metrics on the test set look good.2..Publish the model to the bundled PyTorch-based inference service, and do a real-time small scale evaluation on a live traffic sample.3..Export it automatically to a Caffe2 net..In some cases, e.g..when using complex control flow logic and custom data-structures, this might not yet be supported via PyTorch 1.0.4..If the procedure in 3 isn’t supported, use the Py-Torch C++ API9 to rewrite the model (only the torch.nn.Module10 subclass) and wrap it in a Caffe2 operator.5..Publish the model to the production-grade Caffe2 prediction service and start serving live trafficUsing PyTextGetting started with PyText is relatively simple..The framework can be installed as a typical python package.$ pip install pytext-nlpAfter that, we can train an NLP model using a task configuration.(pytext) $ cat demo/configs/docnn.json{ "task": { "DocClassificationTask": { "data_handler": { "train_path": "tests/data/train_data_tiny.tsv", "eval_path": "tests/data/test_data_tiny.tsv", "test_path": "tests/data/test_data_tiny.tsv" } } }}$ pytext train < demo/configs/docnn.jsonA Task is the central artifact for define the model artifacts in a PyText application..Every task has an embedded config that defines the relationships between the different components as shown in the following code.from word_tagging import ModelInputConfig, TargetConfigclass WordTaggingTask(Task): class Config(Task.Config): features: ModelInputConfig = ModelInputConfig() targets: TargetConfig = TargetConfig() data_handler: WordTaggingDataHandler.Config = WordTaggingDataHandler.Config() model: WordTaggingModel.Config = WordTaggingModel.Config() trainer: Trainer.Config = Trainer.Config() optimizer: OptimizerParams = OptimizerParams() scheduler: Optional[SchedulerParams] = SchedulerParams() metric_reporter: WordTaggingMetricReporter.Config = WordTaggingMetricReporter.Config() exporter: Optional[TextModelExporter.Config] = TextModelExporter.Config()After a model has been trained, we can evaluate the model and also exported to Caffe2.(pytext) $ pytext test < "$CONFIG"(pytext) $ pytext export –output-path exported_model.c2 < "$CONFIG"It is important to notice that PyText provides a very extensible architecture and each one of its key building blocks can be customized and extend it.PyText represents an important milestone in NLP development as one of the first frameworks that addresses the often-conflicting tradeoff between experimentation and production.. More details

Leave a Reply