Time Series Feature Extraction for industrial big data (IIoT) applications

Time Series Feature Extraction for industrial big data (IIoT) applicationsFeature Extraction by Distributed and Parallel means for industrial big data applicationsSharmistha ChatterjeeBlockedUnblockFollowFollowingApr 9Motivation — Why Feature extraction is necessary?Feature extraction remains one of the most preliminary steps in machine learning algorithms to identify strong and weak relevant attributes.

While many feature extraction algorithms are used during Feature Engineering for standard classification and regression problems, the problem turns increasing difficult for time series classification and regression problems where each label or regression target is associated with several time series and meta-information simultaneously.

Trust me such scenarios are quite common with huge datasets obtained from industrial heavy manufacturing equipments, machinery, IoT which often go under maintenance or exibit production line optimization demonstrating different success and failure metrics in different time series.

The main objective in this blog remains to understand the procedures in extracting relevant features from multiple time series and model with real dataset.

The blog is structured as follows:Evaluate the importance of time series feature extraction for classification and regression problems.

Model a practical use case of robot execution failure rates for an industrial application.

Highlight the merits of the model in a distributed real-time environment.

What is Feature Extraction from Time series data and Why it is important?Feature extraction controls selecting the important and useful features, by eliminating redundant features and noise from the system, to yield the best predicted output.

In the context of time series data it aims to :Extract characteristics feature from time series, such as min, max, average, percentile or other mathematical derivations.

Consolidate feature extraction and selection process from distributed heterogeneous sources of information lying on different time-series scale for predicting the target output.

Allow time series clustering (un-supervised learning) from extracted features based on its relevancy.

The extracted relevant and non-relevant features can help us to identify new insights at time series properties and dimensions in both classification and regression modeling.

Industrial big data, IoT (IIoT), robotics and other information sources are known to generate large volume of variety information in huge velocity with variability (inconsistent) and veracity (imprecise).

Such data needs to be carefully merged and integrated for predicting the system state by analyzing and extracting the most meaningful features from different time intervals.

Feature Extraction and Selection ProcessThe following figures illustrate the steps involved in feature extraction and selection process.

While Feature extraction is used to combine existing features to produce a more useful one, Feature selection helps in selecting the most useful features to train on among existing features.

Fig 1 gives a detailed understanding of creating feature sets using mathematical operations from n-different time series, followed by the feature aggregation and feature significance/relevance tests to rank them and arrive at the final selected feature list.

Fig 2 gives a high level representation of the full process covering the feature engineering, ranking and testing process that can be used in a scalable distributed environment.

Fig 1 Source : https://arxiv.

org/pdf/1610.

07717.

pdfSource : https://arxiv.

org/pdf/1610.

07717.

pdfTsfresh and its usageI have used Tsfresh to model time series feature extraction and relevancy test.

Tsfresh is built as an efficient, scalable feature extraction algorithm for time series classification or regression problems.

The algorithm is built with a feature importance filter in the beginning of ML pipeline that extracts relevant features according to its importance scores.

Tsfresh’s algorithm can be summarized to:Establish feature mappings by considering additional features and meta-information.

Evaluate each individual feature vector independently through p-values quantifying its significance for predicting the target output.

Apply in the context of industrial processes — which may include, prediction of the life span of machines, prediction of the quality of steel billets during a continuous casting process, prediction of success and failure rates of robots and IIoT sensors.

Modeling a practical use-caseHere we model, robot execution failures rates obtained through sensor readings.

The execution status collected from 5 different data sources and merged have been known to fail due toFailures in approach to grasp position.

Failures in transfer of a part.

Position of part after a transfer failure.

Failures in approach to ungrasp position.

Failures in motion with partFurther, this dataset is multi-variate, composed of 6 different time series data.

Each feature is numeric, representing a force or a torque measured after failure detection; each failure instance is characterized in terms of 15 force/torque samples collected at regular time intervals starting immediately after failure detection; The total observation window for each failure instance was of 315 ms.

Installation : pip install tsfreshimport tsfresh.

examples.

robot_execution_failures as robotrobot.

download_robot_execution_failures()df_ts, y = robot.

load_robot_execution_failures()print(df_ts.

head())print(y.

tail())In order to isolate and separately view the successful actions from unsuccessful ones, two different plots are used with two different ids.

normal = df_ts[df_ts.

id == 3][['time', 'F_x', 'F_y', 'F_z', 'T_x', 'T_y', 'T_z']]normal[['time', 'F_x', 'F_y', 'F_z', 'T_x', 'T_y', 'T_z']].

plot(x="time", kind="line")plt.

title('Success example (id 3)')plt.

savefig('normal.

png')Successful robot actionfailure = df_ts[df_ts.

id == 20][['time', 'F_x', 'F_y', 'F_z', 'T_x', 'T_y', 'T_z']]failure[['time', 'F_x', 'F_y', 'F_z', 'T_x', 'T_y', 'T_z']].

plot(x="time", kind="line")plt.

title('Success example (id 20)')plt.

savefig('failure.

png')Failure robot actionIn the next step, the relevant feature set is extracted from the 6 different time-series and shown.

Some of the features include range counts, standard deviation, variance, auto-correlation, linear-trends, quantiles and change in quantiles.

from tsfresh import extract_relevant_featuresX = extract_relevant_features(df_ts, y, column_id='id', column_sort='time')print(X.

info())print(X.

head())Feature Significance and RelevanceFeature significance is determined by significance tests that help us to decide if the null hypothesis for a given feature can be rejected/accepted.

In the context of time series feature extraction, a wrongly added feature is a feature Xφ for which the null hypothesis H φ 0 is rejected by the respective feature significance test, even though H φ 0 is true.

The risk of such a false positive result are done by hypothesis tests tuned for individual features.

However, when comparing multiple hypotheses and features simultaneously, errors in the inference tend to accumulate.

In multiple testing, the expected proportion of erroneous rejections among all rejections is called false discovery rate (FDR), that acts as the last component in filtered feature extraction.

The p-values (known as feature selectors) helps to decide which hypotheses to reject while controlling the FDR.

For every feature, an univariate feature selection test is conducted which generate the p values, that are then evaluated by the Benjamini Hochberg procedure to decide which features to keep and which to delete.

FDR_LEVEL = 0.

05X = extract_features(df_ts, column_id='id', column_sort='time',X_selected = select_features(X, y)print(X.

shape)print(X_selected.

shape) default_fc_parameters=ComprehensiveFCParameters(), impute_function=impute)X = X.

loc[:, X.

apply(pd.

Series.

nunique) != 1]df_pvalues_mann = calculate_relevance_table(X, y, fdr_level=FDR_LEVEL, test_for_binary_target_real_feature='mann')print("Total ", len(df_pvalues_mann))print("Relevant ", (df_pvalues_mann["relevant"] == True).

sum())print(Irrelevant ", (df_pvalues_mann["relevant"] == False).

sum(), "( # constant", (df_pvalues_mann["type"] == "const").

sum(), ")")(88, 1968)(88, 631) # 631 features selected from 1968Total 1968Relevant 631Irrelevant 1337The relevant set of features with their p-score (using Mann-Whitney) list is given by:p-values of relevant featuresThe non-relevant set of features with their p-score (using Mann-Whitney) list is given by:p-values of non-relevant featuresMann-Whitney U test is the non-parametric alternative test to the independent sample t-test.

It is used to compare two sample means that come from the same population, and used to test whether two sample means are equal or not.

Mann-Whitney U test is used in tsfresh to calculate the feature significance of a real-valued feature to a binary target as a p-value.

m = len(df_pvalues_mann.

loc[~(df_pvalues_mann.

type == "const")])K = list(range(1, m + 1))C = [sum([1.

0 / i for i in range(1, k + 1)]) for k in K]rejection_line_mann = [defaults.

FDR_LEVEL * k / m * 1.

0 / c for k, c in zip(K, C)]df_pvalues_mann.

index = pd.

Series(range(0, len(df_pvalues_mann.

index)))df_pvalues_mann.

p_value.

where(df_pvalues_mann.

relevant) .

plot(style=".

", label="relevant features")df_pvalues_mann.

p_value.

where(~df_pvalues_mann.

relevant & (df_pvalues_mann.

type != "const")) .

plot(style=".

", label="irrelevant features")df_pvalues_mann.

p_value.

fillna(1).

where(df_pvalues_mann.

type == "const") .

plot(style=".

", label="irrelevant (constant) features")plt.

plot(rejection_line_mann, label="rejection line (FDR = " + str(FDR_LEVEL) + ")")plt.

xlabel("Feature #")plt.

ylabel("p-value")plt.

title("Mann-Whitney-U")plt.

legend()plt.

plot()When selecting, fdr_level is a hyper-parameter to tune.

It is the theoretical expected percentage of irrelevant features among all created features.

By default, it is set at 5%.

However, it might needs to be increased to 0.

9, depending on how the chosen classifier is able to deal with non-informative features.

Feature importance filter can also be applied with standard Classification or Regression algorithms.

Here I used RandomForestClassifier to evaluate feature importances.

clf = RandomForestClassifier()clf.

fit(X, y)importance = pd.

Series(clf.

feature_importances_)importance.

sort_values(ascending=False)print(importance)Extracted Features Coefficients representing its ImportancesPlotting the relative feature importances through a bar plot gives the following feature importance plot.

imp_frame = importance.

to_frame()imp_frame.

plot(kind="bar")plt.

xticks([])plt.

xlabel('Features')plt.

ylabel('Importances')plt.

title('Feature Importance Plot')plt.

savefig('importance.

png')Feature importance showing relevancy of feature with p-value coefficientsClassificationThe standard classifiers Logistic Regression, Boosting and Bagging can be applied on extracted feature set (X).

I have used RandomForestClassifier to illustrate the robot execution failure and success classes, after splitting the selected feature set (X) into train and test subsets.

from sklearn.

ensemble import RandomForestClassifierfrom sklearn.

metrics import classification_reporttesttrain_xy = train_test_split(X, y,test_size=.

4,random_state=50)X_train, X_test, y_train, y_test = testtrain_xyprint(X_train.

shape, X_test.

shape, y_train.

shape, y_test.

shape)(52, 631) (36, 631) (52,) (36,)The trained classifier is evaluated on the test data set to evaluate its performance through precision, recall and f1-score.

clf = RandomForestClassifier()clf.

fit(X_train, y_train)y_pred = clf.

predict(X_test)print(classification_report(y_test, y_pred))print(np.

sum(y_test == y_pred), y_test.

shape)Classification Report with RandomForestClassifierfrom sklearn.

model_selection import cross_val_predictpredicted = cross_val_predict(clf, X, y, cv=5)print(classification_report(y, predicted))Classification Report with 5-Fold Cross-validationFurther feature extraction and model classification can also be performed using Pipeline from scikit-learn, with FeatureAugmentor at the start followed by any classifier like RandomForest.

from sklearn.

pipeline import Pipelinefrom sklearn.

ensemble import RandomForestClassifierfrom tsfresh.

transformers import RelevantFeatureAugmenterIn the fit phase, all possible time series features are calculated that is set by the set_params function (if the features are not manually changed by handing in a feature_extraction_settings object).

Then, their significance and relevance to the target is computed using statistical methods and only the relevant ones are selected by applying suitable algorithmic (like Benjamini Hochberg) procedure.

In the transform step (fit_transform), the information on which features are relevant from the fit step is used and those features are extracted from the time series.

These extracted features are then added to the input data sample that is fed to the classifier.

pipeline = Pipeline([('augmenter', RelevantFeatureAugmenter(column_id='id', column_sort='time')), ('classifier', RandomForestClassifier())])X = pd.

DataFrame(index=y.

index)pipeline.

set_params(augmenter__timeseries_container=df_ts)pipeline.

fit(X_train, y_train)y_pred = pipeline.

predict(X_test)print(classification_report(y_test, y_pred))Precision, Recall, F-Score with Time series Feature Extraction in a Pipeline (RelevantFeatureAugmentor and RandomForestClassifier)ConclusionThe model can be used in distributed industrial big data applications for the following purposes:It is designed to allow consideration of several different time series types per label in addition to several meta-information.

The parallel distributed functionality helps feature extraction and filtering scalable.

Data is processed in distributed architecture for applications where data is fragmented over a widespread infrastructure limiting data aggregation and processing on centralized infrastructure.

It allows easy combination with domain specific and possibly stateful feature mappings from more specialized machine learning algorithms.

The algorithm helps to model different industrial big data use-cases with limited domain knowledge, and low computational complexity.

The model scales linearly with the number of samples and length of time series.

ReferencesChrist, Kempa-Liehr, Feindt: Distributed and parallel time series feature extraction for industrial big data applications, arXiv:1610.

07717Balancing Small Samples and Big Data : Andreas W.

Kempa-Liehrhttps://archive.

ics.

uci.

edu/ml/datasets/Robot+Execution+Failures.. More details

Leave a Reply