Unsupervised Feature Selection for Time-Series Sensor Data with MSDA package

What is MSDA?

Why MSDA?

Basics Revisited

  1. Time series analysis.
  2. Identifying variation of each sensor column wrt time (increasing, decreasing, equal).
  3. Identifying how each column values varies wrt other column, and the maximum variation ratio between each column wrt other column.
  4. Relationship establishment with trend array to identify the most appropriate sensor.
  5. User can select window length and then check average value and standard deviation across each window for each sensor column.
  6. It provides count of growth/decay value for each sensor column values above or below a threshold value.
  7. Feature Engineering

EXAMPLE USECASE — Unsupervised Feature Selection

PCA Evaluation

  1. Import libraries
  • You can get the eigenvectors using pca.components_
  • eigenvalues using pca.explained_variance_
  • Percentage of variance explained by each of the selected components using pca.explained_variance_ratio_
['net_in', 'cpu_util_percent', 'mem_util_percent', 'cpu_util_percent']

IPCA Evaluation

  1. Import libraries
['net_in', 'cpu_util_percent', 'mem_util_percent', 'cpu_util_percent']

MSDA Evaluation

  1. Import libraries
Index(['timestamp', 'machine_id', 'cpu_util_percent', 'mem_util_percent','mem_gps', 'mkpi', 'net_in', 'net_out', 'disk_io_percent', 'Date', 'Time'],
dtype='object')
Max. Variation Involved in each Sensor Column values are:
Note: Inc-Increasing ; Dec-Decreasing ; Eq-Equal
For CPU UTIL PERCENT Column: Dec
For MEM UTILPERCENT Column: Eql
For NET IN Column: Eql
For NET OUT Column: Eql
For DISK IO Column: Eql
[['Eq' 'Inc' 'Dec' ... 'Eq' 'Eq' 'Eq']
['Eq' 'Inc' 'Inc' ... 'Eq' 'Eq' 'Eq']
['Eq' 'Inc' 'Inc' ... 'Eq' 'Eq' 'Eq']
...
['Eq' 'Inc' 'Dec' ... 'Eq' 'Eq' 'Eq']
['Eq' 'Inc' 'Dec' ... 'Eq' 'Eq' 'Inc']
['Eq' 'Inc' 'Inc' ... 'Eq' 'Eq' 'Eq']]
** Ratios of Variations Of Values of Each Sensor Column wrt other Sensor Column **
Note: Inc-Increasing ; Dec-Decreasing ; Eq-Equal
For Sensor Column:- cpu_util_percent
Ratio is: 0.6909658204509999
When Sensor Column 'cpu_util_percent' values are Eq , Sensor Column 'mem_util_percent' values are Eq
------------------------
For Sensor Column:- mem_util_percent
Ratio is: 1.0
When Sensor Column 'mem_util_percent' values are Eq , Sensor Column 'cpu_util_percent' values are Eq
------------------------
For Sensor Column:- net_in
Ratio is: 0.5092423319114361
When Sensor Column 'net_in' values are Inc , Sensor Column 'cpu_util_percent' values are Eq
------------------------
--------------------------------------------------------------------
** Avg. and Standard deviations for each Sensor Column **
Enter Time in Seconds for the Window: (Must be a Multiple of 2):20
Rate of Change of AVG Across Window For Sensor Column cpu_util_percent: 34.61231884057971
Rate of Change of AVG Across Window For Sensor Column mem_util_percent: 89.74639837819186
Rate of Change of AVG Across Window For Sensor Column net_in: 37.58424969806764
Count of Growth/Decay value for each Sensor Column Values above or below a threshold value:
{'cpu_util_percent': 19224, 'mem_util_percent': 8965, 'net_in': 864}

MSDA Conclusion

Most Important Features — Comparison of PCA, IPCA, MSDA

CONTACT

--

--

--

Data Science Manager; AWS Certified ML Specialist; AWS Certified Cloud Solution Architect; https://www.linkedin.com/in/ajay-arunachalam-4744581a/

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Train A Machine Learning Model Inside Docker Container

Hidden Markov Model & it’s applications

VLSI Cell Placement Techniques

Creating a Neural Network From Scratch in Python!

Audio files to dataset by feature extraction with librosa

Synthesis of Tabular Financial Data using Generative Algorithms

Top 3 Effective Feature Selection Strategies in Machine Learning

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Ajay Arunachalam

Ajay Arunachalam

Data Science Manager; AWS Certified ML Specialist; AWS Certified Cloud Solution Architect; https://www.linkedin.com/in/ajay-arunachalam-4744581a/

More from Medium

Time-series forecasting using ordinary Machine Learning algorithms

Time series clustering based on autocorrelation using Python

Univariate Time Series With Stacked LSTM, BiLSTM, and NeuralProphet

The framework in theory