# What is MSDA?

MSDA is an open-source multidimensional multi-sensor data analysis framework, written in Python.

# Why MSDA?

A simple & intuitive Python package that makes it easier to explore, plot, and visualize time-series multidimensional multi-sensor data aimed towards appropriate feature/sensor selection tasks be it the unsupervised/supervised.

# Basics Revisited

Before we dwell into the usability of the package, let’s understand a few basic concepts in simple layman terms. Also, before going into deeper understanding of PCA lets first discuss a few important concepts of linear algebra.

1. Time series analysis.
2. Identifying variation of each sensor column wrt time (increasing, decreasing, equal).
3. Identifying how each column values varies wrt other column, and the maximum variation ratio between each column wrt other column.
4. Relationship establishment with trend array to identify the most appropriate sensor.
5. User can select window length and then check average value and standard deviation across each window for each sensor column.
6. It provides count of growth/decay value for each sensor column values above or below a threshold value.
7. Feature Engineering

# EXAMPLE USECASE — Unsupervised Feature Selection

High-dimensional is very hard to process and visualize. Therefore reducing the dimensions of the data by extracting the important features (lesser than the overall number of features) which are enough to cover the variations in the data can help in the reduction of the data size and in turn for processing.

# PCA Evaluation

Steps:-

1. Import libraries
• You can get the eigenvectors using `pca.components_`
• eigenvalues using `pca.explained_variance_`
• Percentage of variance explained by each of the selected components using `pca.explained_variance_ratio_`
`['net_in', 'cpu_util_percent', 'mem_util_percent', 'cpu_util_percent']`

# IPCA Evaluation

Steps:-

1. Import libraries
`['net_in', 'cpu_util_percent', 'mem_util_percent', 'cpu_util_percent']`

# MSDA Evaluation

Note:- Here, I am explicitly taking you through each of the available algorithms in the module without showing them being used directly from the package. For using as a package, follow the demo tutorial as shown here

1. Import libraries
`Index(['timestamp', 'machine_id', 'cpu_util_percent', 'mem_util_percent','mem_gps', 'mkpi', 'net_in', 'net_out', 'disk_io_percent', 'Date', 'Time'],      dtype='object')`
`Max. Variation Involved in each Sensor Column values are:Note: Inc-Increasing ; Dec-Decreasing ; Eq-Equal For CPU UTIL PERCENT Column: DecFor MEM UTILPERCENT Column: EqlFor NET IN Column: EqlFor NET OUT Column: EqlFor DISK IO Column: Eql`
`[['Eq' 'Inc' 'Dec' ... 'Eq' 'Eq' 'Eq'] ['Eq' 'Inc' 'Inc' ... 'Eq' 'Eq' 'Eq'] ['Eq' 'Inc' 'Inc' ... 'Eq' 'Eq' 'Eq'] ... ['Eq' 'Inc' 'Dec' ... 'Eq' 'Eq' 'Eq'] ['Eq' 'Inc' 'Dec' ... 'Eq' 'Eq' 'Inc'] ['Eq' 'Inc' 'Inc' ... 'Eq' 'Eq' 'Eq']]`
`** Ratios of Variations Of Values of Each Sensor Column wrt other Sensor Column ** Note: Inc-Increasing ; Dec-Decreasing ; Eq-Equal For Sensor Column:- cpu_util_percentRatio is: 0.6909658204509999When Sensor Column 'cpu_util_percent' values are Eq , Sensor Column 'mem_util_percent' values are Eq------------------------For Sensor Column:- mem_util_percentRatio is: 1.0When Sensor Column 'mem_util_percent' values are Eq , Sensor Column 'cpu_util_percent' values are Eq------------------------For Sensor Column:- net_inRatio is: 0.5092423319114361When Sensor Column 'net_in' values are Inc , Sensor Column 'cpu_util_percent' values are Eq------------------------`
`--------------------------------------------------------------------** Avg. and Standard deviations for each Sensor Column **Enter Time in Seconds for the Window: (Must be a Multiple of 2):20Rate of Change of AVG Across Window For Sensor Column cpu_util_percent: 34.61231884057971Rate of Change of AVG Across Window For Sensor Column mem_util_percent: 89.74639837819186Rate of Change of AVG Across Window For Sensor Column net_in: 37.58424969806764`
`Count of Growth/Decay value for each Sensor Column Values above or below a threshold value: {'cpu_util_percent': 19224, 'mem_util_percent': 8965, 'net_in': 864}`

# MSDA Conclusion

The plots show each sensor value and features with correlation (slope) are provided.

# Most Important Features — Comparison of PCA, IPCA, MSDA

# The top-n variables in the order of importance using the different approaches are given below.

# CONTACT

You can reach me at ajay.arunachalam08@gmail.com

--

--