Panel Data Analysis: A Survey On Model-Based Clustering Of Time Series - Statswork

The clustering technique in Statistical Analysis is used to determine the subsets as clusters in the data using the specified distance measure. However, this technique cannot be applied easily for longitudinal or time-series data. In this blog, I will discuss some of the methods used for modeling longitudinal or panel data using the Clustering Analysis technique as explained in Schmatter (2011).

Longitudinal data is actually a sample of observations which are measured repeatedly over time. And, nowadays, longitudinal/repeated measure data or panel data exists in all areas of Applied statistics such as finance, psychology, economics, and social sciences. Most studies deals with analyzing homogeneity in such Time series data (Diggle et al 2002), however, there are few researchers’ shows interest in analyzing the heterogeneity in such data and they proposed different modeling technique for the same.
Let us now discuss the applicability of the model-based clustering technique by means of an example as discussed in Schmatter (2011). The data consists of 237 teenagers who use marijuana for the year 1976–1980. The use of marijuana is categorized into three types as never, not more than once a month and more than once a month. This gives the idea that the data contains the categorical variables in this study. The following figure represents the sample of 10 observed responses to the use of marijuana usage among the 237 teenagers.
To sum up, the model-based clustering technique along with the Bayesian flavor yields better results since it provides an answer to the most troublesome problems in the cluster analysis. In longitudinal or Panel data studies, usage of euclidean distance may be a valid one and hence a kernel-based clustering for Time series data Analysis is considered and selection of the best method is analyzed using different information criteria. In addition to the illustration explained in this paper, an MCMC simulation is carried out to find the optimal clustering methodology. However, this may not be taken as granted for all applications, and a more appropriate method concerning the prior distribution and the choice of kernel is needed in analyzing a time series panel data.

Comments

Popular posts from this blog

Foundations Of Public Policy Research And Primary Data Collection Methods — Statswork

5 Methods For Data Collection and Analysis

Here are some examples of real-world statistical analyses that don’t use p-values and significance testing