Photo: Benjamin O.Tayo.
The scatter pair diagram is a visualization of the relationships in the data set and is the first step in selecting an efficient property. It provides a qualitative analysis of the pairwise correlation between features and is a powerful tool for feature selection and dimension reduction. See the link for the couple with the sea pack: https://seaborn.pydata.org/generated/seaborn.pairplot.html
In this article, we analyze the stock stock to examine stocks that correlate strongly with the overall market. The portfolio contains 22 shares (see table 1) in areas such as healthcare, real estate, consumer discretion, energy, industry, telecommunications services, information technology, consumer products and the economy.
|AAL||american airlines||EDIT||gave medicine||UAL||United Airlines|
|AAPL||Apple||HPP||Hudson Pacific Properties||WEN||Wendy|
|ABT||Abbott Laboratories||JNJ||Johnson & Johnson||WFC||Wells Fargo|
|BXP||Boston Properties||MRO||Marathon Oil Corporation||XOM||Exxon Mobile|
|CCL||Carnival Corporation||PFE||Pfizer||SP500||Stock index|
|From||Delta Airlines||SLG||SL Green Realty|
Table 1. A portfolio of 22 shares from various sectors.
Our goal is to answer the question: which stocks in the portfolio are strongly correlated with the stock market? We use the S&P 500 index as a measure of the stock market. Assume that a correlation coefficient of 70% for the threshold is considered to be strongly correlated with the S&P 500.
Data collection and processing
Raw data was obtained from Yahoo Finance: https://finance.yahoo.com/
The historical data for each stock includes information on the daily open price, high price, low price and closing price. A CSV file was uploaded for each stock, and then the “close” column was extracted and merged to create the dataset, which can be found here: portfolio.csv
Create a scatter pair
Calculate the covariance matrix
The scatter pair diagram is the first step that provides a qualitative analysis of the pairwise correlation between properties. To quantify the degree of correlation, the covariance matrix must be calculated.
Compressed output showing pairings and correlation coefficients
Since we are only interested in the correlation between the 22 stocks in the portfolio and the S&P 500, Figure 1 Below is the final result of our analysis.
Picture 1. Break the pairings and correlation coefficients between the stocks in the portfolio and the S&P 500.
Figure 1 shows that eight of the 22 shares have a correlation coefficient of less than 70%. Interestingly, with the exception of WEN stocks, all other stocks have a positive correlation with the S&P 500 index.
The covariance matrix is shown in its entirety Figure 2.
Figure 2. Visualization of the covariance matrix.
In summary, how the scattered pair image can be used as the first option in selecting features. Other advanced methods for feature selection and dimension reduction include: PCA (Principal component analysis) and the LDA (Linear discriminative analysis).