Photo: Benjamin O.Tayo.

The scatter pair diagram is a visualization of the relationships in the data set and is the first step in selecting an efficient property. It provides a qualitative analysis of the pairwise correlation between features and is a powerful tool for feature selection and dimension reduction. See the link for the couple with the sea pack: https://seaborn.pydata.org/generated/seaborn.pairplot.html

In this article, we analyze the stock stock to examine stocks that correlate strongly with the overall market. The portfolio contains 22 shares (see table 1) in areas such as healthcare, real estate, consumer discretion, energy, industry, telecommunications services, information technology, consumer products and the economy.

Symbol Name Symbol Name Symbol Name
AAL american airlines EDIT gave medicine UAL United Airlines
AAPL Apple HPP Hudson Pacific Properties WEN Wendy
ABT Abbott Laboratories JNJ Johnson & Johnson WFC Wells Fargo
BNTX BioNTech MRNA Modern WMT Walmart
BXP Boston Properties MRO Marathon Oil Corporation XOM Exxon Mobile
CCL Carnival Corporation PFE Pfizer SP500 Stock index
From Delta Airlines SLG SL Green Realty
DVN Devon Energy TSLA Tesla

Table 1. A portfolio of 22 shares from various sectors.

Our goal is to answer the question: which stocks in the portfolio are strongly correlated with the stock market? We use the S&P 500 index as a measure of the stock market. Assume that a correlation coefficient of 70% for the threshold is considered to be strongly correlated with the S&P 500.

Data collection and processing

Raw data was obtained from Yahoo Finance: https://finance.yahoo.com/

The historical data for each stock includes information on the daily open price, high price, low price and closing price. A CSV file was uploaded for each stock, and then the “close” column was extracted and merged to create the dataset, which can be found here: portfolio.csv

Create a scatter pair

import numpy as np
import pandas as pd
import pylab
import matplotlib.pyplot as plt
import seaborn as sns

url="https://raw.githubusercontent.com/bot13956/datasets/master/portfolio.csv"
data = pd.read_csv(url)
data.head()

cols = data.columns[1:24]
sns.pairplot(data[cols], height=2.0)



Calculate the covariance matrix

The scatter pair diagram is the first step that provides a qualitative analysis of the pairwise correlation between properties. To quantify the degree of correlation, the covariance matrix must be calculated.

from sklearn.preprocessing import StandardScaler
stdsc = StandardScaler()
X_std = stdsc.fit_transform(data[cols].iloc[:,range(0,23)].values)

cov_mat = np.cov(X_std.T, bias= True)

import seaborn as sns
plt.figure(figsize=(13,13))
sns.set(font_scale=1.2)
hm = sns.heatmap(cov_mat,
                 cbar=True,
                 annot=True,
                 square=True,
                 fmt=".2f",
                 annot_kws={'size': 12},
                 yticklabels=cols,
                 xticklabels=cols)
plt.title('Covariance matrix showing correlation coefficients')
plt.tight_layout()
plt.show()



Compressed output showing pairings and correlation coefficients

Since we are only interested in the correlation between the 22 stocks in the portfolio and the S&P 500, Figure 1 Below is the final result of our analysis.

Picture 1. Break the pairings and correlation coefficients between the stocks in the portfolio and the S&P 500.

Figure 1 shows that eight of the 22 shares have a correlation coefficient of less than 70%. Interestingly, with the exception of WEN stocks, all other stocks have a positive correlation with the S&P 500 index.

The covariance matrix is ​​shown in its entirety Figure 2.

Figure 2. Visualization of the covariance matrix.

In summary, how the scattered pair image can be used as the first option in selecting features. Other advanced methods for feature selection and dimension reduction include: PCA (Principal component analysis) and the LDA (Linear discriminative analysis).

Related to:

LEAVE A REPLY

Please enter your comment!
Please enter your name here