Correlation in Python

Correlation in Python

Correlation values range between -1 and 1.

There are two key components of a correlation value:

  • magnitude – The larger the magnitude (closer to 1 or -1), the stronger the correlation
  • sign – If negative, there is an inverse correlation. If positive, there is a regular correlation.

Positive Correlation

Let’s take a look at a positive correlation. Numpy implements a corrcoef() function that returns a matrix of correlations of x with x, x with y, y with x and y with y. We’re interested in the values of correlation of x with y (so position (1, 0) or (0, 1)).

import numpy as np

np.random.seed(1)

# 1000 random integers between 0 and 50
x = np.random.randint(0, 50, 1000)

# Positive Correlation with some noise
y = x + np.random.normal(0, 10, 1000)

np.corrcoef(x, y)

Out[1]:

array([[ 1. , 0.81543901],
[ 0.81543901, 1. ]])</pre>

This correlation is 0.815, a strong positive correlation, let’s take a look at a scatter chart.

In [2]:

import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
matplotlib.style.use('ggplot')

plt.scatter(x, y)
plt.show()

[图片上传中...(image-3a2a71-1546484758292-4)]

Negative Correlation

What happens to our correlation figure if we invert the correlation such that an increase in x results in a decrease in y?

In [3]:

># 1000 random integers between 0 and 50
x = np.random.randint(0, 50, 1000)

# Negative Correlation with some noise
y = 100 - x + np.random.normal(0, 5, 1000)

np.corrcoef(x, y)

Out[3]:

array([[ 1. , -0.94957116],
[-0.94957116, 1. ]])</pre>

Our correlation is now negative and close to 1. Let’s take a look at what this looks like graphically:

In [4]:

plt.scatter(x, y)
plt.show()

[图片上传中...(image-64b86c-1546484758291-3)]

No/Weak Correlatio

What if there is no correlation between x and y?

In [5]:

x = np.random.randint(0, 50, 1000)
y = np.random.randint(0, 50, 1000)

np.corrcoef(x, y)

Out[5]:

array([[ 1. , -0.00554681],
[-0.00554681, 1. ]])
Here we see a very small value for the correlation between x and y, indicating no correlation.

Again, let’s plot this and take a look, we see there is no correlation between x and y:

In [6]:

plt.scatter(x, y)
plt.show()

[图片上传中...(image-8ea283-1546484758291-2)]

Correlation Matrix

If we’re using pandas we can create a correlation matrix to view the correlations between different variables in a dataframe:

In [7]:

import pandas as pd

df = pd.DataFrame({'a': np.random.randint(0, 50, 1000)})
df['b'] = df['a'] + np.random.normal(0, 10, 1000) # positively correlated with 'a'
df['c'] = 100 - df['a'] + np.random.normal(0, 5, 1000) # negatively correlated with 'a'
df['d'] = np.random.randint(0, 50, 1000) # not correlated with 'a'

df.corr()

Out[7]:

a b c d
a 1.000000 0.825361 -0.948845 0.009802
b 0.825361 1.000000 -0.789391 0.011852
c -0.948845 -0.789391 1.000000 -0.003228
d 0.009802 0.011852 -0.003228 1.000000

We can also view these correlations graphically as a scatter matrix:

In [8]:

pd.scatter_matrix(df, figsize=(6, 6))
plt.show()

[图片上传中...(image-b6224b-1546484758291-1)]

Or we can directly plot a correlation matrix plot:

In [9]:

plt.matshow(df.corr())
plt.xticks(range(len(df.columns)), df.columns)
plt.yticks(range(len(df.columns)), df.columns)
plt.colorbar()
plt.show()

[图片上传中...(image-b20b7a-1546484758291-0)]

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 字符串 1.什么是字符串 使用单引号或者双引号括起来的字符集就是字符串。 引号中单独的符号、数字、字母等叫字符。 ...
    mango_2e17阅读 12,186评论 1 7
  • 《闭上眼睛才能看清楚自己》这本书是香海禅寺主持贤宗法师的人生体悟,修行心得及讲学录,此书从六个章节讲述了禅修是什么...
    宜均阅读 13,429评论 1 25
  • 前言 Google Play应用市场对于应用的targetSdkVersion有了更为严格的要求。从 2018 年...
    申国骏阅读 64,924评论 15 98
  • 第七章:理性的投资观 字数: 1.投资要围绕目的进行 投资的目的是为了挣钱。投资的除了金钱还有时间和精力也是一种投...
    幸福萍宝阅读 8,687评论 1 2
  • 本文转载自微信公众号“电子搬砖师”,原文链接 这篇文章会以特别形象通俗的方式讲讲什么是PID。 很多人看到网上写的...
    这个飞宏不太冷阅读 11,865评论 2 15

友情链接更多精彩内容