[02] Bar Plot

Bar Plot¶

In [1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

barh 와 bar color 주기¶

# fig = plt.figure(12,7)
# axes = fig.subplots(1,2)
fig, axes = plt.subplots(1,2, figsize=(12,7))

x = list('ABCED')
y = list(range(1,6))

clist = ['tomato', 'g', 'r', 'm', 'b']

axes[0].bar(x,y, color = clist) # 리스트로 개별 막대 색 주기
axes[1].barh(x,y, color = 'k')  # 모든 막대 색

plt.show()

데이터 분석 해보기¶

In [3]:

data = pd.read_csv('./StudentsPerformance.csv')
data.info()
print(data.shape)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   gender                       1000 non-null   object
 1   race/ethnicity               1000 non-null   object
 2   parental level of education  1000 non-null   object
 3   lunch                        1000 non-null   object
 4   test preparation course      1000 non-null   object
 5   math score                   1000 non-null   int64 
 6   reading score                1000 non-null   int64 
 7   writing score                1000 non-null   int64 
dtypes: int64(3), object(5)
memory usage: 62.6+ KB
(1000, 8)

In [4]:

data.head(5)

Out[4]:

	gender	race/ethnicity	parental level of education	lunch	test preparation course	math score	reading score	writing score
0	female	group B	bachelor's degree	standard	none	72	72	74
1	female	group C	some college	standard	completed	69	90	88
2	female	group B	master's degree	standard	none	90	95	93
3	male	group A	associate's degree	free/reduced	none	47	57	44
4	male	group C	some college	standard	none	76	78	75

성별 별 race/ethnity 그룹 수 구하기¶

In [5]:

group = data.groupby('gender')['race/ethnicity'].value_counts().sort_index()
# Series기 때문에 sort_index 사용. sort_values 해서 오류.. 
group

Out[5]:

gender  race/ethnicity
female  group A            36
        group B           104
        group C           180
        group D           129
        group E            69
male    group A            53
        group B            86
        group C           139
        group D           133
        group E            71
Name: race/ethnicity, dtype: int64

In [6]:

fig, axex = plt.subplots(1, 2, figsize=(15,7))
axex[0].bar(group['male'].index, group['male'], color = 'royalblue')
axex[1].bar(group['female'].index, group['female'], color = 'tomato')

# group['male'] 은 pandas 시리즈인데도 가능
# group.male.values 로 array 형태로만 넣었었는데 시리즈도 가능했음

plt.show()

두 그래프를 비교할 때 세로축 범위가 다르므로 비교가 어려움

In [7]:

# 방법 1 : sharey 파라미터 사용
fig, axes = plt.subplots(1, 4, figsize=(15,7), sharey=True)
axes[0].bar(group['male'].index, group['male'], color = 'royalblue')
axes[1].bar(group['female'].index, group['female'], color = 'tomato')

# 방법 2  : 반복문 set_ylim
axes[2].bar(group['male'].index, group['male'], color = 'royalblue')
axes[3].bar(group['female'].index, group['female'], color = 'tomato')

for ax in axes:
    ax.set_ylim(0,200)
plt.show()

Stacked Bar Plot¶

group 수를 bar plot 하는데 남 여 합쳐서 보여주고싶을때 bottom 파라미터를 이용해 stack 하거나 alpha 파라미터로 투명도 조정하여 겹쳐서 plot

In [8]:

data.head(3)
data['race/ethnicity'].value_counts().sort_index()

Out[8]:

group A     89
group B    190
group C    319
group D    262
group E    140
Name: race/ethnicity, dtype: int64

bottom 파라미터를 이용. 아래공간을 비워둔다.

In [9]:

group_cnt = data['race/ethnicity'].value_counts().sort_index()
fig, axes = plt.subplots(1,2, figsize= (12,7), sharey=True)
axes[0].bar(group_cnt.index,group_cnt,color='darkgray')
axes[1].bar(group.male.index, group.male, color='blue')
axes[1].bar(group['female'].index, group['female'], bottom=group['male'], color='red')

plt.show()

In [13]:

fig, ax = plt.subplots(1, 1, figsize=(12,7))

ax.barh(group.male.index, group.male.values / group_cnt.values)
ax.barh(group.female.index, group.female.values / group_cnt.values, left=group.male.values / group_cnt.values)

for s in ['top', 'bottom', 'left', 'right']:
    ax.spines[s].set_visible(False)

plt.show()

ax.spines 알기¶

spines는 축 커스터마이징 할때 사용 dict 형태로 되어있으며 key는 top, bottom, left, right tick_params 와 보통 같이 쓰임

Method

set_visible(False)
set_position('cender') or set_position('data', 1)
set_linewidth(2)
set_alpha(0.5) #투명도
set_color('navy')

Grouped Bar Plot¶

x축 -> width -> xticks, xticklabels¶

x축 조정법¶

2개 : -1/2, +1/2
3개 : -1, 0, +1 (-2/2, 0, +2/2)
4개 : -3/2, -1/2, +1/2, +3/2
공식. index i 구하기
$x+\frac{-N+1+2\times i}{2}\times width$

In [17]:

fig, ax = plt.subplots(1, 1, figsize=(12, 7))

idx = np.arange(len(group['male'].index))
width=0.35

ax.bar(idx-width/2, group['male'], 
       color='royalblue',
       width=width,
       label='Male')

ax.bar(idx+width/2, group['female'], 
       color='tomato',
       width=width,
       label='female')

ax.set_xticks(idx)
ax.set_xticklabels(group['male'].index)
ax.legend()

plt.show()

수치 text 추가하기¶

In [19]:

fig, axes = plt.subplots(1, 2, figsize=(15, 7))

for ax in axes:
    ax.bar(group_cnt.index, group_cnt,
           width=0.7,
           edgecolor='black',
           linewidth=2,
           color='royalblue',
           zorder=10
          )

    ax.margins(0.1, 0.1)

    for s in ['top', 'right']:
        ax.spines[s].set_visible(False)

axes[1].grid(zorder=0)

for idx, value in zip(group_cnt.index, group_cnt):
    axes[1].text(idx, value+5, s=value,
                 ha='center', 
                 fontweight='bold'
                )

errorabar 사용하기¶

먼저 pandas aggregation 함수 복습¶

mean(): Compute mean of groups

sum(): Compute sum of group values

size(): Compute group sizes

count(): Compute count of group

std(): Standard deviation of groups 표준편차

var(): Compute variance of groups 분산

sem(): Standard error of the mean of groups 표준오차

describe(): Generates descriptive statistics

first(): Compute first of group values

last(): Compute last of group values

nth() : Take nth value, or a subset if n is a list

min(): Compute min of group values

max(): Compute max of group values

yerr 파라미터 이용

In [36]:

score_var = data.groupby('gender').std().T
score_var

Out[36]:

gender	female	male
math score	15.491453	14.356277
reading score	14.378245	13.931832
writing score	14.844842	14.113832

In [37]:

fig, ax = plt.subplots(1, 1, figsize=(10, 10))

idx = np.arange(len(score_var.index))
width=0.3


ax.bar(idx-width/2, score['male'], 
       color='royalblue',
       width=width,
       label='Male',
       yerr=score_var['male'],
       capsize=10
      )

ax.bar(idx+width/2, score['female'], 
       color='tomato',
       width=width,
       label='Female',
       yerr=score_var['female'],
       capsize=10
      )

ax.set_xticks(idx)
ax.set_xticklabels(score.index)
ax.set_ylim(0, 100)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)

ax.legend()
ax.set_title('Gender / Score', fontsize=20)
ax.set_xlabel('Subject', fontweight='bold')
ax.set_ylabel('Score', fontweight='bold')

plt.show()

In [ ]:

'부스트캠프 AI Tech > Data Viz' 카테고리의 다른 글

[03] Line Plot (0)	2022.01.09
[01] matplotlib (0)	2022.01.07
[00] Markdown (0)	2022.01.07

태호의 공부노트

[02] Bar Plot

Bar Plot¶

barh 와 bar color 주기¶

데이터 분석 해보기¶

성별 별 race/ethnity 그룹 수 구하기¶

Stacked Bar Plot¶

ax.spines 알기¶

Grouped Bar Plot¶

x축 -> width -> xticks, xticklabels¶

x축 조정법¶

수치 text 추가하기¶

errorabar 사용하기¶

먼저 pandas aggregation 함수 복습¶

yerr 파라미터 이용

'부스트캠프 AI Tech > Data Viz' 카테고리의 다른 글

티스토리툴바

[02] Bar Plot

Bar Plot¶

barh 와 bar color 주기¶

데이터 분석 해보기¶

성별 별 race/ethnity 그룹 수 구하기¶

Stacked Bar Plot¶

ax.spines 알기¶

Grouped Bar Plot¶

x축 -> width -> xticks, xticklabels¶

x축 조정법¶

수치 text 추가하기¶

errorabar 사용하기¶

먼저 pandas aggregation 함수 복습¶

yerr 파라미터 이용

'부스트캠프 AI Tech > Data Viz' 카테고리의 다른 글

'부스트캠프 AI Tech/Data Viz' Related Articles

티스토리툴바