Bar Plot¶
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
barh 와 bar color 주기¶
# fig = plt.figure(12,7)
# axes = fig.subplots(1,2)
fig, axes = plt.subplots(1,2, figsize=(12,7))
x = list('ABCED')
y = list(range(1,6))
clist = ['tomato', 'g', 'r', 'm', 'b']
axes[0].bar(x,y, color = clist) # 리스트로 개별 막대 색 주기
axes[1].barh(x,y, color = 'k') # 모든 막대 색
plt.show()
데이터 분석 해보기¶
In [3]:
data = pd.read_csv('./StudentsPerformance.csv')
data.info()
print(data.shape)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gender 1000 non-null object
1 race/ethnicity 1000 non-null object
2 parental level of education 1000 non-null object
3 lunch 1000 non-null object
4 test preparation course 1000 non-null object
5 math score 1000 non-null int64
6 reading score 1000 non-null int64
7 writing score 1000 non-null int64
dtypes: int64(3), object(5)
memory usage: 62.6+ KB
(1000, 8)
In [4]:
data.head(5)
Out[4]:
gender | race/ethnicity | parental level of education | lunch | test preparation course | math score | reading score | writing score | |
---|---|---|---|---|---|---|---|---|
0 | female | group B | bachelor's degree | standard | none | 72 | 72 | 74 |
1 | female | group C | some college | standard | completed | 69 | 90 | 88 |
2 | female | group B | master's degree | standard | none | 90 | 95 | 93 |
3 | male | group A | associate's degree | free/reduced | none | 47 | 57 | 44 |
4 | male | group C | some college | standard | none | 76 | 78 | 75 |
성별 별 race/ethnity 그룹 수 구하기¶
In [5]:
group = data.groupby('gender')['race/ethnicity'].value_counts().sort_index()
# Series기 때문에 sort_index 사용. sort_values 해서 오류..
group
Out[5]:
gender race/ethnicity
female group A 36
group B 104
group C 180
group D 129
group E 69
male group A 53
group B 86
group C 139
group D 133
group E 71
Name: race/ethnicity, dtype: int64
In [6]:
fig, axex = plt.subplots(1, 2, figsize=(15,7))
axex[0].bar(group['male'].index, group['male'], color = 'royalblue')
axex[1].bar(group['female'].index, group['female'], color = 'tomato')
# group['male'] 은 pandas 시리즈인데도 가능
# group.male.values 로 array 형태로만 넣었었는데 시리즈도 가능했음
plt.show()
두 그래프를 비교할 때 세로축 범위가 다르므로 비교가 어려움
In [7]:
# 방법 1 : sharey 파라미터 사용
fig, axes = plt.subplots(1, 4, figsize=(15,7), sharey=True)
axes[0].bar(group['male'].index, group['male'], color = 'royalblue')
axes[1].bar(group['female'].index, group['female'], color = 'tomato')
# 방법 2 : 반복문 set_ylim
axes[2].bar(group['male'].index, group['male'], color = 'royalblue')
axes[3].bar(group['female'].index, group['female'], color = 'tomato')
for ax in axes:
ax.set_ylim(0,200)
plt.show()
Stacked Bar Plot¶
group 수를 bar plot 하는데 남 여 합쳐서 보여주고싶을때 bottom 파라미터를 이용해 stack 하거나 alpha 파라미터로 투명도 조정하여 겹쳐서 plot
In [8]:
data.head(3)
data['race/ethnicity'].value_counts().sort_index()
Out[8]:
group A 89
group B 190
group C 319
group D 262
group E 140
Name: race/ethnicity, dtype: int64
bottom 파라미터를 이용. 아래공간을 비워둔다.
In [9]:
group_cnt = data['race/ethnicity'].value_counts().sort_index()
fig, axes = plt.subplots(1,2, figsize= (12,7), sharey=True)
axes[0].bar(group_cnt.index,group_cnt,color='darkgray')
axes[1].bar(group.male.index, group.male, color='blue')
axes[1].bar(group['female'].index, group['female'], bottom=group['male'], color='red')
plt.show()
In [13]:
fig, ax = plt.subplots(1, 1, figsize=(12,7))
ax.barh(group.male.index, group.male.values / group_cnt.values)
ax.barh(group.female.index, group.female.values / group_cnt.values, left=group.male.values / group_cnt.values)
for s in ['top', 'bottom', 'left', 'right']:
ax.spines[s].set_visible(False)
plt.show()
ax.spines 알기¶
spines는 축 커스터마이징 할때 사용 dict 형태로 되어있으며 key는 top, bottom, left, right tick_params 와 보통 같이 쓰임
Method
- set_visible(False)
- set_position('cender') or set_position('data', 1)
- set_linewidth(2)
- set_alpha(0.5) #투명도
- set_color('navy')
Grouped Bar Plot¶
x축 -> width -> xticks, xticklabels¶
x축 조정법¶
- 2개 : -1/2, +1/2
- 3개 : -1, 0, +1 (-2/2, 0, +2/2)
- 4개 : -3/2, -1/2, +1/2, +3/2
- 공식. index i 구하기
$x+\frac{-N+1+2\times i}{2}\times width$
In [17]:
fig, ax = plt.subplots(1, 1, figsize=(12, 7))
idx = np.arange(len(group['male'].index))
width=0.35
ax.bar(idx-width/2, group['male'],
color='royalblue',
width=width,
label='Male')
ax.bar(idx+width/2, group['female'],
color='tomato',
width=width,
label='female')
ax.set_xticks(idx)
ax.set_xticklabels(group['male'].index)
ax.legend()
plt.show()
수치 text 추가하기¶
In [19]:
fig, axes = plt.subplots(1, 2, figsize=(15, 7))
for ax in axes:
ax.bar(group_cnt.index, group_cnt,
width=0.7,
edgecolor='black',
linewidth=2,
color='royalblue',
zorder=10
)
ax.margins(0.1, 0.1)
for s in ['top', 'right']:
ax.spines[s].set_visible(False)
axes[1].grid(zorder=0)
for idx, value in zip(group_cnt.index, group_cnt):
axes[1].text(idx, value+5, s=value,
ha='center',
fontweight='bold'
)
errorabar 사용하기¶
먼저 pandas aggregation 함수 복습¶
- mean(): Compute mean of groups
- sum(): Compute sum of group values
- size(): Compute group sizes
- count(): Compute count of group
- std(): Standard deviation of groups 표준편차
- var(): Compute variance of groups 분산
- sem(): Standard error of the mean of groups 표준오차
- describe(): Generates descriptive statistics
- first(): Compute first of group values
- last(): Compute last of group values
- nth() : Take nth value, or a subset if n is a list
- min(): Compute min of group values
- max(): Compute max of group values
yerr 파라미터 이용
In [36]:
score_var = data.groupby('gender').std().T
score_var
Out[36]:
gender | female | male |
---|---|---|
math score | 15.491453 | 14.356277 |
reading score | 14.378245 | 13.931832 |
writing score | 14.844842 | 14.113832 |
In [37]:
fig, ax = plt.subplots(1, 1, figsize=(10, 10))
idx = np.arange(len(score_var.index))
width=0.3
ax.bar(idx-width/2, score['male'],
color='royalblue',
width=width,
label='Male',
yerr=score_var['male'],
capsize=10
)
ax.bar(idx+width/2, score['female'],
color='tomato',
width=width,
label='Female',
yerr=score_var['female'],
capsize=10
)
ax.set_xticks(idx)
ax.set_xticklabels(score.index)
ax.set_ylim(0, 100)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.legend()
ax.set_title('Gender / Score', fontsize=20)
ax.set_xlabel('Subject', fontweight='bold')
ax.set_ylabel('Score', fontweight='bold')
plt.show()
In [ ]:
'부스트캠프 AI Tech > Data Viz' 카테고리의 다른 글
[03] Line Plot (0) | 2022.01.09 |
---|---|
[01] matplotlib (0) | 2022.01.07 |
[00] Markdown (0) | 2022.01.07 |