[17] Troubleshooting

Out Of Memory(OOM)

큰 모델을 사용할 때 자주 겪었었는데 보통 batch_size를 줄여서 돌리면 정상적으로 돌아갔다.
OOM 이 해결이 어려운 이유들
- 왜 발생했는지 알기 어려움
- 어디서 발생했는지 알기 어려움
- Error backtracking 이 이상한데로 감
- 메모리의 이전상황의 파악이 어려움
해결 : Batch Size 줄이기 → GPU clean→ Run

GPUUtil 사용하기

nvidia-smi 처럼 GPU의 상태를 보여주는 모듈
Colab은 환경에서 GPU 상태 보여주기 편함
iter마다 메모리가 늘어나는지 확인!!

torch.cuda.empty_cache()

사용되지 않은 GPU상 cache를 정리
가용 메모리를 확보
del 과는 구분이 필요
reset 대신 쓰기 좋은 함수

import torch
from GPUtil import showUtilization as gpu_usage

print("Initial GPU Usage")
gpu_usage()

tensorList = []
for x in range(10):
    tensorList.append(torch.randn(10000000,10).cuda())

print("GPU Usage after allcoating a bunch of Tensors")
gpu_usage()

del tensorList

print("GPU Usage after deleting the Tensors")
gpu_usage()

print("GPU Usage after emptying the cache")
torch.cuda.empty_cache()
gpu_usage()

참고

Memory Management, Optimisation and Debugging with PyTorch (paperspace.com)

Memory Management, Optimisation and Debugging with PyTorch

This article covers PyTorch's advanced GPU management features, including how to multiple GPU's for your network, whether be it data or model parallelism. We conclude with best practises for debugging memory error.

blog.paperspace.com

torch.no_grad()

Inference 시점에서는 torch.no_grad() 구문을 사용
backward pass 으로 인해 쌓이는 메모리에서 자유로움

with torch.no_grad():
  for data, target in test_loader:
    output = network(data)
    test_loss += F.nll_loss(output, target, size_average=False).item()
    pred = output.data.max(1, keepdim=True)[1]
    correct += pred.eq(target.data.view_as(pred)).sum()

저작자표시

'부스트캠프 AI Tech > Pytorch' 카테고리의 다른 글

[16] Hyperparameter Tuning (0)	2022.01.24
[15] Multi-GPU 학습 (0)	2022.01.24
[14] monitoring tool - wandb (0)	2022.01.24
[13] Monitoring tool - Tensorboard (0)	2022.01.24
[12] 전이학습 tansfer learning (0)	2022.01.24

태호의 공부노트

[17] Troubleshooting

Out Of Memory(OOM)

GPUUtil 사용하기

torch.cuda.empty_cache()

참고

torch.no_grad()

'부스트캠프 AI Tech > Pytorch' 카테고리의 다른 글

티스토리툴바

[17] Troubleshooting

Out Of Memory(OOM)

GPUUtil 사용하기

torch.cuda.empty_cache()

참고

torch.no_grad()

'부스트캠프 AI Tech > Pytorch' 카테고리의 다른 글

'부스트캠프 AI Tech/Pytorch' Related Articles

티스토리툴바