9. Temporal Difference Learning(1/2)

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

공부 정리 블로그

9. Temporal Difference Learning(1/2) 본문

대학원 수업/강화학습

9. Temporal Difference Learning(1/2)

따옹 2022. 10. 12. 23:08

Dynamic Programming

- model based

- 장점 : bootstrap 사용

다음 state 값을 기반으로 현 state를 업데이트 가능

state가 길 때 활용할 수 있음

Monte-Carlo method

- model free method

dynamic 을 쓸 수 없을 때,

Temporal Difference Learning

combines the benefits of DP and MC method

Monte Carlo

어떤 Q값을 어떻게 predict 방식을 보고 policy를 업데이트 해가면서 optimal policy로 가는 것

on-policy

off-policy : Q-value

V(s) = R(s):reward의 합 state의 define

평균 내는 것이 monte R(s) 계산하려면 episode 끝까지 가봐야함

게임처럼 에피가 긴 경우, return 구하는 시간이 너무 많이 걸림

어떻게 하는 것이 좋을까?

이러면 좋지 않을까?

어떤 action 어떤 state로 갈지 몰라서

그냥 가보는 거, 해보는 거

R(실제값)은 episode 끝까지 가야함 / bootstrap estimate(추정값)

TD learning에서 TD error는 (추구하는 값 - 현재값)

예제를 손으로 써서 추가해넣기

TD Prediction Algorithm

'대학원 수업 > 강화학습' 카테고리의 다른 글

4. The Bellman Equation and Dynamic Programming(2/3) - Dynamic Programming - Value Iteration 계산 (0)	2022.10.22
3. Bellman equation and Dynamic Programming(1/3) - The Bellman Equation (0)	2022.10.22
11. The multi-Armed Bandit Problem (0)	2022.10.19
2. 강화학습의 키워드들 (0)	2022.09.18
1. 강화학습의 개념 (0)	2022.09.13

'대학원 수업/강화학습' Related Articles

공부 정리 블로그

9. Temporal Difference Learning(1/2) 본문

9. Temporal Difference Learning(1/2)

Dynamic Programming

Monte-Carlo method

Temporal Difference Learning

'대학원 수업 > 강화학습' 카테고리의 다른 글

티스토리툴바