Search
๐Ÿค–

Kernel Method

์ƒ์„ฑ์ผ
2025/05/22 16:19
ํƒœ๊ทธ
๋จธ์‹ ๋Ÿฌ๋‹
์ž‘์„ฑ์ž

1. Basis function

โ€ข
์ž…๋ ฅ๊ฐ’์„ ๋‹ค์–‘ํ•œ ํŠน์ง• ์ฐจ์›์œผ๋กœ ํ™•์žฅ์‹œํ‚ค๋Š” ํ•จ์ˆ˜
โ€ข
๊ธฐ์กด ์ฐจ์›์—์„œ ์„ ํ˜• ํšŒ๊ท€/๋ถ„๋ฅ˜๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ์— ๊ธฐ์ € ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜์—ฌ ์ฐจ์›์„ ํ™•์žฅํ•˜๋ฉด ์ƒˆ ์ฐจ์›์—์„œ ์„ ํ˜• ๊ด€๊ณ„ ์ ์šฉ์ด ๊ฐ€๋Šฅํ•จ
โ€ข
๊ธฐ์กด ๊ณต๊ฐ„์—์„œ๋Š” ๋น„์„ ํ˜•์ ์œผ๋กœ ๋‚˜๋‰˜๋Š” ๋ฐ์ดํ„ฐ๋ผ๋„ ๊ณ ์ฐจ์›์˜ ํŠน์ง• ๊ณต๊ฐ„์œผ๋กœ mappingํ•˜๋ฉด ์„ ํ˜• ์ดˆํ‰๋ฉด์œผ๋กœ ๋ถ„๋ฆฌ๊ฐ€ ๊ฐ€๋Šฅํ•จ
โ€ข
๊ธฐ์ €ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ๊ฐ€์ค‘์น˜ update์—์„œ ์—ฐ์‚ฐ๋Ÿ‰์ด ๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ•จ
โ—ฆ
์˜ˆ) SSE๋ฅผ ๋ชฉ์ ํ•จ์ˆ˜๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ
โˆ‚E(w)โˆ‚w=โˆ‘i(t(i)โˆ’wTฯ•(x(i)))ฯ•(x(i))\frac{\partial E(w)}{\partial w} = \sum_i(t^{(i)} - w^T \phi(x^{(i)}))\phi(x^{(i)})

2. Kernel Method

โ€ข
๊ณ ์ฐจ์›์˜ ๊ธฐ์ €ํ•จ์ˆ˜๊ฐ€ ์ ์šฉ๋œ ๊ฒฝ์šฐ Gradient Descent๋ฅผ ๋‹จ์ˆœํ™”ํ•˜๋Š” ์ˆ˜ํ•™์  ๋ฐฉ์‹
โ€ข
๊ฐ€์ค‘์น˜ ww๊ฐ€ scalar linear combination์œผ๋กœ ํ‘œํ˜„๋จ์„ ์ˆ˜ํ•™์  ๊ท€๋‚ฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฆ๋ช…
โ€ข
w=โˆ‘i=1Nuiฯ•(x(i))w = \sum^N_{i=1} u_i \phi(x^{(i)})
โ€ข
ww์˜ update๋Š” scalar update์™€ ๊ฐ™๋‹ค
โ€ข
uiโ†ui+โˆ‘i=1N(ui+ฮฑ(t(i)โˆ’โˆ‘j=1Nujโ€‰ฯ•(x(i))โŠคฯ•(x(i))))u_i\leftarrow u_i+ \sum_{i=1}^{N} \left( u_i + \alpha \left( t^{(i)} - \sum_{j=1}^{N} u_j \, \phi(x^{(i)})^\top \phi(x^{(i)}) \right) \right) ; scalar update
โ€ข
ฯ•(x(i))โŠคฯ•(x(i))\phi(x^{(i)})^\top \phi(x^{(i)}) : Kernel ํ–‰๋ ฌ๋กœ ์ •์˜ํ•˜์—ฌ ์‚ฌ์ „ ๊ณ„์‚ฐ ํ›„ ์‚ฌ์šฉ
โ‡’ ์ปค๋„ ๋ฐฉ๋ฒ•์—์„œ๋Š”๊ฐ€์ค‘์น˜ ๋ฒกํ„ฐ์˜ ๋ช…์‹œ์  ์—…๋ฐ์ดํŠธ๋ฅผ ํ”ผํ•˜๊ณ  ํ•™์Šต ์ƒ˜ํ”Œ๋ณ„ ์Šค์นผ๋ผ ๊ณ„์ˆ˜๋ฅผ ์—…๋ฐ์ดํŠธ ํ•จ์œผ๋กœ์จ ๋ชจ๋ธ์„ ํ•™์Šตํ•œ๋‹ค. ๋˜ํ•œ ์ปค๋„ ํ–‰๋ ฌ(Gram matrix)์„ ๋ฏธ๋ฆฌ ๊ณ„์‚ฐํ•จ์œผ๋กœ์จ ๊ณ ์ฐจ์› ๋‚ด์  ์—ฐ์‚ฐ์„ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•˜์—ฌ ์—ฐ์‚ฐ๋Ÿ‰์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.

3. Code

import numpy as np # ๋ฐ์ดํ„ฐ ์ƒ์„ฑ n_obs, n_feature = 10, 5 X = np.random.randn(n_obs, n_feature) true_w = np.random.randn(n_feature) y = X@true_w # ์ดˆ๊ธฐ๊ฐ’ ์„ค์ • w = np.zeros(n_feature) lr = 0.001 n_iters = 200 # Gradient Descent for i in range(n_iters): y_hat = X@w grad = -X.T @ (y - y_hat ) w += lr*grad # Kernel Method # ์ดˆ๊ธฐ๊ฐ’ ์„ค์ • w = np.zeros(n_feature) K = X @ X.T u = np.zeros(n_obs) for i in range(n_iters): dual = K @ u grad_ = -K@(y-dual) u += lr*grad_ w_ = X.T @ u
Python
๋ณต์‚ฌ