Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | ||||
4 | 5 | 6 | 7 | 8 | 9 | 10 |
11 | 12 | 13 | 14 | 15 | 16 | 17 |
18 | 19 | 20 | 21 | 22 | 23 | 24 |
25 | 26 | 27 | 28 | 29 | 30 | 31 |
Tags
- XSS 취약점
- 웹모의해킹
- 웹 모의해킹
- AI
- 코테
- 컴퓨터 구조
- 함수
- 개인정보보호법
- 회귀분석
- 시저암호
- docker
- 마이데이터
- 개인정보보호
- 프로그래머스
- 데이터 분석
- 데이터분석
- 코딩테스트
- 데이터3법
- 자료형
- 파이썬
- AWS
- 클라우드
- 정보보안
- 코딩테스트 연습
- vagrant
- 알고리즘
- 도커
- 파이썬 문법
- 백준
- 머신러닝
Archives
- Today
- Total
찬란하게
[데이터분석][AI] 보스톤 주택가격 - 회귀분석모델(Regression) 정확도 검사 본문
Boston 주택가격 예측 - 회귀분석을 통해서¶
part1. data 준비 과정¶
Import Library¶
In [1]:
import mglearn
import sklearn
다중선형회귀분석¶
- y(label) : 주택가격
- x(features) : 주택가격에 미치는 요인 -> school 위치, station...
In [5]:
X, y = mglearn.datasets.load_extended_boston()
X
len(X)
Out[5]:
506
In [3]:
y
# 수치형
Out[3]:
array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17.5, 20.2, 18.2, 13.6, 19.6,
15.2, 14.5, 15.6, 13.9, 16.6, 14.8, 18.4, 21. , 12.7, 14.5, 13.2,
13.1, 13.5, 18.9, 20. , 21. , 24.7, 30.8, 34.9, 26.6, 25.3, 24.7,
21.2, 19.3, 20. , 16.6, 14.4, 19.4, 19.7, 20.5, 25. , 23.4, 18.9,
35.4, 24.7, 31.6, 23.3, 19.6, 18.7, 16. , 22.2, 25. , 33. , 23.5,
19.4, 22. , 17.4, 20.9, 24.2, 21.7, 22.8, 23.4, 24.1, 21.4, 20. ,
20.8, 21.2, 20.3, 28. , 23.9, 24.8, 22.9, 23.9, 26.6, 22.5, 22.2,
23.6, 28.7, 22.6, 22. , 22.9, 25. , 20.6, 28.4, 21.4, 38.7, 43.8,
33.2, 27.5, 26.5, 18.6, 19.3, 20.1, 19.5, 19.5, 20.4, 19.8, 19.4,
21.7, 22.8, 18.8, 18.7, 18.5, 18.3, 21.2, 19.2, 20.4, 19.3, 22. ,
20.3, 20.5, 17.3, 18.8, 21.4, 15.7, 16.2, 18. , 14.3, 19.2, 19.6,
23. , 18.4, 15.6, 18.1, 17.4, 17.1, 13.3, 17.8, 14. , 14.4, 13.4,
15.6, 11.8, 13.8, 15.6, 14.6, 17.8, 15.4, 21.5, 19.6, 15.3, 19.4,
17. , 15.6, 13.1, 41.3, 24.3, 23.3, 27. , 50. , 50. , 50. , 22.7,
25. , 50. , 23.8, 23.8, 22.3, 17.4, 19.1, 23.1, 23.6, 22.6, 29.4,
23.2, 24.6, 29.9, 37.2, 39.8, 36.2, 37.9, 32.5, 26.4, 29.6, 50. ,
32. , 29.8, 34.9, 37. , 30.5, 36.4, 31.1, 29.1, 50. , 33.3, 30.3,
34.6, 34.9, 32.9, 24.1, 42.3, 48.5, 50. , 22.6, 24.4, 22.5, 24.4,
20. , 21.7, 19.3, 22.4, 28.1, 23.7, 25. , 23.3, 28.7, 21.5, 23. ,
26.7, 21.7, 27.5, 30.1, 44.8, 50. , 37.6, 31.6, 46.7, 31.5, 24.3,
31.7, 41.7, 48.3, 29. , 24. , 25.1, 31.5, 23.7, 23.3, 22. , 20.1,
22.2, 23.7, 17.6, 18.5, 24.3, 20.5, 24.5, 26.2, 24.4, 24.8, 29.6,
42.8, 21.9, 20.9, 44. , 50. , 36. , 30.1, 33.8, 43.1, 48.8, 31. ,
36.5, 22.8, 30.7, 50. , 43.5, 20.7, 21.1, 25.2, 24.4, 35.2, 32.4,
32. , 33.2, 33.1, 29.1, 35.1, 45.4, 35.4, 46. , 50. , 32.2, 22. ,
20.1, 23.2, 22.3, 24.8, 28.5, 37.3, 27.9, 23.9, 21.7, 28.6, 27.1,
20.3, 22.5, 29. , 24.8, 22. , 26.4, 33.1, 36.1, 28.4, 33.4, 28.2,
22.8, 20.3, 16.1, 22.1, 19.4, 21.6, 23.8, 16.2, 17.8, 19.8, 23.1,
21. , 23.8, 23.1, 20.4, 18.5, 25. , 24.6, 23. , 22.2, 19.3, 22.6,
19.8, 17.1, 19.4, 22.2, 20.7, 21.1, 19.5, 18.5, 20.6, 19. , 18.7,
32.7, 16.5, 23.9, 31.2, 17.5, 17.2, 23.1, 24.5, 26.6, 22.9, 24.1,
18.6, 30.1, 18.2, 20.6, 17.8, 21.7, 22.7, 22.6, 25. , 19.9, 20.8,
16.8, 21.9, 27.5, 21.9, 23.1, 50. , 50. , 50. , 50. , 50. , 13.8,
13.8, 15. , 13.9, 13.3, 13.1, 10.2, 10.4, 10.9, 11.3, 12.3, 8.8,
7.2, 10.5, 7.4, 10.2, 11.5, 15.1, 23.2, 9.7, 13.8, 12.7, 13.1,
12.5, 8.5, 5. , 6.3, 5.6, 7.2, 12.1, 8.3, 8.5, 5. , 11.9,
27.9, 17.2, 27.5, 15. , 17.2, 17.9, 16.3, 7. , 7.2, 7.5, 10.4,
8.8, 8.4, 16.7, 14.2, 20.8, 13.4, 11.7, 8.3, 10.2, 10.9, 11. ,
9.5, 14.5, 14.1, 16.1, 14.3, 11.7, 13.4, 9.6, 8.7, 8.4, 12.8,
10.5, 17.1, 18.4, 15.4, 10.8, 11.8, 14.9, 12.6, 14.1, 13. , 13.4,
15.2, 16.1, 17.8, 14.9, 14.1, 12.7, 13.5, 14.9, 20. , 16.4, 17.7,
19.5, 20.2, 21.4, 19.9, 19. , 19.1, 19.1, 20.1, 19.9, 19.6, 23.2,
29.8, 13.8, 13.3, 16.7, 12. , 14.6, 21.4, 23. , 23.7, 25. , 21.8,
20.6, 21.2, 19.1, 20.6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9])
part2. ML - LinearRegression¶
In [16]:
from sklearn.model_selection import train_test_split
# 머신러닝 or 분석 --> 회귀식
# sklearn 객체지향
from sklearn.linear_model import LinearRegression
# test, train data 분리
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
# 리턴값 : tuple
X_train
# 객체 생성 # 학습=적합
lr = LinearRegression().fit(X_train,y_train)
# w출력, b출력
print("lr.coef_:", lr.coef_)
print("lr.intercept_", lr.intercept_)
lr.coef_: [-4.12710947e+02 -5.22432068e+01 -1.31898815e+02 -1.20041365e+01
-1.55107129e+01 2.87163342e+01 5.47040992e+01 -4.95346659e+01
2.65823927e+01 3.70620316e+01 -1.18281674e+01 -1.80581965e+01
-1.95246830e+01 1.22025403e+01 2.98078144e+03 1.50084257e+03
1.14187325e+02 -1.69700520e+01 4.09613691e+01 -2.42636646e+01
5.76157466e+01 1.27812142e+03 -2.23986944e+03 2.22825472e+02
-2.18201083e+00 4.29960320e+01 -1.33981515e+01 -1.93893485e+01
-2.57541277e+00 -8.10130128e+01 9.66019367e+00 4.91423718e+00
-8.12114800e-01 -7.64694179e+00 3.37837099e+01 -1.14464390e+01
6.85083979e+01 -1.73753604e+01 4.28128204e+01 1.13988209e+00
-7.72696840e-01 5.68255921e+01 1.42875996e+01 5.39551110e+01
-3.21709644e+01 1.92709675e+01 -1.38852338e+01 6.06343266e+01
-1.23153942e+01 -1.20041365e+01 -1.77243899e+01 -3.39868183e+01
7.08999816e+00 -9.22538241e+00 1.71980268e+01 -1.27718431e+01
-1.19727581e+01 5.73871915e+01 -1.75331865e+01 4.10103194e+00
2.93666477e+01 -1.76611772e+01 7.84049424e+01 -3.19098015e+01
4.81752461e+01 -3.95344813e+01 5.22959055e+00 2.19982410e+01
2.56483934e+01 -4.99982035e+01 2.91457545e+01 8.94267456e+00
-7.16599297e+01 -2.28147862e+01 8.40660981e+00 -5.37905422e+00
1.20137322e+00 -5.20877186e+00 4.11452351e+01 -3.78250760e+01
-2.67163851e+00 -2.55217108e+01 -3.33982030e+01 4.62272693e+01
-2.41509169e+01 -1.77532970e+01 -1.39723701e+01 -2.35522208e+01
3.68353800e+01 -9.46890859e+01 1.44302810e+02 -1.51158659e+01
-1.49513436e+01 -2.87729579e+01 -3.17673192e+01 2.49551594e+01
-1.84384534e+01 3.65073948e+00 1.73101122e+00 3.53617137e+01
1.19553429e+01 6.77025947e-01 2.73452009e+00 3.03720012e+01]
lr.intercept_ 30.934563673638145
In [15]:
# 적합도 fitting rate
fit_rate = lr.score(X_train, y_train)
fit_rate
Out[15]:
0.952051960903273
In [13]:
# 예측 정확도 accuracy
accuracy = lr.score(X_test, y_test)
accuracy
# 60% 맞고, 40%는 틀림
Out[13]:
0.6074721959665708
- 과적합의 원인
- 샘플링 과정의 문제
- 같은 데이터 반복 주입
part3. 회귀분석 보강기술¶
Ridge¶
- y = w * x + b
- 과적합이 일어났을때 w를 0에 가깝게(최소값)으로 설정
- w가 0에 가까울수록 y는 독립변수가 된다. (x가 y에 미치는 영향력이 작아진다.)
- L2규제
오차제곱법
In [22]:
from sklearn.linear_model import Ridge
ridge = Ridge().fit(X_train, y_train)
print("fitting rate : {:.2f}".format(ridge.score(X_train,y_train)))
print("accuracy : {:.2f}".format(ridge.score(X_test,y_test)))
fitting rate : 0.89
accuracy : 0.75
alpha 조절¶
- L2규제 -> 알파값이 높아질수록 규제가 강해짐
- 알파값이 커질수록 w값이 0에 가까워짐
In [35]:
# 기본값 1
ridge10 = Ridge(alpha=10).fit(X_train,y_train)
ridge01 = Ridge(alpha=0.1).fit(X_train,y_train)
# 알파값이 10일 때, 0.1일 때
print("fitting rate : {:.2f}".format(ridge10.score(X_train,y_train)),"/ {:.2f}".format(ridge01.score(X_train,y_train)))
print("accuracy : {:.2f}".format(ridge10.score(X_test,y_test)), "/ {:.2f}".format(ridge01.score(X_test,y_test)))
fitting rate : 0.79 / 0.93
accuracy : 0.64 / 0.77
In [40]:
import matplotlib.pyplot as plt
plt.plot(ridge10.coef_,'^',label="alpha=10")
plt.plot(ridge01.coef_,'s',label="alpha=0.1")
plt.plot(ridge.coef_,'v',label="alpha=1")
plt.plot
plt.show()
Lasso¶
- l1규제
In [20]:
from sklearn.linear_model import Lasso
lasso = Lasso().fit(X_train, y_train)
print("fitting rate : {:.2f}".format(ridge.score(X_train,y_train)))
print("accuracy : {:.2f}".format(ridge.score(X_test,y_test)))
fitting rate : 0.89
accuracy : 0.75
'AI (인공지능) > 미니프로젝트' 카테고리의 다른 글
언어 판별기 제작하기! ㄴstep1. (0) | 2021.03.26 |
---|---|
[데이터분석][AI] SVM모델 적용시키기 (0) | 2021.03.25 |
[데이터 분석][AI]ML - 회귀분석 (0) | 2021.03.24 |
[데이터분석] 쇼핑몰 매출 데이터 가공 (0) | 2021.03.24 |
서울시 범죄현황 통계자료 (0) | 2020.09.08 |