Notice

Recent Posts

Recent Comments

Link

Tags more

Archives

Today

Total

관리 메뉴

찬란하게

[데이터분석] 쇼핑몰 매출 데이터 가공 본문

AI (인공지능)/미니프로젝트

[데이터분석] 쇼핑몰 매출 데이터 가공

체리핫 2021. 3. 24. 16:57

In [124]:

from IPython.core.display import display, HTML
display(HTML("<style>.container {width:90% !important;}</style>"))

In [125]:

#경고(WARNING) 비표시
import warnings
warnings.filterwarnings('ignore')

1) 쇼핑 웹에서 주문수 분석하기!¶

step 1 : 데이터 읽기¶

데이터 준비 :¶

customer_master : 고객 정보
item_master : 상품 정보
transaction1 & 2 : 거래 내역
transaction_detail_1 & 2 : 거래 내역 디테일

In [126]:

import pandas as pd
customer_master = pd.read_csv('dataset/customer_master.csv')
customer_master

Out[126]:

	customer_id	customer_name	registration_date	email	gender	age	birth	pref
0	IK152942	김서준	2019-01-01 0:25	hirata_yuujirou@example.com	M	29	1990-06-10	대전광역시
1	TS808488	김예준	2019-01-01 1:13	tamura_shiori@example.com	F	33	1986-05-20	인천광역시
2	AS834628	김도윤	2019-01-01 2:00	hisano_yuki@example.com	F	63	1956-01-02	광주광역시
3	AS345469	김시우	2019-01-01 4:48	tsuruoka_kaoru@example.com	M	74	1945-03-25	인천광역시
4	GD892565	김주원	2019-01-01 4:54	oouchi_takashi@example.com	M	54	1965-08-05	울산광역시
...	...	...	...	...	...	...	...	...
4995	AS677229	정우석	2019-07-31 16:52	hirayama_risa@example.com	F	77	1941-10-17	대전광역시
4996	HD758694	정영훈	2019-07-31 19:09	nakahara_mahiru@example.com	F	27	1991-11-13	광주광역시
4997	PL538517	정준기	2019-07-31 19:30	tabata_yuu1@example.com	F	73	1945-12-28	대전광역시
4998	OA955088	정도형	2019-07-31 22:32	setouchi_hikaru@example.com	F	75	1944-04-09	부산광역시
4999	HI349563	정지석	2019-07-31 22:49	horii_kanji@example.com	M	21	1998-02-06	서울특별시

5000 rows × 8 columns

In [127]:

item_master = pd.read_csv('dataset/item_master.csv')
item_master

Out[127]:

	item_id	item_name	item_price
0	S001	PC-A	50000
1	S002	PC-B	85000
2	S003	PC-C	120000
3	S004	PC-D	180000
4	S005	PC-E	210000

In [128]:

transaction_detail_1 = pd.read_csv('dataset/transaction_detail_1.csv')
transaction_detail_1

Out[128]:

	detail_id	transaction_id	item_id	quantity
0	0	T0000000113	S005	1
1	1	T0000000114	S001	1
2	2	T0000000115	S003	1
3	3	T0000000116	S005	1
4	4	T0000000117	S002	2
...	...	...	...	...
4995	4995	T0000004865	S003	1
4996	4996	T0000004866	S001	3
4997	4997	T0000004867	S001	3
4998	4998	T0000004868	S005	1
4999	4999	T0000004869	S003	1

5000 rows × 4 columns

In [129]:

transaction_detail_2

Out[129]:

	detail_id	transaction_id	item_id	quantity
0	5000	T0000004870	S002	3
1	5001	T0000004871	S003	1
2	5002	T0000004872	S001	2
3	5003	T0000004873	S004	1
4	5004	T0000004874	S003	2
...	...	...	...	...
2139	7139	T0000006894	S004	1
2140	7140	T0000006895	S002	1
2141	7141	T0000006896	S001	2
2142	7142	T0000006897	S002	1
2143	7143	T0000006898	S002	1

2144 rows × 4 columns

step2 : 데이터를 결합(유니언)¶

데이터 준비 :¶

transaction1 & 2 : 거래 내역 1과 2를 합치기
transaction_detail_1 & 2 : 거래 내역 디테일 1과 2를 합치기

In [130]:

transaction_1 = pd.read_csv('dataset/transaction_1.csv')
transaction_2 = pd.read_csv('dataset/transaction_2.csv')
transaction_1

Out[130]:

	transaction_id	price	payment_date	customer_id
0	T0000000113	210000	2019-02-01 01:36:57	PL563502
1	T0000000114	50000	2019-02-01 01:37:23	HD678019
2	T0000000115	120000	2019-02-01 02:34:19	HD298120
3	T0000000116	210000	2019-02-01 02:47:23	IK452215
4	T0000000117	170000	2019-02-01 04:33:46	PL542865
...	...	...	...	...
4995	T0000005108	210000	2019-06-15 02:42:41	HD315748
4996	T0000005109	150000	2019-06-15 03:36:16	HI215420
4997	T0000005110	50000	2019-06-15 03:44:06	IK880102
4998	T0000005111	210000	2019-06-15 04:14:06	IK074758
4999	T0000005112	50000	2019-06-15 04:42:38	HD444151

5000 rows × 4 columns

In [131]:

transaction_2

Out[131]:

	transaction_id	price	payment_date	customer_id
0	T0000005113	295000	2019-06-15 07:20:27	TS169261
1	T0000005114	50000	2019-06-15 07:35:47	HI599892
2	T0000005115	85000	2019-06-15 07:56:36	HI421757
3	T0000005116	50000	2019-06-15 08:40:55	OA386378
4	T0000005117	120000	2019-06-15 08:44:23	TS506913
...	...	...	...	...
1781	T0000006894	180000	2019-07-31 21:20:44	HI400734
1782	T0000006895	85000	2019-07-31 21:52:48	AS339451
1783	T0000006896	100000	2019-07-31 23:35:25	OA027325
1784	T0000006897	85000	2019-07-31 23:39:35	TS624738
1785	T0000006898	85000	2019-07-31 23:41:38	AS834214

1786 rows × 4 columns

합치기 연산 : .concat¶

In [132]:

transaction = pd.concat([transaction_1, transaction_2], ignore_index=True)
transaction.head()

Out[132]:

	transaction_id	price	payment_date	customer_id
0	T0000000113	210000	2019-02-01 01:36:57	PL563502
1	T0000000114	50000	2019-02-01 01:37:23	HD678019
2	T0000000115	120000	2019-02-01 02:34:19	HD298120
3	T0000000116	210000	2019-02-01 02:47:23	IK452215
4	T0000000117	170000	2019-02-01 04:33:46	PL542865

In [133]:

print(len(transaction_1))
print(len(transaction_2))
print(len(transaction))

5000
1786
6786

In [134]:

transaction_detail_2 = pd.read_csv('dataset/transaction_detail_2.csv')
transaction_detail=pd.concat([transaction_detail_1,transaction_detail_2], ignore_index=True)
transaction_detail.head()

Out[134]:

	detail_id	transaction_id	item_id	quantity
0	0	T0000000113	S005	1
1	1	T0000000114	S001	1
2	2	T0000000115	S003	1
3	3	T0000000116	S005	1
4	4	T0000000117	S002	2

In [135]:

print(len(transaction_detail_1))
print(len(transaction_detail_2))
print(len(transaction_detail))

5000
2144
7144

step3 : 매출 데이터끼리 결합(조인)¶

~step2 과정:¶

데이터 준비
transation_1 & transation_2: 거래내역 합치기
- union : transation
transation_detail_1 & transation_detail_2 : 거래내역 디테일 합치기
- union : transacion_detail

거래 내역과 거래 내역 디테일을 조인¶

left join : 왼쪽 테이블을 기준으로 조인

In [136]:

transaction_detail

Out[136]:

	detail_id	transaction_id	item_id	quantity
0	0	T0000000113	S005	1
1	1	T0000000114	S001	1
2	2	T0000000115	S003	1
3	3	T0000000116	S005	1
4	4	T0000000117	S002	2
...	...	...	...	...
7139	7139	T0000006894	S004	1
7140	7140	T0000006895	S002	1
7141	7141	T0000006896	S001	2
7142	7142	T0000006897	S002	1
7143	7143	T0000006898	S002	1

7144 rows × 4 columns

In [137]:

transaction

Out[137]:

	transaction_id	price	payment_date	customer_id
0	T0000000113	210000	2019-02-01 01:36:57	PL563502
1	T0000000114	50000	2019-02-01 01:37:23	HD678019
2	T0000000115	120000	2019-02-01 02:34:19	HD298120
3	T0000000116	210000	2019-02-01 02:47:23	IK452215
4	T0000000117	170000	2019-02-01 04:33:46	PL542865
...	...	...	...	...
6781	T0000006894	180000	2019-07-31 21:20:44	HI400734
6782	T0000006895	85000	2019-07-31 21:52:48	AS339451
6783	T0000006896	100000	2019-07-31 23:35:25	OA027325
6784	T0000006897	85000	2019-07-31 23:39:35	TS624738
6785	T0000006898	85000	2019-07-31 23:41:38	AS834214

6786 rows × 4 columns

join : .merge¶

In [138]:

join_data = pd.merge(transaction_detail, 
                     transaction[["transaction_id", "payment_date", "customer_id"]], 
                     on="transaction_id", 
                     how="left")
join_data

Out[138]:

	detail_id	transaction_id	item_id	quantity	payment_date	customer_id
0	0	T0000000113	S005	1	2019-02-01 01:36:57	PL563502
1	1	T0000000114	S001	1	2019-02-01 01:37:23	HD678019
2	2	T0000000115	S003	1	2019-02-01 02:34:19	HD298120
3	3	T0000000116	S005	1	2019-02-01 02:47:23	IK452215
4	4	T0000000117	S002	2	2019-02-01 04:33:46	PL542865
...	...	...	...	...	...	...
7139	7139	T0000006894	S004	1	2019-07-31 21:20:44	HI400734
7140	7140	T0000006895	S002	1	2019-07-31 21:52:48	AS339451
7141	7141	T0000006896	S001	2	2019-07-31 23:35:25	OA027325
7142	7142	T0000006897	S002	1	2019-07-31 23:39:35	TS624738
7143	7143	T0000006898	S002	1	2019-07-31 23:41:38	AS834214

7144 rows × 6 columns

In [139]:

# 행 개수 비교

print(len(transaction_detail))
print(len(transaction))
print(len(join_data))

7144
6786
7144

step4 : 마스터데이터를 결합(조인)해보자¶

거래내역(join data) + 고객정보 --> 조인¶

In [140]:

customer_master

Out[140]:

	customer_id	customer_name	registration_date	email	gender	age	birth	pref
0	IK152942	김서준	2019-01-01 0:25	hirata_yuujirou@example.com	M	29	1990-06-10	대전광역시
1	TS808488	김예준	2019-01-01 1:13	tamura_shiori@example.com	F	33	1986-05-20	인천광역시
2	AS834628	김도윤	2019-01-01 2:00	hisano_yuki@example.com	F	63	1956-01-02	광주광역시
3	AS345469	김시우	2019-01-01 4:48	tsuruoka_kaoru@example.com	M	74	1945-03-25	인천광역시
4	GD892565	김주원	2019-01-01 4:54	oouchi_takashi@example.com	M	54	1965-08-05	울산광역시
...	...	...	...	...	...	...	...	...
4995	AS677229	정우석	2019-07-31 16:52	hirayama_risa@example.com	F	77	1941-10-17	대전광역시
4996	HD758694	정영훈	2019-07-31 19:09	nakahara_mahiru@example.com	F	27	1991-11-13	광주광역시
4997	PL538517	정준기	2019-07-31 19:30	tabata_yuu1@example.com	F	73	1945-12-28	대전광역시
4998	OA955088	정도형	2019-07-31 22:32	setouchi_hikaru@example.com	F	75	1944-04-09	부산광역시
4999	HI349563	정지석	2019-07-31 22:49	horii_kanji@example.com	M	21	1998-02-06	서울특별시

5000 rows × 8 columns

In [141]:

join_data = pd.merge(join_data, customer_master,
                      on="customer_id", how="left")
join_data

Out[141]:

	detail_id	transaction_id	item_id	quantity	payment_date	customer_id	customer_name	registration_date	email	gender	age	birth	pref
0	0	T0000000113	S005	1	2019-02-01 01:36:57	PL563502	김태경	2019-01-07 14:34	imoto_yoshimasa@example.com	M	30	1989-07-15	대전광역시
1	1	T0000000114	S001	1	2019-02-01 01:37:23	HD678019	김영웅	2019-01-27 18:00	mifune_rokurou@example.com	M	73	1945-11-29	서울특별시
2	2	T0000000115	S003	1	2019-02-01 02:34:19	HD298120	김강현	2019-01-11 8:16	yamane_kogan@example.com	M	42	1977-05-17	광주광역시
3	3	T0000000116	S005	1	2019-02-01 02:47:23	IK452215	김주한	2019-01-10 5:07	ikeda_natsumi@example.com	F	47	1972-03-17	인천광역시
4	4	T0000000117	S002	2	2019-02-01 04:33:46	PL542865	김영빈	2019-01-25 6:46	kurita_kenichi@example.com	M	74	1944-12-17	광주광역시
...	...	...	...	...	...	...	...	...	...	...	...	...	...
7139	7139	T0000006894	S004	1	2019-07-31 21:20:44	HI400734	김윤성	2019-01-04 13:24	shishido_akira@example.com	M	64	1955-01-13	대구광역시
7140	7140	T0000006895	S002	1	2019-07-31 21:52:48	AS339451	김무경	2019-02-11 19:34	aihara_miki@example.com	F	74	1945-02-03	대구광역시
7141	7141	T0000006896	S001	2	2019-07-31 23:35:25	OA027325	박준석	2019-04-17 9:23	matsuda_saki@example.com	F	40	1979-05-25	서울특별시
7142	7142	T0000006897	S002	1	2019-07-31 23:39:35	TS624738	이가빈	2019-02-20 18:15	shinndou_masatoshi@example.com	M	56	1963-02-21	인천광역시
7143	7143	T0000006898	S002	1	2019-07-31 23:41:38	AS834214	이승채	2019-04-07 3:20	tahara_yuuko@example.com	F	74	1944-12-18	대전광역시

7144 rows × 13 columns

(거래내역 + 고객정보) + 아이템 정보 --> 조인¶

In [142]:

item_master

Out[142]:

	item_id	item_name	item_price
0	S001	PC-A	50000
1	S002	PC-B	85000
2	S003	PC-C	120000
3	S004	PC-D	180000
4	S005	PC-E	210000

In [143]:

join_data = pd.merge(join_data, item_master, on="item_id", how="left")

In [144]:

join_data.head()

Out[144]:

	detail_id	transaction_id	item_id	quantity	payment_date	customer_id	customer_name	registration_date	email	gender	age	birth	pref	item_name	item_price
0	0	T0000000113	S005	1	2019-02-01 01:36:57	PL563502	김태경	2019-01-07 14:34	imoto_yoshimasa@example.com	M	30	1989-07-15	대전광역시	PC-E	210000
1	1	T0000000114	S001	1	2019-02-01 01:37:23	HD678019	김영웅	2019-01-27 18:00	mifune_rokurou@example.com	M	73	1945-11-29	서울특별시	PC-A	50000
2	2	T0000000115	S003	1	2019-02-01 02:34:19	HD298120	김강현	2019-01-11 8:16	yamane_kogan@example.com	M	42	1977-05-17	광주광역시	PC-C	120000
3	3	T0000000116	S005	1	2019-02-01 02:47:23	IK452215	김주한	2019-01-10 5:07	ikeda_natsumi@example.com	F	47	1972-03-17	인천광역시	PC-E	210000
4	4	T0000000117	S002	2	2019-02-01 04:33:46	PL542865	김영빈	2019-01-25 6:46	kurita_kenichi@example.com	M	74	1944-12-17	광주광역시	PC-B	85000

In [167]:

# 관계형 데이터 베이스는 정규화로 나눠서 분리해서 저장
# join -> 합쳐놓기

step5 : 필요한 데이터 컬럼을 만들기¶

파생변수 만들기¶

In [146]:

# 파생변수 만들기

join_data["price"] = join_data["quantity"] *join_data["item_price"]

In [147]:

join_data[["quantity", "item_price","price"]].head()

Out[147]:

	quantity	item_price	price
0	1	210000	210000
1	1	50000	50000
2	1	120000	120000
3	1	210000	210000
4	2	85000	170000

step6 : 데이터를 검산하자¶

In [148]:

print(join_data["price"].sum())
print(transaction["price"].sum())

971135000
971135000

In [149]:

join_data["price"].sum() == transaction["price"].sum()

Out[149]:

True

step7 : 각종 통계량을 파악하자¶

In [168]:

### 누락값 = 결측치 = NA

In [150]:

join_data.isnull()

Out[150]:

	detail_id	transaction_id	item_id	quantity	payment_date	customer_id	customer_name	registration_date	email	gender	age	birth	pref	item_name	item_price	price
0	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
1	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
2	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
3	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
4	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
7139	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
7140	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
7141	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
7142	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False
7143	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False	False

7144 rows × 16 columns

In [151]:

join_data.isnull().sum()

Out[151]:

detail_id            0
transaction_id       0
item_id              0
quantity             0
payment_date         0
customer_id          0
customer_name        0
registration_date    0
email                0
gender               0
age                  0
birth                0
pref                 0
item_name            0
item_price           0
price                0
dtype: int64

.describe()¶

In [152]:

# 수치형 데이터
join_data.describe()

Out[152]:

	detail_id	quantity	age	item_price	price
count	7144.000000	7144.000000	7144.000000	7144.000000	7144.000000
mean	3571.500000	1.199888	50.265677	121698.628219	135937.150056
std	2062.439494	0.513647	17.190314	64571.311830	68511.453297
min	0.000000	1.000000	20.000000	50000.000000	50000.000000
25%	1785.750000	1.000000	36.000000	50000.000000	85000.000000
50%	3571.500000	1.000000	50.000000	102500.000000	120000.000000
75%	5357.250000	1.000000	65.000000	187500.000000	210000.000000
max	7143.000000	4.000000	80.000000	210000.000000	420000.000000

In [153]:

# 범주형 데이터
join_data["payment_date"].min()

Out[153]:

'2019-02-01 01:36:57'

In [154]:

join_data["payment_date"].max()

Out[154]:

'2019-07-31 23:41:38'

step8 : 월별로 데이터를 집계해보자¶

파생변수 데이터 생성¶

In [155]:

join_data.dtypes

Out[155]:

detail_id             int64
transaction_id       object
item_id              object
quantity              int64
payment_date         object
customer_id          object
customer_name        object
registration_date    object
email                object
gender               object
age                   int64
birth                object
pref                 object
item_name            object
item_price            int64
price                 int64
dtype: object

object --> datetime¶

In [156]:

join_data["payment_date"] = pd.to_datetime(join_data["payment_date"])
join_data["payment_date"]

Out[156]:

0      2019-02-01 01:36:57
1      2019-02-01 01:37:23
2      2019-02-01 02:34:19
3      2019-02-01 02:47:23
4      2019-02-01 04:33:46
               ...        
7139   2019-07-31 21:20:44
7140   2019-07-31 21:52:48
7141   2019-07-31 23:35:25
7142   2019-07-31 23:39:35
7143   2019-07-31 23:41:38
Name: payment_date, Length: 7144, dtype: datetime64[ns]

In [157]:

join_data["payment_date"].dtypes

Out[157]:

dtype('<M8[ns]')

In [158]:

join_data["payment_month"] = join_data["payment_date"].dt.strftime("%Y%m")
join_data["payment_month"].head()

Out[158]:

0    201902
1    201902
2    201902
3    201902
4    201902
Name: payment_month, dtype: object

In [159]:

join_data[["payment_date", "payment_month"]].head()

Out[159]:

	payment_date	payment_month
0	2019-02-01 01:36:57	201902
1	2019-02-01 01:37:23	201902
2	2019-02-01 02:34:19	201902
3	2019-02-01 02:47:23	201902
4	2019-02-01 04:33:46	201902

group by¶

In [160]:

join_data.groupby("payment_month").sum()

Out[160]:

	detail_id	quantity	age	item_price	price
payment_month
201902	676866	1403	59279	142805000	160185000
201903	2071474	1427	58996	142980000	160370000
201904	3476816	1421	59246	143670000	160510000
201905	4812795	1390	58195	139655000	155420000
201906	6369999	1446	61070	147090000	164030000
201907	8106846	1485	62312	153215000	170620000

In [161]:

join_data.groupby("payment_month").sum()["price"]

Out[161]:

payment_month
201902    160185000
201903    160370000
201904    160510000
201905    155420000
201906    164030000
201907    170620000
Name: price, dtype: int64

step9 : 월별, 상품별 데이터를 집계해보자¶

groupby : groupby.집계함수.출력방식¶

In [162]:

join_data.groupby(["payment_month","item_name"]).sum()[["price", "quantity"]]

Out[162]:

		price	quantity
payment_month	item_name
201902	PC-A	24150000	483
	PC-B	25245000	297
	PC-C	19800000	165
	PC-D	31140000	173
	PC-E	59850000	285
201903	PC-A	26000000	520
	PC-B	25500000	300
	PC-C	19080000	159
	PC-D	25740000	143
	PC-E	64050000	305
201904	PC-A	25900000	518
	PC-B	23460000	276
	PC-C	21960000	183
	PC-D	24300000	135
	PC-E	64890000	309
201905	PC-A	24850000	497
	PC-B	25330000	298
	PC-C	20520000	171
	PC-D	25920000	144
	PC-E	58800000	280
201906	PC-A	26000000	520
	PC-B	23970000	282
	PC-C	21840000	182
	PC-D	28800000	160
	PC-E	63420000	302
201907	PC-A	25250000	505
	PC-B	28220000	332
	PC-C	19440000	162
	PC-D	26100000	145
	PC-E	71610000	341

pivot table: 축¶

index='행'¶

columns='열'¶

In [163]:

pd.pivot_table(join_data, 
               index='item_name',
               columns='payment_month',
               values=['price','quantity'], aggfunc='sum' )

Out[163]:

	price						quantity
payment_month	201902	201903	201904	201905	201906	201907	201902	201903	201904	201905	201906	201907
item_name
PC-A	24150000	26000000	25900000	24850000	26000000	25250000	483	520	518	497	520	505
PC-B	25245000	25500000	23460000	25330000	23970000	28220000	297	300	276	298	282	332
PC-C	19800000	19080000	21960000	20520000	21840000	19440000	165	159	183	171	182	162
PC-D	31140000	25740000	24300000	25920000	28800000	26100000	173	143	135	144	160	145
PC-E	59850000	64050000	64890000	58800000	63420000	71610000	285	305	309	280	302	341

step10 : 상품별 매출 추이를 가시화해보자¶

In [188]:

graph_data=pd.pivot_table(join_data,
                         index='payment_month',
                         columns='item_name',
                         values='price', aggfunc='sum')

In [165]:

graph_data.head()

Out[165]:

item_name	PC-A	PC-B	PC-C	PC-D	PC-E
payment_month
201902	24150000	25245000	19800000	31140000	59850000
201903	26000000	25500000	19080000	25740000	64050000
201904	25900000	23460000	21960000	24300000	64890000
201905	24850000	25330000	20520000	25920000	58800000
201906	26000000	23970000	21840000	28800000	63420000

matplotlib 라이브러리¶

In [180]:

list(graph_data.index)

Out[180]:

['201902', '201903', '201904', '201905', '201906', '201907']

In [173]:

graph_data["PC-A"]

Out[173]:

payment_month
201902    24150000
201903    26000000
201904    25900000
201905    24850000
201906    26000000
201907    25250000
Name: PC-A, dtype: int64

In [198]:

import matplotlib.pyplot as plt

# 바로 보여줘
%matplotlib inline

# plt.plot(x,y)

plt.plot(graph_data.index, graph_data["PC-A"], label='PC-A')
plt.plot(graph_data.index, graph_data["PC-B"], label='PC-B')
plt.plot(graph_data.index, graph_data["PC-C"], label='PC-C')
plt.plot(graph_data.index, graph_data["PC-D"], label='PC-D')
plt.plot(graph_data.index, graph_data["PC-E"], label='PC-E')
plt.legend()

Out[198]:

<matplotlib.legend.Legend at 0x228b56e08b0>

In [ ]:

'AI (인공지능) > 미니프로젝트' 카테고리의 다른 글

언어 판별기 제작하기! ㄴstep1. (0)	2021.03.26
[데이터분석][AI] SVM모델 적용시키기 (0)	2021.03.25
[데이터분석][AI] 보스톤 주택가격 - 회귀분석모델(Regression) 정확도 검사 (0)	2021.03.24
[데이터 분석][AI]ML - 회귀분석 (0)	2021.03.24
서울시 범죄현황 통계자료 (0)	2020.09.08

'AI (인공지능)/미니프로젝트' Related Articles

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

찬란하게

[데이터분석] 쇼핑몰 매출 데이터 가공 본문

[데이터분석] 쇼핑몰 매출 데이터 가공

1) 쇼핑 웹에서 주문수 분석하기!¶

step 1 : 데이터 읽기¶

데이터 준비 :¶

step2 : 데이터를 결합(유니언)¶

데이터 준비 :¶

합치기 연산 : .concat¶

step3 : 매출 데이터끼리 결합(조인)¶

~step2 과정:¶

거래 내역과 거래 내역 디테일을 조인¶

join : .merge¶

step4 : 마스터데이터를 결합(조인)해보자¶

거래내역(join data) + 고객정보 --> 조인¶

(거래내역 + 고객정보) + 아이템 정보 --> 조인¶

step5 : 필요한 데이터 컬럼을 만들기¶

파생변수 만들기¶

step6 : 데이터를 검산하자¶

step7 : 각종 통계량을 파악하자¶

.describe()¶

step8 : 월별로 데이터를 집계해보자¶

파생변수 데이터 생성¶

object --> datetime¶

group by¶

step9 : 월별, 상품별 데이터를 집계해보자¶

groupby : groupby.집계함수.출력방식¶

pivot table: 축¶

index='행'¶

columns='열'¶

step10 : 상품별 매출 추이를 가시화해보자¶

matplotlib 라이브러리¶

'AI (인공지능) > 미니프로젝트' 카테고리의 다른 글

티스토리툴바