'DataScience/Python' 카테고리의 글 목록

Notice

Recent Posts

Recent Comments

Tags more

07-08 02:32

Archives

관리 메뉴

목록DataScience/Python (11)

develop myself

matplotlib 사용 지침

객체 지향 방식을 사용해야 하는 이유 2022 PyCon 이제현 님 발표: https://youtu.be/ZTRKojTLE8M 사용 예제 fig, axes = plt.subplots(ncols=5, figsize=(8,4)) for i, col in enumerate(['Temperature','Humidity','Light','CO2','HumidityRatio']): sns.boxplot(data=rooms[col],ax = axes[i]) axes[i].set_title(col) fig.tight_layout() fig.subplots_adjust(top=0.8) fig.suptitle("Room Occupancy") fig.set_facecolor("lightgray") plt.show() fig..

DataScience/Python 2023. 1. 27. 14:56

데이터핸들링: 문자열(str)

문자열: str 문자열 슬리아싱 가능: str[i:j] split() startswith() endswith() contains() 함수 설명 capitalize() 첫 문자를 대문자로하고, 나머지 문자를 소문자로 하는 문자열 반환 casefold() 모든 대소문자 구분을 제거 count(sub, [, start[, end]]) [start, end] 범위에서 부분 문자열 sub의 중복되지 않은 수를 반환 find(sub, [, start[, end]]) [start, end]에서 부분 문자열 sub가 문자열의 가장 작은 인덱스를 반환. sub가 발견되지 않는 경우는 -1 반환 rfind(sub, [, start[, end]]) [start, end]에서 부분 문자열 sub가 문자열의 가장 큰 인덱스를 ..

DataScience/Python 2023. 1. 27. 12:10

데이터 핸들링: 데이터 재구성해서 보기

df.groupby().func() 집계 설명 count 전체 개수 head, tail 앞의 함목 일부 반환, 뒤의 항목 일부 반환 describe Series, DataFrame의 각 컬럼에 대한 요약 통계 min, max 최소값, 최대값 cummin, cummax 누적 최소값, 누적 최대값 argmin, argmax 최소값과 최대값의 색인 위치 idxmin, idxmax 최소값과 최대값의 색인값 mean, median 평균값, 중앙값 std, var 표준편차(Standard deviation), 분산(Variance) skew 왜도(skewness) 값 계산 kurt 첨도(kurtosis) 값 계산 mad 절대 평균 편차(Mean Absolute Deviation) sum, cumsum 전체 항목 합..

DataScience/Python 2023. 1. 27. 12:04

데이터 핸들링: 기본

DataFrame 생성 - 리스트, 배열, ndarray로 부터 생성: 행 단위로 되어 있음: [['1행의 1','1행의 2'],['2행의 1','2행의 2']] - dictionary 타입으로 부터 생성: {'컬럼명':['컬렴의 요소들'],'컬럼명':['컬럼의 요소들']} DataFrame properties, methods 데이터프레임의 properties # df's properties [ elem for elem in dir(pd.DataFrame) if isinstance(getattr(pd.DataFrame,elem),property) and not elem.startswith('_') ] ['T', 'at', 'attrs', 'axes', 'dtypes', 'empty', 'flags', '..

DataScience/Python 2023. 1. 27. 10:51

matplotlib cheatsheets, handout

공식 문서 https://matplotlib.org/cheatsheets/ Matplotlib cheatsheets — Visualization with Python matplotlib.org cheatsheets https://matplotlib.org/cheatsheets/cheatsheets.pdf handout https://matplotlib.org/cheatsheets/handout-beginner.pdf https://matplotlib.org/cheatsheets/handout-intermediate.pdf https://matplotlib.org/cheatsheets/handout-tips.pdf

DataScience/Python 2023. 1. 26. 17:37

matplotlib basic tips

폰트 설정 import matplotlib.font_manager as fm # 설치된 폰트 출력 font_list = [ font.name for font in fm.fontManager.ttflist if font.name.startswith('Nanum')] font_list import matplotlib.pyplot as plt plt.rcParams['font.family'] = 'NanumGothic' plt.rcParams['font.family'] = 'Malgun Gothic' 이미지 저장 plt.savefig("filename") 이미지 output %matplotlib inline plt.show() minus error plt.rcParams['axes.unicode_minus']..

DataScience/Python 2023. 1. 26. 17:08

pandas basic tips

pandas.options.display jupyter notebook의 output 설정을 확인하고 변경할 수 있다. 설정 가능한 display 목록 확인 import pandas as pd # dir로 변경하고 싶은 설정 목록을 확인할 수 있음. dir(pd.options.display) 현재 설정 값 확인 # 현재 설정을 확인하고 싶을 경우 pd.get_option('display.max_rows') pd.get_option('display.max_columns') 설정 변경 pd.set_option('display.max_columns',None) pd.set_option('display.max_rows',None) # Pandas 숫자 출력 포맷팅 # https://financedata.gith..

DataScience/Python 2023. 1. 26. 17:04

numpy basic tips

ndarray numpy의 기본 데이터 타입. class 이다. ndarray 생성 함수 import numpy as np # ndarray 생성 a = np.array([1,2,3,4]) # ndarray 생성의 여러 예 b = np.zeros((3,3)) c = np.ones((3,4)) d = np.full((2,2),7) # 명시된 값으로 채우기 e = np.eye(3,3) f = np.random.random((2,2)) g = np.arange(5) 수학 함수 a = np.array([np.e, 2, 3]) np.abs(a) np.sqrt(a) np.log(a) np.floor(a) np.ceil(a) np.rint(a) # a에 가장 가까운 정수 를 나타내는 float 타입의 값을 반환 비교..

DataScience/Python 2023. 1. 26. 16:38

이전 Prev 1 2 Next 다음

목록DataScience/Python (11)

develop myself

티스토리툴바