본문 바로가기
Computer Science/[21-22] DataCamp ML, R

[WEEK1] Introduction to Statistics in R

by gojw 2022. 5. 19.

- Datacamp Career Track "R Statistician" 를 시작했다.

 

1. Summary Statistics

- Measures of center = mean, median:

- left skewed, right skewed = skewed data일수록 median 사용

- Measures of spread = quartiles, quantiles and quintiles = Box plot uses quartiles

- Variance and standard deviation

- Interquartile range (IQR) = Height of the box quantile(0.75) - quantile(0.25)

- IQR 이용해서 outlier 구별 가능 < Q1 - 1.5*IQR, >Q3 + 1.5*IQR

 

2. Random Numbers and Probability

- with / without replacement

- Diecrete distributions / Continuous distributions

- The binomial distribution (mean, std) 68-95-99

- Calculating binomial probabilities using R

 

3. More Distributions and the Central Limit Theorem

- The normal distribution

- Calculating probabilities from the normal distribution

- The central limit theorem = number of trial이 많아질수록, sampling distribution이 normal distribution과 비슷해진다.

- The poisson distribution (lambda) = 이벤트가 어떤 rate에 맞게 일어나는데, 랜덤으로.

- lambda = 어떤 time period 내에서 이벤트가 평균적으로 일어나는 횟수

- Exponential distribution = possion distribution에서 두 이벤트가 일어나는 사이의 간격

- The t-distribution (degrees of freedom) 

- degrees of freedom (df)가 커질수록 그래프가 normal distribution에 가까워진다.

- Log-normal distribution = 변수의 로그값이 normally distributed

 

4. Correlation and Experimental Design

- Linear relationshop between two variables = Correlation

- Correlation = associated with (o), cause (x)

댓글