- Datacamp Career Track "R Statistician" 를 시작했다.
1. Summary Statistics
- Measures of center = mean, median:
- left skewed, right skewed = skewed data일수록 median 사용
- Measures of spread = quartiles, quantiles and quintiles = Box plot uses quartiles
- Variance and standard deviation
- Interquartile range (IQR) = Height of the box quantile(0.75) - quantile(0.25)
- IQR 이용해서 outlier 구별 가능 < Q1 - 1.5*IQR, >Q3 + 1.5*IQR
2. Random Numbers and Probability
- with / without replacement
- Diecrete distributions / Continuous distributions
- The binomial distribution (mean, std) 68-95-99
- Calculating binomial probabilities using R
3. More Distributions and the Central Limit Theorem
- The normal distribution
- Calculating probabilities from the normal distribution
- The central limit theorem = number of trial이 많아질수록, sampling distribution이 normal distribution과 비슷해진다.
- The poisson distribution (lambda) = 이벤트가 어떤 rate에 맞게 일어나는데, 랜덤으로.
- lambda = 어떤 time period 내에서 이벤트가 평균적으로 일어나는 횟수
- Exponential distribution = possion distribution에서 두 이벤트가 일어나는 사이의 간격
- The t-distribution (degrees of freedom)
- degrees of freedom (df)가 커질수록 그래프가 normal distribution에 가까워진다.
- Log-normal distribution = 변수의 로그값이 normally distributed
4. Correlation and Experimental Design
- Linear relationshop between two variables = Correlation
- Correlation = associated with (o), cause (x)
'Computer Science > [21-22] DataCamp ML, R' 카테고리의 다른 글
[R] ggplot으로 linear regression 시각화 (0) | 2022.05.30 |
---|---|
[WEEK13] Feature Engineering for NLP in Python (0) | 2021.12.06 |
[WEEK12] Introduction to Natural Language Processing in Python (0) | 2021.11.29 |
[WEEK2] Unsupervised Learning in Python (0) | 2021.09.07 |
DataCamp ML Scientist 참고 자료 (0) | 2021.09.02 |
댓글