[WEEK13] Feature Engineering for NLP in Python

WEEK 12에 이어서 두번째 NLP 관련 강의였다.

1. Basic features and readability scores

- Readability test = 얼마나 쉽게 읽히는 글인지?

=> Flesch reading ease, Gunning fog index, Simble Measure of Gobbledygook (SOMG), Dale-Chall score

- Flesch reading ease

=> 가정 1. 길이가 길수록, 2. average number of syllables 가 클수록 읽기 어려운 글

2. Text preprocessing, POS tagging and NER

- Tokenization and Lemmatization

- Part-of-speech tagging (POS tagging)

- Named entity recognition

3. N-gram models

- Building a bag of words model

BoW = based on frequency

- Building a BoW Naive Bayes classifier

- Building n-gram models

4. TF-IDF and similarity scores

- Building tf-idf document vectors

- How to calculate tf-idf value?

- Cosine similarity = 0~1 1=identical

- Building a plot line based recommender

[R] ggplot으로 linear regression 시각화 (0)	2022.05.30
[WEEK1] Introduction to Statistics in R (0)	2022.05.19
[WEEK12] Introduction to Natural Language Processing in Python (0)	2021.11.29
[WEEK2] Unsupervised Learning in Python (0)	2021.09.07
DataCamp ML Scientist 참고 자료 (0)	2021.09.02

댓글