For the final of the Programming A to Z, I wanted to make an NLP model that generate text summarization.
My goal was to make a demo of text analysis of summarization.
It is a Natural Language Processing tool to summarize large bodies of text down into a small subset of sentences. The algorithm is based heavily on PyTeaser.
Using tokenizer and Regex excluding a list of words that are unnecessary to separate and clean up the words, and then rank the words based on the above criteria.
Ranking the words
Then the sentences will be scored from 4 categories, how much is it related to the keywords
- Relevance to the title
- Relevance to keywords in the article
- The position of the sentence
- Length of the sentence
I wanted to incorporate the tool into the chrome extension tool, but somehow there are some issues with the chrome browser. So it wasn’t successful yet. In the process, I found out it is very hard to do it in NLP extraction process. Hopefully, in the near future, I could have a more complete version of this tool, and a more understanding of sequence2sequence and NLP.
Thank you so much for Dan’s teaching in such a complicated topic and translated in a simpler way for us to understand.