Privacy Policy Summarizer

For the final of the Programming A to Z, I wanted to make an NLP model that generate text summarization.

The reason for making a summarization of Privacy Policy is to allow users to understand how their personal information is being used in companies, government and read the privacy policy in a faster and easier way. 96% of the users do not read the privacy policy on the internet. The iTunes policy is equal to Shakespear’s The Tempest scripts which is 2000 more words. The boring and long privacy policy is driving users away from knowing what their information could be used in different ways. 

My goal was to make a demo of text analysis of summarization. 

Privacy reading statistics

It is a Natural Language Processing tool to summarize large bodies of text down into a small subset of sentences. The algorithm is based heavily on PyTeaser. 



Using tokenizer and Regex excluding a list of words that are unnecessary to separate and clean up the words, and then rank the words based on the above criteria. 

Ranking the words

Then the sentences will be scored from 4 categories, how much is it related to the keywords enter, frequency of the word, sentence’s length and the sentence’s position. The calculation of score is borrowed heavily from PyTeaser:

  • Relevance to the title 
  • Relevance to keywords in the article
  • The position of the sentence
  • Length of the sentence

Chrome Extension

I wanted to incorporate the tool into the chrome extension tool, but somehow there are some issues with the chrome browser. So it wasn’t successful yet. In the process, I found out it is very hard to do it in NLP extraction process. Hopefully, in the near future, I could have a more complete version of this tool, and a more understanding of sequence2sequence and NLP.

Thank you so much for Dan’s teaching in such a complicated topic and translated in a simpler way for us to understand.

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to Top