Chunking means getting a chunk of text. A meaningful piece of text from the full text.
One of the main goals of chunking is to group into what is known as “noun phrases.” These are phrases of one or more words that contain a noun, maybe some descriptive words, maybe a verb, and maybe something like an adverb. The idea is to group nouns with the words that are in relation to them.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# importing libraries import nltk from textblob import TextBlob sampleSentence = '''This is a very good book, felt really nice reading it.''' #sampleSentence = '''Ravi is a bad leader, he misguides the team too often''' sampleSentence = nltk.word_tokenize(sampleSentence) sampleSentencePOS = nltk.pos_tag(sampleSentence) # Find Nouns or proper Nouns OnlyNouns = (" ").join([POStags[0] for POStags in sampleSentencePOS if POStags[1] in ['NN','NNP']]) # Find only Adjectives OnlyAdjectives= (" ").join([POStags[0] for POStags in sampleSentencePOS if POStags[1] in ['JJ','JJR','JJS']]) print ('Nouns: ', OnlyNouns) print ('Adjectives: ', OnlyAdjectives) print('Overall sentiment score of ajdectives: ',TextBlob(OnlyAdjectives).sentiment) |
Sample Output

This can give us an idea about what type of adjectives are being used for the nouns, whether they are positive or negative keywords
A Wordcloud can be plotted for all the adjectives to understand overall sentiment or for all the nouns to understand what are the nouns being talked about
Sentiment score can be generated by checking if all the adjectives used are positive or negative
Chunking using multiple grammar rules
Instead of just extracting nouns or adjectives, you can also extract specific combinations of Parts of Speech, e.g. an Adverb-Adjective-Noun combination. You can specify the grammar rules for chunking.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
# Generating grammer rules to find such sentences in data PatternsToFind = '''NP: {<JJ><VBG>} NP: {<RB><JJ><NN>} ''' sampleSentence = '''This is a very good book, felt really nice reading it.''' #sampleSentence = '''Ravi is a bad leader, he misguides the team too often''' # Creating Parts of Speech(POS) tags sampleSentence = nltk.word_tokenize(sampleSentence) sampleSentencePOS = nltk.pos_tag(sampleSentence) # Chunking the listed patterns PatternParser = nltk.RegexpParser(PatternsToFind) ParsedResults = PatternParser.parse(sampleSentencePOS) # Getting the pattern results from the text data for results in ParsedResults: #print(results,'--' ,type(results)) # Printing only the extracted patterns if(type(results)==nltk.tree.Tree): print(results) |
Sample Output:

