How to do chunking in Python

Chunking means getting a chunk of text. A meaningful piece of text from the full text.

One of the main goals of chunking is to group into what is known as “noun phrases.” These are phrases of one or more words that contain a noun, maybe some descriptive words, maybe a verb, and maybe something like an adverb. The idea is to group nouns with the words that are in relation to them.

# importing libraries
import nltk
from textblob import TextBlob

sampleSentence = '''This is a very good book, felt really nice reading it.'''
#sampleSentence = '''Ravi is a bad leader, he misguides the team too often'''

sampleSentence = nltk.word_tokenize(sampleSentence)
sampleSentencePOS = nltk.pos_tag(sampleSentence)

# Find Nouns or proper Nouns
OnlyNouns = (" ").join([POStags[0] for POStags in sampleSentencePOS if POStags[1] in ['NN','NNP']])

# Find only Adjectives
OnlyAdjectives= (" ").join([POStags[0] for POStags in sampleSentencePOS if POStags[1] in ['JJ','JJR','JJS']])

print ('Nouns: ', OnlyNouns)
print ('Adjectives: ', OnlyAdjectives)
print('Overall sentiment score of ajdectives: ',TextBlob(OnlyAdjectives).sentiment)

# importing libraries

import nltk

from textblob import TextBlob

sampleSentence = '''This is a very good book, felt really nice reading it.'''

#sampleSentence = '''Ravi is a bad leader, he misguides the team too often'''

sampleSentence = nltk.word_tokenize(sampleSentence)

sampleSentencePOS = nltk.pos_tag(sampleSentence)

# Find Nouns or proper Nouns

OnlyNouns = (" ").join([POStags[0] for POStags in sampleSentencePOS if POStags[1] in ['NN','NNP']])

# Find only Adjectives

OnlyAdjectives= (" ").join([POStags[0] for POStags in sampleSentencePOS if POStags[1] in ['JJ','JJR','JJS']])

print ('Nouns: ', OnlyNouns)

print ('Adjectives: ', OnlyAdjectives)

print('Overall sentiment score of ajdectives: ',TextBlob(OnlyAdjectives).sentiment)

Sample Output

This can give us an idea about what type of adjectives are being used for the nouns, whether they are positive or negative keywords

A Wordcloud can be plotted for all the adjectives to understand overall sentiment or for all the nouns to understand what are the nouns being talked about

Sentiment score can be generated by checking if all the adjectives used are positive or negative

Chunking using multiple grammar rules

Instead of just extracting nouns or adjectives, you can also extract specific combinations of Parts of Speech, e.g. an Adverb-Adjective-Noun combination. You can specify the grammar rules for chunking.

# Generating grammer rules to find such sentences in data
PatternsToFind = '''NP: {&lt;JJ>&lt;VBG>} 
                    NP: {&lt;RB>&lt;JJ>&lt;NN>} '''

sampleSentence = '''This is a very good book, felt really nice reading it.'''
#sampleSentence = '''Ravi is a bad leader, he misguides the team too often'''

# Creating Parts of Speech(POS) tags
sampleSentence = nltk.word_tokenize(sampleSentence)
sampleSentencePOS = nltk.pos_tag(sampleSentence)

# Chunking the listed patterns
PatternParser = nltk.RegexpParser(PatternsToFind)
ParsedResults = PatternParser.parse(sampleSentencePOS)

# Getting the pattern results from the text data
for results in ParsedResults:
    #print(results,'--' ,type(results))
    
    # Printing only the extracted patterns
    if(type(results)==nltk.tree.Tree):
        print(results)

# Generating grammer rules to find such sentences in data

PatternsToFind = '''NP: {<JJ><VBG>}

NP: {<RB><JJ><NN>} '''

sampleSentence = '''This is a very good book, felt really nice reading it.'''

#sampleSentence = '''Ravi is a bad leader, he misguides the team too often'''

# Creating Parts of Speech(POS) tags

sampleSentence = nltk.word_tokenize(sampleSentence)

sampleSentencePOS = nltk.pos_tag(sampleSentence)

# Chunking the listed patterns

PatternParser = nltk.RegexpParser(PatternsToFind)

ParsedResults = PatternParser.parse(sampleSentencePOS)

# Getting the pattern results from the text data

for results in ParsedResults:

#print(results,'--' ,type(results))

# Printing only the extracted patterns

if(type(results)==nltk.tree.Tree):

print(results)

Sample Output:

Chunking multiple grammar rules in Python

Author Details

Farukh Hashmi

Lead Data Scientist

Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!

https://thinkingneuron.com/

thinkingneuron@gmail.com

Chunking using multiple grammar rules

Leave a Reply! Cancel Reply