A token is a piece of text. Each “entity” that is a part of whatever was split up based on rules. For example, each word is a token when a sentence is “tokenized” into words. Each sentence can also be a token if you tokenized the sentences out of a paragraph.
The nltk library in python is used for tokenization as well as all major NLP tasks!
|
1 2 3 4 5 6 7 8 9 10 11 |
# importing the library from nltk.tokenize import sent_tokenize, word_tokenize EXAMPLE_TEXT = '''Hello Mr. Smith, how are you doing today? The weather is great, and Python is awesome. The sky is pinkish-blue. You shouldn't eat cardboard.''' # Sentence tokenization print(sent_tokenize(EXAMPLE_TEXT)) # Word tokenization print(word_tokenize(EXAMPLE_TEXT)) |
Sample Output:

Author Details
Lead Data Scientist
Farukh is an innovator in solving industry problems using Artificial intelligence. His expertise is backed with 10 years of industry experience. Being a senior data scientist he is responsible for designing the AI/ML solution to provide maximum gains for the clients. As a thought leader, his focus is on solving the key business problems of the CPG Industry. He has worked across different domains like Telecom, Insurance, and Logistics. He has worked with global tech leaders including Infosys, IBM, and Persistent systems. His passion to teach inspired him to create this website!
