Introduction to Flair for NLP: A Simple yet Powerful State-of-the-Art NLP Library

# for i in range(10):  print(corpus[i])  print(POS[i]) ### Removing blanks form sentence and pos ### corpus = [x for x in corpus if x!= ] POS = [x for x in POS if x!= ] ### Check ### For i in range(10):  print(corpus[i])  print(POS[i]) We have extracted the essentials aspects we require from the dataset.

Let’s move on to step 3.

  Step 3: Tagging the text using NLTK and Flair Tagging using NLTK: First, import the required libraries: import nltk nltk.

download(tagsets) nltk.

download(punkt) nltk.

download(averaged_perceptron_tagger) from nltk import word_tokenize This will download all the necessary files to tag the text using NLTK.

### Tagging the corpus with NLTK ### #for storing results# nltk_pos = [] ##for every sentence ## for i in tqdm(corpus):  # Tokenize sentence #  text = word_tokenize(i)  #tag Words#  z = nltk.

pos_tag(text)  # store #  nltk_pos.

append(z) The PoS tags are in this format: [(‘token_1’, ‘tag_1’), ………….

, (‘token_n’, ‘tag_n’)]   Lets extract PoS from this: ### Extracting final pos by nltk in a list ### tmp = [] nltk_result = [] ## every tagged sentence ## for i in tqdm(nltk_pos):  tmp = [] ## every word ##  for j in i:    ## append tag (from index 1) ##    tmp.

append(j[1])  # join the tags of every sentence #  nltk_result.

append( .

join(tmp)) ### check ### for i in range(10):  print(nltk_result[i])  print(corpus[i]) The NLTK tags are ready for business.

Turning our attention to Flair now Importing the libraries first: !pip install flair from flair.

data import Sentence from flair.

models import SequenceTagger   Tagging using Flair # initiating object # pos = SequenceTagger.

load(pos-fast) #for storing pos tagged string# f_pos = [] ## for every sentence ## for i in tqdm(corpus):  sentence = Sentence(i)  pos.

predict(sentence) ## append tagged sentence ##  f_pos.

append(sentence.

to_tagged_string()) ###check ### for i in range(10):  print(f_pos[i])  print(corpus[i]) The result is in the below format: token_1 <tag_1> token_2 <tag_2> ………………….

token_n <tag_n> Note: We can use different taggers available within the Flair library.

Feel free to tinker around and experiment.

You can find the list here.

Extract the sentence-wise tags as we did in NLTK Import re ### Extracting POS tags ### ## in every sentence by index ## for i in tqdm(range(len(f_pos))):  ## for every words ith sentence ##  for j in corpus[i].

split():    ## replace that word from ith sentence in f_pos ##    f_pos[i] = str(f_pos[i]).

replace(j,””,1)  ## Removing < > symbols ##  for j in  [<,>]:    f_pos[i] = str(f_pos[i]).

replace(j,””)    ## removing redundant spaces ##    f_pos[i] = re.

sub( +, , str(f_pos[i]))    f_pos[i] = str(f_pos[i]).

lstrip() ### check ### for i in range(10):  print(f_pos[i])  print(corpus[i]) Aha!.We have finally tagged the corpus and extracted them sentence-wise.

We are free to remove all the punctuation and special symbols.

### Removing Symbols and redundant space ### ## in every sentence by index ## for i in tqdm(range(len(corpus))):  # Removing Symbols #  corpus[i] = re.

sub([^a-zA-Z], , str(corpus[i]))  POS[i] = re.

sub([^a-zA-Z], , str(POS[i]))  f_pos[i] = re.

sub([^a-zA-Z], , str(f_pos[i]))  nltk_result[i] = re.

sub([^a-zA-Z], , str(nltk_result[i]))  ## Removing HYPH SYM (they are for symbols) ##  f_pos[i] = str(f_pos[i]).

replace(HYPH,””)  f_pos[i] = str(f_pos[i]).

replace(SYM,””)  POS[i] = str(POS[i]).

replace(SYM,””)  POS[i] = str(POS[i]).

replace(HYPH,””)  nltk_result[i] = str(nltk_result[i].

replace(HYPH,))  nltk_result[i] = str(nltk_result[i].

replace(SYM,))                       ## Removing redundant space ##  POS[i] = re.

sub( +, , str(POS[i]))  f_pos[i] = re.

sub( +, , str(f_pos[i]))  corpus[i] = re.

sub( +, , str(corpus[i]))  nltk_result[i] = re.

sub( +, , str(nltk_result[i]))   We have tagged the corpus using NLTK and Flair, extracted and removed all the unnecessary elements.

 Let’s see it for ourselves: for i in range(1000):  print(corpus   +corpus[i])  print(actual   +POS[i])  print(nltk     +nltk_result[i])  print(flair    +f_pos[i])  print(-*50) OUTPUT: corpus   SOCCER JAPAN GET LUCKY WIN CHINA IN SURPRISE DEFEAT actual    NN NNP VB NNP NNP NNP IN DT NN nltk        NNP NNP NNP NNP NNP NNP NNP NNP NNP flair        NNP NNP VBP JJ NN NNP IN NNP NNP ————————————————– corpus   Nadim Ladki actual    NNP NNP nltk        NNP NNP flair        NNP NNP ————————————————– corpus   AL AIN United Arab Emirates actual    NNP NNP NNP NNPS CD nltk        NNP NNP NNP VBZ JJ flair        NNP NNP NNP NNP CD That looks convincing!.  Step 4: Evaluating the PoS tags from NLTK and Flair against the tagged dataset Here, we are doing word-wise evaluation of the tags with the help of a custom-made evaluator.

corpus   Japan coach Shu Kamo said The Syrian own goal proved lucky for us actual    NNP NN NNP NNP VBD POS DT JJ JJ NN VBD JJ IN PRP nltk        NNP VBP NNP NNP VBD DT JJ JJ NN VBD JJ IN PRP flair        NNP NN NNP NNP VBD DT JJ JJ NN VBD JJ IN PRP Note that in the example above, the actual POS tags contain redundancy compared to NLTK and flair tags as shown (in bold).

Therefore we will not be considering the POS tagged sentences where the sentences are of unequal length.

### EVALUATION FUNCTION ### def eval(x,y):  # correct match #  count = 0  #Total comparisons made#  comp = 0  ## for every sentence index in dataset ##  for i in range(len(x)):    ## if the sentence length match ##    if len(x[i].

split()) == len(y[i].

split()):      ## compare each word ##      for j in range(len(x[i].

split())):        if x[i][j] == y[i][j] :          ## Match.## count = count+1          comp = comp + 1        else:          comp = comp + 1  return (count/comp)*100 Finally we evaluate the POS tags of NLTK and Flair against the POS tags provided by the dataset.

print(“nltk Score “, eval2(POS,nltk_result)) print(“Flair Score “, eval2(POS,f_pos)) Our Result: NLTK Score: 85.

38654023442645 Flair Score: 90.

96172124773179 Well, well, well.

I can see why Flair has been getting so much attention in the NLP community.

  End Notes Flair clearly provides an edge in word embeddings and stacked word embeddings.

These can be implemented without much hassle due to its high level API.

The Flair embedding is something to keep an eye on in the near future.

I love that the Flair library supports multiple languages.

The developers are additionally currently working on “Frame Detection” using flair.

The future looks really bright for this library.

I personally enjoyed working and learning the in’s and out’s of this library.

I hope you found the tutorial useful and will be using Flair to your advantage next time you take up an NLP challenge.

You can also read this article on Analytics Vidhyas Android APP Share this:Click to share on LinkedIn (Opens in new window)Click to share on Facebook (Opens in new window)Click to share on Twitter (Opens in new window)Click to share on Pocket (Opens in new window)Click to share on Reddit (Opens in new window)Like this:Like Loading.

Related Articles (adsbygoogle = window.

adsbygoogle || []).

push({});.

. More details

Leave a Reply