Four Mistakes You Make When Labeling Data

Four Mistakes You Make When Labeling DataA checklist of things that can go wrong and how to fix themTal PerryBlockedUnblockFollowFollowingMay 27It’s better to anticipate and fix errors before they reach productionLabeling Data for NLP, like flying a plane, is one something that looks easy at first glance but can go subtly wrong in strange and wonderful ways.

Knowing what can go wrong and why are good first steps to detecting and fixing the errors.

Labeling Data for NLP is one something that looks easy at first glance but can go subtly wrong in strange and wonderful ways.

Knowing what can go wrong and why are good first steps to detecting and fixing the errors.

At LightTag, we make text annotation tools and work closely with our customers to understand what is happening in their annotation efforts and how our product can help them get labeled data faster and with higher quality.

In this post we’ll share four common problems that come up in entity annotations for NLP, discuss their root cause and possible solutions.

White SpaceWhite space is hard to see and can cause confusionPerhaps the most common source of annotator disagreement is inconsistent labeling of trailing and leading whitespaces and punctuation.

That is, one annotator, might label “Tal Perry” and the other will label “Tal Perry “ or “ Tal Perry” or “ Tal Perry “.

This issue also appears with trailing punctuation such as “Tal Perry.

”When measuring annotator agreement or deciding on a golden source of annotation, these conflicts lead to lower agreement scores and ambiguity in the golden set.

These errors are particularly frustrating because the annotation is conceptually correct, and a human wouldn’t really notice or care about the difference.

In fact, that subtlety is the root cause of these kinds of errors.

Typically, your annotators are not concerned with how your algorithms calculate agreement and won’t notice or care about the difference between “Tal Perry” “Tal Perry “ unless explicitly told to do so.

In that regard, the solution is simple, your annotation tool should visually indicate to annotators when they have captured trailing and leading white spaces and let them decide if that is correct according to the guidelines you have set.

Nesting AnnotationsA nest is great for complex things like life.

For NLP you might want something elseAnother common source of disagreement is “Nested Annotations”.

For example, the phrase “The President Of the United States Donald Trump” could be labeled a number of different ways.

A Naive annotation of the phrase says a whole thing is a personA more pedantic approach breaks it down to title and personThe most pedantic annotation separates the title and countryThe cause for this kind of error is fundamental, language is hierarchical in nature, not linear and so linear annotations such as highlighting spans don’t always fit perfectly.

Annotating Nested entities in BratAnnotating a tree relationship in LightTagFrom a UX perspective, a simple solution is to let the annotator create nested annotations such as in Brat or annotate tree structures.

While these solutions work from a UX perspective, they require downstream models that can handle these complex, non-linear structures both in the input and output of the model.

Among our customer base, we haven’t seen mass adoption of structured annotations outside of the linguistic community.

This is mostly due to the additional model and engineering complexity required to work with them.

What we commonly see are annotation projects that guide their team to annotate at the finest possible resolution and apply post-processing to capture the inherent structure at a later stageYour annotation tool should show you conflicts among annotators and let you resolve them.

Adding New Entity Types MidwayAlways take extra precautions when adding new things togetherIn the early stages of an annotation project, you’ll often find you need entity types that were not anticipated.

For example, the set of tags for a pizza chatbot might start with the tags “Size” “topping” and “drink” before someone realizes that you also need a “Side Dish” tag to capture Garlic Bread and Chicken Wings.

Simply adding these tags and continuing work on the documents that haven’t been labeled yet poses a danger to the project.

The new tags will be missing from all of the documents annotated before the new tags were added This means that your test set will be wrong for those tags, and your training data won’t contain the new tags leading to a model that won’t capture them.

The pedantic solution is to start over and ensure that all tags are captured.

However, this is highly wasteful, starting over every time you need a new tag is a less than desirable use of resources.

A good middle ground is to start over, but use the existing annotations as “pre-annotations” which are displayed to the annotator.

For example, LightTag’s text annotation tool lets you do exactly that, showing the annotator pre-annotations which they can accept with the click of a button.

From there they can focus on adding the new tags.

Long Lists of TagsToo much choice can be dangerousOne sure way to increase project costs and lower data quality is to force your annotators to work through very long lists of tags.

Famously, ImageNet has 20,000 distinct categories such as Strawberry, Hot Air Balloon and Dog.

In text, the SeeDev 2019 shared task defines “only” 16 entity types shown here but you can see how they quickly become overwhelming.

The collection of tags for the SeeDev 2019 shared taskIn an annotation process, increasing the number of choices the annotator needs to make slows them down and leads to poor data quality.

Of note, the distribution of annotations will be influenced by how the tags are ordered in the annotation UX.

This is due to the availability bias, where it is much easier for us to recognize concepts that are top of mind (Available in our heads).

Imagenet with its 20,000 categories is an extreme example of this problem, and it is worth reading the paper on how the annotations were collected.

Their methodology consisted of breaking down an annotation task into smaller tasks, wherein each subtask an annotator would annotate one instance of some class (And other workers would have separate validation tasks).

This significantly reduces the cognitive load on an annotator, helping them work faster with fewer errors.

ConclusionData labeling needs to be done fast at scale and with high accuracy, without any one of those compromising the other.

The first step in creating a quality annotation pipeline is anticipating common problems and accommodating for them.

This post showed four of the most common errors that come up in text annotation projects and how text annotation tools like LightTag can help solve them.

About The AuthorTal Perry is founder and CEO of LightTag The Text Annotation Tool for Teams.

He’s also a google developer expert in machine learning and, most importantly, Father of David and partner to Maria.

Tal and David discussing Labeled Data for NLP.

. More details

Leave a Reply