The Qualitative Data Scientist

Structured data refers to any data that resides in a fixed field within a record or file.

This includes data contained in relational databases and spreadsheets.

Google’s search engine as an example prefers your data to be structured in a specific way.

Unstructured data is information that either does not have a pre-defined data model or is not organized in a pre-defined manner.

Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well.

Semi-structured data is a form of structured data that does not obey the formal structure of data models associated with relational databases or other forms of data tables, but nonetheless contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data.

An easy and simplified way to conceive of this is that structured data works in the specific context for its intended use, for example big data or small data sets with defined data types.

Usage may of course have unintended consequences.

Data types in computer science and computer programming or simply type is an attribute of data which tells the compiler or interpreter how the programmer intends to use the data.

In computer science a compiler is a program that converts instructions into a machine-code or lower-level form so that they can be read and executed by a computer.

An interpreter is a computer science program that directly executes instructions written in a scripting or programming language without requiring it to have been compiled into a machine language program (machine code is a strictly numerical language such as 1s and 0s).

Data science is a "concept to unify statistics, data analysis, machine learning and their related methods" in order to "understand and analyze actual phenomena" with data.

The quality of the qualitative and definitions of qualitativeNow that we have a baseline for a discussion let us look at quality and qualitative.

Quality is the standard of something as measured against other things of a similar kind; the degree of excellence of something.

Although there are likely other definitions.

Qualitative data in statistics is also known as categorical data.

This is data that approximates or characterises but does not measure the attributes, characteristics, properties, etc.

, of a thing or phenomenon.

Qualitative data describes whereas quantitative data defines.

Qualitative data is distinguished by attributes that are not numeric and are used to categorise groups of objects according to shared features.

Qualitative research is a scientific method of observation to gather non-numerical data.

[1]This type of research "refers to the meanings, concepts definitions, characteristics, metaphors, symbols, and description of things" and not to their "counts or measures".

This methodology is often associated with the social sciences, perhaps particularly sociology and anthropology, however it is used in the natural sciences or other fields.

You can as such see that notions of qualitative data and qualitative research share commonalities, yet diverges.

As an example an interview or a picture thought to be qualitative research, i.

e non-numerical data, will likely be numbered in a computer system.

Words, sound, photography that are digital will be read by a central processing unit (CPU) as numbers.

This raises some immediate thoughts or concerns:Certainly qualitative research data is fluid.

It may very well turn quantitative when processed in this digital manner.

Is this a transfer from methods or description?Perhaps the qualitative distinction as non-numerical is wrong to some extent given the current developments?.Even an image tag consists of numerical data.

How far would you have to go to keep your data non-numerical?.It is questionable whether this pursuit is conducive to collaboration.

This distinction between qualitative and quantitative whether in research or data is an oppositional pair that creates discussions that could be fruitful for developments in data science.

The Benefits of Qualitative Methods in Data ScienceLast year in April 2018 Robyn Rap and Vicky Zhang wrote a piece on Medium about Qualitative + Quantitative How Qualitative Methods Support Better Data Science.

They start telling of the potential embarrassment in machine learning projects were you did not think of the obvious and important features.

Thus they argue data scientists without qualitative research can make assumptions about user behaviour that could lead to:Neglecting critical parametersMissing a vital opportunity to emphatize with those using our products orMisinterpreting dataIn this context they posted a cartoon that emphasised their point.

On that note I would recommend you check out cmx.

io a way to create your own xkcd-style comics using HTML markup.

Cartoon created by Indeed UX Research Manager Dave Yeats using cmx.

ioAnother example that I have written about earlier in my article Social Scientists and AI – For Safety, Society and Science is from OpenAI.

Their journal article in Distill called AI Safety Needs Social Scientists published the 19th of February discusses how a specific qualitative method could be used.

Their proposed solution is to replace machine learning with people, at least until ML systems can participate in the complexity of debates we are interested in.

Try with human participants first before considering to replace with ML.

An example debate with two human debaters and a human judge.

Only the debaters can see the image.

Red is arguing that the image is a dog, Blue is arguing for cat.

Image and caption is fetched from the aforementioned journal paper published in Distill.

Of course qualitative data is far more than categorical data.

Qualitative research and quantitative research have massive tomes, indeed methodology in science is very revered or respected.

In saying The Qualitative Data Scientist I do not claim there is such a thing, yet the discussion is what interests me.

It is not for me to say with certainty what is right or wrong in this respect, simply to think relatively freely about these concepts.

I am diving into the deep waters of data science without being able to swim, gasping for air, while seemingly critiquing the professional swimmers.

It is not my intention to be rude.

Rather in this ‘ocean’ of data, we have to check the water quality or examine our relationship to these activities in the first place.

I certainly do enjoy sprawling in this knee-deep water seeing the techniques of those out there, watching with great trepidation to what the future can hold!This is day 28 of #500daysofAI, I sincerely hope you enjoyed it.

–What is #500daysofAI?I am challenging myself to write and think about the topic of artificial intelligence for the next 500 days with the #500daysofAI.

It is a challenge I invented to keep myself thinking of this topic and share my thoughts.

This is inspired by the film 500 Days of Summer where the main character tries to figure out where a love affair went sour, and in doing so, rediscovers his true passions in life.


. More details

Leave a Reply