Understanding the Data Readiness Score

andrew · April 12, 2023, 11:08am

ChatAible uses Aible Sense to analyse your data ready for querying

The primary output from Aible Sense is the data readiness score.

This score measures how effectively patterns and features in the data can be used to model the chosen outcome field and we use this as an indication of whether the data is ready for further analysis with ChatAible.

It is presented as a value between 0 and 10.

A mid-range score is good, indicating that Sense has trained a model that can use information in the other columns to explain the outcome column with reasonable confidence.

What does a low score mean?

A very low score around or below 1 indicates that there is little or no information in the data that Sense can use to explain the selected outcome.

A low score is typically an indication that we need to add to the data:

we can add more columns to the data - perhaps there are other fields we can add that may help explain the outcome
or we can add more rows - if the record count is too low then there may not be enough examples for Aible to find any patterns

Note that a low score is not a terminal problem - we can proceed and query the data with ChatAible as it is but we just have to be mindful that the insights it finds may not have a great deal of statistical support.

What if the Data Readiness is too high?

A very high score of 9 or above is typically too good to be true. There is information in another field that can be used to explain the outcome too accurately. This is often due to data in a column that is known at the same time or after the outcome itself is known.

To diagnose a high data readiness score we can click on the data readiness panel to expand it:

The expanded panel has 3 additional sections: Problematic predictors, useful variables and recommended derived fields.

The problematic predictions section highlights fields that are highly correlated with the outcome for us to apply our judgement as to whether they are appropriate to include.

Highly correlated fields are not necessarily a problem - it is quite common to have correlated information in a dataset - but they are a problem if the correlated fields hold information that the AI can use to cheat.

For instance, in this example, we can see that StageName is very highly correlated with our IsWon outcome:

In fact StageName is the same information just encoded differently. An AI model could simply look up the StageName value to perfectly predict the IsWon outcome and since StageName is only known at the same time that IsWon becomes known it is of no help in understanding the drivers of a successful sale in descriptive analysis with ChatAible.

We can easily remove the column and rerun the analysis. Check the box next to the field to exclude and then select Apply and Refresh Analysis:

This doesn’t alter the original dataset card but creates a new one. We can always go back to our original dataset card if we want to work on untransformed data.

Problematic predictors are not always as obvious as this. More often fields contain information that implies the outcome by the presence of values.

For example, we could have a product delivery date column which is only ever populated for our closed won opportunities but isn’t always but remains blank when no delivery was required. This would be a problematic predictor because the product delivery date is only ever populated after the sale outcome is known and is therefore of no use predicting whether the deal will be won. An AI might not be able to use it to absolutely determine the outcome but it is certainly a shortcut it should not be allowed to use.

We can continue to review and adjust the data in Aible Sense or proceed with augmented analysis using ChatAible.

Topic	Replies	Views
Data Readiness is too High/Low Troubleshooting sense , data-loading	30	April 18, 2023
Data Requirements Getting Started	76	April 12, 2023
Loading Data Into ChatAible Getting Started	104	April 12, 2023
What is Aible Sense? FAQ	29	April 18, 2023
New entry screen, new use cases and data discovery Product Updates	40	July 10, 2023

Understanding the Data Readiness Score

Related topics