Data Requirements

ChatAible uses Aible Sense to prepare and analyse your data.

Currently Aible Sense supports structured, tabular data with regular rows and columns.

Row granularity

The first consideration for your data is the granularity of the data and what each row represents.

For best results Aible Sense needs transactional level data at the level of granularity you wish to analyse.

Such as one row per customer, one row per transaction, one row per employee etc.

Usually this is quite straightforward but when you have a hierarchy of information you may need to adjust it so the analysis will meet your goals.

It’s also worth noting that the machine learning models Aible uses assume that each row of the data is independent of the others, in other words information in one row gives no information about another except insofar as any common feature values they may have.

You can load time series data, with metrics and values for a sequence of timestamps and Aible will find seasonality patterns but it may not be able to discover relationships dependent upon the precise sequence of the records.

Aggregated data - summarising results in a small table - is generally unsuitable for use in AI.

Outcome

The second consideration is the outcome column, also known as the objective or the dependent variable. Aible Sense analyses your data with respect to the outcome column you select.

The most common scenario is to provide a categorical column. In our sales example, below, the column could indicate whether the opportunity was won or lost.*Currently, for categorical outcome columns with more than two distinct values you specify which is the positive value and the other values are grouped together as the corresponding negative.

As well as categorical outcomes you can also use numeric or continuous outcome columns for regression analysis.

You can have more than one outcome field in the data but be careful if they are related to each other as this may affect the analysis - see the Understanding Data Readiness topic.

It’s perhaps stating the obvious but you must have a mixture of values in the outcome column. For instance, in our sales opportunities example, if we only provide records representing successful sales and then Aible Sense won’t be able to find patterns in the data related to whether a sale was successful or not as it apparently makes no difference, regardless of the data every sale will be successful!

We need to provide examples of both the positive and negative case.

Similarly, if the outcome is a continuous value (for a regression analysis) the column can’t only have one value. Aible Sense won’t let you select a single-valued column as the outcome.

Independent variables

In addition to the outcome column, you should provide several numeric or categorical independent variables representing attributes of the record.

These should include real world attributes of the record and not just technical fields like IDs and timestamps.

Consider the case with only transaction IDs and the outcome. Are there likely to be patterns in the ID values that could predict whether the sale will be successful? It’s possible but unlikely.

Much more useful are numeric and categoric attributes and dimensions that describe the nature of the transaction.
Dates can be useful too - sometimes patterns can emerge in the day of week or month in which a particular activity occurs (e.g., are customers more likely to return an item in January? are people more likely to buy insurance on a Monday? etc).
Relative dates - expressed as elapsed time since (or prior to) another event - can also be very useful.

Fewer than 10 descriptive columns and it is possible that Aible won’t be able to find many patterns in the data to explain your target outcome.

Aible can handle many hundreds of columns but we recommend you limit your data to 50 or 60 columns at least to begin with.

Row count

The next consideration is how much data to provide.

Aible works best with datasets of at least 10,000 records. You can certainly analyse smaller datasets with 1,000 rows and fewer but the insights will obviously have less statistical support

At the other end of the scale, you can load large datasets with millions of records. In this case we recommend you start with a smaller sample to rapidly assess the Data Readiness (see the Understanding Data Readiness topic) before moving onto the full dataset.

File format
Your uploaded file must be in CSV or gzipped CSV format and the first row must contain a unique heading label for every column.
CSV files with duplicate column headings, missing column headings or mismatched field counts in the rows may be rejected by Aible Sense.

Data is limited to 500,000 rows during the Beta release.