Next: Introduction

PHENOMENAL DATA MINING: FROM DATA TO PHENOMENA

John McCarthy
Computer Science Department
Stanford University
Stanford, CA 94305
jmc@cs.stanford.edu
http://www-formal.stanford.edu/jmc/

JanFebMarAprMayJun JulAugSepOctNovDec , :< 10 0

Abstract:

Phenomenal data mining finds relations between the data and the phenomena that give rise to data rather than just relations among the data.

For example, suppose supermarket cash register data does not identify cash customers. Nevertheless, there really are customers, and these customers are characterized by sex, age, ethnicity, tastes, income distribution, and sensitivity to price changes. A data mining program might be able to identify which baskets of purchases are likely to have been made by the same customers. In this example, the receipts are the data, and the customers are phenomena not directly represented in the data. Once the ``baskets'' of purchases are grouped by customer, the way is open to infer further phenomena about the customers, e.g. their sex, age, etc.

This article concerns what can be inferred by programs about phenomena from data and what facts are relevant to doing this. We work mainly with the supermarket example, but the idea is general.

In order to infer phenomena from data, facts about their relations must be supplied. Sometimes these facts can be implicit in the programs that look for the phenomena, but more generality is achieved if the facts are represented as sentences of logic in a knowledge base used by the programs.

The result of phenomenal data-mining might include an extended database with additional fields on existing relations and new relations. Thus the relations describing supermarket baskets might be extended with a customer field, and new relations about customers and their properties might be introduced.

John McCarthy
Thu Apr 6 16:23:28 PDT 2000