next up previous
Next: The Logic of Phenomenal Up: PHENOMENAL DATA MINING: FROM Previous: Mail Order Bookstores

Proposed Experiments

 

Grouping supermarket purchases by customer as proposed in Section 3 can be tested with the aid of a supermarket database that does contain customer identification. We discard the customer identification, run our grouping algorithm and compare the results with the genuine customer data.

My present opinion is that grouping baskets by customers is likely to be a difficult but feasible task. As will be seen, it will involve taking advantage special features of the behavior of supermarket customers. In this respect, it may resemble cryptanalysis which often takes advantage of special features of the behavior of senders of messages. Moreover, the results cannot be perfect in terms of identifying the purchasers, but the uncertainties about who bought what may not affect the interesting statistics of customer behavior.

Here are some ideas about how to proceed.

  1. It may be best to start the experiments with a relatively small store. That way there will be fewer assignments to try and fewer similar signatures.
  2. Very likely we should start with a date in the middle of the operation of the system and try to extend identifications both forward and backward in time.
  3. At any time in the computation, there will be a certain collection of putative customers and a set of possible assignments of some of the baskets to customers. Maybe the computational resources will be adequate to deal with hundreds to thousands of possible assignments. Each of these assignments will have an anomaly computed on the basis of what has been assigned so far.
  4. Since many people shop on a weekly basis, it may be worthwhile to try to find some putative customers who buy on a particular day of the week.
  5. It may be possible to find some signatures for some customers that are repeated every week. For example, a shopper may buy both whole milk and skim milk every time, because of the needs of different family members.
  6. The algorithm may grow assignments forward and backward in time. As it goes it will eliminate certain assignments.
  7. When it cannot decide among the assignments over some lengthy period, say two months, it should probably just pick one in order to keep down the number of open choices.
  8. Perhaps there will be a compact way of keeping certain choices open in order to use long term aspects of the signature.


next up previous
Next: The Logic of Phenomenal Up: PHENOMENAL DATA MINING: FROM Previous: Mail Order Bookstores

John McCarthy
Wed Feb 23 17:08:25 PST 2000