Patent 2403249 Summary

(12) Patent Application:	(11) CA 2403249
(54) English Title:	GRADIENT CRITERION METHOD FOR NEURAL NETWORKS AND APPLICATION TO TARGETED MARKETING
(54) French Title:	METHODE DU GRADIENT POUR RESEAUX NEURONAUX, ET APPLICATION DANS LE CADRE D'UN MARKETING CIBLE
Status:	Dead

Bibliographic Data

(51) International Patent Classification (IPC):	G06N 3/08 (2006.01) G06Q 30/00 (2006.01)
(72) Inventors :	GALPERIN, YURI (United States of America) FISHMAN, VLADIMIR (United States of America)
(73) Owners :	GALPERIN, YURI (Not Available) FISHMAN, VLADIMIR (Not Available)
(71) Applicants :	MARKETSWITCH CORPORATION (United States of America)
(74) Agent:	FETHERSTONHAUGH & CO.
(74) Associate agent:
(45) Issued:
(86) PCT Filing Date:	2000-03-15
(87) Open to Public Inspection:	2000-09-21
Availability of licence:	N/A
(25) Language of filing:	English

Patent Cooperation Treaty (PCT):	Yes
(86) PCT Filing Number:	PCT/US2000/006735
(87) International Publication Number:	WO2000/055790
(85) National Entry:	2002-09-13

(30) Application Priority Data:

Application No.	Country/Territory	Date
60/124,217	United States of America	1999-03-15

Abstracts

English Abstract

The present invention is drawn to a unique application of the Maximum
Likelihood statistical method to commercial neural network technologies. The
present invention utilizes the specific nature of the output in target
marketing problems and makes it possible to produce more accurate and
predictive results by minimizing a gradient criterion to produce model weights
to get the maximum likelihood result. It is best used on "noisy" data and when
one is interested in determining a distribution's overall accuracy, or best
general description of reality.

French Abstract

La présente invention concerne une application unique de la méthode statistique du maximum de vraisemblance aux techniques des réseaux neuronaux commerciaux. La présente invention utilise la nature spécifique du résultat de problèmes de marketing ciblé, et permet la production de résultats prévisionnels plus précis par une minimisation d'un gradient visant à produire des pondérations de modèles permettant d'obtenir le résultat assorti du maximum de vraisemblance. Ce procédé s'utilise, de préférence, pour les données bruitées et lorsque l'on cherche à déterminer la précision générale d'une distribution, ou la meilleure description générale de la réalité.

Claims

Note: Claims are shown in the official language in which they were submitted.

CLAIMS

7. A system for training neural networks with a maximum likelihood utility
function,
comprising:

a central application server;

a modeling database connected to said central application server;

at least one workstation networked to said central application server;

at least one multithreaded calculation engine networked to said central
application
server; and

software instructions on said central application server, at least one
workstation
and at least one multithreaded calculation engine so as to provide for:

said at least one workstation to select an initial model function for a
propensity score g(X,W, where W is a set of weights of the neural network and
X is a vector of customer attributes from a modeling database; and
said at least one multithreaded calculation engine to
calculate propensity scores for the customers in the modeling database;

calculate a training error Err, where

Image

measure the error to cheek for convergence below a desired value;
obtain a new model and apply it to new data when convergence occurs;
minimize the error to solve for new weights W by minimizing the gradient
criterion defined by the formula:

Image

11

begin a new iteration of the process by calculating new propensity scores
for the customers in the modeling database.

8. The system for training neural networks with a maximum likelihood utility
function of claim 7, further comprising:

a customer database connected to said central application server and said
at least one multithreaded calculation engine; and

software instructions to apply the new model to customer data from said
customer database upon being selected by said at least one workstation.

9. The system for training neural networks with a maximum likelihood utility
function of claim 7, further comprising:

software instructions on said at least one multithreaded calculation engine
to:

define f as a normalized propensity score related to g(X,W) by the
formula:

g(X,W)=f~~(X,W)

where f is the output of the neural network; and

choose the parameter t in such a way that f may be of the order of
0.5;

wherein 12 is an average response rate in the sample and the above condition
is
satisfied if:

Image

wherein:

12

Image

and gradient criterion is computed as follows:

Image

10. The system for training neural networks with a maximum likelihood utility
function of claim 7, further comprising:

a customer database connected to said central application server and said
at least one multithreaded calculation engine; and

software instructions to apply the new model to a top 20% of a targeted
marketing sample customer pool selected from said customer database by said a
least one workstation.

11. The system for training neural networks with a maximum likelihood utility
function of claim 9, further comprising:

a customer database connected to said central application server and said
at least one multithreaded calculation engine; and

software instructions to apply the new model to a top 20% of a targeted
marketing sample customer pool selected from said customer database by said a
least one workstation.

13

claims 7-12 added to define apparatus of invention.

All the remaining claims are unchanged.

14

Description

Note: Descriptions are shown in the official language in which they were submitted.

CA 02403249 2002-09-13
WO 00/55790 PCT/US00/06735
1 TITLE OF THE INVENTION: Gradient Criterion Method for Neural Networks and
2 Application to Targeted Marketing
3 FIELD OF THE INVENTION:
This invention relates generally to the development of neural network models
to optimize the effects of targeted marketing programs. More specifically,
this
6 invention is an improvement on the Maximum Likelihood method of training
neural
7 networks using a gradient criterion, and is specially designed for binary
output having
8 strongly uneven proportion, which is typical for direct marketing problems.
9
1o BACKGROUND OF THE INVENTION:
11 The goal of most modeling procedures is to minimize the discrepancy between
12 real results and model outputs. If the discrepancy, or error, can be
accumulated on a
13 record by record basis, it is suitable for gradient algorithms like Maximum
14 Likelihood.
The goal of target marketing modeling is typically to find a method to
16 calculate the probability of any prospect in the list to respond to an
offer. The neural
17 network model is built based on the experimental data (test mailing), and
the
18 traditional approach to this problem is to choose a model and compute model
19 parameters with a model fitting procedure.
2o The topology of model-for example, number of nodes, input and transfer
21 functions-defines the formula that expresses the probability of response as
a
22 function of attributes.
23 In a special model fitting procedure, the output of the model is tested
against
24 actual output (from the results of a test mailing) and discrepancy is
accumulated in a
special error function. Different types of error functions can be used (e.g.,
mean

CA 02403249 2002-09-13
WO 00/55790 PCT/US00/06735
1 square, absolute error); model parameters are determined to minimize the
error .-
2 function. The best fitting of model parameters is an implicit indication
that the model
3 is good (not necessarily the best) in terms of its original objective.
Thus the model building process is defined by two entities: the type of model
and the error (or utility) function. The type of model defines the ability of
the model
6 to discern various patterns in the data. For example, increasing the number
of nodes
7 results in more complicated formulae, so a model can more accurately discern
8 complicated patterns.
9 The "goodness" of the model is ultimately defined by the choice of an error
to function, since it is the error function that is minimized during the model
training
1i process.
12 To reach the goal of modeling, one wants to use a utility function that
assigns
13 probabilities that are most in compliance with the results of the
experiment (the test
14 mailing). The Maximum Likelihood criterion is the explicit measure of this
compliance. However, the modeling process as it exists today has a significant
16 drawback: it uses conventional utility functions (least mean square, cross
entropy)
17 only because there is a mathematical apparatus developed for these utility
functions.
1 g What would really be useful is a process that builds a response model that
19 directly maximizes Maximum Likelihood.
For example, a random variable X exists with the distribution p(X, A), where
21 A is an unknown vector of parameters to be estimated based on the
independent
22 observations of X: (x1, x2, ..., xN). The goal is to find such a vector A
that makes a
23 probability of the output p(xl,A)*p( x2,A)* ... *p( xN,A) maximally
possible. Note that
24 the function p(X, A) should be a known function of two variables. The
Maximum
2

CA 02403249 2002-09-13
WO 00/55790 PCT/US00/06735
1 Likelihood technique provides the mathematical apparatus to solve this
optimization
2 problem.
3 In general, the Maximum Likelihood method can be applied to neural
4 networks as follows. Let the neural network calculate a value of the output
variable y
based on the input vector X. The observed values (y1, y2, ..., yN) represent
the actual
6 output with some error e. Assuming that this error has, for example, a
normal
7 distribution, the method can fmd weights W of the neural network that makes
a
8 probability of the output p(yl,W)*p( y2,W)* ~ ~-*p( Yrr~W) maximally
possible. In
9 the case of a normal probability function, the Maximum Likelihood criterion
is
l0 equivalent to the Least Mean Square criterion-which is, in fact, most
widely used for
11 neural network training.
12 In the case of target marketing, the observed output X is a binary variable
that
13 is equal to 1 if a customer responded to the offer, and is 0 otherwise. The
normality
14 assumption is too rough, and leads to a sub-optimal set of neural network
weights if
used in neural network training. This is a typical direct marketing scenario.
16
1~ SUMMARY OF THE INVENTION:
18 The present invention represents a unique application of the Maximum
19 Likelihood statistical method to commercial neural network technologies.
The present
2o invention utilizes the specific nature of the output in target marketing
problems and
21 makes it possible to produce more accurate and predictive results. It is
best used on
22 "noisy" data and when one is interested in determining a distribution's
overall
23 accuracy, or best general description of reality.
24 The present invention provides a competitive advantage over off the-shelf
modeling packages in that it greatly enhances the application of Maximum
Likelihood
3

CA 02403249 2002-09-13
WO 00/55790 PCT/US00/06735
1 to quantitative marketing applications such as customer acquisition, cross-
selling/up
2 selling, predictive customer profitability modeling, and channel
optimization.
3 Specifically, the superior predictive modeling capability provided by using
the present
4 invention means that marketing analysts will be better able to:
~ Predict the propensity of individual prospects to respond to an offer, thus
enabling
6 marketers to better identify target markets.
7 ~ Identify customers and prospects who are most likely to default on loans,
so that
8 remedial action can be taken, or so that those prospects can be excluded
from
9 certain offers.
l0 ~ Identify customers or prospects who are most likely to prepay loans, so a
better
11 estimate can be made of revenues.
12 ~ Identify customers who are most amenable to cross-sell and up-sell
opportunities.
13 ~ Predict claims experience, so that insurers can better establish risk and
set
14 premiums appropriately.
~ Identify instances of credit-card fraud.
16
17 BRIEF DESCRIPTION OF THE DRAWINGS
18 Figure 1 shows the dataflow of the method of training the model of the
present
19 invention.
Figure 2 illustrates a preferred system architecture for employing the present
21 invention.
22
23 DETAILED DESCRIPTION OF THE INVENTION
4

CA 02403249 2002-09-13
WO 00/55790 PCT/US00/06735
1 The present invention uses the neural network to calculate a propensity
score
2 g(X, W), where W is a set of weights of the neural network, X is a vector of
customer
3 attributes (input vector). The probability to respond to an offer for a
customer with
4 attributes X can be calculated by a formula:
_ g(X,W)
p 1+g(X,W)
6 If there are N independent samples and among them n are responders, the
7 probability of such output is:
~g(X;,W)~' ~(1-g(Xi,W))
L - isresp isnon-resp
N
~(1+g(Xi,W))
r=i
9 Using the logarithm of L as a training criterion (training error) in the
form of:
to Err=-1nL=~ln(1+gi )- ~In(gi>- ~ln(1-gi)
(W ieresp isnon-resp
11 The neural network training procedure fords the optimal weights W that
12 minimize Err and thus maximize likelihood of the observed output L. One can
use
13 back propagation or a similar method to perform training. The gradient
criterion that
14 is required by a training procedure is computed as follows:
15 Err _ (~ gi - ~ 1 ~ g; + ~ g' )g~
i=~ 1 ~' gi ieresp ienon-resp 1 - gi
16 In order for the training procedure be robust and stable the output of the
neural
17 network should be in the middle of the working interval [0, 1]. To ensure
that, the
18 present invention introduces the normalized propensity score f which is
related to g
19 as:
2o g(X,W) - f~n (X,W)
s

CA 02403249 2002-09-13
WO 00/55790 PCT/US00/06735
I Now, let f be the output of the neural network and choose the parameter i in
2 such a way that f may be of the order of 0.5.
3 Let R be an average response rate in the sample. The above condition is
4 satisfied if:
i =1/lnl R
R
6 While training the model, the criterion is optimized so the calculation is
based
7 on the output of the neural network using the formula:
N N
8 Err=-1nP=~ln(1+ ft~z~-1 ~~(.f>- ~ln(1-fr'~~~
(-t ~ ieresp ienon-resp
9 The gradient criterion is computed as follows:
f.m-t N fuzes
l0 Err ' --
_ ( 1 ~ vz 1 ~ 1 / f + ~ ~'v~ ).~
1 + f 2 ieresp isnon_resp 1 -,l;
11 The method was tested on a variety of business cases against both Least
Mean
12 Square and Cross-Entropy criteria. In all cases the method gave 20% - SO%
13 improvement in the lift on top 20% of the target marketing sample customer
pools.
14 As shown in figure 1, the method inputs data from modeling database 11 into
a selected model 12 to calculate scores 13. The error 14 is calculated from
comparison
16 with the known responses from modeling database 11 and checked for
convergence
17 15 below a desired level. When convergence occurs, a new model 16 is the
result to
18 be used for targeted marketing 17. Otherwise, the process minimizes the
error and
19 solves for a new set of weights at 18 and begins a new iteration.
The present invention operates on a computer system and is used for targeted
21 marketing purposes. In a preferred embodiment as shown in figure 2, the
system runs
22 on a three-tier architecture that supports CORBA as an intercommunications
protocol.
23 The desktop client software on targeted marketing workstations 20 supports
JAVA. The
6

CA 02403249 2002-09-13
WO 00/55790 PCTlUS00/06735
1 central application server 22 and multithreaded calculation engines 24, 25
run on
2 Windows NT or UNIX. Modeling database 26 is used for training new models to
be
3 applied for targeted marketing related to customer database 28. The
recommended
4 minimum system requirements for application server 22 and multithreaded
calculation
engines 24, 25 are as follows:
9 _
~HP Platform~i'~
Processor: ~ HP
emory~ ~j 256 MB
isk Space: ~ 10 MB*W
*Approximately 100 MB/1 million records in customer database. The
above assumes the user client is installed on a PC with the recommended
configuration found below.

CA 02403249 2002-09-13
WO 00/55790 PCT/US00/06735
Read/Write permissions in area of server
Permissions: installation (no root permissions)
perating System: ~~ HP/UX 11 (32 Bit)
Protoco~ 1: ~~ CP/IP ___ _
Daemons: ~ elnet and FTP (Optional)
1
2 The recommended minimum requirements for the targeted marketing workstations
20
3 are as follows:
4
6 Using the present invention in conjunction with a neural network, the
present
'7 invention provides a user with data indicating the individuals or classes
of individuals
s who are most likely to respond to direct marketing.

Representative Drawing

A single figure which represents the drawing illustrating the invention.

Administrative Status

For a clearer understanding of the status of the application/patent presented on this page, the site Disclaimer , as well as the definitions for Patent , Administrative Status , Maintenance Fee and Payment History should be consulted.

Administrative Status

Title	Date
Forecasted Issue Date	Unavailable
(86) PCT Filing Date	2000-03-15
(87) PCT Publication Date	2000-09-21
(85) National Entry	2002-09-13
Dead Application	2004-12-16

Abandonment History

Abandonment Date	Reason	Reinstatement Date
2003-12-16	FAILURE TO RESPOND TO OFFICE LETTER
2004-03-15	FAILURE TO PAY APPLICATION MAINTENANCE FEE

Payment History

Fee Type	Anniversary Year	Due Date	Amount Paid	Paid Date
Reinstatement of rights			$200.00	2002-09-13
Application Fee			$300.00	2002-09-13
Maintenance Fee - Application - New Act	2	2002-03-15	$100.00	2002-09-13
Maintenance Fee - Application - New Act	3	2003-03-17	$100.00	2002-10-22

Owners on Record

Note: Records showing the ownership history in alphabetical order.

Current Owners on Record
GALPERIN, YURI
FISHMAN, VLADIMIR

Past Owners on Record
None

Past Owners that do not appear in the "Owners on Record" listing will appear in other documentation within the application.

Documents

To view selected files, please enter reCAPTCHA code :

To view images, click a link in the Document Description column. To download the documents, select one or more checkboxes in the first column and then click the "Download Selected in PDF format (Zip Archive)" or the "Download Selected as Single PDF" button.

List of published and non-published patent-specific documents on the CPD .

If you have any difficulty accessing content, you can call the Client Service Centre at 1-866-997-1936 or send them an e-mail at CIPO Client Service Centre.

Filter

Download Selected in PDF format (Zip Archive)

Download Selected as Single PDF

Document Description	Date (yyyy-mm-dd)	Number of pages	Size of Image (KB)
Abstract	2002-09-13	2	65
Claims	2002-09-13	4	85
Drawings	2002-09-13	2	53
Representative Drawing	2003-01-13	1	6
Cover Page	2003-01-14	1	38
Description	2002-09-13	8	345
PCT	2002-09-13	10	331
Assignment	2002-09-13	3	100
Correspondence	2003-01-10	1	25

Language selection

Menus

English Abstract

French Abstract

Administrative Status

Abandonment History

Payment History

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.

Patent 2403249 Summary

English Abstract

French Abstract

Administrative Status

Abandonment History

Payment History

Your request is in progress.Requested information will be availablein a moment.Thank you for waiting.

Your request is in progress.

Requested information will be available
in a moment.

Thank you for waiting.