The multilayer perceptron (MLP) is a type of artificial neural network (ANN) widely used in computer science and engineering for object recognition, discrimination and classification, and process monitoring and control. "Training" the network (fitting the model to data) is not a straightforward optimization problem.
This talk examines features of the implicit network model which contribute to the optimization difficulties. We examine the likelihood surface of the model and describe its singularities which create the difficulties for optimization routines. The form of the model allows a simple iterative weighted least squared algorithm to be used for maximum likelihood analysis, although the multiple singularities in the likelihood require a large number of random starting points for the algorithm to give the global maximum in even simple problems. We reformulate the model as an explicit latent variable model. This has the same parameters and mean structure as the MLP but a different variance structure. The likelihood for this model does not have the singularities of the MLP though it may have local maxima. An EM algorithm can be used for ML in this model, which is a finite mixture model, equivalent to a special form of the "mixture of experts" model.