An approach to machine learning in which feedback on the desirability of an outcome is gained during interactions with a problem environment. The feedback (or reward) signal indicates the effect of past action in terms of success, e.g. a win\lose signal at the end of a game. This is different from supervised learning, because the reward signal is delayed and a form of trial-and-error search is therefore involved. The aim of this method is to discover which actions are the most suitable for different situations and improved future interactions.