The use of variable-length codes in order to reduce the number of symbols in a message to the minimum necessary to represent the information in the message, or at least to go some way toward this, for a given size of alphabet. In source coding the particular code to be used is chosen to match the source (i.e. the relative probabilities of the symbols in the source alphabet) rather than any channel through which the message may ultimately be passed.
The main problem in source coding is to ensure that the most probable source symbols are represented by the shortest codewords, and the less probable by longer codewords as necessary, the weighted average codeword length being minimized within the bounds of Kraft's inequality. The most widely used methods for ensuring this are Huffman coding and Shannon-Fano coding; the former is more efficient for a given extension of the source but the latter is computationally simpler. In either case, a large extension of the source may be necessary to approach the limiting compression factor given by the source coding theorem.
See also Shannon's model. Compare channel coding.