Simplified molecular input line entry system. A type of line notation for representing molecules and reactions. It uses standard atom symbols but hydrogen atoms are omitted. A double bond is represented by an equals sign and a triple bond by a hash sign. So ethane is CC, ethane is C=C, and ethyne is C#C. Branched structures are given using brackets. For example, triethylamine (C2H5)3N) is represented by CCN(CC)CC; ethanoic (acetic acid) (CH3COOC) is CC(=O)O. Ring structures use numbers to show connections. For instance, cyclohexane is C1CCCCC1, where the C1 atoms are the ones joined to form the ring. SMILES notation for aromatic molecules uses lower case letters. Benzene is c1ccccc1; chlorobenzene could be c1c(Cl)cccc1. Rules exist for expressing isotope and stereochemistry information and there is a notation for reactions. Rules also exist for choosing a unique SMILES, which is a definitive representation of the many possible valid strings that may be written for a given compound. An extension of SMILES to describe molecular patterns (for substructure searches) is called SMARTS.

