Skip to content

ChemFormula is a Python library for working with chemical formulas. It supports parsing formulas, generating formatted output strings, calculating molecular weights and weight distributions, and performing arithmetic operations on ChemFormula objects.

License

Notifications You must be signed in to change notification settings

molshape/ChemFormula

Repository files navigation

ChemFormula

PyPI Version CI codecov
Python Versions License
GitHub stars


Table of Contents

  1. Description
  2. How to install and uninstall?
  3. Dependencies
  4. How to use?
  5. Examples and Formula Formatting
  6. Formula Arithmetics (Addition, Subtraction, Multiplication)
  7. Comparing and Sorting
  8. Atomic Weight Data

Description

ChemFormula is a Python library for working with chemical formulas. It allows parsing chemical formulas and generating predefined (LaTeX, HTML) or customized formatted output strings, e.g. [Cu(NH3)4]SO4⋅H2O. ChemFormula also calculates formula weights and weight distributions and enables stoichiometric calculations with chemical formula objects. Arithmetic operations (+, -, *) between formula objects are supported for combining and modifying chemical compositions. Atomic weights are based on IUPAC recommendations (see Atomic Weight Data).


How to install and uninstall

ChemFormula can be installed from the Python Package Index (PyPI) repository by calling pip install chemformula or uv add chemformula.

In order to uninstall ChemFormula from your local environment use pip uninstall chemformula or uv remove chemformula.


Dependencies

ChemFormula uses the casregnum package to manage CAS Registry Numbers®. The corresponding properties of the CAS class are therefore inherited to the ChemFormula class.


How to use

ChemFormula provides the ChemFormula class for creating a chemical formula object:

from chemformula import ChemFormula

chemical_formula = ChemFormula(formula,
                               charge = 0,
                               name = None,
                               cas = None)

Examples:

ethylcinnamate = ChemFormula("(C6H5)CHCHCOOC2H5")
tetraamminecoppersulfate = ChemFormula("[Cu(NH3)4]SO4.H2O")
uranophane = ChemFormula("Ca(UO2)2(SiO3OH)2.(H2O)5")

muscarine = ChemFormula("((CH3)3N)(C6H11O2)", charge = 1, name = "L-(+)-Muscarine")
pyrophosphate = ChemFormula("P2O7", -4)

caffeine = ChemFormula("C8H10N4O2", name = "caffeine", cas = 58_08_2)
theine = ChemFormula("(C5N4H)O2(CH3)3", name = "theine", cas = "58-08-2")

The ChemFormula class offers the following attributes/functions

.formula           # original chemical formula used to create the chemical formula object

.text_formula      # formula including charge as text output

.latex             # formats a formula as a string that can be used in LaTeX

.html              # formats a formula as a string that can be used in HTML

.unicode           # formats a formula with unicode subscript and superscript numbers

.format_formula(   # custom formatting of the formula, .FormatFormula uses the following optional keyword arguments
                formula_prefix = "",                      # precedes the complete formula string
                element_prefix = "", element_suffix = "", # encloses every chemical symbol (Prefix + Symbol + Suffix)
                freq_prefix = "", freq_suffix = "",       # encloses every element frequency (Prefix + Frequency + Suffix)
                formula_suffix = "",                      # closes the complete formula string
                bracket_prefix = "", bracket_suffix = "", # encloses all brackets: {[()]} (Prefix + Bracket + Suffix)
                multiply_symbol = "",                     # replacement for '.' or '*'
                charge_prefix = "", charge_suffix = "",   # encloses every charge information (Prefix + Charge + Suffix)
                charge_positive = "+",                    # symbol for a positive charge
                charge_negative = "-",                    # symbol for a negative charge
	           )

.sum_formula       # collapsed sum formula of .OriginalFormula with all bracketed units resolved as a ChemFormula object,
                   # i.e. use .SumFormula.HTML to retrieve an HTML representation of the sum formula

.hill_formula      # sum formula in Hill notation as a ChemFormula object, i.e. use .HillFormula.Unicode to retrieve
                   # a Unicode representation of the Hill formula (first Carbon, then Hydrogen (if carbon is present),
                   # followed by all other elements in alphabetical order of their chemical symbol)
                   # Source: Edwin A. Hill, J. Am. Chem. Soc., 1900 (22), 8, 478-494 (https://doi.org/10.1021/ja02046a005)

.formula_weight    # formula weight of the chemical formula in g/mol

.mass_fractions    # mass fraction of each element for the chemical formula in the form of
                   # key, value = chemical symbol, mass fraction

.name              # name of the chemical formula object

.is_radioactive    # boolean value whether the formula is radioactive (True) or not (False)

.contains_isotopes # boolean value whether the formula contains specific isotopes (e.g. D or Tc)

.charged           # boolean value whether the formula is charged (True) or not (False)

.charge            # integer value carrying the charge of the chemical formula object

.text_charge       # formatted string of the charge of the chemical formula object (e. g. 3+, 4-, +, ...)

.element           # is a dictionary representation of the formula composition in the form of
                   # key, value = chemical symbol, frequency of this element
                   # e.g.: .element["C"] gives the number of carbon atoms in the corresponding formula object

.cas               # CAS Registry Number® in a formatted way ('_____00-00-0')
                   # .cas is a CAS number object from the casregnum package
.cas.cas_string    # CAS number as a formatted string, inherited property from casregnum.CAS
.cas.cas_integer   # CAS number as an integer value, inherited property from casregnum.CAS
.cas.check_digit   # CAS number check digit, inherited property from casregnum.CAS

Examples and Formula Formatting

The following python sample script

from chemformula import ChemFormula

tetraamminecoppersulfate = ChemFormula("[Cu(NH3)4]SO4.H2O")
ethylcinnamate = ChemFormula("(C6H5)CHCHCOOC2H5", name = "ethyl cinnamate")

uranophane = ChemFormula("Ca(UO2)2(SiO3OH)2.(H2O)5", name = "Uranophane")
muscarine = ChemFormula("((CH3)3N)(C6H11O2)", charge = 1, name = "L-(+)-Muscarine")

caffeine = ChemFormula("C8H10N4O2", name = "caffeine", cas = 58_08_2)

print(f"\n--- Formula Depictions of {muscarine.name} ---")
print(f" Print instance: {muscarine}")
print(f" Original:       {muscarine.formula}")
print(f" Text formula:   {muscarine.text_formula}")
print(f" HTML:           {muscarine.html}")
print(f" LaTeX:          {muscarine.latex}")
print(f" Unicode:        {muscarine.unicode}")
print(f" Charge (int):   {muscarine.charge}")
print(f" Charge (str):   {muscarine.text_charge}")
print(f" Sum formula:    {muscarine.sum_formula}")
print(f" Sum (HTML):     {muscarine.sum_formula.html}")
print(f" Sum (Unicode):  {muscarine.sum_formula.unicode}")
print(f" Hill formula:   {muscarine.hill_formula}")
print(f" Hill formula:   {muscarine.hill_formula.latex}")

print(f"\n--- Formula Weights Calculations with {ethylcinnamate.name.title()} ---")
print(f" The formula weight of {ethylcinnamate.name} ({ethylcinnamate.sum_formula.unicode}) is {ethylcinnamate.formula_weight:.2f} g/mol.")
mole = 1.4
print(f" {mole:.1f} mol of {ethylcinnamate.name} weighs {mole * ethylcinnamate.formula_weight:.1f} g.")
mass = 24
print(f" {mass:.1f} g of {ethylcinnamate.name} corresponds to {mass/ethylcinnamate.formula_weight * 1000:.1f} mmol.")
print(f" The elemental composition of {ethylcinnamate.name} is as follows:")
for stringElementSymbol, floatElementFraction in ethylcinnamate.mass_fractions.items():
    print(f"   {stringElementSymbol:<2}: {floatElementFraction * 100:>5.2f} %")

print(f"\n--- {uranophane.name} and {muscarine.name} ---")
print(f" Yes, {uranophane.name} is radioactive.") if uranophane.is_radioactive else print(f" No, {uranophane.name} is not radioactive.")
print(f" Yes, {uranophane.name} is charged.") if uranophane.charged else print(f" No, {uranophane.name} is not charged.")
print(f" Yes, {muscarine.name} is radioactive.") if muscarine.is_radioactive else print(f" No, {muscarine.name} is not radioactive.")
print(f" Yes, {muscarine.name} is charged.") if muscarine.charged else print(f" No, {muscarine.name} is not charged.")

print("\n--- Accessing Single Elements through FormulaObject.element[\"Element_Symbol\"] ---")
print(f" Tetraamminecopper(II)-sulfate contains {tetraamminecoppersulfate.element['N']} nitrogen atoms.")

print("\n--- CAS Registry Number ---")
print(f" {caffeine.name.capitalize()} has the CAS RN {caffeine.cas} (or as an integer: {caffeine.cas.cas_integer}).\n")

generates the following output

--- Formula Depictions of L-(+)-Muscarine ---
 Print instance: ((CH3)3N)(C6H11O2)
 Original:       ((CH3)3N)(C6H11O2)
 Text formula:   ((CH3)3N)(C6H11O2) +
 HTML:           <span class='ChemFormula'>((CH<sub>3</sub>)<sub>3</sub>N)(C<sub>6</sub>H<sub>11</sub>O<sub>2</sub>)<sup>+</sup></span>
 LaTeX:          \(\(\textnormal{C}\textnormal{H}_{3}\)_{3}\textnormal{N}\)\(\textnormal{C}_{6}\textnormal{H}_{11}\textnormal{O}_{2}\)^{+}
 Unicode:        ((CH₃)₃N)(C₆H₁₁O₂)⁺
 Charge (int):   1
 Charge (str):   +
 Sum formula:    C9H20NO2
 Sum (HTML):     <span class='ChemFormula'>C<sub>9</sub>H<sub>20</sub>NO<sub>2</sub><sup>+</sup></span>
 Sum (Unicode):  C₉H₂₀NO₂⁺
 Hill formula:   C9H20NO2
 Hill formula:   \textnormal{C}_{9}\textnormal{H}_{20}\textnormal{N}\textnormal{O}_{2}^{+}

--- Formula Weights Calculations with Ethyl Cinnamate ---
 The formula weight of ethyl cinnamate (C₁₁H₁₂O₂) is 176.21 g/mol.
 1.4 mol of ethyl cinnamate weighs 246.7 g.
 24.0 g of ethyl cinnamate corresponds to 136.2 mmol.
 The elemental composition of ethyl cinnamate is as follows:
    C : 74.98 %
    H :  6.86 %
    O : 18.16 %

--- Uranophane and L-(+)-Muscarine ---
 Yes, Uranophane is radioactive.
 No, Uranophane is not charged.
 No, L-(+)-Muscarine is not radioactive.
 Yes, L-(+)-Muscarine is charged.

--- Accessing Single Elements through FormulaObject.element["Element_Symbol"] ---
 Tetraamminecopper(II)-sulfate contains 4 nitrogen atoms.

--- CAS Registry Number ---
 Caffeine has the CAS RN 58-08-2 (or as an integer: 58082).

More examples can be found in the folder examples/.


Arithmetics with Chemical Formulas (Addition, Subtraction, Multiplication)

ChemFormula instances can be added and subtracted with each other and can be multiplied with a positive integer factor to create a new ChemFormula instance by summing, subtracting or multiplying element counts and charges:

ATP = ChemFormula("C10H12N5O13P3", -4)  # Adenosine triphosphate
water = ChemFormula("H2O")
dihydrogen_phosphate = ChemFormula("H2PO4", -1)

AMP = ATP + 2 * water - 2 * dihydrogen_phosphate  # Adenosine monophosphate

print("\n--- Arithmetics with ChemFormula Objects ---")
print(f" ATP ({ATP.hill_formula.unicode}) hydrolyzes with two water molecules"
      f" to AMP ({AMP.hill_formula.unicode}) and two inorganic phosphates ({dihydrogen_phosphate.unicode})\n"
      f" releasing energy for cellular processes.\n")

creates the following output:

--- Arithmetics with ChemFormula Objects ---
 ATP (C₁₀H₁₂N₅O₁₃P₃⁴⁻) hydrolyzes with two water molecules to AMP (C₁₀H₁₂N₅O₇P²⁻) and two inorganic phosphates (H₂PO₄⁻)
 releasing energy for cellular processes.

example6.py shows more examples for formula arithmetics.


Comparing and Sorting of Chemical Formulas

ChemFormula allows comparing and sorting of chemical formula objects. Chemical formula objects can be compared with the == operator. Two chemical formula objects are considered equal, if they have the same chemical composition (i.e. the same sum formula) and the same charge. If a CAS number is specified, the CAS number of both objects must also be identical.

Formulas will be sorted into lexicographical order with reference to the Hill notation (Edwin A. Hill, J. Am. Chem. Soc., 1900, 22(8), 478-494). All chemical symbols are sorted alphabetically, with carbon and hydrogen moved to the top position, if carbon atoms are present. Elements with different element frequencies are sorted numerically in ascending order.

from chemformula import ChemFormula

caffeine = ChemFormula("C8H10N4O2", name = "caffeine", cas = 58_08_2)
theine = ChemFormula("(C5N4H)O2(CH3)3", name = "theine", cas = "58-08-2")

l_lacticacid = ChemFormula("CH3(CHOH)COOH", 0, "L-lactic acid", cas = 79_33_4)
d_lacticacid = ChemFormula("CH3(CHOH)COOH", 0, "D-lactic acid", cas = 10326_41_7)

hydrocarbons = [ChemFormula("C3H5"), ChemFormula("C6H12O6"), ChemFormula("C6H12O5S"), ChemFormula("C3H5O"),
                ChemFormula("C4H5"), ChemFormula("C6H12S6"), ChemFormula("C6H12S2O3")]

print(f"\n--- Comparing {caffeine.name.capitalize()} with {theine.name.capitalize()} and Lactic Acid Isomers ---")
print(f" {caffeine.name.capitalize()} and {theine.name} are", end=" ")
print("identical.") if caffeine == theine else print("not identical.")
print(f" {l_lacticacid.name} and {d_lacticacid.name} are", end=" ")
print("identical.") if l_lacticacid == d_lacticacid else print("not identical.")

print("\n--- Lexical Sorting of Chemical Formulas via Hill Notation ---")
for position, item in enumerate(sorted(hydrocarbons), start = 1):
    print(f"{position:>3}. {item.hill_formula.unicode}")

generates the following output

--- Comparing Caffeine with Theine and Lactic Acid Isomers ---
 Caffeine and theine are identical.
 L-lactic acid and D-lactic acid are not identical.

--- Lexical Sorting of Chemical Formulas via Hill Notation ---
  1. C₃H₅
  2. C₃H₅O
  3. C₄H₅
  4. C₆H₁₂O₃S₂
  5. C₆H₁₂O₅S
  6. C₆H₁₂O₆
  7. C₆H₁₂S₆

example4.py provides detailed examples for sorting and comparing ChemFormula instances.


Using Isotopes like Deuterium or Tritium

If hydrogen isotopes are intended to be used, the global flag AllowHydrogenIsotopes must be set to True:

import chemformula.config
from chemformula import ChemFormula
chemformula.config.AllowHydrogenIsotopes = True  # Enable usage of hydrogen isotopes like Deuterium ("D") and Tritium ("T")

The following example

import chemformula.config
from chemformula import ChemFormula

chemformula.config.AllowHydrogenIsotopes = True

water = ChemFormula("H2O")
heavy_water = ChemFormula("D2O")

print("\n--- Isotopes in ChemFormula Objects ---")
print(f" Yes, {water.unicode} contains specific isotopes.") if water.contains_isotopes else print(f" No, {water.unicode} contains no specific isotopes.")
print(f" Yes, {heavy_water.unicode} contains specific isotopes.\n") if heavy_water.contains_isotopes else print(f" No, {heavy_water.unicode} contains no specific isotopes.\n")

creates the following output:

--- Isotopes in ChemFormula Objects ---
 No, H₂O contains no specific isotopes.
 Yes, D₂O contains specific isotopes.

Atomic Weight Data

All atomic weights are taken from the IUPAC Commission on Isotopic Abundances and Atomic Weights and are based on the following reports and publications:

The current data has been downloaded from https://iupac.qmul.ac.uk/AtWt/ as of August 2nd, 2025. The original data has been mirrored to AtWt23.html.

Quoted atomic weights are those suggested for materials where the origin of the sample is unknown. For most radioactive elements the isotope with the longest half-life is quoted as an integer.

Data for hydrogen isotopes are taken from the AME2020 Atomic Mass Evaluation by Meng Wang et al.:


About

ChemFormula is a Python library for working with chemical formulas. It supports parsing formulas, generating formatted output strings, calculating molecular weights and weight distributions, and performing arithmetic operations on ChemFormula objects.

Topics

Resources

License

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages