Skip to content

Embedded Chemistry Metadata

When you export to SVG, you can write machine-readable chemistry data into the file alongside the drawing. The visible graphics stay the same, but the SVG also carries data-mk-* attributes on atoms and bonds plus a molecule-level metadata block. This makes the file readable by scripts and search indexes that understand the format, and it records identifiers like SMILES and (when you are online) the IUPAC name from PubChem.

Turning it on

In the SVG export dialog, check Embed chemistry metadata. There is a related option, Omit per-atom implicit H counts (metadata), that stamps each atom’s data-mk-implicit-h attribute as 0. Molecule-level fields are not affected: the formula, molecular weight, atom count, and other identity fields always include implicit hydrogens so they agree with the SMILES and PubChem records.

The option only affects SVG output. It has no effect on PDF or other formats.

What gets embedded

Per-atom attributes on each atom group include data-mk-element, data-mk-x, data-mk-y, data-mk-index, data-mk-formal-charge, data-mk-lone-pairs, data-mk-domains-bonding, data-mk-domains-nonbonding, data-mk-implicit-h, data-mk-octet, data-mk-valence-electrons, and data-mk-warnings. When the relevant data exists, atoms also carry data-mk-abbreviation, data-mk-hybridization, data-mk-geometry, data-mk-vsepr-angles, data-mk-chiral, data-mk-isotope, data-mk-aromatic, data-mk-ring-sizes, and data-mk-oxidation-state.

Per-bond attributes include data-mk-order, data-mk-atom1, data-mk-atom2, endpoint coordinates data-mk-x1 through data-mk-y2, and, when applicable, data-mk-stereo, data-mk-polarity, data-mk-polar-toward, data-mk-conjugated, and data-mk-resonance-order.

For a single-molecule canvas, the file also gets molecule-level fields in a metadata block and as root attributes: formula, molecular weight, exact mass, SMILES (and canonical SMILES when available), atom count, bond count, total valence electrons, degree of unsaturation, element composition, point group, functional groups, and a V2000 MOL block. A Schema.org MolecularEntity JSON-LD record is included as well. Multi-molecule canvases get the per-atom and per-bond attributes but skip the molecule-level fields.

Online enrichment

If the Embed chemistry metadata box is checked and you have an internet connection, Molkit sends the molecule’s SMILES to PubChem to look up extra identifiers. When the lookup succeeds it adds the IUPAC name, PubChem CID, InChI, InChIKey, and a list of synonyms, and it updates the title and the JSON-LD record with the resolved name. See /chemistry/pubchem/ for how the lookup works.

This step is optional and best-effort. If you are offline or PubChem does not return a match, the export still completes with the locally computed fields above. Only the PubChem-derived fields are skipped.

What it is good for

The embedded data lets other tools index the structure by formula, SMILES, or InChIKey without re-parsing the drawing. The Schema.org JSON-LD and the title text help search engines and accessibility tools describe the file. The full MOL block and SMILES preserve a structural record next to the picture.

Note that Molkit does not yet read this metadata back in. Re-importing a metadata-rich SVG to reconstruct the structure is on the backlog, so treat the embedded data as an export record for now.

Cost

Enrichment adds one or two network requests and a short wait while PubChem responds. The added attributes and metadata increase file size, with the MOL block and JSON-LD being the largest contributors. If you do not need the data, leave the box unchecked to keep the SVG small.

For developers: the full attribute schema, reading patterns, and worked examples (tooltips, inspectors, RDKit.js integration) are in the Developer Guide metadata reference.

See also