Skip to content

Chemistry Metadata

When you enable the Embed chemistry metadata export option, Molkit stamps a machine-readable description of the molecule into the SVG itself: scalar attributes on the root element, a full set of data-mk-* attributes on every atom group and bond group, and a structured mk:chemistry block inside the file’s metadata element. Your page script can read the formula, walk the bond graph, or hand the embedded MOL block to a cheminformatics library, all without a server round trip. This page is the schema reference.

Root attributes

Scalar attributes on the svg root give you the headline identity of the molecule without touching the DOM tree below it. They are deliberately duplicated outside the metadata block so they survive HTML sanitizers that strip metadata elements.

AttributeTypeWhen presentDescription
data-mk-versionstringalwaysSchema version, currently 1.0
data-mk-formulastringsingle molecule on canvasMolecular formula, e.g. C6H6
data-mk-mwnumber stringsingle moleculeMolecular weight in g/mol, rounded to three decimals
data-mk-smilesstringsingle molecule, SMILES generation succeededSMILES string
data-mk-canonical-smilesstringwhen RDKit canonicalization ranCanonical SMILES
data-mk-inchistringPubChem enrichment succeededInChI identifier
data-mk-inchi-keystringPubChem enrichment succeeded27-character InChIKey
data-mk-cidnumber stringPubChem enrichment succeededPubChem compound ID

The last three come from an online lookup at export time (see PubChem integration). If the export happened offline, or PubChem did not recognize the structure, those attributes are simply absent. Guard for null.

Per-atom attributes

Every atom group (the same element that carries the atom’s data-id) gets its own attribute set. Query atoms with [data-mk-element].

AttributeTypeWhen presentDescription
data-mk-elementstringalwaysElement symbol, e.g. N
data-mk-xnumberalwaysCanvas X coordinate
data-mk-ynumberalwaysCanvas Y coordinate (SVG space, Y grows downward)
data-mk-indexintegeralwaysZero-based atom index, matches MOL block order
data-mk-formal-chargeintegeralwaysFormal charge, can be 0 or negative
data-mk-lone-pairsintegeralwaysLone pair count
data-mk-domains-bondingintegeralwaysBonding electron domains (VSEPR)
data-mk-domains-nonbondingintegeralwaysNonbonding electron domains (VSEPR)
data-mk-implicit-hintegeralwaysImplicit hydrogen count; reads 0 when the omit option was checked at export
data-mk-octetenumalwayssatisfied, deficient, or expanded
data-mk-valence-electronsintegeralwaysValence electrons contributed by this atom
data-mk-warningsJSON arrayalwaysValidator warning types; [] when clean
data-mk-hybridizationstringwhen determinablee.g. sp2
data-mk-geometrystringwhen geometry is knownVSEPR geometry name, e.g. trigonal planar
data-mk-vsepr-anglesJSON arraywith data-mk-geometryIdeal angles in degrees, e.g. [120]
data-mk-chiral"true"potential stereocenter detectedFour distinct substituents (heuristic, not full CIP)
data-mk-isotopeintegerisotope label setMass number
data-mk-aromatic"true"atom in an aromatic ringAromaticity flag
data-mk-ring-sizesJSON arrayatom in at least one ringSizes of rings containing this atom, e.g. [6]
data-mk-oxidation-stateintegerwhen computableOxidation state
data-mk-abbreviationstringatom drawn as an abbreviationCustom label, e.g. OEt

Per-bond attributes

Bond groups carry the graph topology plus rendered geometry. Query bonds with [data-mk-order].

AttributeTypeWhen presentDescription
data-mk-ordernumberalwaysBond order: 1, 2, 3, or 1.5
data-mk-atom1stringalwaysdata-id of the first atom
data-mk-atom2stringalwaysdata-id of the second atom
data-mk-x1 / data-mk-y1numberwhen geometry extractedRendered start point of the bond line
data-mk-x2 / data-mk-y2numberwhen geometry extractedRendered end point of the bond line
data-mk-stereoenumnon-plain bonds onlyBond style: wedge, dash, wavy, aromatic, dative, and other drawn styles
data-mk-polaritynumberelectronegativity difference at or above 0.05Pauling EN difference, two decimals
data-mk-polar-towardstringwith data-mk-polarityElement symbol the dipole points toward
data-mk-conjugated"true"bond in a conjugated systemConjugation flag
data-mk-resonance-ordernumberaromatic bondsEffective order, 1.5

The metadata element

Deeper data lives in the mk:chemistry element inside the SVG’s metadata element (namespace https://molkit.app/ns/chemistry/1.0). For a single-molecule canvas it contains the formula (plain and HTML), molecular weight and exact mass, SMILES, atom and bond counts, total valence electrons, degree of unsaturation, an element-composition JSON object, the point group with a confidence rating, a functional-groups JSON array, the full MOL block, and a Schema.org JSON-LD object. PubChem enrichment appends the IUPAC name, compound ID, InChI, InChIKey, and common names when the lookup succeeds.

The JSON-LD payload is a MolecularEntity, taken here from a reference export of the nitrate ion (the SMILES string reflects whichever generator produced the export; RDKit-backed exports carry canonical SMILES):

mk:schema-org contents (nitrate.svg)
{
"@context": "https://schema.org",
"@type": "MolecularEntity",
"name": "NO3",
"molecularFormula": "NO3",
"molecularWeight": "62.004",
"smiles": "N=O(O)(O)",
"hasRepresentation": {
"@type": "ImageObject",
"encodingFormat": "image/svg+xml",
"creator": {
"@type": "SoftwareApplication",
"name": "Molkit",
"url": "https://molkit.com"
}
}
}

Because the SVG ships its own structured-data block, search engines that index inlined SVG can associate the image with a chemical identity rather than treating it as anonymous vector art.

Extracting the MOL block

The mk:mol-block element holds a complete V2000 molfile as text content, ready to feed to RDKit.js, OpenBabel, or any MOL-aware tool. One caveat: Y coordinates are flipped relative to the SVG (chemistry convention is Y-up, SVG is Y-down). Match metadata children by localName, which ignores the namespace prefix and works in every browser:

get-mol-block.js
function getMolBlock(svg) {
const meta = svg.querySelector('metadata');
if (!meta) return null;
const chem = [...meta.children].find(el => el.localName === 'chemistry');
const mol = chem && [...chem.children].find(el => el.localName === 'mol-block');
return mol ? mol.textContent : null;
}

Reading the molecule graph

The atom and bond attributes together form a complete graph. Here is a self-contained parser that builds an adjacency structure you can use for highlighting, tooltips, or analysis. It works against an inlined export (see embedding for how to get the SVG into your page):

parse-graph.js
function parseGraph(svg) {
const atoms = new Map();
for (const g of svg.querySelectorAll('[data-mk-element]')) {
const id = g.getAttribute('data-id');
atoms.set(id, {
id,
element: g.getAttribute('data-mk-element'),
x: parseFloat(g.getAttribute('data-mk-x')),
y: parseFloat(g.getAttribute('data-mk-y')),
charge: parseInt(g.getAttribute('data-mk-formal-charge'), 10),
lonePairs: parseInt(g.getAttribute('data-mk-lone-pairs'), 10),
implicitH: parseInt(g.getAttribute('data-mk-implicit-h'), 10),
hybridization: g.getAttribute('data-mk-hybridization'), // null when absent
ringSizes: JSON.parse(g.getAttribute('data-mk-ring-sizes') ?? '[]'),
neighbors: [],
node: g, // keep the live element for styling
});
}
const bonds = [];
for (const g of svg.querySelectorAll('[data-mk-order]')) {
const bond = {
order: parseFloat(g.getAttribute('data-mk-order')),
atom1: g.getAttribute('data-mk-atom1'),
atom2: g.getAttribute('data-mk-atom2'),
stereo: g.getAttribute('data-mk-stereo'), // null for plain bonds
node: g,
};
bonds.push(bond);
atoms.get(bond.atom1)?.neighbors.push(bond.atom2);
atoms.get(bond.atom2)?.neighbors.push(bond.atom1);
}
return { atoms, bonds };
}

For example, parseGraph(svg).atoms.values() over the nitrate export yields one N with charge: 1 and three O atoms, and the three bonds connect them through the central nitrogen.

Gotchas

  • Conditional attributes need null-guards. getAttribute returns null for absent attributes like data-mk-stereo, data-mk-hybridization, or data-mk-isotope. Decide on a default before parsing.
  • Never use logical-or fallbacks on numeric reads. Zero is a valid value for data-mk-formal-charge, data-mk-lone-pairs, data-mk-implicit-h, and data-mk-oxidation-state. Use ?? after parsing, not ||.
  • JSON-array attributes need JSON.parse. data-mk-vsepr-angles, data-mk-ring-sizes, and data-mk-warnings are JSON text, not plain strings.
  • Molecule-level fields disappear on multi-molecule canvases. data-mk-formula, data-mk-mw, data-mk-smiles, and the single-molecule children of the mk:chemistry element are only written when the canvas held exactly one connected structure. Per-atom and per-bond attributes are still present.
  • Atom IDs are opaque strings. Do not parse them. Match bonds to atoms by comparing data-mk-atom1 and data-mk-atom2 against each atom group’s data-id, as the parser above does.
  • PubChem fields are best-effort. data-mk-inchi, data-mk-inchi-key, data-mk-cid, and the IUPAC name exist only when the export-time lookup succeeded online.
  • The omit-implicit-H export option only blanks one attribute. With it checked, every data-mk-implicit-h reads 0. Nothing else changes: formula, molecular weight, atom count, and total valence electrons always include implicit hydrogens so they agree with the SMILES and PubChem identity fields.
  • Re-export files made with older builds. Earlier exporters had implicit-hydrogen accounting bugs: line-structure carbons could carry data-mk-formal-charge="1", data-mk-octet="deficient", a phantom lone pair, and wrong hybridization, and molecule-level numbers could exclude implicit H while the formula included them. Current exports are internally consistent; treat old files as suspect and re-export.

See also