CADpeek: a free in-browser CAD viewer, with a custom WASM parser for DGN
Free in-browser DXF viewers exist. Free in-browser DGN viewers did not. We built one in hours rather than months by writing a Rust WebAssembly parser for Bentley's DGN format - including the v8 binary container that requires a separate parser from v7. Here is the deep-dive.
The problem
Czech land surveyors, architects, and parcel owners regularly receive CAD files they need to look at - open them, check the geometry, confirm the parcel boundary, hand them back. They don’t need to edit. They don’t need a $5k MicroStation seat. They just need to view.
What existed:
- Free in-browser DXF viewers - a few, mostly varying quality. DXF is a well-documented format with at least one good open-source JS parser.
- Free in-browser DGN viewers - none. DGN is Bentley’s MicroStation format. The only parsers are server-side (GDAL, dgnlib, the official Bentley SDK), and most run native code that doesn’t compile cleanly to WebAssembly.
That gap was the entire reason to build CADpeek. Free in-browser DXF viewers are a commodity. Free in-browser DGN viewers were, until recently, a category of one - zero.
The constraint that made this interesting wasn’t the format itself, though. It was privacy. The data Czech surveyors carry around - cadastral surveys, easement maps, parcel boundaries, draft architectural drawings - is sensitive. Their professional code, their clients’ expectations, and increasingly their GDPR posture rule out uploading these to someone else’s server. A “free DGN viewer that you upload your file to” is a non-product. The only acceptable architecture was file never leaves the device.
Which meant: WebAssembly. In the browser. No server.
The wrong path
The path most people take when they want to view a vendor format in a browser is the wrong one, but it’s worth saying out loud because it’s tempting.
Option A: shell to ODA File Converter on a server, render the converted DXF. Cheap, fast, well-trodden. Loses on the privacy constraint - file leaves the device.
Option B: emscripten-port GDAL to WebAssembly. GDAL is C++, ~2M lines, with a hundred dependencies. There are partial WASM ports floating around but none with the DGN driver in a state we trusted. Estimated effort to make this work: weeks of toolchain wrangling for a working starting point, more for production. And the resulting bundle would be ~30 MB.
Option C: write our own parser in Rust, compile to WebAssembly. Daunting for a binary format with no clean spec, but pleasingly bounded: just the geometry types we need, not the entire DGN surface. Production bundle: a few hundred KB.
We picked C, with the caveat that we’d accept partial coverage. Cover the element types that show up in real Czech-survey files. Punt anything exotic to a future iteration.
The format, briefly
DGN is two unrelated binary formats sharing a name.
v7 (ISFF). Intergraph Standard File Format. Publicly documented. A flat sequence of elements. Each element starts with a small header (type + level + size), then a type-specific payload. No compression, no containers, no encryption. You walk a byte cursor through the file, dispatch on element type, accumulate geometry.
v8 (OLE2 / CFB). Microsoft Compound File Binary - same container format as old .doc and .xls files. Elements live inside zlib-compressed streams at named paths like /Dgn-Md/#000000/Dgn^G/$<N>. Bentley publishes the v8 SDK under NDA; everything else is observation. GDAL and dgnlib are the public reference points.
Same element-type vocabulary in both. Different on-disk layout. The parser shares almost no code between v7 and v8 beyond type names.
Detect by reading the first 8 bytes:
fn looks_like_v8(bytes: &[u8]) -> bool {
bytes.len() >= 8 && bytes[..8] == [0xd0, 0xcf, 0x11, 0xe0, 0xa1, 0xb1, 0x1a, 0xe1]
}
That magic header (d0 cf 11 e0 a1 b1 1a e1) is the OLE2 compound-file signature. Anything else with a 08 09 fe ... prefix and a TCB element at offset 0 is v7. Anything else is not a DGN file and we reject it.
The bug that ate 40% of the geometry
The single biggest failure mode in the v8 parser, which cost us the most time to find, was element alignment.
The intuition is that binary elements are aligned to some natural boundary - 4 bytes, 8 bytes, often 16. So when you walk a stream of variable-length elements, you advance by the element’s declared size, then re-align to the next boundary. We assumed 8-byte alignment, which is the most common for floating-point-heavy formats:
// what we tried first - wrong
pos += bytes_used;
pos = (pos + 7) & !7; // re-align to 8 bytes
This worked on our first three test fixtures. We shipped a smoke test. It worked on the smoke fixture. We tried it on a real Czech-survey file (60_Sedliste_PV.dgn) and roughly 40% of the geometry was missing.
The actual rule: v8 streams pad with 0, 4, or 6 zero bytes between elements - never 8. Forced 8-byte re-alignment was stepping over the start of the next element header on the elements that had non-multiple-of-8 sizes. The fix:
// after parsing an element:
pos += bytes_used;
// skip leading zero u32s before the next header
while pos + 4 <= dec.len() && dec[pos..pos + 4] == [0, 0, 0, 0] {
pos += 4;
}
That single fix took us from “30% of files render correctly” to “the parser works.” It’s the kind of bug that doesn’t surface in synthetic fixtures and only shows up when you load a file that came out of MicroStation 8.11 SS3 running on Windows 7 in a surveyor’s office.
This is also the bug that taught us the right reference data isn’t the spec - it’s a CSV dump from GDAL of an actual customer file. If the spec doesn’t tell you about the padding rule, the spec is incomplete. The CSV dump tells you what’s actually there.
Czech-specific work
If you’re building this for a global audience, you stop at “files load.” We were not building it for a global audience - at least not first. The first 500 users were Czech surveyors. That meant a pile of Czech-specific work that doesn’t get talked about in international CAD-format discussions.
CP1250 vs UTF-16 text decoding. DGN’s TEXT element stores its string in one of two encodings, signalled by a BOM-style prefix. Files written with Czech locale produce CP1250 (Windows-1250, Latin-2) text without a BOM. Files written with explicit Unicode produce UTF-16 LE with the standard FF FE BOM. If you decode CP1250 as UTF-8, every Czech diacritic - á, é, í, ó, ú, č, š, ž, ř, ě, ů - turns into garbage. We ship a 128-entry CP1250 -> Unicode table and BOM-sniff per string.
Then there’s /Dgn^Nm/$17, the named-entity stream that contains level names. Inside, renamed levels use UTF-16 LE - but with a non-standard FF FD marker instead of FF FE. Same purpose as a BOM. Different bytes. The kind of thing that takes a day to find and twenty seconds to fix once you’ve found it.
S-JTSK coordinates and float32 precision. Czech surveying uses the S-JTSK / Křovák East-North coordinate reference system (EPSG:5514). Coordinates are large negative numbers - X around -460000 to -900000, Y around -1000000 to -1300000. Three.js renders with float32 GPU buffers. Float32 has ~7 significant digits of precision; at the scale of S-JTSK coordinates, that gives you precision in the decimeters, not millimeters. Geometry jitters when you zoom in.
The fix is straightforward but easy to forget: compute a median-based offset at load time, translate every vertex into a local coordinate frame near origin, render the local frame, display the real coordinates in the status bar. The whole CRS abstraction lives in the rendering layer; the parser stays honest.
Surveyor’s field-book overlay. Czech surveyors record GPS points in a text file alongside the CAD drawing - point ID, X, Y, Z, and a code that maps to ČÚZK symbology (the cadastral office’s standard symbol set). The text file uses positive coordinates; the matching DGN uses negative coordinates. Tradition. On overlay load, we detect the sign mismatch and flip the TXT.
These are the kinds of details that make a tool feel native to its users. They’re also exactly the kind of detail that the global version of this product would never get right.
The agentic workflow
Why this could be built in hours rather than months, in one sentence: the parser was the kind of work where an agent’s verification loop is dramatically more efficient than a human’s.
Concretely, here’s what the loop looked like:
- The agent reads the GDAL DGN driver source as reference for v7 element types and TCB layout.
- It writes a Rust function that walks v7 elements, returning a list of types and positions.
- It runs
cargo testwith a fixture file. Fails on element count mismatch. - It dumps hex of the first failing offset, compares against the GDAL reference CSV.
- It identifies that the
words_to_followfield is in 16-bit words (so total element size is4 + words_to_follow * 2bytes, not4 + words_to_follow). - Fixes the offset math. Runs the test. Passes.
- Moves to the next failure mode. Repeats.
A human doing the same loop would burn an hour per round-trip: open hex editor, find offset, count bytes, edit code, recompile, re-run. An agent doing it does the same loop in minutes. Across the dozens of small fixes a binary parser needs, the difference compounds into the difference between afternoon and quarter.
The human’s job in this loop is at the goal level:
- “Parse v7 element types LINE, LINE_STRING, SHAPE, ARC, ELLIPSE, TEXT, CELL_HEADER for now. Skip the rest.”
- “Files of
60_Sedliste_PV.dgnshape are ground truth. Compare againstgdalinfooutput.” - “Performance target: parse a 10 MB DGN file in under 200ms on a 2020 MacBook Pro.”
And reviewing the artifact:
- Read the smoketest output for sanity.
- Render a real file in the browser and look at it.
- Check that the layer panel shows real layer names rather than
Level 0..21.
We don’t review every line of generated Rust. We review the artifact: does it parse the file correctly? That review takes minutes and catches the actual errors.
What this is not: handing the agent a one-line prompt and getting back a parser. We provided the GDAL source, the dgnlib reference, our priority list of element types, and our fixture files. We reviewed the smoketest output. We caught the alignment bug ourselves and pointed the agent at the failure. We made the architectural call to compile to WASM rather than to JS.
The compression came from the parts in between - the dozens of small fix-and-verify loops that would have eaten our week. Those happened in agent-minutes.
The architecture, post-hoc
The viewer that wraps the parser is straightforward by comparison:
- Vite + React + TypeScript for the UI. Single-page app, no router, no server.
- Three.js for the rendering layer. 2D top-down view is an orthographic camera looking down the Z axis; 3D orbit is the same scene with a perspective camera and orbit controls. Same geometry, different camera.
- Rust -> wasm-bindgen -> npm package for
dgn-wasm. Compiled as a local package, imported asimport init, { parse_dgn } from 'dgn-wasm'. The WASM module is ~180 KB compressed. dxf-parser(existing open-source) for DXF. We didn’t rewrite something that already worked.- Native browser parsers for SVG, GeoJSON, and KML. Each is straightforward.
- No backend. No build-time API keys. No telemetry beyond the analytics script the operator chooses (CADpeek itself can ship with Plausible, Fathom, Umami, or nothing - set in a single config file).
The build pipeline: cargo build --release produces a .wasm binary that gets bundled by Vite. No special toolchain to run locally. npm run dev works on any developer’s machine.
Deployment is a static-hosting drop. The current production deployment runs as a Docker container under Dokploy, but dist/ is plain HTML/JS/WASM and works on Cloudflare Pages, Netlify, Vercel, GitHub Pages, or any nginx with the right cache headers.
What we left for V2
The honest accounting of what doesn’t work yet:
- DGN cell libraries. Czech-survey DGN files reference external
.celfiles for the surveyor’s symbol library (PUB_OBRAZY_PARCEL_VIEW). We place the cells at the right position; we draw them as generic dots because the actual cell geometry is in a file not embedded in the DGN. The fix is to bundle the ČÚZK template.celfiles into CADpeek - pending license clearance from the cadastral office. - DWG. Pre-convert via ODA File Converter, then load the converted DXF. CADpeek’s UI links to ODA. This is the right answer until someone WASM-ports the GDAL DWG driver, which is a separate (larger) project.
- B-spline rendering fidelity. We render B-splines as polylines through the control poles - visually close enough for most viewing tasks, not mathematically correct. Real evaluation requires implementing De Boor’s algorithm, which is a future iteration.
- Mobile UI. The current UI works on desktop and tablet. Phones get a degraded experience.
The roadmap publicly lists these so users know what’s coming. We don’t pretend more is shipped than is.
What it cost
The tangible costs:
- Time: hours rather than months. Genuinely hours - the first version that opened a Czech-survey DGN file rendered in a browser was running by end of day.
- Money: the domain, a Dokploy app slot. The agentic workflow is not free, but it’s an order of magnitude cheaper than the engineering time it replaces.
- Open-source debt: we owe references back to GDAL and dgnlib in our NOTICE file. We don’t redistribute their code; we cross-referenced their behavior, which is fair use, and we credit them anyway.
The intangible:
- Permission to talk publicly about our agentic-workflow practice. This is the part that mattered for the studio. CADpeek is the public artifact that lets us tell consulting clients: here is what an hours-not-months project looks like; here is the code; here is what the agent did and what we did. Without it, “we work in an AI-agentic way” is marketing. With it, it’s a reference.
What’s next
The immediate path: extend coverage on real customer files. We get bug reports - a file that crashes, a layer that renders the wrong color, a text element with garbled Czech diacritics - and turn them into fixes. The DGN format doc in our public repo (see docs/dgn-format.md) accumulates every weird thing we find. That doc is itself the kind of artifact most CAD-format projects never publish, and it’s the part of the project we’re proudest of.
The medium path: V2 of the DGN parser is a WASM port of the GDAL DGN driver - MIT-compatible - to get coverage of every DGN dialect rather than just Czech survey. That’s a bigger project than the original parser, and it’s exactly the kind of work the agentic workflow is built for.
The big-picture path: CADpeek is one tool in what we want to be a small portfolio of free, niche, privacy-preserving viewers and editors for vendor-locked formats. If you’re stuck on the wrong side of one - proprietary format you’d rather not pay $50k/year to access - that’s the kind of email we read.