Fast Thai Word
Segmentation in Rust
kham is a batteries-included Thai NLP engine — zero external dependencies,
no_std core,
and ready for Rust, WebAssembly, Python, C, PostgreSQL, and SQLite.
Everything you need for Thai NLP
A single library for the full pipeline — from raw text to structured tokens with semantic metadata.
Fast
Maximal Matching on a compressed DAWG dictionary. Outperforms PyThaiNLP newmm on throughput while matching accuracy.
Multi-target
One core, many targets: Rust crate, WebAssembly, Python (PyO3), C FFI, CLI, PostgreSQL FTS parser, SQLite FTS5 tokenizer.
no_std core
kham-core is pure Rust with no_std + alloc. Runs in embedded, WASM, and any environment without a standard library.
Full NLP pipeline
Segmentation, POS tagging, Named Entity Recognition, romanization (RTGS), phonetic codes (lk82 / udom83 / MetaSound), number normalization.
Simple, zero-copy API
Segment Thai text into tokens with byte and Unicode char spans — suitable for search indexing, NLP pipelines, and binding to any language runtime.
Getting started guide →use kham_core::Segmenter;
let seg = Segmenter::new();
let tokens = seg.segment("กินข้าวกับปลา");
// ["กิน", "ข้าว", "กับ", "ปลา"] Try it right now
Powered by WebAssembly — runs entirely in your browser, no server needed.
Ready to integrate?
Add kham to your Rust, Python, or Node.js project in minutes.