Skip to main content

Fast Thai Word
Segmentation in Rust

kham is a batteries-included Thai NLP engine — zero external dependencies, no_std core, and ready for Rust, WebAssembly, Python, C, PostgreSQL, and SQLite.

# Cargo.toml kham-core = "0.5"
$ pip install kham
$ npm install kham-wasm

Everything you need for Thai NLP

A single library for the full pipeline — from raw text to structured tokens with semantic metadata.

Fast

Maximal Matching on a compressed DAWG dictionary. Outperforms PyThaiNLP newmm on throughput while matching accuracy.

📦

Multi-target

One core, many targets: Rust crate, WebAssembly, Python (PyO3), C FFI, CLI, PostgreSQL FTS parser, SQLite FTS5 tokenizer.

🔒

no_std core

kham-core is pure Rust with no_std + alloc. Runs in embedded, WASM, and any environment without a standard library.

🌐

Full NLP pipeline

Segmentation, POS tagging, Named Entity Recognition, romanization (RTGS), phonetic codes (lk82 / udom83 / MetaSound), number normalization.

Simple, zero-copy API

Segment Thai text into tokens with byte and Unicode char spans — suitable for search indexing, NLP pipelines, and binding to any language runtime.

Getting started guide →
main.rs
use kham_core::Segmenter;

let seg = Segmenter::new();
let tokens = seg.segment("กินข้าวกับปลา");
// ["กิน", "ข้าว", "กับ", "ปลา"]

Try it right now

Powered by WebAssembly — runs entirely in your browser, no server needed.

Samples:

Ready to integrate?

Add kham to your Rust, Python, or Node.js project in minutes.