Healthcare Archives - Eukairos HealthData

A STS Annotation Tool for EHR Text

Apr 24, 2026

—

Semantic Textual Similarity (STS) evaluation is a standard way to measure how well an embedding model captures meaning. You present the model with pairs of sentences, compute the cosine similarity of their embeddings, and correlate those scores against human judgements. The correlation — Spearman’s ρ — tells you whether the model’s sense of “similar” matches…

Teaching an Old Trick to a Newer, Smarter Dog

Mar 19, 2026

—

in AI, Data, Healthcare

A context-sensitive neural spell checker for clinical text, built on BioClinical-ModernBERT Source code: github.com/eukairos/spellcheck • MIT License The problem with spell-checking clinical notes Clinical documentation is full of spelling errors. That is not a criticism of clinicians — it is a structural reality. Notes are written at speed, on shift, using a vocabulary that sits…

Building a Spell Screener for Clinical Text — And How You Can Adapt It for Any Domain

Mar 15, 2026

—

in AI, Data, Healthcare

Clinical notes are peculiarly messy. Written under time pressure by busy clinicians, they’re full of abbreviations, shorthand, and — inevitably — typos. When you’re building natural language processing (NLP) pipelines that depend on these notes, these ‘features’ become a real problem. This post describes a tool Anthropic’s Claude helped me build to tackle that problem,…

Adding Allergy Nodes to our MIMIC-IV Patient Graph

Mar 5, 2026

—

in Data, Healthcare

In our previous graph database exercise, we built a graph of MIMIC-IV patients, their admissions, and diagnoses associated with each admission. In this exercise, we’ll load some of their allergies. The allergies documented in MIMIC-IV are not in some structured fields, but exist as free text inside clinical notes, which makes it a challenge to…

Topic Modelling 2: Latent Dirichlet Allocation

Feb 5, 2026

—

in Data, Healthcare

Latent Dirichlet Allocation (LDA) is a probability-based topic modelling approach that treats documents as bags-of-words. Conceptually it is similar to Latent Semantic Analysis (LSA, discussed in the previous post) in that it tries to discover a latent space from observed variables, but instead of a deterministic matrix factorization, it uses probability distributions on random variables.…

Building a SNOMED Concept Graph

Jan 2, 2026

—

in Data, Healthcare

SNOMED Is A Knowledge Graph The Systematized Nomenclature of Medicine, Clinical Terms (SNOMED CT) is a de facto standard for standardizing clinical vocabulary and ontology in many parts of the world, including in Singapore. You can access SNOMED CT in a number of ways. If you work in a large healthcare organization, it probably has…

Adjacent Possibles

Jan 1, 2026

—

in AI, Data, Healthcare

Welcome to Eukairos, a collection of musings at the confluence of artificial intelligence (AI), data management and healthcare. The term eukairos is derived from the Greek ευκαιρός, loosely meaning ‘timeliness’ or ‘opportunity’. The short explanation for the site’s name is that English-language domain names are pretty much saturated in the .sg domain. The more involved…

Category: Healthcare

A STS Annotation Tool for EHR Text

Teaching an Old Trick to a Newer, Smarter Dog

Building a Spell Screener for Clinical Text — And How You Can Adapt It for Any Domain

Adding Allergy Nodes to our MIMIC-IV Patient Graph

Topic Modelling 2: Latent Dirichlet Allocation

Building a SNOMED Concept Graph

Adjacent Possibles