The check is in the maize

Did I get your attention?

I am working on a new keyboard for our language and this will be the first generation with predictive suggestions. We are now in late 2023, and out there AI is running wild. I see the concept of “lexical model” and get exited, but the documentation for Keyman and KAB seems to be all about word-lists only.

There is one page on Keyman help, where they are giving the example of “on my w…” and talk about options like “way”, “website” and “whole”. My brain brings up phrases like “on my watch”. I guess that Keyman would bring up “on my waffles”, purely on one-word-frequencies, which would be nice in a part of the world, where people write a lot about waffle toppings.

We have a certain text corpus (not world record level but substantial) and our language is using a lot of such “set phrases” where an “unlikely candidate” (by total word count) would come up as “the best candidate” if a tool would consider even a humble context of two previous words, like in the example of “on my way”.

This would give us some “educated list” of theoretically, say 3.375.000.000.000 possible three-word-combinations. Sounds frighening at first glance, but we would cull that list to keep only entries that show 10 or more actual occurrences in real-world-texts. Should take my notebook a few hours to prepare such a list? I believe there would be no need to ever actually generate the entire list or have it in memory, just crawl through the text-corpus and grab what is actually there.

Those who know me, have now guessed that this is a bait for discussion, and ultimately a feature request. In some other thread today, I had written that I do not like “automatic” so much, because it often gets it wrong. But in the context of a keyboard, the more we developers provide good data, the better the output or response would feel to the users. I guess this needs a plug-in for Fieldworks and maybe some student could have a go at the coding and do at least a feasibility stud? (pun indented)

For those who made it this far:
We have got one specific question for the present state of things (using Keyman developer 16.0.144): In our word-list as exported (and cleaned-up) from Fieldworks, we got a few handfuls of multi-word-phrases with spaces. This must be, because one of our team-members with know-how must have tagged certain common phrases (which might make more or different “sense” semantically, than the individual words used).
So how do we handle those when preparing a lexical model for Keyman? Can the present system handle multi-word-entries in the wordlist? Do we have to split or delete such entries?

1 post - 1 participant

Read full topic

The check is in the maize

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112