Commit graph

298 commits

Author SHA1 Message Date
Eric House
ef7c0965ba fix to build wordlist from current sources
I'd lost the old source, so uncompressed a current list to recreate.
2020-12-14 08:57:11 -08:00
Eric House
57ab42223a fix to print older wordlists with tiny headers 2020-12-14 08:57:11 -08:00
Eric House
b8f359c3e5 add filtering to wordlist browser
Add a basic regular expression engine to the dictiter, and to the UI add
the ability to filter for "starts with", "contains" and "ends with",
which translate into ANDed RE_*, _*RE_* and _*RE, respectively (with
_ standing for blank/wildcard). The engine's tightly integrated with the
next/prevWord() functions for greatest possible speed, but unless
there's no pattern does slow things down a bit (especially when "ENDS
WITH" is used.) The full engine is not exposed (users can't provide raw
REs), and while the parser will accept nesting (e.g. ([AB]_*[CD]){2,5}
to mean words from 2-5 tiles long starting with A or B and ending with C
or D) the engine can't handle it. Which is why filtering for word length
is handled separately from REs (but also tightly integrated.)

Users can enter strings that don't map to tiles. They now get an
error. It made sense for the error alert to have a "Show tiles"
button, so there's now a dialog listing all the tiles in a wordlist,
something the browser has needed all along.
2020-08-05 09:47:44 -07:00
Eric House
7daf3313e0 fix to include optional info.txt info 2020-07-25 13:58:29 -07:00
Eric House
d98430aa0d improving prep of byod files 2020-07-25 13:58:29 -07:00
Eric House
6f6e5516c9 add LANGFILTER so byod can build Hungarian 2020-07-25 13:58:29 -07:00
Eric House
d2a997d0ee don't exit badly when piped 2020-07-25 13:58:29 -07:00
Eric House
67d91111df more tweaks for byod 2020-07-25 13:58:29 -07:00
Eric House
a75264c8eb tweaks for byod 2020-07-25 13:58:29 -07:00
Eric House
666d2db62a add Makefile as symlink 2020-07-25 13:58:29 -07:00
Eric House
83b775a52c convert two more perl scripts to python 2020-07-25 13:58:29 -07:00
Eric House
042e5e6eab remove files I'll never need again 2020-07-25 13:58:29 -07:00
Eric House
f30bc77a5f rewrite some dawg perl scripts in python 2020-07-25 13:58:29 -07:00
Eric House
db30cea947 update to work with uncompressed Portuguese source 2020-05-18 20:24:36 -07:00
Eric House
4c28013439 fix per informant's instructions to build from git src 2020-05-15 08:33:23 -07:00
Eric House
0e9661aa19 fix search of wordlists containing duplicates
Hungarian is unique (so far) in having two-letter tiles that can be
spelled with one-letter tiles AND in allowing words to be spelled both
ways. This crashed search based on strings because there were
duplicates. So now search is done by tile arrays. Strings are first
converted, and then IFF there is more than one tile array result AND the
wordlist has the new flag indicating that duplicates are possible, THEN
the user is asked to choose among the possible tile spellings of the
search string.
2020-05-04 08:33:15 -07:00
Eric House
851fa1a76e let's not change the Spanish wordlist name rashly 2020-05-03 21:28:33 -07:00
Eric House
67f74b3808 cleanup hungarian Makefile 2020-05-01 09:26:08 -07:00
Eric House
f1f6d5b642 change name of Spanish wordlist
"Spanish" is redundant
2020-05-01 08:59:58 -07:00
Eric House
dfbbf2d480 don't assert wordlist length wrong
For some reason the header and dawg data in Spanish wordlist don't
agree. Until I fix this, remove the assertion from the (dev-use-only)
script that dumps wordlist since it breaks it for other uses.
2020-05-01 08:59:58 -07:00
Eric House
f7374f54c5 fix Spanish support for lowercase
"special" casing is specified in two places, and I forgot to modify the
second one when I added allowing lowercase alternative spellings
2020-04-30 13:06:01 -07:00
Eric House
fb2fcf15cc tmp fix for Hungarian: remove duplicate words
Find-prefix feature in current code crashes on Hungarian because it
allows duplicates (words that occur spelled with the same letters but
different tile combinations.) Modify Makefile to exclude those (as it
does for all other multi-letter-tile languages). And to pull the git
source of the wordlist on demand.
2020-04-29 12:29:26 -07:00
Eric House
1c0348dbf1 add option to print a delimiter between tiles
For Hungarian, there are "duplicate" words because e.g. the string CS
can be spelled with two tiles or one. If a delimiter is printed at tile
boundaries the duplication goes away.
2020-04-24 21:14:20 -07:00
Eric House
adadbd8647 make symlink relative
Useless if it specifies my machine :-)
2020-04-24 20:09:08 -07:00
Eric House
cc4776d29d Populate an actual wordlist for Hungarian
Add Makefile filters to create a wordlist with about 42K words derived
from a github project (thanks to pointers from an informant. :-) Per
him, and contrary to how Catalan does it, double-letter-tile words
also appear in single-letter variants if the tiles allow.
2020-04-24 13:44:55 -07:00
Eric House
cc28562061 files to create empty Hungarian wordlist 2020-04-24 06:34:30 -07:00
Eric House
2204d951a7 don't crash dumping empty wordlists 2020-04-23 22:10:25 -07:00
Eric House
3c1a748272 fix dictionary sum checking server-side 2020-03-18 22:28:58 -07:00
Eric House
390753ae3a fix to correctly wrap L·L in 2x2 cell case 2020-01-31 18:57:12 -08:00
Eric House
d4d4693def makefile for new French wordlist 2019-12-28 09:01:10 -08:00
Eric House
1e7ae2b2ec fix lower->upper translation: tr didn't like my strings
For whatever reason, though emacs thought the lower- and uppercase
strings I was passing to tr were the same but for case two letters were
getting dropped. This lets tr figure things out itself.
2019-11-19 22:27:41 -08:00
Eric House
58c3ab4e4a first cut at python version of dawg2dict
Perl version doesn't work and I don't remember enough of the language to
fix it.
2019-11-13 13:22:30 -08:00
Eric House
c5a8319fa8 fix comment 2019-11-13 12:32:06 -08:00
Eric House
611e046987 check for undefined variable 2019-11-12 09:08:08 -08:00
Eric House
56fc359f42 fix counts/values for 'D' (thanks Peter) 2019-11-03 18:12:24 -08:00
Eric House
a1a71df7c6 update Makefile for latest Catalan wordlist 2019-10-22 13:24:40 +02:00
Eric House
2a54c9ed20 fix Slovak makefile for new wordlist 2019-08-19 07:49:23 +03:00
Eric House
2e4d3a1276 add source of current (dinky) Slovak wordlist 2019-08-08 09:57:40 -07:00
Eric House
df14108e4e add lowercase equivalents
where missing and seems possible
2019-07-07 13:00:06 -07:00
Eric House
4abefb025c add lower-case letters as alternatives 2019-07-07 13:00:06 -07:00
Eric House
411707a3a1 fix NPE with empty wordlist (and add note for Greek) 2019-06-29 16:44:38 -07:00
Eric House
896d63bc48 build from new wordlist 2019-06-07 21:16:56 -07:00
Eric House
838d0e5cc2 Makefile for CSW19 ... ish 2019-05-22 19:24:44 -07:00
Eric House
2f264e36ca add makefile for new wordlist 2019-03-23 18:53:48 -07:00
Eric House
3cf8d7571b fix md5sum calc for non-utf8 wordlists
And use apache logging
2019-01-05 18:46:58 -08:00
Eric House
309622b592 put back a couple of words -- not dirty! 2017-05-05 06:48:52 -07:00
Eric House
8752432de3 add ability to filter out "dirty" words
If a Makefile defines a dirty word list then a new python script is
invoked to filter for and remove those words as the dict is being
built. So far I have for English only, which makes sense because only
English wordlists are built-in on Android and Google's rating system
cares only about what's built in.
2017-05-04 22:45:27 -07:00
Eric House
c62c9899eb Hack: use sed to strip utf-8 marker from start of file. 2016-01-04 20:38:49 -08:00
Eric House
22dde029c8 Merge tag 'android_beta_100' into android_branch
ready for release
2016-01-03 11:36:37 -08:00
Eric House
8c26cf726a file for new French wordlist (not publicly available yet) 2016-01-01 19:32:32 -08:00