Commit graph

308 commits

Author SHA1 Message Date
Eric House
047f68aafd version with checksum and note 2012-09-08 13:18:58 -07:00
Eric House
ca88c9850e add DICTNOTE 2012-09-08 13:17:49 -07:00
Eric House
977bee15d9 add DICTNOTEs 2012-09-08 10:10:17 -07:00
Eric House
8e58d8c1c0 print header elems, including md5sum, if present rather than just
skipping.
2012-09-07 20:34:55 -07:00
Eric House
0b81516682 add md5sum to dict header, summing not the whole file but the parts
that make the wordlist unique: tile counts and values, and bitmaps,
and the data.  This happens to be contiguous data on non-palm .xwd
files so it's easy to duplicate if the sum isn't there.
2012-09-07 20:32:10 -07:00
Eric House
568bef7ac3 add DICTNOTE 2012-08-30 07:08:55 -07:00
Eric House
a0e8b6c076 add description 2012-08-26 21:36:00 -07:00
Eric House
b29df8512a add null-terminated note to dawg header and modify linux client to
accept it if present.  Android client will successfully ignore it and
will need to be modified to capture and display it if present.  Idea's
to display information about copyright, source, etc. of wordlists.
2012-08-25 10:20:52 -07:00
Eric House
9185ec71ca use newest Catalan wordlist 2012-07-26 21:14:40 -07:00
Eric House
baa8c7472d include stylesheet in generated index.html 2012-05-30 22:16:43 -07:00
Eric House
fd7a25ba3c makefile for just-released DISC2 wordlist for Catalan 2012-05-23 20:04:11 -07:00
Eric House
07e93971d3 makefile for latest CSW 2012-01-17 18:19:57 -08:00
Eric House
cfa4c96d22 just for grins: japanese dict-building files. There are too many kana
for the current format so this can only be for demos, but I might as
well record it.
2011-08-29 20:42:27 -07:00
Andy2
332767105c express size in K (rounding up) 2011-05-15 07:37:29 -07:00
Andy2
7ccacdc26d switch size and wordcount columns 2011-05-15 07:28:10 -07:00
Andy2
deeb2f3cba fix compile-command 2011-04-29 06:24:41 -07:00
Eric House
1ab5aa02b9 Makefile for new dict containing 4288 words: good for the robot. 2011-04-14 22:09:44 -07:00
Andy2
4272686034 Makefile for new smaller Dutch wordlist 2011-04-08 22:13:31 -07:00
Andy2
ce61427bba generate md5 sum file optionally. Later I'll want to download these
to check that the file arrived safely.
2011-03-02 19:00:25 -08:00
Eric House
beaa7ba5a5 assume dict is utf8-encoded but check and fail if it isn't 2011-02-08 20:57:41 -08:00
Eric House
481a533e58 ignore uncompressed dicts too 2011-01-24 22:21:44 -08:00
Eric House
c7b6d799f0 switch to utf8 2011-01-07 18:05:57 -08:00
Andy2
5459631c76 No need for empty .dict when creating empty .dict.gz 2011-01-06 18:20:56 -08:00
Andy2
6f2cde1304 create an index at the top of page; indent dict lines; drop ".xwd" 2011-01-06 18:09:10 -08:00
Andy2
2cc46d8a69 get rid of unused but oft-included file 2010-12-17 19:02:01 -08:00
Andy2
0ee156c9f0 add empty: case for WINCE type too 2010-12-17 18:55:44 -08:00
Andy2
c0bec75fd8 fix crash when input wordlist is empty by not counting zero-length
word as a word.
2010-12-17 18:55:25 -08:00
Andy2
c5e0955460 simplify build rule 2010-12-17 17:39:33 -08:00
Andy2
7e46163988 add counts and values -- from wikipedia article, as are Arabic and
Turkish files just checked in.
2010-12-17 17:38:47 -08:00
Andy2
18f8b0d4e4 switch to utf-8, adding an iconv call to translate the wordlists. 2010-12-17 17:37:57 -08:00
Andy2
32fccca995 Turkish. As with Arabic, untested. 2010-12-17 17:36:38 -08:00
Andy2
71559e27c6 add Arabic. I have no wordlist but this should still allow play
between humans, even over the net.  Untested, though, as my phone
doesn't have any Arabic glyphs.
2010-12-17 17:36:03 -08:00
Andy2
d1605c4493 fix: convert to utf8 and replace grep that didn't work (presuambly
because ranges have different meanings in utf-8) with one that does.
2010-12-13 20:39:04 -08:00
Andy2
d78584fddf remove obsolete, pre-utf8 files 2010-12-13 20:09:26 -08:00
Andy2
bb0a79914b add conversion from ISO88591 since the default dict's in that format. 2010-12-13 20:09:09 -08:00
Andy2
dc807c948a use sed instead of tr since as with Slovak a letter was getting
dropped.  Same one in fact.
2010-12-13 19:58:37 -08:00
Andy2
299c84bb2b use sed rather than tr to uppercase letters. tr was dropping the Á
letter for some reason.  The sed feature I'm using is a gnu extension
but has the advantage of working.  Should probably do this for all
languages and in the info files.
2010-12-13 18:16:22 -08:00
Eric House
894afdc0cb take words up to 15 letters long. This makes no difference with any
dict I've tried as there just aren't any words over 7 letters long
made up of only a-f.
2010-12-12 20:02:28 -08:00
Eric House
e8e0b25fad go back to old dict -- correcting a change I didn't mean to check in. 2010-12-12 20:01:33 -08:00
Eric House
9c5b2c9f4f add for current French list 2010-12-09 21:22:37 -08:00
Eric House
98456dd652 fix to build dicts, wince/android format by default 2010-12-09 21:22:14 -08:00
Eric House
6b58c9031f script to build html page for downloading dicts 2010-12-09 21:21:41 -08:00
Andy2
39b40a9a3d build with a header giving word count 2010-12-06 18:31:12 -08:00
Andy2
12508b7cd5 cleanup stderr output 2010-12-06 07:23:22 -08:00
Andy2
0072112b5a fix syntax for including newheader so only one gets included. Fixes
bug building multiple dicts where headers would accumulate.
2010-12-06 07:23:05 -08:00
Eric House
c4cdc24b78 initial changes to add a header to xwd format so that stuff like
number of words can be included.  Changed to build dicts and linux to
open them.  Android still needs to learn.  Also, some of the tools in
dawg/ need to be fixed to read old-format (pre-utf8) .xwd files.
2010-12-05 19:33:10 -08:00
Eric House
eff2324950 fix compile command 2010-12-05 19:30:00 -08:00
Eric House
bef1e125bf ignore .pdb files 2010-12-05 19:29:15 -08:00
Andy2
e89feb62d8 second part of manual merge of unicode_branch's dawg/ directory into
this one.  This adds the directories and their files created inside
dawg.
2010-11-30 18:38:05 -08:00
Andy2
79990bc7b1 first set of changes formed by applyinig diff of android_branch's
dawg/ directory against unicode_branch's.  The two branches seem to
have to common ancestor -- probably didn't survive translation from
svn -- so this is the best I can do.

This checkin is all the files that were modified by the patch plus a
couple of simple additions.  Next I'll be adding directories that the
patch created.  It also reintroduced a bunch of .cvsignore files; I
won't check those in.
2010-11-30 18:35:11 -08:00
Eric House
2a2f4d4395 been a while since cvs... 2010-11-09 05:53:49 -08:00
Eric House
3716218a1d ignore files in dawg/ 2010-07-07 23:18:14 -07:00
Eric House
48946996b8 ignore file in dawg/ 2010-07-07 23:17:13 -07:00
ehouse
8dca48b3ea Useful ftell, commented out. 2009-03-29 18:13:09 +00:00
ehouse
9e5b3f8f29 Changes to fix BYOD (though still need native speaker confirmation) 2009-03-14 22:33:53 +00:00
ehouse
690bf80b7b Fix so can build iso-8859-2 Polish dicts using make (won't work on
BYOD yet): add encoding to emacs mode line and fix the letters,
including hard-coding them as decimal numbers until I can figure out
how to get perl (in xloc.pm) to emit iso-8859-2 instead of utf8.
2009-03-14 19:27:29 +00:00
ehouse
0b0bf96cd5 accept ISO-8859-2; remove unused param; add assert that EOF/EOL aren't
part of a multibyte char
2009-03-14 19:22:15 +00:00
ehouse
b16a07d0ba build dict2dawg with debug symbols 2009-03-14 19:21:09 +00:00
ehouse
b9dce19a93 if setlocale doesn't work, try again with en_US -- works around
problem on my ISP.
2009-01-28 03:32:21 +00:00
ehouse
b7fa674c28 Set locale based on params passed in, only on ENV if not specified. 2009-01-25 20:13:36 +00:00
ehouse
90f8a276e1 Cleanup to run on a machine that's utf8: specify iso-8859-1 when needed. 2009-01-25 18:57:05 +00:00
ehouse
f6d8924593 make tarball ready to be dropped into byod 2009-01-25 18:48:29 +00:00
ehouse
b2dd3f02b0 Need to escape period in grep pattern to get literal dot! 2009-01-22 04:30:35 +00:00
ehouse
24622876bb change default dictionary 2009-01-21 05:36:43 +00:00
ehouse
c2f1ff3d06 smartphone-size small bitmaps 2009-01-21 05:25:43 +00:00
ehouse
f422305542 Make smaller bitmaps 8x8 since that's the smallest size that can be
required and StretchBlt to smaller can't work for letters.
2009-01-18 18:25:33 +00:00
ehouse
702940fe06 Tweaks to bitmaps; build for wince by default 2009-01-17 18:39:08 +00:00
ehouse
a56d84b64d add emacs mode line 2009-01-14 13:41:25 +00:00
ehouse
41ae10f8b6 Allow language Makefile to specify encoding. Pass to perl and c++
dict builders, using it to open files and to determine whether to do
multi-to-wide conversion.
2009-01-13 13:32:07 +00:00
ehouse
7b8e4e0fd3 Add target to build all languages. Stops on Swedish at the moment. 2009-01-13 13:19:15 +00:00
ehouse
4e619601c2 To support Catalan, add Makefile and bitmaps for three special tiles.
The first of these, L-high-dot-L, requires Unicode to be properly
drawn, but the current dict format doesn't support it so it'll be L-L
for now.  Bitmaps are still rough.
2009-01-13 13:17:58 +00:00
ehouse
eb1e667c17 Add type Letter to represent what are Tiles in Crosswords:
lang-independent indices into the set of letters in use.  Should be no
change in functionality or code generated.
2009-01-07 05:13:45 +00:00
ehouse
948981434b Fix compiler warnings. Should be no change in generated code. 2009-01-07 05:03:13 +00:00
ehouse
96d9baaac1 Compress user-visible name so more likely to fit on-device widgets 2008-10-29 08:47:12 +00:00
ehouse
a8fb37504d Don't choke when words are longer than 15 letters. 2008-10-08 04:37:44 +00:00
ehouse
564b827f6d Make new FAA 4.1 the default Spanish dictionary source; build three
dicts (8, 9 and 15) by default (all: target).
2008-09-18 03:55:04 +00:00
ehouse
ac03c4be61 Fix to compile with newer g++; increase size of buffer to handle largest Spanish wordlist. 2008-09-18 03:44:43 +00:00
ehouse
df52f6a47a Accept words that contain no vowels. 2008-07-12 19:37:27 +00:00
ehouse
8e25e205fc update in accordance with current Dutch practice (says an informant) 2008-07-10 03:13:33 +00:00
ehouse
5fd535d853 Break Czech into two "languages" as a way to support the two encodings in common use. 2008-03-19 04:47:03 +00:00
ehouse
dda5042690 Remove windows LF chars just in case; take SOURCEDICT via cmdline; add emacs modeline. 2008-03-15 15:00:46 +00:00
ehouse
a028b34a11 Compile dict2dawg by default since dict2dawg.pl has problems; fix warnings. 2008-03-15 14:52:23 +00:00
ehouse
9d0231a8b7 line column heads up correctly again 2008-02-23 22:00:40 +00:00
ehouse
907838591e Fix to work with BYOD: pass -r rather than use grep to pull illegal words; fix language code; include charset. 2008-02-23 21:59:38 +00:00
ehouse
f0b53fd605 First cut at handling Czech. Correspondent says the Palm dict looks right. Still need to test on Windows and on BYOD. 2008-02-20 03:50:32 +00:00
ehouse
ab73fc4d38 cleanup; add lineno so number of letters is apparent 2008-02-20 03:44:31 +00:00
ehouse
22909ce6fb add target for dict2dawg 2008-01-02 01:44:12 +00:00
ehouse
5457ea1b59 replace all __FUNCTION__ with __func__ 2007-12-02 19:13:25 +00:00
ehouse
a2f60cb1f8 Makefile for Collins dict 2007-05-26 14:47:46 +00:00
ehouse
ca7c69bff1 include Makefile.langcommon 2007-04-14 16:03:31 +00:00
ehouse
5d867cd81c Target to build tarball for uploading to byod. 2007-02-20 07:24:18 +00:00
ehouse
8faacfcede Fix to work with new byod scheme. 2007-02-20 05:49:57 +00:00
ehouse
b2ed436b74 Add support for Russian. So that Russian text can be processed on systems without setting LANG=ru_RU.CP1251, modify dict2dawg to skip duplicates and words outside of specified lengths. Modify all info.txt files for the new scheme (which includes change to byod.cgi not kept on sourceforge.) 2007-02-17 17:06:05 +00:00
ehouse
3b7e680f2c increment internal tile values by one so strings can be null-terminated 2007-02-14 15:17:00 +00:00
ehouse
599b43ab78 Y counts as a vowel when removing non-words. 2007-01-30 04:53:32 +00:00
ehouse
5b8e0e89d3 remove duplicates as part of sort process 2007-01-06 04:43:22 +00:00
ehouse
a2840b42ac Change LANG to XWLANG to avoid conflict with ENV variable. 2006-08-11 01:44:08 +00:00
ehouse
b5164aa0c5 hide dict files -- playing with svn:ignore 2006-07-29 21:36:24 +00:00
ehouse
44a4dab13a Cleanup prior to adding Swedish to BYOD. 2006-07-22 16:05:45 +00:00
ehouse
d43acd6b46 check for remaining memory being < 0, not just <=, since we allocate exactly as much as we need. Fixes failure due to being out of memory at same time as having finished parsing stdin. 2006-07-22 16:03:14 +00:00
ehouse
3fe3e05548 default dict now gzipped (no real change) 2006-07-01 14:13:29 +00:00
ehouse
5fb3705535 don't cast size to a char! 2006-06-28 14:11:46 +00:00
ehouse
de20e83bdb A couple of tweaks so it works on byod with sample wordlist. 2006-06-28 03:38:42 +00:00
ehouse
5ebbf3f4d0 Support for Portuguese based on info from user in Brazil 2006-06-28 03:08:22 +00:00
ehouse
99ba48ce3e add poolsize and fsize args to better warn users when dict is too big.
Later need to modify the build process to specify the size needed.
2006-05-02 13:28:07 +00:00
ehouse
653fdb6a7b Improve out-of-memory message; don't double-count words. 2006-05-01 14:00:06 +00:00
ehouse
22e6ddde2a Bring over from personal archive. I don't know if this works yet:
waiting for a wordlist.
2006-04-30 16:17:21 +00:00
ehouse
c0c5332098 add 'sort -u' to get rid of duplicates. All info files should have this.... 2006-04-30 15:15:28 +00:00
ehouse
328c96c617 fix filter to eliminate words with unused letters; catch up count of
'G' tiles with gtoal's list.
2006-04-30 14:52:43 +00:00
ehouse
0295579e32 More cleanup for Spanish dict building. Seems to work now. 2006-04-30 04:44:10 +00:00
ehouse
8124a01010 Cleanup for Spanish dict building: die when can't build correctly, and
do same for WINCE as for FRANK re: specials
2006-04-30 04:27:33 +00:00
ehouse
834c43e131 sort to get rid of duplicates and so sort inside dict2dawg won't be needed 2006-04-30 02:35:26 +00:00
ehouse
3a37c11970 check that this version number stuff works 2006-04-29 16:47:01 +00:00
ehouse
4493ed8482 attempt to print subversion revision number with -v option 2006-04-29 16:40:48 +00:00
ehouse
1d40eddbb5 exit if can't open table file; include assert for compile on sarge 2006-04-14 08:23:28 +00:00
ehouse
3df1e461e4 For already-sorted case, read words from file on as-needed basis rather
than build a vector to hold them.
2006-04-14 05:23:30 +00:00
ehouse
8f909cd3a7 Use new compiled dict2dawg when present. 2006-04-13 15:30:15 +00:00
ehouse
b70bee3d53 A final bit of cleanup. All the perl is gone. 2006-04-13 04:04:03 +00:00
ehouse
d6dc4bf30c Cleanup: remove dead code. 2006-04-13 03:58:54 +00:00
ehouse
131d4c9bd4 Use a single huge buffer for all strings rather than calling malloc
for each.  Makes a measureable speed difference.
2006-04-13 03:52:48 +00:00
ehouse
08557184a5 debug: works now! Also ifdef out debug/verbose code. 2006-04-13 03:49:41 +00:00
ehouse
72532d72a8 print letter as well as tile in text dumps (same as cpp version) 2006-04-13 03:06:18 +00:00
ehouse
b89ed5b999 add -debug arg for parity with cpp version, and add -mn flag to usage(). 2006-04-13 02:58:39 +00:00
ehouse
0c7081bf36 Tons of changes continuing port from perl. Doesn't quite work yet, but close. 2006-04-13 02:57:43 +00:00
ehouse
2863379b9b Starting work on cpp version of dict2dawg.pl. This is nowhere near complete. 2006-04-12 04:39:49 +00:00
ehouse
6f9e7ed94c add an underbar to separate numerals 2006-03-18 03:35:20 +00:00
ehouse
162cb99c53 ignore .stamp files 2006-03-04 15:36:06 +00:00
ehouse
772c262b5e first checked in. works 2006-02-26 23:51:57 +00:00
ehouse
233479a959 get rid of null-termination and 'sort -z' since that option isn't on
new ISP's BSD sort.
2006-02-10 05:12:25 +00:00
ehouse
92485783af update email address in header comments: no code change 2006-01-08 01:25:02 +00:00
ehouse
5e6eca025a fix so hex dicts build again 2005-10-30 19:05:40 +00:00
ehouse
fb8d643ea2 replace sed with awk 2005-10-30 19:04:49 +00:00
ehouse
3b12c4df87 syntax error 2005-07-09 15:36:39 +00:00
ehouse
77374484f8 ditch words without vowels 2005-07-06 00:58:44 +00:00
ehouse
78aefbefea fix description at user's suggestion 2005-06-27 05:23:14 +00:00
ehouse
5e02ca1c86 first checked in. Seems to work. 2005-06-22 06:40:53 +00:00
ehouse
e2cbee1210 path to local copy of wordlist 2005-06-16 05:12:49 +00:00
ehouse
835f81582d fix typos 2005-06-11 15:32:34 +00:00
ehouse
69277eee0b first checkin for Danish 2005-06-11 15:32:09 +00:00
ehouse
cf206900fd add note about use of iso-8859-2 character encoding 2004-12-12 06:29:16 +00:00
ehouse
abd8356964 catch up with the current Polish values (says a correspondent) 2004-12-11 05:36:31 +00:00
ehouse
1d2c533094 add .xwd 2004-11-05 14:37:38 +00:00
ehouse
6ca65b261e first checked in 2004-11-05 14:26:25 +00:00
ehouse
b24837669d TARGET_TYPE to PALM; use env variable to locate input wordlist 2004-11-05 14:24:47 +00:00
ehouse
d890b0c892 TARGET_TYPE to PALM 2004-11-05 14:20:25 +00:00
ehouse
eb168cf1f2 add -raw option to dump DAWG in way useful for debugging engine and
dawgshow.
2004-10-15 04:01:22 +00:00
ehouse
b8fb47ea6c add .xwd and .pdb 2004-10-05 02:34:36 +00:00
ehouse
00b53c233a ignore dicts 2004-10-02 03:58:47 +00:00
ehouse
1586797fef remove tabs indenting lines between ifdef and endif. For some reason
this was blocking some assignments.
2004-10-02 03:48:14 +00:00
ehouse
f091ea2a53 just copied from ../English 2004-07-21 14:36:00 +00:00