xwords/xwords4/dawg/Spanish/info.txt

# -*- mode: conf; coding: utf-8; -*-
# Copyright 2002-2006 by Eric House (xwords@eehouse.org).  All rights
# reserved.
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.

# no way can unix sort handle the control chars I'm adding to text
# below

LANGCODE:es
LANGNAME:Spanish
NEEDSSORT:true
CHARSET: utf-8


# MSDos LF chars go bye-bye
LANGFILTER: tr -d '\r'

# convert accented vowels
LANGFILTER: | tr '\207\216\222\227\234\237\226' 'aeiouu\321'
# uppercase
LANGFILTER: | sed -e 's/[[:lower:]]*/\U&/'
# remove words with illegal letters
LANGFILTER: | grep -x '[[A-JL-VX-ZÃ]\{2,15\}'
# substitute pairs (can't figure out how to use octal values)
LANGFILTER: | sed 's/CH/1/g'
LANGFILTER: | sed 's/LL/2/g'
LANGFILTER: | sed 's/RR/3/g'
# substitute in the octal control character values
LANGFILTER: | tr '123' '\001\002\003'
# now add nulls as terminators
LANGFILTER: | tr -s '\n' '\000'
LANGFILTER: | sort -u -z

D2DARGS: -r -term 0

LANGINFO: <p>Spanish words include all letters in the English alphabet
LANGINFO: except "K" and "W", and with "Ã" added. Since there are no
LANGINFO: tiles for accented vowels, these are replaced by the
LANGINFO: unaccented forms.</p>


LANGINFO: <p>In addition, there are three special two-letter tiles
LANGINFO: "CH", "LL" and "RR".  The rules say that the corresponding
LANGINFO: two single tiles may not be used where a two-letter tile is
LANGINFO: possible (e.g. if a word contains "CH" you must use the "CH"
LANGINFO: tile rather than a "C" tile followed by an "H" tile.  Thus
LANGINFO: we remove all of these pairs from your wordlist and replace
LANGINFO: them with the appropriate two-letter "letter". </p>


LANGCODE:es_ES

# I think dealing with "specials" goes like this.  In the {} pairs
# below, if the first string is followed by other strings (one or two)
# they are assumed to be filenames.  The filenames will need to be
# found, and converted into binary files appropriate for the platform
# by rules given somewhere -- here?  No, since they're the same for
# all platforms.  Just put 'em in the byod.cgi file for now.

# It'll be assumed that the first name is for the "small" bitmap, and
# the second for the "large". It's ok for a file not to exist; it'll
# just be ignored.  In the unlikely case that you wanted to specify
# the large but not the small this is what you'd need to do.

# High bit means "official".  Next 7 bits are an enum where
# Spanish==6.  Low byte is padding
XLOC_HEADER:0x8600

<BEGIN_TILES>
{"_"}							0	2
'A|a'							1	12
'B|b'							3	2
'C|c'							3	4
{"CH|ch|Ch|cH",true,true}		5	1
'D|d'							2	5
'E|e'							1	12
'F|f'							4	1
'G|g'							2	2
'H|h'							4	2
'I|i'							1	6
'J|j'							8	1
'L|l'							1	4
{"LL|ll|Ll|lL",true,true}		8	1
'M|m'							3	2
'N|n'							1	5
'Ñ|ñ'							8	1
'O|o'							1	9
'P|p'							3	2
'Q|q'							5	1
'R|r'							1	5
{"RR|rr|Rr|rR",true,true}		8	1
'S|s'							1	6
'T|t'							1	4
'U|u'							1	5
'V|v'							4	1
'X|x'							8	1
'Y|y'							4	1
'Z|z'                       	10	1
<END_TILES>
# should ignore all after the <END> above
-												first set of changes formed by applyinig diff of android_branch's
dawg/ directory against unicode_branch's.  The two branches seem to
have to common ancestor -- probably didn't survive translation from
svn -- so this is the best I can do.

This checkin is all the files that were modified by the patch plus a
couple of simple additions.  Next I'll be adding directories that the
patch created.  It also reintroduced a bunch of .cvsignore files; I
won't check those in.

											
										
										
											2010-12-01 03:35:11 +01:00
+								# -*- mode: conf; coding: utf-8; -*-
-												More cleanup for Spanish dict building.  Seems to work now.


											
										
										
											2006-04-30 06:44:10 +02:00
+								# Copyright 2002-2006 by Eric House (xwords@eehouse.org).  All rights
 								# reserved.
-												first checked in


											
										
										
											2003-12-14 20:51:44 +01:00
+								#
 								# This program is free software; you can redistribute it and/or
 								# modify it under the terms of the GNU General Public License
 								# as published by the Free Software Foundation; either version 2
 								# of the License, or (at your option) any later version.
 								#
 								# This program is distributed in the hope that it will be useful,
 								# but WITHOUT ANY WARRANTY; without even the implied warranty of
 								# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 								# GNU General Public License for more details.
 								#
 								# You should have received a copy of the GNU General Public License
 								# along with this program; if not, write to the Free Software
 								# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
 								# no way can unix sort handle the control chars I'm adding to text
 								# below
-												change tile info format in info.txt

I'm moving toward allowing per-board-size counts with faces and values
staying the same. So it makes more sense to have face and values be
the first columns.

											
										
										
											2022-04-04 08:12:29 +02:00
+								LANGCODE:es
 								LANGNAME:Spanish
-												first checked in


											
										
										
											2003-12-14 20:51:44 +01:00
+								NEEDSSORT:true
-												first set of changes formed by applyinig diff of android_branch's
dawg/ directory against unicode_branch's.  The two branches seem to
have to common ancestor -- probably didn't survive translation from
svn -- so this is the best I can do.

This checkin is all the files that were modified by the patch plus a
couple of simple additions.  Next I'll be adding directories that the
patch created.  It also reintroduced a bunch of .cvsignore files; I
won't check those in.

											
										
										
											2010-12-01 03:35:11 +01:00
+								CHARSET: utf-8
-												first checked in


											
										
										
											2003-12-14 20:51:44 +01:00
-												change tile info format in info.txt

I'm moving toward allowing per-board-size counts with faces and values
staying the same. So it makes more sense to have face and values be
the first columns.

											
										
										
											2022-04-04 08:12:29 +02:00
-												handle DOS EOL


											
										
										
											2004-07-09 06:02:48 +02:00
+								# MSDos LF chars go bye-bye
-												Add support for Russian.  So that Russian text can be processed on systems without setting LANG=ru_RU.CP1251, modify dict2dawg to skip duplicates and words outside of specified lengths.  Modify all info.txt files for the new scheme (which includes change to byod.cgi not kept on sourceforge.)

											
										
										
											2007-02-17 18:06:05 +01:00
+								LANGFILTER: tr -d '\r'
-												first checked in


											
										
										
											2003-12-14 20:51:44 +01:00
 								# convert accented vowels
-												Add support for Russian.  So that Russian text can be processed on systems without setting LANG=ru_RU.CP1251, modify dict2dawg to skip duplicates and words outside of specified lengths.  Modify all info.txt files for the new scheme (which includes change to byod.cgi not kept on sourceforge.)

											
										
										
											2007-02-17 18:06:05 +01:00
+								LANGFILTER: | tr '\207\216\222\227\234\237\226' 'aeiouu\321'
-												first checked in


											
										
										
											2003-12-14 20:51:44 +01:00
+								# uppercase
-												use sed instead of tr to uppercase -- everywhere

Required for some unicode chars, but might as well use it everywhere to
make copying easier.

											
										
										
											2022-01-28 04:36:55 +01:00
+								LANGFILTER: | sed -e 's/[[:lower:]]*/\U&/'
-												first checked in


											
										
										
											2003-12-14 20:51:44 +01:00
+								# remove words with illegal letters
-												limit word lengths to 2-15

dict2dawg crashes when given a 1-letter word. Easier to fix in the
filtering that has to be there anyway.

											
										
										
											2022-01-24 02:46:52 +01:00
+								LANGFILTER: | grep -x '[[A-JL-VX-ZÃ]\{2,15\}'
-												first checked in


											
										
										
											2003-12-14 20:51:44 +01:00
+								# substitute pairs (can't figure out how to use octal values)
-												Add support for Russian.  So that Russian text can be processed on systems without setting LANG=ru_RU.CP1251, modify dict2dawg to skip duplicates and words outside of specified lengths.  Modify all info.txt files for the new scheme (which includes change to byod.cgi not kept on sourceforge.)

											
										
										
											2007-02-17 18:06:05 +01:00
+								LANGFILTER: | sed 's/CH/1/g'
 								LANGFILTER: | sed 's/LL/2/g'
 								LANGFILTER: | sed 's/RR/3/g'
-												first checked in


											
										
										
											2003-12-14 20:51:44 +01:00
+								# substitute in the octal control character values
-												Add support for Russian.  So that Russian text can be processed on systems without setting LANG=ru_RU.CP1251, modify dict2dawg to skip duplicates and words outside of specified lengths.  Modify all info.txt files for the new scheme (which includes change to byod.cgi not kept on sourceforge.)

											
										
										
											2007-02-17 18:06:05 +01:00
+								LANGFILTER: | tr '123' '\001\002\003'
-												left out a filter step


											
										
										
											2004-05-20 04:15:20 +02:00
+								# now add nulls as terminators
-												Add support for Russian.  So that Russian text can be processed on systems without setting LANG=ru_RU.CP1251, modify dict2dawg to skip duplicates and words outside of specified lengths.  Modify all info.txt files for the new scheme (which includes change to byod.cgi not kept on sourceforge.)

											
										
										
											2007-02-17 18:06:05 +01:00
+								LANGFILTER: | tr -s '\n' '\000'
 								LANGFILTER: | sort -u -z
-												first checked in


											
										
										
											2003-12-14 20:51:44 +01:00
-												Add support for Russian.  So that Russian text can be processed on systems without setting LANG=ru_RU.CP1251, modify dict2dawg to skip duplicates and words outside of specified lengths.  Modify all info.txt files for the new scheme (which includes change to byod.cgi not kept on sourceforge.)

											
										
										
											2007-02-17 18:06:05 +01:00
+								D2DARGS: -r -term 0
-												first checked in


											
										
										
											2003-12-14 20:51:44 +01:00
 								LANGINFO: <p>Spanish words include all letters in the English alphabet
-												first set of changes formed by applyinig diff of android_branch's
dawg/ directory against unicode_branch's.  The two branches seem to
have to common ancestor -- probably didn't survive translation from
svn -- so this is the best I can do.

This checkin is all the files that were modified by the patch plus a
couple of simple additions.  Next I'll be adding directories that the
patch created.  It also reintroduced a bunch of .cvsignore files; I
won't check those in.

											
										
										
											2010-12-01 03:35:11 +01:00
+								LANGINFO: except "K" and "W", and with "Ã" added. Since there are no
-												first checked in


											
										
										
											2003-12-14 20:51:44 +01:00
+								LANGINFO: tiles for accented vowels, these are replaced by the
 								LANGINFO: unaccented forms.</p>
 								LANGINFO: <p>In addition, there are three special two-letter tiles
 								LANGINFO: "CH", "LL" and "RR".  The rules say that the corresponding
 								LANGINFO: two single tiles may not be used where a two-letter tile is
 								LANGINFO: possible (e.g. if a word contains "CH" you must use the "CH"
-												More cleanup for Spanish dict building.  Seems to work now.


											
										
										
											2006-04-30 06:44:10 +02:00
+								LANGINFO: tile rather than a "C" tile followed by an "H" tile.  Thus
-												first checked in


											
										
										
											2003-12-14 20:51:44 +01:00
+								LANGINFO: we remove all of these pairs from your wordlist and replace
 								LANGINFO: them with the appropriate two-letter "letter". </p>
 								LANGCODE:es_ES
 								# I think dealing with "specials" goes like this.  In the {} pairs
 								# below, if the first string is followed by other strings (one or two)
 								# they are assumed to be filenames.  The filenames will need to be
 								# found, and converted into binary files appropriate for the platform
 								# by rules given somewhere -- here?  No, since they're the same for
 								# all platforms.  Just put 'em in the byod.cgi file for now.
 								# It'll be assumed that the first name is for the "small" bitmap, and
 								# the second for the "large". It's ok for a file not to exist; it'll
 								# just be ignored.  In the unlikely case that you wanted to specify
 								# the large but not the small this is what you'd need to do.
 								# High bit means "official".  Next 7 bits are an enum where
 								# Spanish==6.  Low byte is padding
 								XLOC_HEADER:0x8600
 								<BEGIN_TILES>
-												change tile info format in info.txt

I'm moving toward allowing per-board-size counts with faces and values
staying the same. So it makes more sense to have face and values be
the first columns.

											
										
										
											2022-04-04 08:12:29 +02:00
+								{"_"}							0	2
 								'A|a'							1	12
 								'B|b'							3	2
 								'C|c'							3	4
 								{"CH|ch|Ch|cH",true,true}		5	1
 								'D|d'							2	5
 								'E|e'							1	12
 								'F|f'							4	1
 								'G|g'							2	2
 								'H|h'							4	2
 								'I|i'							1	6
 								'J|j'							8	1
 								'L|l'							1	4
 								{"LL|ll|Ll|lL",true,true}		8	1
 								'M|m'							3	2
 								'N|n'							1	5
 								'Ñ|ñ'							8	1
 								'O|o'							1	9
 								'P|p'							3	2
 								'Q|q'							5	1
 								'R|r'							1	5
 								{"RR|rr|Rr|rR",true,true}		8	1
 								'S|s'							1	6
 								'T|t'							1	4
 								'U|u'							1	5
 								'V|v'							4	1
 								'X|x'							8	1
 								'Y|y'							4	1
 								'Z|z'                       	10	1
-												first checked in


											
										
										
											2003-12-14 20:51:44 +01:00
+								<END_TILES>
 								# should ignore all after the <END> above