Unicode and Collation Support Unicode and Collation Support in Microsoft SQL Serverin Microsoft SQL Server
Michael S. KaplanGlobalization Infrastructure and Font Technology
Windows International
Microsoft
24-26 March 2003 Prague, Czech Republic (IUC23)
Unicode SupportUnicode Support
Uses the "N" or national data types from the SQL-92 specification
NCHAR, NVARCHAR, NTEXTWhat the SQL-99 spec says about UnicodeInteroperability with other clients
24-26 March 2003 Prague, Czech Republic (IUC23)
Collation in SQL Server <= 6.5Collation in SQL Server <= 6.5
No Unicode support at allOne code page per serverOne collation per serverNo good solution for multilingual support
24-26 March 2003 Prague, Czech Republic (IUC23)
Collation in SQL Server 7.0Collation in SQL Server 7.0
Unicode datatypes supportedTwo collations
– Unicode– Non-Unicode
Number of collations distilled down to the minimum necessary
24-26 March 2003 Prague, Czech Republic (IUC23)
7.0 flattening of collations7.0 flattening of collations
Example: the General Unicode sort order handles: Afrikaans, Albanian, Arabic, Basque, Belarusian, Bulgarian, English, Faeroese, Farsi, Georgian (Traditional), Greek, Hebrew, Hindi, Indonesian, Malay, Russian, Serbian, Swahili, and Urdu
24-26 March 2003 Prague, Czech Republic (IUC23)
OS independenceOS independence
Collation independent of operating systemBased on the Jet “Unicorn” DLLs
24-26 March 2003 Prague, Czech Republic (IUC23)
SQL Language SupportSQL Language Support(limited locale information)(limited locale information)
Messages Date/Time First Day of Week Currency and currency symbols Month/day names and abbreviated month
names
24-26 March 2003 Prague, Czech Republic (IUC23)
SQL Language SupportSQL Language Support(list of languages)(list of languages)
Arabic British English Brazilian Bulgarian Simplified Chinese Traditional Chinese Croatian Czech Danish Dutch English Estonian Finnish French German Greek Hungarian
Italian Japanese Korean Latvian Lithuanian Norwegian Polish Portuguese Romanian Russian Slovak Slovenian Spanish Swedish Thai Turkish
24-26 March 2003 Prague, Czech Republic (IUC23)
Getting at the list of languagesGetting at the list of languages
sp_helplanguage stored proceduresyslanguages/sysmessages tablesSET LANGUAGE
– SET LANGUAGE čeština– SET LANGUAGE 한국어
Each language has a langid (0 – 32)
24-26 March 2003 Prague, Czech Republic (IUC23)
Collation in SQL Server 2000Collation in SQL Server 2000
Combined code pages and collations into a single entity
24-26 March 2003 Prague, Czech Republic (IUC23)
"Windows" collations"Windows" collations
Added for unique code pages(Example – Arabic)
Added for unique ordering (Example – French)
Removed for identical ordering(Example – Finnish_Swedish)
24-26 March 2003 Prague, Czech Republic (IUC23)
43 Windows Collations43 Windows Collations Albanian Arabic Chinese_PRC Chinese_PRC_Stroke Chinese_Taiwan_Bopomofo Chinese_Taiwan_Stroke Cyrillic_General Croatian Czech Danish_Norwegian Estonian Finnish_Swedish French Georgian_Modern_sort German_PhoneBook Greek Hebrew Hindi Hungarian Hungarian_Technical Icelandic Japanese
Japanese_Unicode Korean_Wansung Korean_Wansung_Unicode Latin1_General Latvian Lithuanian Lithuanian_Classic FYRO Macedonian Spanish (Spain) Polish Romanian Slovak Slovenian Thai Traditional_Spanish Turkish Ukrainian Vietnamese
24-26 March 2003 Prague, Czech Republic (IUC23)
Windows collations, continuedWindows collations, continued
Suffix meanings– _BIN (Binary)– _CI/_CS (Case sensitivity)– _AI/_AS (Accent sensitivity)– _KS - kanatype sensitivity (hiragana/katakana)– _WS - width sensitivity (full/half width)
24-26 March 2003 Prague, Czech Republic (IUC23)
SQL CollationsSQL Collations
Provided for backwards compatibility with prior versions of SQL Server
24-26 March 2003 Prague, Czech Republic (IUC23)
SQL CollationsSQL Collations SQL_1xCompat_CP850 SQL_Estonian_CP1257 SQL_Latin1_General_Pref_CP437 SQL_AltDiction_CP1253 SQL_Hungarian_CP1250 SQL_Latin1_General_Pref_CP850 SQL_AltDiction_CP850 SQL_Icelandic_Pref_CP1 SQL_Latvian_CP1257 SQL_AltDiction_Pref_CP850 SQL_Latin1_General_CP1 SQL_Lithuanian_CP1257 SQL_Croatian_CP1250 SQL_Latin1_General_CP1250 SQL_MixDiction_CP1253
SQL_Czech_CP1250 SQL_Latin1_General_CP1251 SQL_Polish_CP1250 SQL_Danish_Pref_CP1 SQL_Latin1_General_CP1253 SQL_Romanian_CP1250 SQL_EBCDIC037_CP1 SQL_Latin1_General_CP1254 SQL_Scandinavian_CP850 SQL_EBCDIC273_CP1 SQL_Latin1_General_CP1255 SQL_Scandinavian_Pref_CP850 SQL_EBCDIC277_CP1 SQL_Latin1_General_CP1256 SQL_Slovak_CP1250
SQL_EBCDIC278_CP1 SQL_Latin1_General_CP1257 SQL_Slovenian_CP1250 SQL_EBCDIC280_CP1 SQL_Latin1_General_CP437 SQL_SwedishPhone_Pref_CP1 SQL_EBCDIC284_CP1 SQL_Latin1_General_CP850 SQL_SwedishStd_Pref_CP1 SQL_EBCDIC285_CP1 SQL_Latin1_General_Pref_CP1 SQL_Ukrainian_CP1251 SQL_AltDiction_CP1253 SQL_Hungarian_CP1250 SQL_Latin1_General_Pref_CP850
24-26 March 2003 Prague, Czech Republic (IUC23)
Collation at four levelsCollation at four levels
ServerDatabaseColumnExpression
24-26 March 2003 Prague, Czech Republic (IUC23)
At the server levelAt the server level
Acts as a default for all databasesCan be changed with RebuildM.exe in the
tools\BINN dirQuerying the server collation:
SELECT CONVERT(char, SERVERPROPERTY('collation'))
24-26 March 2003 Prague, Czech Republic (IUC23)
At the database levelAt the database level
Every database has a collation (default is the server collation)
Collation can be changed under some circumstances
24-26 March 2003 Prague, Czech Republic (IUC23)
At the column levelAt the column level
Overrides database level collationSpecifies code page for non-Unicode
columnsAgain, can be changed under some
circumstancesNo multilingual columns with separate
collations
24-26 March 2003 Prague, Czech Republic (IUC23)
At the expression levelAt the expression level
Can be used to override any other collationuses the COLLATE keyword
24-26 March 2003 Prague, Czech Republic (IUC23)
Metadata in System TablesMetadata in System Tables
All stored as Unicode no matter what the database collation is
Unicode 2.0 repertoire is used for identifiers (use brackets or quotes around anything else)
24-26 March 2003 Prague, Czech Republic (IUC23)
More on the COLLATE keywordMore on the COLLATE keyword
COLLATE [<Windows_Collation_name>|<SQL_Collation_Name]
Specific rules of precedence:– Explicit (two explicits == runtime error)– Implicit (two implicits == no collation)– Default– <no collation>
24-26 March 2003 Prague, Czech Republic (IUC23)
LimitationsLimitations
Features people will want for future versions– LCID --> Collation– ISO string <--> Collation– Creating custom collations?
24-26 March 2003 Prague, Czech Republic (IUC23)
ReferencesReferences
http://microsoft.com/globaldev/ “International Features in Microsoft SQL Server
2000”
(by Michael Kaplan) at http://msdn.microsoft.com/
24-26 March 2003 Prague, Czech Republic (IUC23)
Questions?Questions?
24-26 March 2003 Prague, Czech Republic (IUC23)
Unicode and Collation Support
in Microsoft SQL Server
Don’t Forget Your Evaluations!