Working with a database can be so easy - as long as you don't need to care about localization or languages with more or other letters than english. One of these Languages is German. And here the problem starts...
Actually the problem itself is very simple: The character encoding. However the solution is not as simple. I tried to connect to Oracle via an ODBC connection from Lazarus IDE for inserting german press releases with a lot of Umlauts (äöü). The result in the database was a mess. So how to solute this mess?
"(...) The NLS_LANG option specifies the current language, territory, and database character set, which are determined by session-wide globalization parameters." (Oracle; see http://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_options070.htm). So we specified the character set of codepage 850 because the Oracle Instant Client is a command line application and uses the command line's character set. Now you can insert and select Umlauts in Instant Client already correctly.
In a further step we will take a closer look at Lazarus IDE. After the NLS_LANG system variable is defined, a Lazarus application will still cause trouble when inserting o querying the Oracle Database. The reason is, once again, Windows. Since Lazarus uses AnsiStrings as String on Windows systems you need to convert these strings before inserting them into Oracle Insert-Statements. Usually you can Use the command
If this option is unchecked, there will always be a console window available beside the application. I use it for debugging and debugger output. But this inconsiderable and unseful windows has a terrible problem: it will cause the compiler the build the application using the console's (command line's) character set by default. Of if this option is unchecked, you need to use
Now it will look a little bit clearer. But there still is a strange thing: Why calling UTF8ToConsole instead of UTF8ToSys when you have set the compiler option "Win32 gui application"? This is a Windows specific problem. I guess on Linux you only need UTF8ToSys, but did never test it.
Introduction
Let's look at a simple scenario: Inserting a text with German letters like ä, ö, ü and ß into the Oracle 11g database. And right now there already could be a problem when trying to select it again. So what is the problem here?Actually the problem itself is very simple: The character encoding. However the solution is not as simple. I tried to connect to Oracle via an ODBC connection from Lazarus IDE for inserting german press releases with a lot of Umlauts (äöü). The result in the database was a mess. So how to solute this mess?
The "problematic chain"
First of all we have to consider that different operating systems have different character sets. Usually current Linux distributions have a Unicode character set in contrast to Windows, using CP-1252 for the graphics and codepage 850 for the command shell. Further, Lazarus Applications use different character sets (dependent of a specific compiler option) and at least the Lazarus Code Editor and Form Designer have an own character encoding (coonfigurable).The solution
For a correct handling of Umlauts a system variable needs to be defined. I will describe this process for Windows 7 and later operating systems. The path variable for the Oracle Instant Client has the name
NLS_LANG
For German Windows installations it should have the value
GERMAN_GERMANY.WE8PC850
"(...) The NLS_LANG option specifies the current language, territory, and database character set, which are determined by session-wide globalization parameters." (Oracle; see http://docs.oracle.com/cd/B28359_01/olap.111/b28126/dml_options070.htm). So we specified the character set of codepage 850 because the Oracle Instant Client is a command line application and uses the command line's character set. Now you can insert and select Umlauts in Instant Client already correctly.
In a further step we will take a closer look at Lazarus IDE. After the NLS_LANG system variable is defined, a Lazarus application will still cause trouble when inserting o querying the Oracle Database. The reason is, once again, Windows. Since Lazarus uses AnsiStrings as String on Windows systems you need to convert these strings before inserting them into Oracle Insert-Statements. Usually you can Use the command
UTF8ToConsole(String)
to modify the input for the INSERT-Query. For the SELECT-Query you contrarily use
ConsoleToUTF8
again. But now there is still one more Lazarus problem since there is the option to create a Win 32 GUI application.
UTF8ToSys
for inserting into a database and
SysToUTF8
for selecting textual data from the database.Background: Lazarus
Internally Lazarus is using the Qt-Interface to display widgets of the used operating system. Qt itself is using UTF8-encoded textual data for the widgets. Everything in textual from you get from Windows will be system encoded in CP-1252. The Lazarus code Editor and Form Designer again will user UTF8. So when you design the GUI you can use Umlauts without problems. But when you read file names from system yontaining Umlauts you can come into trouble when displaying it in the GUI. SysToUTF8 can convert this.Now it will look a little bit clearer. But there still is a strange thing: Why calling UTF8ToConsole instead of UTF8ToSys when you have set the compiler option "Win32 gui application"? This is a Windows specific problem. I guess on Linux you only need UTF8ToSys, but did never test it.
Kommentare
Kommentar veröffentlichen