Direkt zum Hauptbereich

Posts

Es werden Posts vom 2013 angezeigt.

Accelerating nested tables by "Letter Vectors"

One major problem when using Text Mining in PL/SQL is, that sooner or later associative arrays or even nested tables are required to store large lists of words and their attributes. Those collections are very slow to iterate. Now you may think, that a hashed value of a word you need to find can be used as index, what would be much faster here. Well, it would be. But Oracle 11g2 contains no hash value function, that will return unique results on lists with 10.000 words. On several tests I got double values after ~86 words, independent of the size you specify at the function parameters. I tried ORA_HASH as well as the DBMS_CRYPTO package's HASH -function. It was the similar problem. Dependent of this problem and the long run times of the procedures mining the text, it was necessary to speed up the algorithm somehow. And the algorith was just iterating across a collection (nested table), comparing each entry with the wanted word, what was very slow. Here I got an idea. I used alrea...

Disaggregating CLOBs in PL/SQL

For a seminar during my Master's studies, I am currently occupied with Text Mining. Especially for that, I use a lot of CLOBs in an Oracle 11g2 database. Generally a CLOB is a Character Large Object and can store up to 8 terabytes of character data. The VARCHAR2 data type can just store up to 4000 characters. For a lot of applications 4000 characters is sufficient but for storing texts like a publication or a newspaper article it is not enough. Here a CLOB is required. Handling a CLOB can be very difficult because of its size. You need to cut of single words or even sentences. This can be very easy, using regular expressions. The process to disaggregate a text is called tokenization. The single words cut off from the text are the tokens. The code below shohs a procedures I use for tokenizing a CLOB. It will be loaded from the table into variable b_text . The a loop is run where a single word l_word will be cut off from the clob using a regular expression and the function REGEXP_...

Lazarus IDE, Oracle 11g and German Umlaute

Working with a database can be so easy - as long as you don't need to care about localization or languages with more or other letters than english. One of these Languages is German. And here the problem starts... Introduction Let's look at a simple scenario: Inserting a text with German letters like ä, ö, ü and ß into the Oracle 11g database. And right now there already could be a problem when trying to select it again. So what is the problem here? Actually the problem itself is very simple: The character encoding. However the solution is not as simple. I tried to connect to Oracle via an ODBC connection from Lazarus IDE for inserting german press releases with a lot of Umlauts (äöü). The result in the database was a mess. So how to solute this mess? The "problematic chain" First of all we have to consider that different operating systems have different character sets. Usually current Linux distributions have a Unicode character set in contrast to Windows, u...

Lazarus IDE and TOracleConnection - A How-To

Free programming IDEs are a great benefit for everybody who's interested in Programming and for little but ambitious companies. One of these free IDEs is the Lazarus IDE . It's a "clone" of the Delphi IDE by Embarcadero (originally by Borland). But actually Lazarus is much more than a clone: Using the Free Pascal-Compiler , it was platform-independent and cross-compiling since it was started. I am using Lazarus very often - especially for building GUIs easily because Java is still Stone-Age when a GUI is required (though there is a couple of GUI-building tools - they all are much less performant than Delphi / Lazarus). In defiance of all benefits of Lazarus there still is one Problem. Not all Components are designed for use on a 64 bit systems. Considering that 64 bit CPUs are common in ordinary PCs since at least 2008, this is very anpleasant. One of the components which will not be available on 64 bit installations is the TOracleConnection of Lazarus' SQLDB ...

How to boot openSuSE 12.2 from SD Card

My old Asus EeePC was collecting dust since its display backlight driver chip got broken in August 2012. Now, for a project at the university its second life seems to be come. Unfortunately I disassembled its RAM memory and the HDD. Whilst it was no problem to assemble the built-off RAM again, the HDD was already built-in into an external HDD enclosure for a use as backup drive. I din't want to disassemble it again, so what should I do to get my EeePC running again...? The solution was pretty simple: with an SD Card. But why after all using an SD Card, when an USB flash drive would be more common? The answer is pretty simple. An SD Card can be put into the reader in the EeePC and is covered completely. So its mechanically robust against collisions and cannot be torn off accidentally nor removed. And at least: why not using an SD card? So greenly I bought an SDHC card, inserted it into the EeePC and created a bootable USB flash drive with openSuSE 12.2. An old flat screen, connec...