No Bytes

Posts

The dark Side of Data Analytics

Meanwhile I am working as a data analyst for an industry company. One part of my job is to research production processes and reducing defects on the products to avoid them. Unfortunately during analysis with decision trees and logistic regression models a correlation very often is given, though a causality in many cases doesn't exist. And even if a causality exists, I have to consider the direction to avoid mistaking cause and impact. All these considerations are known under the term "Business Understanding". If you don't have the business understanding for your job, data mining and more simply analytics can get misleading very fast. Business Understanding an its spread Nearly every company that uses analytics and data mining has a strong concern in using the insights proper, because a wrong used information can cause bad consequences. This issue is that present in the heads of many managers, that they attach a lot of importance to business understanding. Th...

Unsolvable Windows Problems #1 - If you can' start the spooler service

A recent case of unusual computer problem induced me to write this post. Background A freind asked me because he had a computer problem that even an expert couldn't solve and he recommended to re-install Windows to solve the problem. Hence this friend of mine was a entrepreneur and he needed his computer every day, he din't want to re-install Windows and all the applications he uses every day. Not to mention the costs of the re-installation of his service-software. So I agreed to have a look at the problem. The problem itself The problem was, that he couldn't print anymore, though it was still possible seven days ago and he (of course) didn't change anything on the system. The first thing was, that I tried to print. I got an error, that the spooler service was not started. So I tried to start the spooler service, but this didn't work. Everytime I tried to start the spooler service, I got the error message "Windows Error 0x800706b9 - Not enough resource...

Text Mining 1 - The problem of propositional logic and natural languages

Natural Language Processing (NLP) is becoming more and more important. It is used to determine the meaning of something written (by a human being). And this is the reason, why it becomes more and more important. Because every day millions of humans write. They write comments on products they ordered, they write comments on facebook, the write their blog. Of course the industry wants their opinion about products, songs, ideas and everything else - just look at Facebook. Opinions are money. There is just (at least) one problem: How to catch this money in form of opinions? That's where Natural Language Processing is used. The first thing needed to process a (natural) language is a Parser. Most parsers process synthetic languagses such as programming languages and they have a defined syntax and a logical semantic. If the author of source code disregards the defined syntax or semantic, the parser will abort processing the code and throw an error. So the author of the code has to keep ...

Eisberg FileSync

Eisberg FileSync 2.1 BETA online Currently I finished a new Version of "Eisberg FileSync". But until further tests, it will be a BETA-Version and only for 64-bit systems. New Features: Multi-Sync in Arctos Traybar Fast Sync (synchrnize fast without creating any profiles) Download Links: 64 Bit application Eisberg FileSync 2.1 x64 BETA 32 Bit application Eisberg FileSync 2.1 x86 BETA

Using ORE 1.4 on Oracle 12.1c with pluggable databases

It is possible to use Oracle database tables in the R statistical software. And this is a very useful approach (if you know R's capabilities). The fact that you are on this blog now may means that you had no success trying to use tables in R and that you received ORA-12541 once more. To solve this problem, you can omit the next paragraph. However, for getting started with Oracle 12c and R you should take a look at the documentation provided by Oracle. Clear and brief it says that you have to do the following things to use Oracle database in R: Install the Oracle Instant Client if you don't have already from http://www.oracle.com/technetwork/database/features/instant-client/ Install Oracle's R distribution (ORD) from https://oss.oracle.com/OR Modify the PATH variable for the path of Instant Client Set the environment variable OCI_LIB64 with the path to the Instant Client Install ORE Client Package for R from http://www.oracle.com/technetwork/database/options/advanced-...

How to use TOracleConnection under Lazarus for Win64

Lazarus Programmers have had no possibility to use TOracleConnection under 64 Bit Windows and Lazarus for years. Even if you tried to use the TOracleConnection with a correctly configured Oracle 11g client, you were not able to connect to the Oracle Database. The error message was always: ORA-12154: TNS:could not resolve the connect identifier specified Today I found a simple workaround to fix this problem. It seems like the OCI.DLL from Oracle Client 11g2 is buggy. All my attempts to find identify the error ended here. I could exclude problems with the TNS systems in Oracle - or the Free Pascal file oracleconnection.pp though the error messages suggestes those problems. After investigating the function calls with Process Monitor (Procmon) I found out, that even the file TNSNAMES.ORA was found and read correctly by the Lazarus Test applictaion. So trouble with files not found or wrong Registry keys could also be eliminated. Finally I installed the Oracle Instant Client 12.1c - aft...

Accelerating nested tables by "Letter Vectors"

One major problem when using Text Mining in PL/SQL is, that sooner or later associative arrays or even nested tables are required to store large lists of words and their attributes. Those collections are very slow to iterate. Now you may think, that a hashed value of a word you need to find can be used as index, what would be much faster here. Well, it would be. But Oracle 11g2 contains no hash value function, that will return unique results on lists with 10.000 words. On several tests I got double values after ~86 words, independent of the size you specify at the function parameters. I tried ORA_HASH as well as the DBMS_CRYPTO package's HASH -function. It was the similar problem. Dependent of this problem and the long run times of the procedures mining the text, it was necessary to speed up the algorithm somehow. And the algorith was just iterating across a collection (nested table), comparing each entry with the wanted word, what was very slow. Here I got an idea. I used alrea...