CIDEC    
ÜIK
Estonian Winter Schools in Computer Science    
Eesti arvutiteaduse talvekoolid
EWSCS 2005
EATTK 2005

10th Estonian Winter School in Computer Science (EWSCS)
X Eesti Arvutiteaduse Talvekool

Palmse, Estonia, February 27 - March 4, 2005

under the auspices of European Educational Forum

STUDENT TALKS AND POSTERS 2005 (abstracts)


Talks

Exponential Lower Bounds for the Running Time of DPLL Algorithms on Satisfiable Formulas

Dmitry Itsykson
St. Petersburg State University, Fac. of Math. and Mechanics, Russia

DPLL (for Davis, Putnam, Logemann, and Loveland) algorithms form the largest family of contemporary algorithms for SAT (the propositional satisfiability problem) and are widely used in applications. The recursion trees of DPLL algorithm executions on unsatisfiable formulas are equivalent to tree-like resolution proofs. Therefore, lower bounds for tree-like resolution which are known since 1960s) apply to them.

However, these lower bounds say nothing about the behavior of such algorithms on satisfiable formulas. Proving exponential lower bounds for them in the most general setting would be equivalent to proving P not equals NP. In my talk, I give exponential lower bounds for two families of DPLL algorithms: generalized myopic algorithms (that read upto $n^{1-epsilon}$ of clauses at each step and see the remaining part of the formula without negations) and drunk algorithms (that choose a variable using any complicated rule and then pick its value at random).

Propositional Logic for Program Synthesis

Vahur Kotkas
Institute of Cybernetics at TUT, Estonia

I would like to present a fragment of intuitionistic propositional calculus that is used for automated program synthesis. This logic together with prewritten pieces of programs (modules), automatic proof search and result optimization allows to generate logical clauses which in turn can be easily translated into executable clauses of given programming language.

Additive Conditional Disclosure of Secrets

Sven Laur
University of Tartu, Inst. of Computer Science / HUT, Finland, Estonia

This talk introduces various extensions and applications of the basic homomorphic oblivious transfer (HOT) protocol proposed by Aiello, Ishai and Reingold. First, we show how to extend HOT to work on top of groups of composite order. Previous constructions of HOT were secure only on top of groups of prime order; and the only well-known homomorphic cryptosystem working on such groups is the ElGamal. Since the ElGamal cryptosystem is not additively homomorphic, the applications of the previous constructions were seriously limited.

Importantly, the new HOT protocol allows to implement straightforwadly the conditional disclosure of secrets protocol, that enables to convert a large class of two-round protocols, private in the semi-honest model, to two-round protocols, private in the malicious model. The resulting protocols are secure in the standard model (without using random oracles), and extremely efficient. Some applications of our work include the most efficient known oblivious transfer protocol, and two-round private versions of various tasks in privacy-preserving data-mining and linear algebra.

The talk is based on a joint work with Helger Lipmaa. The resulting paper will be submitted before the winter school.

Different Selection Schemas in Multimodal Optimization Task

Irina Lovtsova
Riga Technical University, Faculty of CS and IT, Latvia

This work discusses the influence of different selection schemas on Genetic algorithm (GA) behaviour in multimodal functions optimization task. The aim of this work is to understand the dynamics of interaction in each population of individuals using different selection methods. To provide a comparison of the used selections, the following parameters were used: selection reproduction rate, loss of diversity, selection intensity and selection variance. In addition to observing selections effect, different parameters of GA were used, like population size, crossover and mutation rate. For each generation the maximal and average fitness values of individuals are calculated.

During the experiments it was decided to use five selection methods (tournament, truncation, roulette wheel, linear ranking and exponential ranking selections) and two crossover methods (uniform and unimodal normal distribution crossovers). In each experiment the number of generation is constant to observe GA convergence quality, speed and stability. In this work, GA convergence quality means the ability to find global extremum of multimodal functions. The experiments are divided in the groups using different GA parameters and operators.

Approximate String Matching Using Suffix Tries

Hendrik Nigul
University of Tartu, Inst. of Computer Science, Estonia

In bioinformatics one of the central problems is to find common substrings from DNA sequences.

Our approach to tackle the problem uses suffix tries - data structures, that provide concise representation of internal structure of strings by enumerating all substrings. To find patterns from a string, we create an index (i.e. suffix trie) from the string. This makes the searching process faster. It also gives us the possibility to search for several patterns at the same time or even search for all substrings of given patterns.

In this talk we show how to use suffix tries to find similar patterns from biosequences. We assume that there must exist some patterns in some sequences and these patterns do not occur in other sequences (that often). This means, we are searching for interesting patterns, that also have some biological meaning.

Triangle Strip Generation Using Tunneling Algorithm

Massimiliano Porcu
University of Cagliari, Dept. of Mathematics and Computer Science, Italy

In computer graphics applications, surfaces and volumes are often visualized using triangle meshes. The performance of the rendering task is usually strictly related to the rate at which triangulation data is sent to the graphics board. Using standard approach, we need to transmit to GPU three vertices per triangle. It is possible to effectively reduce the data rate, ordering triangles so that consecutive triangles share an edge. Using such an ordering, called triangle strip, the first triangle is defined by three vertices and each of the following triangles by only one additional vertex. The stripification encode is the most popular type of geometry compression technique in computer graphics.

The goal of a stripification algorithm is to compute a small set of triangle strips whose union covers the mesh. Unfortunately, the computing of an optimal set of triangle strips is NP-complete. Nevertheless, because the importance of this topic, many heuristic has been developed, each one with its own advantages and disadvantages.

In this talk, a new stripification approach, the so called Tunneling Algorithm, will be discussed and analyzed. The method, based upon the dual graph of a mesh, can be used to create a stripification from scratch or to improve an existing stripification. According to the main criteria of strip quality, it goes to show as an effective improvement in comparison with standard algorithms: lower number of stripe, lower number of isolated triangles, better cache coherence.

Keywords [Geometry Compression, Computer Graphics, Triangle Strip, Tunneling Algorithm]

Program Proofs and Compilation

Ando Saabas
Institute of Cybernetics at TUT, Estonia

So far the use of proof-carrying code has mostly been limited to showing well-typedness of binary code, in which case the compiler automatically creates a certificate (proof) to enable efficient checking of type safety of the code. If some stronger properties are required from the binary code like adherence to an interface specification, automatic generation of proofs by the compiler becomes impossible. Instead, we need a way to transform proofs of a high-level program into proofs of its compiled counterpart. Such transformations are not completely straightforward because of optimizations that typically take place during compilation. The talk will outline some of the problems that arise and discuss ways of tackling them.

(Joint work with T. Rezk and T. Uustalu.)

Calculation of Shannon's Interval Entropy

Aleksandrs Vališevskis
Riga Technical University, Faculty of CS and IT, Latvia

In my poster a method of calculation of Shannon’s entropy in case of interval probabilities is described. The value of entropy also is interval, as the probabilities are interval. This task corresponds to uncertain conditions in which it is not possible to determine exact values of probabilities of a system’s states. Other applications include calculation of the amount of information about different alternatives in an uncertain decision task, as described in [1]. During one of the previous Estonian Schools in Computer Science I presented a heuristic approach towards the solution of this problem. In this poster an analytical solution is described. The problem is presented as an optimization task with inequality constraints and is solved using method of Lagrange multipliers. The solution of the stated problem is given as a multi-step algorithm, which iteratively seeks the admissible entropy value, which fits all the constraints. Moreover, an example of applying the solution is presented.


[1] Aleksandrs Vališevskis, Arkady Borisov. Using Interval-Valued Entropy for Risk Assessment, Proceedings of “Modelling and Simulation of Business Systems”, Kaunas University of Technology Press – Technologija, Lithuania, 2003, pp. 74-78.

Recent Progress in Obfuscation

Nikolay Vyahhi
St. Petersburg State University, Fac. of Math. and Mechanics, Russia

To ensure platform independence, mobile programs are distributed in forms that are isomorphic to the original source code. Such codes are easy to decompile, and hence they increase the risk of malicious reverse engineering attacks. Several methods have been proposed to alleviate this situation. The highest level of protection is achieved with cryptographic solutions, but, unfortunately, this requires dedicated hardware with integrated decryption and execution units.

A more modest level of protection is achieved through obfuscation. An obfuscator is a tool which -- through the application of code transformations -- converts a program into an equivalent one that is more difficult to reverse engineer. The advantage of this method is that it runs on standard hardware and without any changes to virtual machines or available interpreters.
[ C.Collberg, C.Thomborson, D.Low. Breaking Abstractions and Unstructuring Data Structures (ICCL'98) ]

In the last works one was made conclusion about necessity of using different obfuscating methods against different attacks. In our research we construct mathematical models for the different types of obfuscation, we determine the different criteria of secrecy and develop methods for the realization of these determinations. This approach made it possible to do one additional step on the way to obfuscating with the proved secrecy.


Posters

Classification Trees as a Tool to Acquire Knowledge for Expert Systems

Ieva Bolakova
Daugavpils University, Dept. of Computer Science, Latvia

There exist several knowledge acquisition strategies in the field of knowledge engineering. Knowledge engineer has to elicit knowledge in order to construct a model of an application domain, which is used by experts for decision making.

Another strategy is based on the knowledge base filling by using the special programming tools. There is no necessity of knowledge engineer direct attendance in this process – expert himself inputs his own knowledge into computer.

However, for some situations experts have only weak causal knowledge or have not it at all. In such cases, machine learning tools can collect training data and process it through an induction engine in search of diagnostic knowledge. This machine learning strategy is known as a decision tree induction. Classification trees are one of more popular approaches to solve the data mining tasks. The induction mechanism is embedded within a data mining system that suggests plausible rules to an expert, who can override the rules or modify the data from which the rules were derived.

So the basic task is to extract knowledge (or information) from the data (databases). Classification trees build a hierarchical structure of rules, which has a form of a tree. In order to make a decision to what class an object belongs we have to answer the questions, which stand in the nodes of this tree starting from its root.

When the decision tree has been constructed, it is easy to convert it into a set of rules. But if the received decision tree is very large, that means that the rule set will be large too. Unfortunately, large tree sizes do not mean better expert systems.

This paper examines some possibilities of reducing the rule set sizes in order to increase the expected accuracy by using decision tree pruning methods.

Decision trees have proved to be valuable tools for the description, classification and generalisation of data for expert systems. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, and in many others.

The popularity of the classification trees approach is based on obviousness and simplicity of the method. However, we have to designate some faults too – trees cannot find more completed and more accurate rules in the data. Nevertheless, majority of systems uses just this method.

Optimal Scheduler Synthesis using Cost Optimal Reachability Analysis

Juhan Ernits
Tallinn University of Technology, Dept. of Computer Science / Institute of Cybernetics, Estonia

The work builds on the scheduler synthesis problem presented in EWSCS`04, last year. The concrete example is based on a radar system memory interface case study [1] from the IST AMETIST project. The poster presents a solution to the case study where size optimal buffer configuration for the specified signal processing task is synthesized using a model checker with cost optimal reachability analysis extensions - Uppaal Cora [2]. For this purpose the model is augmented with a monotonously increasing cost variable that is proportional
to the memory range the buffers use. Additionally, the initial case study is refined by constraining the behaviour to double data rate (aka
DDR) memory [3] that can be used to store intermediate values of signals for the purpose of signal processing.

In this example model checking is used in two distinct ways. First, we apply a model checker with cost optimal reachability analysis extensions for memory arbiter synthesis by looking for a (the best in terms of specified cost) trace to a desired location. Second, we integrate the synthesized component (the arbiter) into the initial system and verify that there are no paths to unwanted states and the system never gets deadlocked.

References
[1] Gerd Behrmann, Sébastien Bernicot, Thomas Hune, Kim G. Larsen, Sylvain Lecamp, and Arne Skou, Case study 2: Memory interface for radar system, Deliverable to the IST AMETIST project, July 2002.
[2] Gerd Behrmann et al, Uppaal CORA, http://www.cs.aau.dk/~behrmann/cora/, 2005.
[3] JEDEC Solid State Technology Association, Double data rate (DDR) sdram specification, JEDEC Standard No. 79 Releace C, March 2003.

Data Warehouse and Stable Financial Modelling

Audrius Kabasinskas
Institute of Mathematics and Informatics, Lithuania

Many financial institutions (banks, credit unions, stock exchanges, etc.) create data warehouses (DW) that contain large data sets of financial information. Stock price data and other stock market information are kept in data warehouses, but usually this information is very multimode and complex. Modules of knowledge discovery in financial data sets realize statistical data analysis with recommendations for investors. An important part of DW in solving such problems is data marts which have simpler architecture, so stock prices may be easily extracted and analyzed. Computational aspects of mathematical modelling of stock markets are the subject of this research.

Data analysis and financial modelling usually start from hypothesis testing of the distribution of stock price returns. It has been empirically tested that returns are often non-Gaussian. Since the classical models of financial market based on the hypotheses of normality and the efficient market become inadequate, a system for financial modelling has been developed, following the stability hypothesis of financial data. Several statistical and robust procedures are examined while creating the system for stock portfolio simulation and optimization. The development of software is discussed, which enables us to calculate option price, to estimate stability parameters of distribution, to build an efficient portfolio, as well as to solve other optimization problems. Web–enabled technologies for data mart were used to gather the required information on the Lithuanian stock market and also special software is in progress, which will be of use in to applying stable modelling in the optimization of portfolio of the Lithuanian stock market.

Adaptive Algorithm to Estimate Varying Parameters of the Regression Model

Helen Priisalu
Tallinn University of Technology, Dept. of Computer Control / Hansapank, Estonia

We consider the estimation of the varying parameters of the regression model, where the parameters can be seen as a random process, which can be described by Markov model, by the 1-order auto-regression process.

In the Markov model, when it is given the state transfer matrix and the white noise parameters, then the regression model’s parameters estimation algorithm is the same as Kalman’s filter equations.

The parameters of the Markov model, which are the state transfer model, the white noise and the distribution characteristics of initial states, can be found by numerical computation applied to the given dataset of the object’s observed inputs and outputs, in order to minimize the sum of squared errors by an optimization method. In the current study, it is used for the parameters’ numerical computation a Newton-Raphson optimization method.

However, in many practical situations, we lack the observation data about the object, but we have got an understanding about the structure of the model, which could be used to describe the object. For example, in the bank, customer credit risk model's structure is the same to a full set of borrowers, but differs only in parameters’ values at the borrower level. Because the adaptive algorithm provides a way to develop credit risk model to lately joined customers at the running base, it’s suitable for the practical banking case.

In the current work we propose several adaptive algorithms, which are derived by analogous to [1] and we compared them based on the simulated data sets. We conclude that not always is the best way to use precise but more complex algorithms. We propose a simple, robust and stable adaptive algorithm to estimate varying parameters of the regression model. The algorithm is applied to the bank’s data to estimate the parameters of the credit risk model and to make the calculation for prognosis. The model is implemented using the commercial software dedicated to handle very large datasets.

Literature

[1] Peter Eykhoff, Parameter and state estimation.
[2] Andrew P.Sage, James L.Melsa. System identification

Program Construction as an Optimization Task

Jelena Sanko
Institute of Cybernetics at TUT, Estonia

The poster describes simple functional constraint network and a value propagation method for program construction. We consider constraints as relations of finite arity on the universal domain specifying which combinations of values of program variables are acceptable. Functional constraint network or computational model is a specialization of constraint network, where constraints can only be interpreted as functional relations. We propose an approach for finding the best path on the functional constraint network from the initial state (knowing variables) to the goal state (asked variables). As the terms "search problem" and "optimization problem" can be considered as synonymous, the search problem for the best feasible solution can be converged to the optimization problem. So we propose to reduce program construction problem to the optimization problem and than use Differential Evolution (DE) method [1] that are basically used for optimizing real-valued multimodal objective function. A coding of programs compiled from relations of a computational model and an abstract framework for program construction by means of DE for optimization is described in [2].

References
[1] Storn, R., Price, K., Differential Evolution - A simple and effective adaptive scheme for global optimization over continuous space. Technical Report TR - 95-012, ICSI, 1995
[2] Penjam, J., Sanko, J., Deductive and Inductive Methods for Program Synthesis. ACIS International Journal of Computer & Information Science, Vol.5, No.3, Septeber 2004

Position Weight Matrices for Representing Signals in Sequences

Triinu Tasa
University of Tartu, Inst. of Computer Science, Estonia

Today, the most powerful method for inferring the biological function of a sequence is by similarity searching on databases. In those databases short (5 - 20bp) sequences with a common biological function are often represented in the form of matrices, as they describe better the uniform structure and increase the speed of database-queries. In my presentation I explain different weight-matrix construction methods, give an overview of most common applications and challenges.

Genome-Wide Identification of Putative Regulatory Signals in the DNA

Jelena Zaitseva
University of Tartu, Inst. of Computer Science, Estonia

During the last 10 years there has been an explosion in sequencing the complete DNA of many organisms. The first eukaryotic genome fully sequenced was yeast Saccharomyces cerevisiae (~12MB), completed in 1996. Since that human (3.12GB) and many other (10+?) higher eukaryotes have been sequenced, providing us with the wealth of information top be interpreted.

One of the biggest challenges of the bioinformatics discovery and modelling is to describe the complex regulatory networks that arise from gene products regulating the expression of other genes in a concerted fashion. For such description we need to identify the signals hidden in the DNA that are targeted by the regulating proteins (transcription factors).

Our goal is to study the distribution and co-occurrences of such signals in the whole genome. Particularly, we study how the basal regulatory signal, the so called TATA-box, is distributed. We propose some methods for identifying the true signals from random genomic features.

(Joint work with T. Tasa, H. Peterson, J. Vilo.)

http://www.cs.ioc.ee/yik/schools/win2005/

Modified Jun 09, 2009 11:34 by monika(at)cs.ioc.ee