| CIDEC ÜIK |
Estonian Winter Schools in Computer Science Eesti arvutiteaduse talvekoolid |
EWSCS 2006 EATTK 2006 |
Juhan Ernits
Tallinn University of Technology, Dept. of Computer Science, Estonia
Model checking in general involves searching possibly very large state spaces for proving or disproving a query --- a formula typically in some temporal logic.
Bitstate hashing [2], also known as super-trace, is a well known method for reducing memory consumption of the whole state space search by storing only a single bit for each seen state at the address calculated by a hash function.
The drawback of the method is the possibility of hash collisions that will result in unexplored parts of the search space rendering the method to be sound but incomplete. Still, fast reachability checks that yield a valid trace are essential for applying model checking in software/hardware synthesis [1], in scheduling, and in fast debugging of models.
In general, the bigger the hash table, the lower the probability of hash collisions. But, due to the pseudo-random nature of hash functions, big hash tables are much less cache efficient than those of the order of magnitude of the processor cache as prefetching adjacent hash table bytes from memory does not increase the cache hit rate. But present processors have cache line sizes of 32, 64 or 128 bytes which are also transfer units between memory and CPU, and it takes approximately 10^2 cycles to transfer instructions or data from memory to (second level) cache in the case of a cache miss.
We propose a method where we iteratively use independent hash functions after each other on fresh hash tables that are of the same order of magnitude as the CPU cache. Independent hash functions can easily be gained by using a modulus-based hash function and by changing the size of the hash table (and thus the base of the modulus function). An additional benefit of the method is its suitability for parallelisation as hash functions for different table sizes can be calculated in parallel on different CPUs and cores. The method has been experimentally implemented for Uppaal [3] and Spin [4]. As a result, in the case of several examples, the improvement rendered the parsing of models to be the bottleneck instead of the actual model checking.
[1] Ernits, Juhan. Memory Arbiter Synthesis and Verification for a Radar Memory Interface Card. Nordic Journal of Computing, 12(2), 68-88.
[2] Holzmann, Gerard J. 1998. An Analysis of Bitstate Hashing. Form. Methods Syst. Des. 13, 3, 289307.
[3] Model checker Spin, http://www.spinroot.com
[4] Model checker Uppaal, http://www.uppaal.com
Markulf Kohlweiss
Catholic University of Leuven, Dept. of Electrical Engineering, Belgium
While privacy is generally regarded as an important social value, and research on privacy enhancing technologies has recently made important progress, little progress has been made in the deployment of these technologies.
Our contribution is, first, the definition of a new signature mechanism, called assertion-based signatures, that are based on state-of-the-art privacy technology research and, second, the design of an XML signature profile for this new mechanism. Assertion-based signatures enable the signer to anonymously sign documents, while proving in the same signature that she possesses attribute certificates that fulfill the assertion made in the signature. Just one important use case for such signatures is to allow a web user to flexibly provide attribute assertions, e.g., as defined by the SAML, without the need to rely on a third party.
Because of this relation to other web service enabling XML formats such as SAML and XACML, the need for an XML signature profile of assertion-based signatures is well grounded. Moreover, our work provides an important testbed for the suitability of existing web technologies to the privacy challenge.
Sven Laur
University of Tartu, Inst. of Computer Science, Estonia
Solutions for an easy and secure setup of a wireless connection between two devices are urgently needed for WLAN, Wireless USB, Bluetooth and similar standards for short range wireless communication. In this talk, we propose a new three round protocol MA-3 for mutual data authentication based on a cryptographic commitment scheme and short manually authenticated out-of-band messages. We compare the solution to previous protocols like MANA-II and SAS.
We provide a rigorous security proof under the assumption that the commitment scheme is non-malleable. Moreover, we show that non-malleability of the commitment scheme is essential. Finally, provide suggestions how to instantiate the MA-3 protocol in practise.
The talk is based on joint work with N. Asokan and Kaisa Nyberg and the preliminary version of the article can be accessed via Eprint archive: http://eprint.iacr.org/2005/424
Yury Lifshits
Petersburg Department of Steklov Institute of Mathematics, Russia
There is a new topic in algorithm design: processing compressed objects without unpacking. For a number of problems, we now have more efficient algorithms than just "unpack-and-solve". In the talk we will study several problems on compressed strings. For some of them, we show new efficient algorithms, while for some others, we prove their NP-hardness.
The central result is an algorithm for fully compressed pattern matching. For given compressed strings P and T (Lempel-Ziv compression), we determine whether P is a substring in T. The algorithm works in cubic time in the COMPRESSED size of input. Previously known algorithm (1997) works in n^4. Then we briefly discuss algorithms for computing fingerprint table and for finding periods, subsequences, and covers. However, computing Hamming distance and longest common subsequence is NP-hard for compressed strings.
Dmitri Pavlov
Saint Petersburg Institute of Fine Mechanics and Optics, Russia
I want to present an O(m * n * 2^n) deterministic algorithm for solving the mean payoff game problem, m and n being respectively the number of arcs and vertices in a game graph. The algorithm is based on the potential theory, which is used in all preliminary theorems and is interesting by itself.
A mean payoff game is a game for two players, Alice and Bob, who move a token along the arcs of a weighted directed graph. Roughly speaking, the first player wants to make the sums of the weights of the arcs passed so far tend to positive infinity whereas the second player wants to make them tend to negative infinity. The problem is to find the winner given a game graph and a starting vertex.
This problem has important applications in the area of program verification, more precisely, the problem of model checking for modal mu-calculus reduces to the mean payoff game problem. The mean payoff game problem is also interesting from the theoretical viewpoint, since it is one of the few problems known to lie in NP and co-NP, but not known to lie in P so far.
The key idea of the algorithm is to remove one vertex from the graph, calculate the winning sets for the reduced graph, and reinsert the removed vertex into the graph.
Konstantin Pervyshev
St. Petersburg State University, Fac. of Math. and Mechanics, Russia
There is a common belief that quadratic time algorithms are more powerful than their linear time counterparts. Indeed, a "time hiearchy theorem" is known: there exists a lanuage recognizable in quadratic time not recognizable in linear time. Surprisingly, there is no evidence that this theorem holds for probabilistic algorithms.
Recently, Barak et al. proved a time hierarchy for one-bit non-uniform probabilistic algorithms. Roughly speaking, if one bit of an algorithm's description depends on the algorithm's input length, then such probabilistic algorithms having more time can recognize more languages.
We continue this line of research and prove that a time hiearchy exists for virtually any class of one-bit non-uniform algorithms (various types of non-uniform probabilistic algorithms, non-uniform interactive protocols etc.) Besides, our result implies that existence of a time hierarchy for non-uniform probabilistic algorithms can't be an evidence to the existence of a time hierarchy for uniform ones.
Philipp Rümmer
Chalmers University of Technology, Dept. of Computing Science, Sweden
Free variables and unification are standard techniques for constructing suitable instances of quantified formulas in the area of automated theorem proving. Combined with constraint methods, they enable (less standard) non-destructive tableau-like calculi that do not require backtracking. While this approach has proven to be very usable for pure first-order reasoning, the combination of theories (like arithmetic) that are commonly present in the context of program verification with free-variable methods is difficult and so far not very successful. Most theorem provers in this area are nowadays based on ground calculi. This abstract describes and discusses an idea for integrating theories (focusing on Presburger Arithmetic) in the constraint mechanism of a classical non-destructive free-variable calculus for program verification.
Hans Svensson
Chalmers University of Technology, Dept. of Computing Science, Sweden
Algorithms described in literature can often be used to solve practical, industrial problems. Often when a developer is faced with an algorithmic problem, a solution to the problem can be found in literature. In safety-critical industrial settings, algorithms that have been formally verified should be very attractive candidates for implementations. Nevertheless, little transfer of algorithms from research papers directly into products can be observed. There are several reasons for this. One reason is that algorithms are often designed under special assumptions that do not transfer to reality. Another reason is that it is hard to find an algorithm that exactly matches the requirements in a particular setting.
We have made a case study in which we have implemented algorithms for the widely known and broadly studied problem of leader election. Using our experience from the case study we describe some issues related to: choosing an algorithm for implementation, adapting the algorithm, implementing the algorithm and verifying the implementation. Despite thousands of articles on the leader election topic, it still requires a lot of engineering to select the relevant articles, and get a correct algorithm implemented in an industrial setting. Modifications are necessary to meet all requirements. We propose adaptation and testing of formal properties as a realistic and cheap way to check the correctness of the modifications, since performing a formal proof seems unrealistic for industrial systems. We show how we use the properties stated in the articles to guide our tests.
Ilja Tšahhirov
Tallinn University of Technology, Dept. of Informatics, Estonia
We enhance the technique of static analysis for confidentiality in cryptographic protocols with support for asymmetric encryption operation. The technique being extended was originally developed by Laud for symmetric encryption, and then extended with digital signature primitive support by Tshahhirov and Laud. The protocol analysis method, similar to Abadi and Rogaway, is based on replacing the cryptographic operations in the protocols with constructs that are "obviously secure". The replacements are made in such a way that no insecure protocol becomes secure. The transformed protocols are then statically analysed; they should be easier to analyse than the original protocol. Handling the asymmetric encryption is a step in exploring the general approach and making it able to handle realistic scenarios gracefully.
Vesal Vojdani
University of Tartu, Inst. of Computer Science, Estonia
As the software industry is under a fundamental paradigm shift from sequential to parallel programming, new languages and software tools are required to deal with the complexity of parallel execution. This presentation is about the Goblin, a new dataflow analyzer that detects race-conditions in Posix-threaded C programs.
The difficulty with analyzing multi-threaded programs is in dealing with all possible interleaving of multiple threads. The Goblin uses a solution to this problem based on the theory of partial global invariants developed originally for the Trier Data Race Analyzer by Helmut Seidl, Varmo Vene, and Markus Müller-Olm. In the presentation, we will briefly examine this approach and then move on to a demonstration of the Goblin Data Race Analyzer.
The Goblin improves on the Trier analyzer by moving to a more modern toolset. In particular, it uses the CIL infrastructure for Program Analysis and Transformation as a C frontend and features an Eclipse based user interface. The goal of the Goblin project is to build a user-friendly open-source tool that finds real bugs in real software. We assess the fulfillment of that goal by viewing performance benchmarks and conclude the talk with a discussion of the Goblin's future.
Dennis Walter
Chalmers University of Technology, Dept. of Computing Science, Sweden
One may take the viewpoint that an information flow type system --whose types are elements of some security lattice-- essentially performs a syntax-directed dependency analysis between program variables. We found that this analysis can also be phrased in a program logic based on symbolic execution, where instead of security types one has abstractions of variable values, meaning that the execution of (part of) a program is abstracted into an ``imprecise'' update of the values of the variables occuring in the program. This abstraction cannot be arbitrary, of course. In particular, the abstraction function for a program variable may have to depend on certain other (program) variables.
When applying this kind of abstraction during a formal proof of non-interference of program variables in a given program --the classical semantic concept for information flow security--, the arguments to the abstraction functions resemble security types. We provide proof rules for the abstract evaluation of a program and demonstrate their soundness. Some examples are given to highlight the similarity of our approach to recent security type systems for the simple while language. The ideas presented here are still work in progress.
Jan Cappaert
Catholic University of Leuven, Dept. of Electrical Engineering, Belgium
Code obfuscation tries to make analysis of code more difficult to an attacker. It consists of a number of transformations that alter the code in a way that preserves the program's functionality but makes the internals harder to understand.
The most common attack scenario is the reverse-engineering of executables. In order to modify an executable or reuse code fragments, one first has to analyze code, or at least parts of it.
From the owner's perspective, protection of the software against these analysis and tampering threats can be achieved by embedding several software security techniques. These techniques include several verification mechanisms, self-modifying code, tamper-resistant methods and cryptography.
In the scenario where an executable (or installer) is distributed, the owner has several decicions to make: how to protect his/her software, where to apply these techniques during software design (or deployment), etc. In our study we will only focus on code transformations in the context of code obfuscation.
Code obfuscation transformations can be classified into different groups. According to Collberg et al. [1] there are 4 groups: layout transformations, data flow transformations, control flow transformations, and preventive transformations. Most of Collberg's research finds its roots in Java (bytecode) protection against decompilation.
Code obfuscation however can be used for other languages as well and during differnt stages of the software's design and build. One can apply transformations before compilation at a source code level (e.g. by a preprocessor), during compilation by modifying the compiler, or after compilation at a binary level (e.g. on assembler code [2]).
Although the goal is often a self-protecting binary, we study the possibilities of performing code transformations at a highter level.
References
[1] C. Collberg and C. Thomborson and D. Low. A Taxonomy of Obfuscating Transformations. Internal report #148. Department of Computer Science, The University of Auckland, 1997.
[2] G. Wroblewski. General Method of Program Code Obfuscation. PhD thesis. Wroclaw University of Technology, Institute of Engineering Cybernetics, 2002.
Johan Glimming
Stockholm University, School of Computing Sciences and Communication, Sweden
This poster will contain some examples of structural recursion and structural corecursion on (co)algebraic datatypes, and then give some examples on current research problems on generalising these principles to higher-order datatypes useful for modelling object-based programs and classes. Such a generalisation means here a step away from set-theoretic semantics into domain theory. The reflexive domain D=D->D is a special case. Research on finding the right equality on the resulting datatypes (object types) will be mentioned (i.e. bisimulations on recursive function spaces).
Andre Karpištšenko
Tallinn University of Technology, Dept. of Computer Control, Estonia
Modern systems are software-intensive and must meet increasingly demanding requirements in order to cope with significant degrees of uncertainty, dynamic environments, and to provide greater flexibility. Agent-based development metaphors and technologies provide well suited structures and techniques to handle such complexity. However, when developing complex open systems, the tools provide insufficient support for effective specification and implementation of such systems.
With the official release of UML 2.0, the fundamental definition of behavior will offer means for modeling interactions and for specifying systems independently of where the system/environment boundary is drawn. This would suggest that the utilization of Model Driven Architecture (MDA) offers a content platform for modeling multi-agent systems. However, as critics of MDA suggest, it is distinctly nontrivial – many would argue impossible – to support and evolve semantically correct platform-specific models for complex platforms such as J2EE or .NET. Drawing the parallel with inherently complex multi-agent systems, additional means of modeling have to be considered.
The focus of this doctoral work is on the effective modeling practices in the early stages of software development. The purpose is to improve the modeling possibilities of time-aware multi-agent systems (TA-MAS) to enable formal analysis of composite system behavior
Petras Pauliunas
Vytautas Magnus University, Dept. of Applied Informatics, Lithuania
The experience shows that information technology development at high schools and universities opens a wide scope for creating virtual learning environments. A paperless knowledge testing system is the most important part of a virtual learning environment. It helps the lecturer to evaluate student knowledge fast and thorougly. Test as a knowledge evaluation and examination form lets us avoid value judgments, that is characteristic to other knowledge and ability verification patterns. The most important thing for such systems is to use a suitable assignment tool and a GUI attractive for most users. That is why paperless knowledge testing system has to be cross-platform and use open source software.
The framework of paperless knowledge testing system was created using the Apache web server, PHP development environment and MySQL database. Access to the system is restricted, because the paperless knowledge testing system is available for all computers attached to the internet. Based on the Computer Science course experience at Vytautas Magnus university, tests were created of “text fill-in” questions. Answers to such questions could be single words or whole sentences. A test created of “text fill-in” questions shows true student knowledge and doesn’t let the student to guess the right answer. The main problem of such tests is correct evaluation of student answers, that usually contain spelling errors, but can be logically correct. To solve this problem I have developed an algorithm for evaluating test results. This algorithm ignores spelling errors and evaluates logically correct answers. The developed paperless knowledge testing system helps the lecturer evaluate student tests and extends the scope of making assignments for lecture.
Gunnar Piho
Tallinn University of Technology, Dept. of Informatics, Estonia
According to Arlow and Neustadt [1], a business archetype is a primordial thing that occurs consistently and universally in business domains and in business software systems. A business archetype pattern is collaboration between business archetypes. Examples of archetype patterns are Party, Party Relationship, CRM, Product, Inventory, Order, Quantity, Money, and Rule.
Let us assume there are domain-specific archetypes (health care, education, banking, and so on) and archetype patterns. If so, how can they be identified, designed and used in software development. More precisely: what are archetypes and archetype patterns (A&AP)? What are domain specific A&AP? How to identify, prescribe and formalize them? Where and how can they be used? How can A&AP be used in domain models? Is there some universal (useful in most domains) set of A&AP? What about validation and verification of models with A&AP? How can A&AP relate to other software engineering artefacts (objects, components, agents, frameworks, services …)? How can A&AP be used in software development? What benefits and drawbacks might we expect? Could we improve maturity and capability of software development process in terms of CMMI [2] when using A&AP?
The goal is to design techniques to identify, design and use A≈ to use those techniques in identifying and designing clinical laboratory A≈ and to use those clinical laboratory A&AP in describing clinical laboratory domain and in prescribing LIMS [3] requirements.
The textbooks of Dines Bjørner [4-6] and other works on software and domain engineering will be used as theoretical background.
-------------------
1. J.Arlow and I.Neustadt, Enterprise Patterns and MDA: Building Better Software with Archetype Patterns and UML, Addison-Wesley. 2003.
2. CMMI, see online http://www.sei.cmu.edu/cmmi/general/general.html
3. ASTM Standard, E-1578-93 (Reapproved 1999), Standard Guide for Laboratory Information Management Systems (LIMS)
4. D.Bjørner, Software Engineering, vol.1, Abstraction and Modelling, Springer, 2005
5. D.Bjørner, Software Engineering, vol.2, Specification of Systems and Languages, Springer, 2005
6. D.Bjørner, Software Engineering, vol.3, Domains, Requirements, and Software Design, Springer, 2005
Risto Tamme
Tallinn University of Technology, Dept. of Computer Science, Estonia
Although program analysis has been a hot topic in computer science for a while, applications containing relational databases have not received sufficient attention. But almost every modern organization uses/produces considerable amounts of data and there are programs (for example for logistics, decision support, accounting, inventory systems) that use/process it. Thus more and more applications are related to some database which raises the acute need for methods and tools for efficient testing of such systems. Due to data dependencies an error might show up while testing with one data set and successfully hide itself in the case of some other data set. Thus populating the database with relevant data is crucial for successful and useful testing of applications using databases.
The aim of the study is to find effective methods for testing database related software applications. The main issues are how to generate data to enable the discovery of large portions of possible errors concerning a certain application.
Our work focuses on answering the following questions:
Indre Zliobaite
Vilnius University, Dept. of Informatics, Lithuania
I’m working on my Master thesis on topic “Financial time series prediction using artificial neural networks”. I would like to present the results achieved.
A characteristic feature of current research in economic and financial data mining methods is the environment dynamics.
Data changes could be of different nature, including trends, random or systematic substitution of prediction tasks, changes in underlying probability distribution, especially due to enormous amount of influencers to the financial asset price the data. Thus, in the financial prediction task, the algorithms have to be able to operate using small samples of historical information and adapt to rapid changes in prediction task.
The objective of the present research is to analyze the theoretical background of the factually unsolvable financial time series prediction task, to suggest methods to achieve more reliable forecasting results in situations when the environment is changing permanently.
In dynamic financial time series prediction, neural network training based on short data sequences results in more accurate predictions than using lengthy historical data. To take into account different character of distinct segments of non-stationary financial time series, we suggest a multi-agent system (MAS) based forecasting algorithm. We require that diverse agents be good for different segments of the data. The optimal training set size is determined theoretically and experimentally. To reduce the generalization error we suggest:
http://www.cs.ioc.ee/yik/schools/win2006/
Modified Jun 09, 2009 11:47 by ewscs06(at)cs.ioc.ee