Artificial Intelligence (AI) pervades more and more aspects of our lives. For some, it is the beginning of a new era, for others, it is the Pandora's box. Despite all the advantages of AI, the issue of threats and challenges associated with it cannot be left unaddressed.

This topic has been explored by the European Commission in a comprehensive report entitled: “Algorithmic discrimination in Europe: Challenges and opportunities for gender equality and non-discrimination law” by Janneke Gerards, Utrecht University Law School and Raphaële Xenidis, Edinburgh University Law School and Courts, Copenhagen University, published in 2020. The problems raised in this report will be discussed below, as well as in future articles that will also be published on this blog. The purpose of this article is to introduce the principles on the basis of which algorithms work.

Algorithms can be defined as a set of computer instructions that for a series of input data, what gives a wide range of possibilities. Acting on the basis of such data, algorithms can generate a key output, directly pointing out the reasonableness of a decision or estimating the probability of the occurrence of certain phenomena. An example of such qualitative outputs would be a feedback message with a decision to grant a social security benefit to a specific person, to fine someone who has been found to be speeding or that there is an increase in crime in a certain neighbourhood. Whereas examples of the quantitative outputs would be determination of the amount of social benefit granted and possible modifications of its value in relation to the extension or shortening of working hours of a given employee, or even of the entire group of employees in a given workplace. More examples of application of such outputs can be found in the articles. This type of feedback received within seconds after data input does not remain just a matter of theory as its practical aspect is of significant importance.

However, in order for algorithms to perform all these roles, it is necessary to verify them in advance in order to exclude the risk of incorrect and underdeveloped algorithms that may generate undesirable results, immediately visible, and sometimes such problems are accidentally identified after months of use.

Such alarming phenomena demonstrating malfunctioning of an algorithm include an explicit bias in their performance, which leads to a question: What did go wrong?

In order to answer this question, it is not sufficient to just recognise the fact that not only humans may discriminate, but also algorithms. It is necessary to scrutinize the methodology on the basis of which they operate.

Rule-based algorithms

Generally speaking, algorithms perform thought processes according to the reasoning pattern: 'if ‘x’, then y'.

Let’s imagine that a provision of law stating that driving faster than 200 km per hour on a motorway is prohibited and violation of that rule is punishable by a fine. As a result, if someone has been found to have been driving faster than 200 km per hour, then the consequence must be that they have to pay the fine.

In cases requiring more complex thinking, the amount of logical sub-rules, variables is respectively increasing (in the next stage of our example, we can specify the financial scope of the penalties to be imposed and the additional offence of alcohol abuse). The decision-making processes can be split into different modules operating in accordance with the initial pattern (see the working mechanism of exemplary rule-based algorithms on a decision tree presented below).

A. Freitas, D. Wieser, R. Apweiler, On the Importance of Comprehensible Classification Models for Protein Function Prediction, ResearchGate: https://www.researchgate.net/figure/Example-of-a-decision-tree-with-three-leaf-nodes-converted-into-a-set-of-three_fig3_41425089, (dostęp 15.11.2021)

Computer algorithms developed according to the 'if x, then y' rule create predictable knowledge-based systems, as they are based on fixed rules and a precisely defined set of variables. Therefore, rule-based algorithms become more efficient alternative to human decision-making process. However, it should be emphasised that for proper operation of such rule-based algorithm, it should be provided with precise and unambiguous variables that can be interpreted in accordance with the ‘if ‘x’, then ‘y’’ rule.

Machine-learning algorithms

In comparison to the rule-based algorithms, machine-learning algorithms are “more intelligent” due to the fact that they are characterised by their ability to learn. They are able to autonomously adapt, evolve and improve in order to optimise their outcomes achieved based on any input data without being explicitly programmed to do so. Machine-learning algorithms can be characterised as dynamic because, unlike the rule-based algorithms, their rules change depending on the input data. These algorithms make use of different analytical techniques (as outlined below), which enable to determine correlations and patterns, especially in comprehensive database.

With the use of classification techniques, an algorithm can be trained to recognise categories corresponding to particular data on the basis of pre-defined classes. In order to achieve this, an algorithm can learn to recognise spam on the basis of what humans previously defined as spam messages (including typical words and phrases for the spam messages).

Next - after the classification process - an algorithm can be trained to detect spam messages, while recognising common patterns distinguishing them, for instance: by detecting characteristic phrases. This is why, in incoming disruptive e-mails, the titles of such e-mails and senders are constantly changing.

Clustering techniques can be useful in detecting fraudulent tax reports which may be disclosed by some kind of discrepancy from the regular reports. This technique helps the algorithm to learn to identify discrepancies, commonalities and correlations between highly diverse data. Besides being able to create specific clusters of situations, events or people with comparable features (e.g. interests, preferences), the algorithm can also (or especially!) detect anomalies.

Regression techniques will work perfectly well in the banking sector, as by developing a proficiency in estimating probabilities, the algorithms will be able to calculate the level of credit risk based on a comparison of personal data with the data regarding loan/credit history or life situation.

On the other hand, entrepreneurs, particularly in the e-commerce industry, are likely to be the most interested in association techniques. By using them, the algorithm will be able to detect close correlations between future-oriented data. For example, the algorithm will be able to recognise a close correlation between buying a smartphone and a matching phone case, or between watching TV series X and then TV series Y. Such correlations create the so-called "association rules", which can be used to recommend additional purchases to customers, set prices, personalise information or for purposes of behavioural targeting practice.

Summing up, applying each of the above techniques has its own pros, but the most optimal outcomes are possible by using them collectively. For example, thanks to combination of clustering and association techniques, a professional profile or even a marital status of a given person can be identified. By way of subsequent implementation of regression techniques, the algorithm can predict other preferences of that person, e.g. tendency to use stimulants. Last but not least, the unquestionable usefulness of algorithms in predicting human behaviour or the occurrence of consequences of events should be particularly emphasised.   

A wide range of further issues will be discussed in the second article in this series providing more information i.a. about the enhanced learning or specific causes of discrimination by algorithms.