Software crisis

History has valuable lessons on AI. Many of us are aware of the replication crisis in social science. Were you aware of the software crisis, first famously discussed at the NATO Conference on Software Engineering in Garmisch, October 1968? We have got used to software being buggy, updates being required on a near-daily basis, often to fix security vulnerabilities – and, given the vast number of high profile cyber attacks, often too late. People are now suggesting using large language models, trained on code people have dumped on the web, to write software. Software testing and static program analysis are going to be more important than ever, whether you’re evaluating internet-connected apps or statistical analysis code.

The original reports are available online. It’s worth having a browse around to see the issues. In 1968, hardware and software had a tiny fraction of the computational and political power it has now.

Actual causes: two examples using the updated Halpern-Pearl definition

Halpern (2015) provides three variants of the Halpern-Pearl definitions of actual causation. I’m trying to get my head around the formalism, which is elegant, concise, and precise, but tedious to use in practice, so I wrote an R script to do the sums. This blog post is not self-contained – you will need to read the original paper for an introduction to the model. However, it works through two examples, which may help if you’re also struggling with the paper.

The second (“updated”) definition of an actual cause asserts that \(\vec{A} = \vec{a}\) is a cause of \(\varphi\) in \((M,\vec{u})\) iff the following conditions hold:

AC1 \((M,\vec{u}) \models (\vec{A} =\vec{a}) \land \varphi\).

This says, if \(\vec{A} = \vec{a}\) is an actual cause of \(\varphi\) then they both hold in the actual world, \((M,\vec{u})\). Note, for this condition, we are just having a look at the model and not doing anything to it.

AC2 There is a partition of the endogenous variables in \(M\) into \(\vec{Z} \supseteq \vec{X}\) and \(\vec{W}\) and there are settings \(\vec{x’}\) and \(\vec{w}\) such that

(a) \((M,\vec{u}) \models [ \vec{X} \leftarrow \vec{x’}, \vec{W} \leftarrow \vec{w}] \neg \varphi\).

So, we’re trying to show that undoing the cause, i.e., setting \(\vec{X}\) to \(\vec{x’} \ne \vec{x}\), prevents the effect. We are allowed to modify \(\vec{W}\) however we want to show this, whilst leaving \(\vec{Z}-\vec{X}\) free to do whatever the model tells these variables to do.

(b) If \((M,\vec{u}) \models \vec{Z} = \vec{z^{\star}}\), for some \(\vec{z^{\star}}\), then for all \(\vec{W’} \subseteq \vec{W}\) and \(\vec{Z’} \subseteq \vec{Z}-\vec{X}\),
\((M,\vec{u}) \models [ \vec{X} \leftarrow \vec{x}, \vec{W’} \leftarrow \vec{w’}, \vec{Z’} \leftarrow \vec{z^{\star}}] \varphi\).

This says, trigger the cause (unlike AC1, we aren’t just looking to see if it holds) and check whether it leads to the effect under all subsets of \(\vec{Z}\) (as per actual world) that aren’t \(\vec{X}\) and all subsets of the modified \(\vec{W}\) that we found for AC2(a). Note how we are setting \(\vec{Z}\) for those subsets, rather than just observing it.

AC3 There is no \(\vec{A’} \subset \vec{A}\) such that \(\vec{A’} = \vec{a’}\) satisfies AC1 and AC2.

This says, there’s no superfluous stuff in \(\vec{A}\). You taking a painkiller and waving a magic wand doesn’t cause your headache to disappear, under AC3, if the painkiller works without the wand.

Example 1: an (actual) actual cause

Let’s give it a go with an overdetermined scenario (lightly edited from Halpern) that Alice and Bob both lob bricks at a glasshouse and smash the glass. Define

\(\mathit{AliceThrow} = 1\)
\(\mathit{BobThrow} = 1\)
\(\mathit{GlassBreaks} = \mathit{max}(\mathit{AliceThrow},\mathit{BobThrow})\)

So, if either Alice or Bob (or both) hit the glasshouse, then the glass breaks. Strictly speaking, I should have setup one or more exogenous variables, \(\vec{u}\), that define the context and then defined \(\mathit{AliceThrow}\) and \(\mathit{BobThrow}\) in terms of \(\vec{u}\), but it works fine to skip that step as I have here since I’m holding \(\vec{u}\) constant anyway.

Is \(\mathit{AliceThrow} = 1\) an actual cause of \(\mathit{GlassBreaks} = 1\)?

AC1 holds since \((M,\vec{u}) \models \mathit{AliceThrow} = 1 \land \mathit{GlassBreaks} = 1\). The first conjunct comes directly from one of the model equations and none of the functions change it. Spelling out the second conjunct,

\(\mathit{GlassBreaks} = \mathit{max}(\mathit{AliceThrow},\mathit{BobThrow})\)
\(= \mathit{max}(1, 1)\)
\(= 1\)

For AC2, we need to find a partition of the endogenous variables such that AC2(a) and AC2(b) hold. Try \(\vec{Z} = \{ \mathit{AliceThrow}, \mathit{GlassBreaks} \}\) and \(\vec{W}= \{ \mathit{BobThrow} \}\).

AC2(a) holds since \((M,\vec{u}) \models [ \mathit{AliceThrow} \leftarrow 0, \mathit{BobThrow} \leftarrow 0] \mathit{GlassBreaks} = 0\).

For AC2(b), we begin with \(\vec{Z} = \{ \mathit{AliceThrow}, \mathit{GlassBreaks} \}\) and the settings as per the unchanged model, so

\((M,\vec{u}) \models \mathit{AliceThrow} = 1 \land \mathit{GlassBreaks} = 1\).

We need to check that for all \(\vec{W’} \subseteq \vec{W}\) and \(\vec{Z’} \subseteq \vec{Z}-\vec{X}\),
\((M,\vec{u}) \models [ \vec{X} \leftarrow \vec{x}, \vec{W’} \leftarrow \vec{w’}, \vec{Z’} \leftarrow \vec{z^{\star}}] \varphi\).

Here are the combinations and \(\varphi \equiv \mathit{GlassBreaks} = 1\) holds for all of them:

\((M,\vec{u}) \models [ \mathit{AliceThrow} \leftarrow 1, \mathit{GlassBreaks} \leftarrow 1, \mathit{BobThrow} \leftarrow 0 ] \varphi\)
\((M,\vec{u}) \models [ \mathit{AliceThrow} \leftarrow 1, \mathit{BobThrow} \leftarrow 0 ] \varphi\)
\((M,\vec{u}) \models [ \mathit{AliceThrow} \leftarrow 1, \mathit{GlassBreaks} \leftarrow 1 ] \varphi\)
\((M,\vec{u}) \models [ \mathit{AliceThrow} \leftarrow 1 ] \varphi \)

(The third was rather trivially true; however, as far as I understand, has to be checked given the definition.)

AC3 is easy since the cause only has one variable, so there’s nothing superfluous.

Example 2: not an actual cause

Now let’s try an example that isn’t an actual cause: the glass breaking causes Alice to throw the brick. It’s obviously false; however, it wasn’t clear to me exactly where it would fail until I worked through this…

AC1 holds since in the actual world, \(\mathit{GlassBreaks} = 1\) and \(\mathit{AliceThrow} = 1\) hold.

Examining the function defintions, they don’t provide a way to link \(\mathit{AliceThrow}\) to a change in \(\mathit{GlassBreaks}\), so the only apparent way to do so is through \(\vec{W}\). Therefore, use the partition \(\vec{W} = \{\mathit{AliceThrow}\}\) and \(\vec{Z} = \{\mathit{GlassBreaks}, \mathit{BobThrow}\}\).

Now for AC2(a), we can easily get \(\mathit{AliceThrow} = 0\) as required, since we can do what we like with \(\vec{W}\). It doesn’t help when we move onto AC2(b) since we have to hold \(\mathit{AliceThrow} = 0\), which is the negation of what we want. The same is the case for the other partition including \(\mathit{AliceThrow}\) in \(\vec{W}\), i.e., \(\vec{W} = \{ \mathit{AliceThrow}, \mathit{BobThrow} \}\).

So, the broken glass does not cause Alice to throw a brick. The setup we needed to get through AC2(a) set us up to fail AC2(b).

References

Halpern, J. Y. (2015). A Modification of the Halpern-Pearl Definition of Causality. Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015), 3022–3033.

See also this companion blog post.

Distinguishing between = and <-

‘Perhaps because of the use of = for assignment in FORTRAN […] assignment is often read as “x equals E”. This causes great confusion. The first author learned to distinguish between = and := while giving a lecture in Marktoberdorf, Germany, in 1975. At one point, he wrote “:=” on the board but pronounced it “equals”. Immediately, the voice of Edsger W. Dijkstra boomed from the back of the room: “becomes!”. After a disconcerted pause, the first author said, “Thank you; if I make the same mistake again, please let me know.”, and went on. Once more during the lecture the mistake was made, followed by a booming “becomes” and a “Thank you”. The first author has never made that mistake again! The second author, having received his undergraduate education at Cornell, has never experienced this difficulty.’

– David Gries and Fred B. Schneider (1993, p. 17, footnote 6) [A logical approach to discrete math. Springer.]

R lets you use both = and <- for assignment, FYI.

Experiments to protect against harvest now, decrypt later

Quantum computing needs thousands of qubits to crack the public key encryption we currently rely on and at present the largest (publicly announced) quantum computer only has 433 qubits. The sales pitch for protecting against quantum attacks now is that baddies could be quietly harvesting encrypted data, so that when quantum computers are ready, off they go and decrypt.

There are two main approaches to protect against quantum attacks.

One approach, quantum encryption key distribution (QKD), can be implemented one qubit at a time, e.g., by sending photons along fibre optic cables. Each photon is a realisation of a qubit. HSBC is trialling a QKD approach developed by BT and Toshiba, which builds on BB84 from forty years ago. I’ve attempted to explain BB84 in a previous blog post.

Vodafone has opted for a non-quantum approach to defend against attacks on crypto by trialling new public key encryption algorithms, on classical computers, that are thought to be quantum-safe: post-quantum cryptography (PQC).

The next major revision of Google Chrome will include a hybrid approach combining X25519 , a standard method already used by Chrome and others browsers, with Kyber-768, a PQC method that has so far resisted attack.

Since PQC methods are new, the hybrid approach offers protection while the maths continues to be stress-tested. This is important since there has already been an embarrassingly quick crack of a supposedly quantum-safe approach.

A survey of impossibility results in computer science and neighbours (Brcic & Yampolskiy, in press)

Abstract:

An impossibility theorem demonstrates that a particular problem or set of problems cannot be solved as described in the claim. Such theorems put limits on what is possible to do concerning artificial intelligence, especially the super-intelligent one. As such, these results serve as guidelines, reminders, and warnings to AI safety, AI policy, and governance researchers. These might enable solutions to some long-standing questions in the form of formalizing theories in the framework of constraint satisfaction without committing to one option. We strongly believe this to be the most prudent approach to long-term AI safety initiatives. In this paper, we have categorized impossibility theorems applicable to AI into five mechanism-based categories: deduction, indistinguishability, induction, tradeoffs, and intractability. We found that certain theorems are too specific or have implicit assumptions that limit application. Also, we added new results (theorems) such as the unfairness of explainability, the first explainability-related result in the induction category. The remaining results deal with misalignment between the clones and put a limit to the self-awareness of agents. We concluded that deductive impossibilities deny 100%-guarantees for security. In the end, we give some ideas that hold potential in explainability, controllability, value alignment, ethics, and group decision-making.

References

Brcic, M., & Yampolskiy, R. V. (in press). Impossibility Results in AI: A Survey. ACM Computing Surveys.

Writing a song “is an act of self-murder”: Nick Cave on ChatGPT

The best part of Nick Cave’s critique of ChatGPT is, IMHO, the following:

“Songs arise out of suffering, by which I mean they are predicated upon the complex, internal human struggle of creation and, well, as far as I know, algorithms don’t feel.”

A close second:

“Writing a good song is not mimicry, or replication, or pastiche, it is the opposite. It is an act of self-murder that destroys all one has strived to produce in the past.”