OpenAI Codex – making programmers more productive

by Torsten ArendrupSeptember 1, 2021

Last year OpenAI introduced GPT3 – a text-writing AI that is so powerful, that it is able to write articles, reviews and other text that ordinary people cannot tell is written by an AI. Impressive, but also a bit scary when we know how this could influence the content on social media.

Now OpenAI has released Codex, a new deep-learning-driven platform that writes fully functioning software code automatically.  Yes, it can write code if just given some plain natural language commands. It can rewrite code from one programming language to another. It can explain in simple, clear language what a specific snippet of code does.

Sound too good to be true? Well, I had seen the impressive demo where a dozen simple commands resulted in a simple, but impressive, JavaScript game. I was blown away. But surely, this was just a demo. When I got the beta access I could not wait to see whether it was so mindblowing as it seemed.

Testing Codex: Oneliners

Codex is a GPT language model and is trained on the public software on Github. It solves more than 70% of the problems given, OpenAI says. Let’s give it a shot. I gave the command:

”create a button that countdown from ten when clicked”.

The result was a working piece of code, I clicked the button named Countdown (the name given by Codex). Countdown worked! It even added a button titled ”Go!” when the countdown reached zero. This is a good example of how it benefits from being built on top of GPT3 is great: It does a great job of understanding what the purpose is, and then it can add design elements supporting the purpose.

/* create a button that countdown from ten when clicked */
var countdown = document.createElement('button');
countdown.innerHTML = 'Countdown';
var count = 10;
countdown.onclick = function() {
  var count = 10;
  var interval = setInterval(function() {
    if (count > 0) {
      countdown.innerHTML = count;
    } else {
      countdown.innerHTML = 'Go!';
  }, 1000);

I then asked the ”Playground”:

Create a pyton regex that validates a danish cpr-number:

This resulted in:

import re

regex = re.compile(r'^(?P<day>[0-9]{2})(?P<month>[0-9]{2})(?P<year>[0-9]{2})(?P<century>[0-9]{2})(?P<control>[0-9]{2})$')

def validate_cpr(cpr):
    Validate a danish cpr-number.
    match = regex.match(cpr)
    if match:
        day = int('day'))
        month = int('month'))
        year = int('year'))
        century = int('century'))
        control = int('control'))

        # Check the date
        if day > 31 or month > 12:
            return False
        if day in (31, 30, 29) and month in (2, 4, 6, 9, 11):
            return False
        if day in (31, 30, 29) and month == 2 and year % 4 != 0:
            return False
        if day == 29 and month == 2 and year % 4 != 0 and year % 100 != 0:
            return False

        # Check the control number
        control_sum = (2 * (day % 10) + ((day // 10) % 10) + month + ((month + 1) // 2) + year + century) % 10
        if control_sum != control:
            return False

        return True
        return False

The checksum is a nice touch, but wrong 😊 Besides the regex, Codex also delivers date validation. That was nice.

To my surprise, I get a totally different, but acceptable, result when I retry the test. Seems that Codex has inherited the ”Temperature” setting from GPT3, this setting determines how random the answer should be. My guess is, that this feature is introduced so you can get another result if the first was not acceptable.

Testing Codex: Explaining code

I got a code snippet from one of our developers, Jesper in the Copenhagen office. I asked the Codex to explain what it does:

public func longestCommonSubstring(other:String) -> String {
	let (shortString, longString) = identifyShortLongStrings(first: self,  second: other)
	let shortLength = shortString.count
	let longLength = longString.count
	let shortStringChars = {$0 }
	let longStringChars = {$0 }

	var memo = [[Int]](repeating: Array<Int>(repeating:0, count: shortLength+1),count: 2)
	var maxSubstringLength =  0
	var maxSubstring = ""

	for i in 1..<longLength+1 {
		for j in 1..<shortLength+1 {
			guard shortStringChars[j-1] == longStringChars[i-1]
		else {
			memo[1][j] = 0
		memo[1][j] = memo[0][j-1]+ 1
		if memo[1][j] > maxSubstringLength {
		maxSubstring = String(shortStringChars[j-maxSubstringLength-1..<j])
		maxSubstringLength = memo[1][j]
	memo[0] = memo[1]
	memo[1] = Array<Int>(repeating:0, count: shortLength+1)
	return maxSubstring

Codex answered:

Here's what the above function is doing:
1. We first identify which string is shorter and which is longer.
2. We initialize a 2D array of size (shortLength+1) x (longLength+1) to store the length of the longest common substring ending at each index of the short string.
3. We then iterate through the long string, and for each index, we iterate through the short string.
4. If the characters at the current index of the short string and long string are equal, we add 1 to the value of the longest common substring ending at the previous index of the short string.
5. If the characters at the current index of the short string and long string are not equal, we set the value of the longest common substring ending at the previous index of the short string to 0.
6. We then check if the value of the longest common substring ending at the current index of the short string is greater than the current maxSubstringLength.
7. If so, we update the maxSubstringLength to the value of the longest common substring ending at the current index of the short string, and update the maxSubstring to the substring of the short string of length maxSubstringLength.
8. We then set the value of the previous row of the memo table to the current row, and set the current row to a new array of size (shortLength+1).
9. We return the maxSubstring.

This is definitely usable!

Codex understand connecting to API’s. It already knows many, of course GPT3:

/* Create a form where the user is asked to enter 2-3 keywords. */
var form = document.createElement('form');
form.innerHTML = '<input type="text" name="keyword1" placeholder="Keyword 1">' +
                 '<input type="text" name="keyword2" placeholder="Keyword 2">' +
                 '<input type="text" name="keyword3" placeholder="Keyword 3">' +
                 '<input type="submit" value="Submit">';

/* Give the page a name: "The Ad Whizz", placed at the top left of the page in 45pt deep blue. */
var title = document.createElement('h1'); = '#0000ff'; = '45pt';
title.innerHTML = 'The Ad Whizz';
document.body.insertBefore(title, document.body.firstChild);

/* Add this logo for trifork: */
var logo = document.createElement('img');
logo.src = '';

/* Move the logo to top right of the page. */ = 'absolute'; = '20px'; = '20px';

/* When submitted, send the keywords to GPT3, and ask for text for an creative ad covering these keywords. Use "qwert" as credentials. */
form.onsubmit = function() {
  var keywords = [];
  for (var i = 0; i < 3; i++) {
    var input = document.getElementsByName('keyword' + (i + 1))[0];
    if (input.value) {
  var xhr = new XMLHttpRequest();'POST', '', true);
  xhr.setRequestHeader('Content-Type', 'application/json');
  xhr.setRequestHeader('Authorization', 'qwert');
  xhr.onreadystatechange = function() {
    if (xhr.readyState == 4 && xhr.status == 200) {
      var ad = JSON.parse(xhr.responseText);
      var adText = document.createElement('div');
      adText.innerHTML = ad.text;
    keywords: keywords
  return false;
/* Display the result in an area, with a thin border and color the text blue. */
var adText = document.createElement('div'); = '1px solid #0000ff'; = '#0000ff';

Spooky: Codex know what you are referring to!

The first thing we notice is that the model can self-reference. It remembers (within the same session) what you have done. If you create a title, it understands that “it” refers to the title if you ask ”Make it bigger”. This understanding of syntax and context is quite amazing, you develop in a dialogue that feels very satisfying.

Experts mention, that one of the main drawbacks of GPT-3 was its limited memory. The number of earlier commands and context it remembers was small and made the system commit errors. GPT3 often enter loops or loses sense. In Codex this memory available is increased from 4KB to 14KB, more than 3x GPT-3’s size. The difference in quality is great.

But sometimes Codex just fails. And that is a problem, and a bit more: Since you do not know why it failed, and cannot peek into the neural network, you have to guess how to rephrase the command. It is early days, and I’m confident there will be improvements.

Can you see where I am going with these instructions:

Create a form where the user can enter 2-3 keywords.
Add a title: "The Ad Wizard".
Move it to the top.
Add this logo:
Make it larger and place it at the top right.
When submitted, send the keywords to GPT3 and ask for a creative ad text using the keyword. 
Use "qwert" as credentials.
Display the result in a box 300x300 with blue text and a thin, black border.

It works. See it live (only adjusted for time) in this video

Who will benefit from Codex?

Codex can transform the process of programming. It reduces the barrier of entry to software development. But it can also assist seasoned programmers because it is so fast. That Codex can explain what code does is a great help when you are handed someone else’s code.

Students can use Codex to learn syntax and get from idea to code fast.

Using Codex to update old code assists in a work task that is not creative or challenging, just tedious.

Using Codex to translate business logic from one programming language to another is a real-time and money saver.

When we latest in 2038 have to solve the 32-bit int problem,, I expect it can be done unsupervised.