👾 Tips for using Copilot based on the original Codex article

Copilot is an autocompletion code system developed by GitHub, which is based on the Codex model. The system is accessible through an API and is available as an extension for IDEs (Integrated Development Environments). As a frequent user of Copilot, I have found that it significantly saves time and streamlines the coding process by eliminating the need to search for code online. In this article, I will share some tips that I have gathered throughout my experience with Copilot, ranging from obvious to more nuanced.

Tldr: Copilot is beneficial for repetitive tasks and can serve as a substitute for searching through the documentation of popular libraries. However, it may not be effective in solving complex tasks or projects from scratch.

Article Structure:

  1. Installation and payment of Copilot
  2. How Codex works?
  3. Tips for using Copilot effectively
  4. Life hacks for using Copilot
  5. Limitations of Copilot and how to overcome them

Installation and payment of Copilot

To start using Copilot, you need to log in to the page using your GitHub account and connect a payment method. Please note that payment by Russian cards is not currently possible. If you don't have a foreign card, you can try using Pyypl, a virtual card that can be topped up with USDT. While I personally don't trust Pyypl, it can be a useful option for situations where you need to connect a card. I usually keep $5 on the card to register for trial versions of products.

If you encounter issues with payment and your account is blocked, you can contact the support service and provide them with details about an alternative card. I had to do this myself when I first got a foreign card.

Copilot offers a two-month trial period, after which you will be charged $10 per month. If you use Python in VSCode or Go in GoLand, you can easily install the IDE extension to start using Copilot.

The work of Codex

The article Evaluating Large Language Models Trained on Code

The technology behind Copilot is the Codex model, which is a fine-tuned version of GPT-3. Codex is designed to generate code for a program based on its docstring, which is a comment that describes a function. To achieve this, Codex was trained to predict the continuation of text as code, given a function name, its arguments, and its description. The model was trained on a dataset of open-source code from GitHub, with a focus on Python.

Since there are no standardized metrics for evaluating large language models like Codex, the creators of Copilot designed their own evaluation process. They took a set of Python tasks and generated K solutions with the model, then put the solutions through unit tests for the function. If one of the solutions passed all of the tests, it was scored. The test sample consisted of 160 tasks.

About hotkeys

  • To accept or decline an autocompletion, use Tab or Esc respectively.
  • To show the next or previous suggestion, use Alt or Option + ] and Alt or Option + [.
  • If you've disabled automatic suggestions, you can show them by pressing Alt or Option (⌥) + \
  • To open 10 suggestions in a window, use Ctrl + Enter.

I personally find Ctrl + Enter to be a very useful feature that I wasn't aware of for a long time. It can help you find the right solution more quickly in many cases.

Methods of Copilot application

  • Converting a comment to code: This involves writing a comment describing how a function works, and then generating its implementation using Copilot. You can start with small functions or utilities, and then use them to write the basic logic.

We will analyze at an example from a leetcode task Two sum.

For an array of integers nums and an integer number target find the indices of the two numbers so that their sum equals target.

Selection - Copilot-generated code

The initial suggestion provided by Copilot is a brute force solution with a time complexity of O(n^2). However, by specifying desired properties in the function name and comments, we can obtain faster solutions with a time complexity of O(n) for certain functions.

Copilot can be helpful in replacing the need to search through documentation for popular libraries or APIs. When working with unfamiliar libraries or not remembering method names, Copilot can suggest methods for you. However, it's important to double-check the suggested code to ensure accuracy. One way to do this is to inspect the docstring of the method, which can be accessed after the autocompletion of the method.

Repetitive code generation. Copilot can help generate multiple similar lines of code, such as defining data classes or encoding column names into constants

Writing tests. Let's generate unit tests for the functions, which we wrote above.

Translating code from one language to another can be challenging, but Copilot can help make it easier. As someone who writes code in both Python and Go, I sometimes need to rewrite data preprocessing or inference of simple models from Python to Go. Copilot speeds up this process by allowing me to write the code in Python in the comments, and then generating a Go draft that I can refine.

Since Go doesn't have some of the features built into Python, such as calculating the union or intersection between sets, Copilot helps me rewrite the code in the new language faster.

Tips for using Copilot

  1. Break down your task into 1-2 semantic actions and generate functions with Copilot.
  2. Use readable function names, descriptions, and examples without bugs.
  3. Test and inspect the generated code.
  4. Use the function name for modifiers. For instance, if you want a faster version of a function, add "fast_" to the function name and indicate in the comments that this version is faster than the original. Similarly, you can add annotations for the types of arguments and results by adding "typed_" to the function name.
  5. For complicated functions, write out the steps in the comments using step-by-step reasoning. This helps break down the task and makes it easier to follow along.
  6. Copilot works better in a mono repository as it has access to a larger context of the code. According to the Codex article on the OpenAI website, the context for Python is 14KB or 14,000 tokens, which is helpful for generating better code.

The limitations of Codex and Copilot and how to solve them

The authors write that Codex does not know how to chain actions. If you need to take more than one action, each additional action reduces the efficiency of the code by 2-3 times. By "action" we mean a term like "remove all instances of the letter e from the string" or "convert the string s to lowercase".

Errors also occur when you use different variables and operations on them. For example: "Add 3 to y, then subtract 4 from both x and w. Return the product of the four numbers."

Common mistakes when using Copilot include:

  • Missing variables
  • Using methods and variables that are not defined or are out of scope
  • Using syntactically incorrect code.

To avoid these mistakes, use the recommendations mentioned earlier in this document.

There is also a phenomenon called alignment failure. Codex's task is to predict the next token based on the previous ones, taking into account the distribution statistics in the training data. So, if there is a small error in the context (such as an incorrect example), Codex may generate code that is incorrect but fits the context as closely as possible, even though it could generate the correct solution. In other words, Codex does not correct errors, but rather incorporates them. ChatGPT may have the same issue.

The alternatives of Copilot

  • I used to try the free version of TabNine, but it got in the way more than it helped.
  • If you don't want outside resources to have access to your code, you can have Fauxpilot, the Codex equivalent, at your site. True, it has GPT-J, which is noticeably weaker.
  • Instead of generating code, you can use search how the old-fashioned way. For example, there is a phind which searches for code snippets.
  • I also use ChatGPT for some cases, usually something like write a big request to Clickhouse. There are different homemade addons that allow to embed ChatGPT into vscode, but I haven't used them, if you have similar experience, post in the comments.

Conclusions

  1. Copilot accelerates development, especially if you know the basic principles of using Codex.
  2. Break the task into atomic actions and generate code to perform them from comments.
  3. Copilot helps you spend less time on repetitive actions and documentation searches.
  4. Copilot code still needs to be double-checked.