Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • https://github.com/features/copilot - Continue passive investigation as part of my 60-day free trial while working on normal sprint tasks. Seems most promising for enumerating test casesuseful as a sophisticated auto-complete to reduce copy-pasting.

  • https://wwwopenai.diffblue.com/products/ - Might be valuable product/gpt-4 (ChatGPT) - Good at describing copy-pasted code. Might be useful for unit testing simple functions, but inadequate for anything complex. Almost certainly not worth the paid tierInability to follow linked code limits its usefulness.

Investigation blocked

  • https://codeball.ai/ - Ran into issues setting up Codeball. Devs have not yet responded to requests for support.

  • https://www.codiga.io/ - Requires org-level GitHub permissions to install, even if I’m only installing on repositories that I have admin access over. Currently waiting on IT to respond to install request.

  • https://aws.amazon.com/codeguru/ - CodeGuru attempts to connect to the wrong fork of our repo. Blocked waiting for AWS Support to respond.

Not worth further investigation

  • https://platformwww.openaidiffblue.com/docs/guidesproducts/code - Inadequate. Maybe try - ChatGPT does a better job. However, DiffBlue has the advantage in that it runs as an IDE plugin. Might be worth looking into again in 2024 or later when AI research has improved.

    • Update: GPT-4 was released recently and might be worth looking into.

...

  • 2025 as technology improves.

Code Generation

OpenAI

OpenAI is best known for their product ChatGPT. However, they have models that can be used for code generation. See https://platform.openai.com/docs/guides/code . Initial investigations into OpenAI were underwhelming. The better of the two models (code-davinci-002) was frequently under load and unavailable, so I did my investigations with the lesser of the two models (code-cushman-001, described as “almost as capable as code-davinci-002, but slightly faster”).

...

Code Block
languagejava
// Unit test with mocks.
@Test
public void testGetUploadViewForExporter3() {
    String appId = "appId";
    String studyId = "studyId";
    String uploadId = "uploadId";
    String healthCode = "healthCode";
    String userId = "userId";

    Upload upload = new Upload();
    upload.setId(uploadId);
    upload.setAppId(appId);
    upload.setHealthCode(healthCode);
    upload.setMetadata(new ObjectNode(JsonNodeFactory.instance));

    HealthDataRecordEx3 record = new HealthDataRecordEx3();
    record.setId(uploadId);
    record.setAppId(appId);
    record.setHealthCode(healthCode);
    record.setMetadata(new HashMap<>());

    UploadViewEx3 view = new UploadViewEx3();
    view.setId(uploadId);
    view.setHealthCode(healthCode);
    view.setRecord(record);
    view.setUpload(upload);
    view.setUserId(userId);

    UploadViewEx3 view2 = new UploadViewEx3();
    view2.setId(uploadId);
    view2.setHealthCode(healthCode);
    view2.setRecord(record);
    view2.setUpload(upload);
    view2.setUserId(userId);

    UploadViewEx3 view3 = new UploadViewEx3();
// repeat making UploadViewEx3's until we ran out of tokens

Conclusion: OpenAI isn’t ready for code generation, at least not as isready for code generation, at least not as is.

GPT-4 / ChatGPT

GPT-4 is currently only available as ChatGPT through https://chat.openai.com/chat

First, I copy-pasted the method at https://github.com/DwayneJengSage/BridgeServer2/blob/develop/src/main/java/org/sagebionetworks/bridge/services/UploadService.java#L522. ChatGPT responded with

This method is a part of some application or service that is responsible for managing uploads of health data records, which are related to a study. The method fetches the view that includes both the upload and the record (if they exist) for a given upload ID, and optionally gets the timeline metadata and the adherence records, if they exist.

The method takes as input parameters the appId and uploadId, which are required, and the studyId, which is only required if we are fetching adherence. It also takes two boolean flags, fetchTimeline and fetchAdherence, which are used to determine whether to include the timeline metadata and the adherence records in the view, respectively.

The method first checks that the appId and uploadId are not null, and that the studyId is not blank if fetchAdherence is true. It then gets the upload and the record for the given uploadId, and checks that they are not null and that they belong to the same appId. If neither the upload nor the record exist, it throws an EntityNotFoundException.

The method then gets the healthCode from the upload or the record, and uses it to get the userId. It checks that the caller has permission to read the uploads for the given userId and studyId.

If fetchTimeline or fetchAdherence is true, the method gets the instanceGuid from the upload or the record metadata, and uses it to fetch the timeline metadata and/or the adherence records, if they exist. If the timeline metadata is associated with a different appId, the method logs an error and moves on. If the adherence records are associated with a different appId, it throws an EntityNotFoundException.

Finally, the method returns the UploadViewEx3 object that contains the upload, the record, the timeline metadata (if fetched), the adherence records (if fetched), the healthCode, and the userId.

This is an accurate summary of the code.

I then prompted ChatGPT: “Write unit tests for that method.” ChatGPT responded with

Code Block
languagejava
public class UploadViewEx3Test {

    private UploadViewEx3 uploadView;
    private String appId = "test-app-id";
    private String studyId = "test-study-id";
    private String uploadId = "test-upload-id";
    private boolean fetchTimeline = true;
    private boolean fetchAdherence = true;
    private UploadDao uploadDao;
    private HealthDataEx3Service healthDataEx3Service;
    private AccountService accountService;
    private Schedule2Service schedule2Service;
    private AdherenceService adherenceService;

    @Before
    public void setUp() {
        uploadDao = mock(UploadDao.class);
        healthDataEx3Service = mock(HealthDataEx3Service.class);
        accountService = mock(AccountService.class);
        schedule2Service = mock(Schedule2Service.class);
        adherenceService = mock(AdherenceService.class);

        uploadView = new UploadViewEx3();
        uploadView.setId(uploadId);
    }

    @Test(expected = NullPointerException.class)
    public void testGetUploadViewForExporter3WithNullAppId() {
        UploadViewEx3 uploadViewEx3 = new UploadViewEx3();
        uploadViewEx3.getUploadViewForExporter3(null, studyId, uploadId, fetchTimeline, fetchAdherence);
    }

    @Test(expected = NullPointerException.class)
    public void testGetUploadViewForExporter3WithNullUploadId() {
        UploadViewEx3 uploadViewEx3 = new UploadViewEx3();
        uploadViewEx3.getUploadViewForExporter3(appId, studyId, null, fetchTimeline, fetchAdherence);
    }

    @Test(expected = BadRequestException.class)
    public void testGetUploadViewForExporter3WithFetchAdherenceTrueAndBlankStudyId() {
        UploadViewEx3 uploadViewEx3 = new UploadViewEx3();
        uploadViewEx3.getUploadViewForExporter3(appId, "", uploadId, fetchTimeline, true);
    }

    @Test(expected = EntityNotFoundException.class)
    public void testGetUploadViewForExporter3WithNoUploadOrRecord() {
        when(uploadDao.getUploadNoThrow(uploadId)).thenReturn(null);
        when(healthDataEx3Service.getRecord(uploadId, false)).thenReturn(Optional.empty());

        uploadView.getUploadViewForExporter3(appId, studyId, uploadId, fetchTimeline, fetchAdherence);
    }

    @Test(expected = EntityNotFoundException.class)
    public void testGetUploadViewForExporter3WithUploadFromDifferentApp() {
        Upload upload = mock(Upload.class);
        when(upload.getAppId()).thenReturn("different-app-id");
        when(uploadDao.getUploadNoThrow(uploadId)).thenReturn(upload);

        uploadView.getUploadViewForExporter3(appId, studyId, uploadId, fetchTimeline, fetchAdherence);
    }

    @Test(expected = EntityNotFoundException.class)
    public void testGetUploadViewForExporter3WithRecordFromDifferentApp() {
        HealthDataRecordEx3 record = mock(HealthDataRecordEx3.class);
        when(record.getAppId()).thenReturn("different-app-id");
        when(healthDataEx3Service.getRecord(uploadId, false)).thenReturn(Optional.of(record));

        uploadView.getUploadViewForExporter3(appId, studyId, uploadId, fetchTimeline, fetchAdherence);
    }

    @Test
    public void testGetUploadViewForExporter3WithUploadOnly() {
        Upload upload = mock(Upload.class);
        when(upload.getAppId()).thenReturn(appId);
        when(upload.getHealthCode()).thenReturn("test-health-code");
        when(uploadDao.getUploadNoThrow(uploadId)).thenReturn(upload);

Note that the test is incomplete. I prompted ChatGPT with “Please continue.” ChatGPT wrote one more unit test

Code Block
languagejava
@Test
public void testGetUploadViewForExporter3_RecordExistsButNoUpload() {
    String appId = "test-app";
    String studyId = "test-study";
    String uploadId = "test-upload";
    String healthCode = "test-health-code";
    String userId = "test-user-id";

    HealthDataRecordEx3 record = new DynamoHealthDataRecordEx3();
    record.setId(uploadId);
    record.setAppId(appId);
    record.setHealthCode(healthCode);

    when(uploadDao.getUploadNoThrow(uploadId)).thenReturn(null);
    when(healthDataEx3Service.getRecord(uploadId, false)).thenReturn(Optional.of(record));
    when(accountService.getAccountId(appId, "healthcode:" + healthCode)).thenReturn(Optional.of(userId));
    doNothing().when(CAN_READ_UPLOADS).checkAndThrow(studyId, userId);

    UploadViewEx3 result = service.getUploadViewForExporter3(appId, studyId, uploadId, false, false);

    assertNotNull(result);
    assertEquals(uploadId, result.getId());
    assertEquals(healthCode, result.getHealthCode());
    assertNull(result.getUpload());
    assertSame(record, result.getRecord());
    assertEquals(userId, result.getUserId());
    assertNull(result.getTimelineMetadata());
    assertNull(result.getAdherenceRecords());

    verify(uploadDao).getUploadNoThrow(uploadId);
    verify(healthDataEx3Service).getRecord(uploadId, false);
    verify(accountService).getAccountId(appId, "healthcode:" + healthCode);
    verify(CAN_READ_UPLOADS).checkAndThrow(studyId, userId);
}

Then added some descriptive text

In this test, we simulate the case where a record exists but an upload does not. We create a HealthDataRecordEx3 object with the same ID as the upload ID and set its appId and healthCode properties. We then mock the UploadDao and HealthDataEx3Service to return null for the upload and the created record for the record. We also mock the AccountService to return the userId for the given appId and healthCode. We then invoke the method with fetchTimeline and fetchAdherence set to false, and assert that the returned UploadViewEx3 object has the expected properties. Finally, we verify that the appropriate methods were called on the mocked objects.

These two tests should cover the majority of the possible scenarios in the getUploadViewForExporter3 method. However, depending on the implementation details of the various services that the method depends on, there may be other edge cases that should be tested as well.

Note that attempts to get ChatGPT to regenerate testGetUploadViewForExporter3WithUploadOnly() caused ChatGPT to get confused and I was unable to get what ChatGPT originally intended (if it intended anything at all) with that test.

The tests that ChatGPT did write needed quite a bit of massaging. However, the tests do compile and pass and are readable and logical. However, the tests only cover 51% of line coverage and 38% of branch coverage (even worse than DiffBlue), and misses some important test cases.

I also tried the prompt: “Enumerate test cases for method getUploadViewForExporter3 in class UploadService at https://github.com/DwayneJengSage/BridgeServer2/blob/develop/src/main/java/org/sagebionetworks/bridge/services/UploadService.java”. ChatGPT replied with

Here are some test cases for the getUploadViewForExporter3 method in the UploadService class:

1. Test case where the studyId parameter is null. Expect an exception to be thrown.

2. Test case where the studyId parameter is an empty string. Expect an exception to be thrown.

3. Test case where the healthCode parameter is null. Expect an exception to be thrown.

4. Test case where the healthCode parameter is an empty string. Expect an exception to be thrown.

5. Test case where the synapseUserId parameter is null. Expect an exception to be thrown.

6. Test case where the synapseUserId parameter is an empty string. Expect an exception to be thrown.

7. Test case where the numFiles parameter is negative. Expect an exception to be thrown.

8. Test case where the numBytes parameter is negative. Expect an exception to be thrown.

9. Test case where the numRecords parameter is negative. Expect an exception to be thrown.

10. Test case where the uploadId parameter is null. Expect an exception to be thrown.

11. Test case where the uploadId parameter is an empty string. Expect an exception to be thrown.

12. Test case where the uploadId parameter is not a valid UUID. Expect an exception to be thrown.

13. Test case where the numFiles, numBytes, and numRecords parameters are all greater than 0, and the uploadId parameter is a valid UUID. Expect an UploadView object to be returned with all the fields populated appropriately.

* testGetUploadViewForExporter3WithUploadOnlyAndNoData: test the case where the healthCode parameter is not null, studyId is null, startDate and endDate are null, and the exporter3 parameter is UploadValidationStrictness.UPLOAD_ONLY. This test should ensure that no records are returned in the response.

* testGetUploadViewForExporter3WithStrictValidationAndNoData: test the case where the healthCode parameter is not null, studyId is null, startDate and endDate are null, and the exporter3 parameter is UploadValidationStrictness.STRICT_VALIDATION. This test should ensure that no records are returned in the response.

* testGetUploadViewForExporter3WithUploadOnlyAndRecords: test the case where the healthCode parameter is not null, studyId is null, startDateand endDate are null, and the exporter3 parameter is UploadValidationStrictness.UPLOAD_ONLY. This test should ensure that records are returned in the response.

* testGetUploadViewForExporter3WithStrictValidationAndRecords: test the case where the healthCode parameter is not null, studyId is null, startDate and endDate are null, and the exporter3 parameter is UploadValidationStrictness.STRICT_VALIDATION. This test should ensure that records are returned in the response.

* testGetUploadViewForExporter3WithStudyAndNoData: test the case where the healthCode parameter is null, studyId is not null, startDate and endDate are null, and the exporter3 parameter is UploadValidationStrictness.UPLOAD_ONLY. This test should ensure that no records are returned in the response.

* testGetUploadViewForExporter3WithStudyAndRecords: test the case where the healthCode parameter is null, studyId is not null, startDate and endDate are null, and the exporter3 parameter is UploadValidationStrictness.UPLOAD_ONLY. This test should ensure that records are returned in the response.

* testGetUploadViewForExporter3WithDatesAndNoData: test the case where the healthCode parameter is null, studyId is null, startDate and endDate are not null, and the exporter3 parameter is UploadValidationStrictness.UPLOAD_ONLY. This test should ensure that no records are returned in the response.

* testGetUploadViewForExporter3WithDatesAndRecords: test the case where the healthCode parameter is null, studyId is null, startDate and endDate are not null, and the exporter3 parameter is UploadValidationStrictness.UPLOAD_ONLY. This test should ensure that records are returned in the response.

Note that the output refers to parameters not present in the linked code, which makes this list of test cases not particularly useful.

Next, I tried the following prompt: “Summarize the pull request https://github.com/Sage-Bionetworks/BridgeServer2/pull/649 in a page of text, with the intended audience of other software developers.” ChatGPT responded with

This pull request adds functionality to the BridgeServer2 project that allows a user to upload large files in chunks to the server. The goal of this feature is to reduce the risk of uploading a large file in a single request, which could cause issues with network timeouts, file size limitations, or server resource constraints.

The changes made in this pull request include modifications to the UploadService class, as well as the addition of a ChunkedFileHelper class that handles the chunking and reassembly of files. The UploadService now has a new method called "uploadChunk" that accepts a chunk of a file, and a separate method called "completeChunkedFileUpload" that reassembles the chunks into a single file once all the chunks have been uploaded.

Several new unit tests were added to test this new functionality, including tests to ensure that files are correctly split into chunks, that the correct error messages are returned when necessary, and that the reassembly of chunks produces the expected result.

Overall, this pull request adds a valuable feature to BridgeServer2 that improves the reliability and robustness of file uploads, especially for larger files.

Which is completely wrong. Additional prompts asking for the title and description of the linked pull request confirm that ChatGPT is unable to follow links.

Conclusion:

  • Better at generating unit tests than DiffBlue. Might be good for generating tests for simple code, but fails at anything complex.

  • Good at describing a copy-pasted piece of code. Unable to follow links (to either code or pull requests) and describing that code, let alone enumerate or write unit tests. This limits ChatGPT from being able to work with large codebases.

GitHub Copilot

GitHub Copilot was similar to OpenAI in that both ended up writing code that was repetitive and didn’t compile and needed a lot of editing to get it to compile and run. The one advantage Copilot has over OpenAI is that Copilot can run as an IDE plugin.

...

Conclusion: Copilot is worth continuing to passively investigate while working on normal sprint work. It seems okay for enumerating test cases, but it’s incomplete, and I wouldn’t yet trust it to generate code for me.

Update: Copilot is good as a more sophisticated auto-complete that can auto-complete entire blocks of code for you, which cuts down on repeated copy-pasting.

DiffBlue Cover

See https://www.diffblue.com/products/ DiffBlue Cover is a tool to analyze code and generate unit tests. It exists as an IntelliJ plugin.

...

Also, the tests are almost unreadable. I wouldn’t accept this level of quality from a paid intern. I’m not even sure I would accept this level of quality for free.

Conclusion: ChatGPT does a better job. However, DiffBlue is worth investigating for simple code chunks, but is a poor fit for anything more complexmore convenient to use as an IDE plugin. Might be worth investigating again in a year or two as technology improves.

Code Reviews

AI seems to have trouble generating code, but it might be able to help with code reviews.

...