Author Topic: GPT is better at generic code reviews.  (Read 4142 times)

0 Members and 1 Guest are viewing this topic.

Offline paulcaTopic starter

  • Super Contributor
  • ***
  • Posts: 4133
  • Country: gb
GPT is better at generic code reviews.
« on: February 16, 2024, 05:58:39 pm »
Previously I determined from interacting with ChatGPT...
Asking chatgpt to generic boiler plate = win.
Asking chatgpt to solve logic issues = fail.
Trying to interactively and iteratively develop code with it's help = fail.

Tonight I tried it with just dumping code at it.

I gave it some pretty nasty code.  Without being asked it just described what the code did, about 90% accurately, it made multiple "generic" suggestions off the bat, which amounts to things I probably should have done, but didn't that could lead to problems.

However it almost hit on the reason I was asking for it's help, so I specificied the help I wanted.

I said the code is duplication, copy pasta hell and I'd like to make it more concise, robust and readible.

It pretty much rewrote the sample I sent it.  It took me 10 minutes to find that it was about 90% correct.  The 10% is just things it could not know or see or detemine.

I gave it something a bit more tricky.  I gave it a prototype High availability MQTT client implementation.  It gave me a generic code review, which touched on an issue I was actually looking for help with.  The "timer management" for hearbeats and timeouts.  Python seemed to force me to cancel and then recreate the timers and then start them.  That just sounds like a big memory churn issue to me and leaving myself open to concurrency issues in the allocation of new timers as old ones are deleted.  It was able to detect I didn't need to "recreate" the object each time, if I encapsulated my timer as subclass of the timer class, I could implement a reset method that had access to the internals to reset the timer with a new interval without recreating the timer.

That said when I pushed it on the bigger purpose of the client and concurrency control to make a service a singleton on an asynchronous bus... it just spat out generic textbook answers. 

It is still VERY useful.  If it was a quick last check to dump your PR into GPT and have it "mummy" you with the generic stuff, it would still, IMHO improve code quality, espcecially for juniors.

The concern, of course, if that many and an increasing number are now forbidding the use of public AI with private code.  For very good reason.  Chat GPT now has a fair portion of my code, however they may or may not record that.  It's find it's Apache license and public anyway.  In this case.
"What could possibly go wrong?"
Current Open Projects:  STM32F411RE+ESP32+TFT for home IoT (NoT) projects.  Child's advent xmas countdown toy.  Digital audio routing board.
 

Offline golden_labels

  • Super Contributor
  • ***
  • Posts: 1287
  • Country: pl
Re: GPT is better at generic code reviews.
« Reply #1 on: February 17, 2024, 08:26:09 am »
One more time the algorithm behaves as expected. And this is one of the valid uses of these solutions. At least as long as a human, who can provide critical evaluation, is reviewing the outputs. In other words: a good way to quickly spot common mistakes without wasting time on reading the entire code manually.

You are asking the algorithm to generate text, which humans are likely to say seeing the subject code. So it should be able to “notice” patterns any junior would spot and then provide the relevant text. LLMs have been proven to be effective to give hints in such cases. But two things must not be forgotten:
  • These are hints. It’s generated content that should often match human output. But it must from time to time generate wrong results, including absolute garbage like “variable `covfefe` is Donald Clinton in the loo == 5i”. How much convincing they may look, these are not produced by a thinking entity.
  • Accuracy is related to how often a given pattern was found in the leraning set, along with the relevant explanation. LLMs may support the initial review and eliminate most obvious errors, but their capabilities end on what they seen.(1) Serious mistakes, in particular with complex logic, may easily slip through with this kind of a review.
Summing it up: save time on trivial stuff, but then donate this time to searching for more serious ones.

Though I’m pretty sceptical about the numbers given. To start with, what is the measure of accuracy? What does 90% mean in this particular context? Is it a subjective feeling? If yes: misleading use of percentages. Is it a proportion estimator? If yes, what is the sample size, what was the test pass/fail methodology? I’m pretty picky about this: A.I. research is flooded with accuracy numbers, which are either baseless or represent not what the reader might’ve assumed it do.


(1) With their “memories” not neccessarily mapping to anything readily recognizable to a human brain.
People imagine AI as T1000. What we got so far is glorified T9.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf