Merrick Christensen's Avatar
I have been impressed with the urgency of doing. Knowing is not enough; we must apply. Being willing is not enough; we must do.Leonardo Davinci

Stateful Semantic Diffing

2016-12-23

Update June 7, 2018 - Hall of Shame

This article is a Hall of Shamer™ for offering little to no value from the moment it was written.

I'm trying to infer programmer intention by observing semantic changes as they edit files. My first attempt was to parse the syntax trees to determine the changes made by the programmer. I first used a generic diff implementation but quickly realized I would need somethign more semantically aware in able to infer any serious meaning about the changes intended by the programmer. I started reading about change detection algorithms which meant looking up a lot of mathematical symbols I'm not accustomed to.

As seems to always be the case with "great ideas that nobody has done" I've run into a lot of unforseen issues. For example:

const name = "Merrick";
console.log(name);

Say a programmer changes this variable name:

const me = "Merrick";
console.log(name);

The code assistant should note the change as occurs, character by character. As we diff the two trees we might see the following events:

The cursor jumps down to the end of name, and hits backspace:

{ node_type: 'Identifier', type: 'change', name: 'nam' }

And another:

{ node_type: 'Identifier', type: 'change', name: 'na' }

And another:

{ node_type: 'Identifier', type: 'change', name: 'n' }

And one more:

{ node_type: 'Identifier', type: 'change', name: '' }

But wait, we can't have an empty identifier, that won't parse... So, we need to wait till we are parseable again, the developer types "m"

{ node_type: 'Identifier', type: 'change', name: 'm' }

And one last event:

{ node_type: 'Identifier', type: 'change', name: 'me' }

Ok, now the code assistant should suggest that you update the use of name found in console.log, this poses a really challenging issue. Connecting "me" to "name". The variable was "name" a long time ago, so how do we know to suggest name to me at this point? Do we need to persist the scope some place? That we can adjust references as we receive changes? So name references are updated to nam, na, n, (parse failure), m, me. And after even identifier change events we suggest updating? How do we know "name" is the good state for reference? How do we avoid pointing variables inbetween name to me, meaning if there were also a variable "nam" how do we avoid accidentally pointing nam to me? I suppose by checking if there is a corresponding VariableDeclarator for nam I could avoid destructive suggestions.

Conclusion for the day: My mind is tired. I anticipated this would be extremely difficult but in my initial excitment I definitely believed it would be easier than this. AI would be a long term goal, I'm just trying to solve the problem of determining programmer intention using stateful change observation at this point. I've been battling a lot shame and self-confidence issues as I've faced friction. Hard not to feel stupid, or that I should have gone to school. I can't help but feel inadequate as it takes me 2 hours to read 6 pages to try and comprehend it and look up mathematical symbols on wikipedia.

Resources Used