Questions about Abstract Syntax Tree (AST) -


i finding difficulties understand

1) ast matching, how 2 ast's similar? types included in comparison/matching or operations +, -, ++,...etc inlcuded?

2) 2 statements syntactically similar (this term read somewhere in paper), can below example 2 statement syntactically similar?

int x = 1 + 2 string y = "1" + "2" 

java - eclipse using right , trying understand ast for.

best regards,

what asts are:

an ast data structure representing program source text consists of nodes contain node type, , possibly literal value, , list of children nodes. node type corresponds op calls "operations" (+, -, ...) includes language commands (do, if, assignment, call, ...) , declarations (int, struct, ...) , literals (number, string, boolean). [it unclear op means "type"]. (asts have additional information in each node referring point of origin in source text file).

what asts for:

op seems puzzled presence of asts in eclipse.

asts used represent program text in form easier interpret raw text. provide means reason program structure or content; used enable modification of program ("refactoring") modifying ast program , regenerating text ast.

comparing asts similarity not common use in experience, except in clone detection and/or pattern matching.

comparing asts:

comparing asts equality easy: compare root node type/literal value equality; if not equal, comparision complete, else (recursively) compare children nodes).

comparing asts of similarity harder because must decide how relax equality comparision. in particular, must decide on precise definition of similarity. there many ways define this, rather shallow syntactically, more semantically sophisticated.

my paper clone detection using abstract syntax trees describes 1 way this, using similarity defined ratio of number of nodes shared divided number of nodes total in both trees. shared nodes computed comparing trees top down point child different. (the actual comparision compute anti-unifier). similary measure rather shallow, works remarkably in finding code clones in big software systems.

from perspective, ops's examples:

     int x = 1 + 2      string y = "1" + "2" 

have trees written s-expressions:

     (declaration_with_assignment (int x) (+ 1 2))      (declaration_with_assignment (string y) (+ "1" "2")) 

these not similar; share root node type "declaration-with-assignment" , top of + subtree. (here node count 12 2 matching nodes similarity of 2/12).

these more similar:

     int x = 1 + 2      float x = 1.0 + 2 

(s-expressions)

     (declaration_with_assignment (int x) (+ 1 2))      (declaration_with_assignment (float x) (+ 1.0 2)) 

which share declaration assignment, add node, literal leaf node 2, , arguably literal nodes integer 1 , float 1.0, depending on whether wish define them "equal" or not, similarity of 4/12.

if change 1 of trees pattern tree, in "leaves" pattern variables, can use such pattern trees find code has structure.

the surface syntax pattern:

  ?type ?variable = 1 + ?expression 

with s-expression

  ((declaration_with_assignment (?type ?varaible)) (+ 1 ?expression)) 

matches first of op's examples not second.

as far know, eclipse doesn't offer pattern-based matching abilities. these useful in program analysis and/or program transformation tools. specific examples, long include here, see dms rewrite rules

(full disclosure: dms product of company. i'm architect).


Comments

Popular posts from this blog

asynchronous - C# WinSCP .NET assembly: How to upload multiple files asynchronously -

aws api gateway - SerializationException in posting new Records via Dynamodb Proxy Service in API -

asp.net - Problems sending emails from forum -