Skip to content

Question on Entity Extraction from Raw Outputs of Agent #20

@enochii

Description

@enochii

Hi, thanks for your excellent work! I encountered an issue described below.

LocAgent uses a function called get_module_from_line_number to extract function entities (ultimately added to found_entities in location/loc_outputs.jsonl), as shown below.

def get_module_from_line_number(line, file_path, searcher):
    assert file_path in searcher.G.nodes
    file_node = searcher.get_node_data([file_path])[0]
    print(f'file_path:line -> {file_path}:{line}')
    print(f'file_node -> {file_node}')
    cur_start_line = file_node['start_line']
    cur_end_line = file_node['end_line']
    cur_node = None
    
    for nid in searcher.G.nodes():
        # if not nid.startswith(file_path) or ':' not in nid:
        #     continue
        node = searcher.G.nodes[nid]
        if node['type'] != NODE_TYPE_FUNCTION: continue
        
        # to do: strict matching
        # if file_node['node_id'] not in nid: continue
        
        if 'start_line' in node and 'end_line' in node:
            if node['start_line'] < cur_start_line or node['end_line'] > cur_end_line:
                continue
            if line >= node['start_line'] and line <= node['end_line']:
                cur_node = node
                cur_node['name'] = nid
                cur_start_line = node['start_line']
                cur_end_line = node['end_line']
    if cur_node:
        print(f'cur_node -> {cur_node}')
        return (cur_node, cur_end_line)
    return (None, None)

From my understanding, it should extract the function at Line line of file file_path. If this is the case, the implementation is not entirely correct. Note that as found_entities is used for computing evaluation metrics, the calculated results may be inaccurate.

Using the print statement added by me (see the above code), I got some output in the figure below, where the file path of the found entity (cur_node) does not match the file_path.

Image

Thanks for your time. If my analysis is correct, I am happy to submit a short PR (by adding a commented line in the above code) to address this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions