Undermind - AI-Powered Scientific Research Assistant

Research topic

I want to find recent papers on using generative AI for creating agent-like pipelines, especially in the context of coding assistants that can manage an entire codebase, including the architectural design, training methodologies, real-world applications, and performance metrics.

Report

References

Last 5 years

Last 2 years

> 1 citation per year

> 5 citations per year

Topic Match

Cit./Year

Year

Paper

Paper Relevance Summary

98.1%

0.0

2024

[1] AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct} Bin Lei, ..., and Qiuwu Chen Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 98.1% topic match

What the paper does: Introduces AutoCoder, surpassing GPT-4 models in code evaluation. Expanded detail:

What the paper does: Introduces AutoCoder, surpassing GPT-4 models in code evaluation. Expanded detail:

97.9%

24.4

2023

[2] CodePlan: Repository-level Coding using LLMs and Planning Ramakrishna Bairi, ..., and Shashank Shet ArXiv 2023 - 27 citations - Show abstract - Cite - PDF 97.9% topic match

Provides a framework for repository-level coding using LLMs and planning. CodePlan creates multi-step plans integrating LLM calls for complex codebase changes. Focuses on dependency and change analysis, enhancing LLM's utility in large codebases.

Provides a framework for repository-level coding using LLMs and planning. CodePlan creates multi-step plans integrating LLM calls for complex codebase changes. Focuses on dependency and change analysis, enhancing LLM's utility in large codebases.

97.2%

8.8

2024

[3] CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges Kechi Zhang, ..., and Zhi Jin ArXiv 2024 - 7 citations - Show abstract - Cite - PDF 97.2% topic match

Presents CodeAgent, an LLM-based framework for repo-level code generation. Introduces CodeAgentBench for benchmarking LLMs on complex repository tasks, showing improved performance using integrated programming tools. Evaluates performance with concrete metrics and real-world scenarios like HumanEval, relevant to the desired topic.

Presents CodeAgent, an LLM-based framework for repo-level code generation. Introduces CodeAgentBench for benchmarking LLMs on complex repository tasks, showing improved performance using integrated programming tools. Evaluates performance with concrete metrics and real-world scenarios like HumanEval, relevant to the desired topic.

97.1%

0.0

2024

[4] MapCoder: Multi-Agent Code Generation for Competitive Problem Solving Md. Ashraful Islam, ..., and Md. Rizwan Parvez Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 97.1% topic match

Introduces a multi-agent framework for code generation. Uses four LLM agents to emulate human-like program synthesis, achieving state-of-the-art results on multiple benchmarks. Focuses on competitive problem-solving, with performance metrics including HumanEval and MBPP, demonstrating real-world application relevance.

Introduces a multi-agent framework for code generation. Uses four LLM agents to emulate human-like program synthesis, achieving state-of-the-art results on multiple benchmarks. Focuses on competitive problem-solving, with performance metrics including HumanEval and MBPP, demonstrating real-world application relevance.

97.0%

0.0

2024

[5] AutoDev: Automated AI-Driven Development Michele Tufano, ..., and Neel Sundaresan ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 97.0% topic match

Proposes an AI-driven framework for automated software development. Details autonomous AI agents managing entire codebases, including builds, testing, and git operations. Mentions use of contextual information, secure environments, and execution of tasks autonomously, covering several specified criteria.

Proposes an AI-driven framework for automated software development. Details autonomous AI agents managing entire codebases, including builds, testing, and git operations. Mentions use of contextual information, secure environments, and execution of tasks autonomously, covering several specified criteria.

96.9%

9.5

2024

[6] CodePori: Large Scale Model for Autonomous Software Development by Using Multi-Agents Zeeshan Rasheed, ..., and P. Abrahamsson ArXiv 2024 - 7 citations - Show abstract - Cite - PDF 96.9% topic match

Introduces CodePori for autonomous software development using LLM-based multi-agents. CodePori performs tasks like system design, code development, review, verification, and testing autonomously. Evaluates performance using benchmarks like HumanEval and MBPP, demonstrating improved code generation for large projects.

Introduces CodePori for autonomous software development using LLM-based multi-agents. CodePori performs tasks like system design, code development, review, verification, and testing autonomously. Evaluates performance using benchmarks like HumanEval and MBPP, demonstrating improved code generation for large projects.

95.9%

0.0

2023

[7] AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation Dong Huang, ..., and Heming Cui Journal Not Provided 2023 - 0 citations - Show abstract - Cite - PDF 95.9% topic match

Introduces AgentCoder, a multi-agent framework for code generation. Utilizes specialized agents for code generation, test case design, and execution. Demonstrates enhanced performance via iterative testing and optimization, surpassing traditional models and single-agent systems.

Introduces AgentCoder, a multi-agent framework for code generation. Utilizes specialized agents for code generation, test case design, and execution. Demonstrates enhanced performance via iterative testing and optimization, surpassing traditional models and single-agent systems.

95.7%

0.0

2024

[8] AICoderEval: Improving AI Domain Code Generation of Large Language Models Yinghui Xia, ..., and Jinsong Yang Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 95.7% topic match

Provides an evaluation framework for AI-driven code generation. Enhances and tests LLMs on real-world tasks using the AICoderEval dataset. Introduces CoderGen, an agent-based framework, aligning with the topic's focus on generative AI for agent-like pipelines.

Provides an evaluation framework for AI-driven code generation. Enhances and tests LLMs on real-world tasks using the AICoderEval dataset. Introduces CoderGen, an agent-based framework, aligning with the topic's focus on generative AI for agent-like pipelines.

95.1%

3.4

2024

[9] MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution Wei Tao, ..., and Yu-Xi Cheng ArXiv 2024 - 2 citations - Show abstract - Cite - PDF 95.1% topic match

Proposes an LLM-based multi-agent framework for GitHub issue resolution. Features different agents like Manager, Repository Custodian, Developer, and QA Engineer collaborating to manage software evolution. Uses empirical studies and experimental comparisons with GPT-3.5, GPT-4, and Claude-2, showing significant improvements in issue resolution.

Proposes an LLM-based multi-agent framework for GitHub issue resolution. Features different agents like Manager, Repository Custodian, Developer, and QA Engineer collaborating to manage software evolution. Uses empirical studies and experimental comparisons with GPT-3.5, GPT-4, and Claude-2, showing significant improvements in issue resolution.

95.0%

12.6

2023

[10] AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation Dong Huang, ..., and Heming Cui ArXiv 2023 - 23 citations - Show abstract - Cite 95.0% topic match

Introduces a multi-agent framework for iterative code generation and optimization. Features specialized agents for programming, test design, and test execution to collaboratively refine codes based on feedback. Evaluates performance using benchmarks like HumanEval, relevant for assessing real-world applicability and performance metrics.

Introduces a multi-agent framework for iterative code generation and optimization. Features specialized agents for programming, test design, and test execution to collaboratively refine codes based on feedback. Evaluates performance using benchmarks like HumanEval, relevant for assessing real-world applicability and performance metrics.

94.7%

3.4

2024

[11] CodeS: Natural Language to Code Repository via Multi-Layer Sketch Daoguang Zan, ..., and Lizhen Cui ArXiv 2024 - 2 citations - Show abstract - Cite - PDF 94.7% topic match

Shows generative AI's potential for creating entire code repositories. Introduces NL2Repo to generate fully functional code repositories from natural language requirements using the CodeS framework. Proposes a multi-layer sketch approach, closely aligning with the architectural design and task decomposition aspects specified.

Shows generative AI's potential for creating entire code repositories. Introduces NL2Repo to generate fully functional code repositories from natural language requirements using the CodeS framework. Proposes a multi-layer sketch approach, closely aligning with the architectural design and task decomposition aspects specified.

94.2%

3.3

2024

[12] When LLM-based Code Generation Meets the Software Development Process Feng Lin, ..., and Tse-Husn Chen ArXiv 2024 - 2 citations - Show abstract - Cite - PDF 94.2% topic match

Introduces an LLM-based code generation framework, LCG. LCG uses multiple LLM agents emulating software process roles with models like LCGScrum, showing notable benchmark improvements. Evaluates performance on code generation using benchmarks such as HumanEval, showcasing an average 15% improvement over GPT.

Introduces an LLM-based code generation framework, LCG. LCG uses multiple LLM agents emulating software process roles with models like LCGScrum, showing notable benchmark improvements. Evaluates performance on code generation using benchmarks such as HumanEval, showcasing an average 15% improvement over GPT.

94.1%

71.4

2023

[13] Self-collaboration Code Generation via ChatGPT Yihong Dong, ..., and Ge Li ArXiv 2023 - 110 citations - Show abstract - Cite - PDF 94.1% topic match

Proposes a self-collaboration framework using LLMs for code generation. Multiple LLM agents act as specialized experts to collaboratively generate code without human intervention. Employs analysis, coding, and testing stages, demonstrating significant performance improvements in code generation benchmarks.

Proposes a self-collaboration framework using LLMs for code generation. Multiple LLM agents act as specialized experts to collaboratively generate code without human intervention. Employs analysis, coding, and testing stages, demonstrating significant performance improvements in code generation benchmarks.

93.4%

0.0

2023

[14] Human like programming using SPADE BDI agents and the GPT-3-based Transformer Alain Josué Ratovondrahona, ..., and Victor Manantsoa Human Interaction and Emerging Technologies (IHIET-AI 2023): Artificial Intelligence and Future Applications 2023 - 0 citations - Show abstract - Cite Abstract: Programming an application requires multiple people with skills and experience in that field. It will also take a lot of time with multiple steps before achieving the final result of an application. Today, developers are assisted by various tools, software, or applications based on Artificial Intelligence (AI) such as OpenAI's ChatGPT. These AI that automatically generates source code helps developers to develop applications much faster. However, although code generators are numerous and very helpful, we are not yet at the stage where we can generate a fully functional application, but just generate pieces of source code. And we don’t know yet how to understand textual descriptions of Software Requirements to generate an application directly. Or where to find data to train an AI capable of generating a functional application from textual descriptions. Therefore, we created a new architecture composed of virtual intelligent agents called SPADE BDI to create virtual developers. The virtual intelligent agents were responsible for keyword extraction, Software Requirements synthesis, and source file creation. Then we used a transformer based on pre-trained GPT-3 for source code generation. This transformer is orchestrated by a virtual intelligent agent. To solve the problem of training data, we collected and created a new dataset called WSBL. The data came from several projects developed with the Laravel Framework over 4 years. The result allowed us to have a functional application directly from a textual description. Each intelligent virtual agent played a role like a developer by analyzing textual of Software Requirements and then generating source code. With a 15% reduction in time to develop an application compared to brute development. Our new architecture allows for processing textual descriptions (Software Requirements) step by step using intelligent virtual agents named SPADE BDI and source code generation is done by a transformer based on pre-trained GPT-3 to have a directly functional application 93.4% topic match

Provides an architecture for creating virtual developers using SPADE BDI agents and GPT-3. Combines keyword extraction, requirements synthesis, and source code generation orchestrated by intelligent agents. Focuses on leveraging GPT-3 for code generation with an agent framework, relevant to managing codebases and architectural design.

Provides an architecture for creating virtual developers using SPADE BDI agents and GPT-3. Combines keyword extraction, requirements synthesis, and source code generation orchestrated by intelligent agents. Focuses on leveraging GPT-3 for code generation with an agent framework, relevant to managing codebases and architectural design.

92.2%

0.0

2023

[15] AI Pair Programming Tool Prof. Anand Magar International Journal for Research in Applied Science and Engineering Technology 2023 - 0 citations - Show abstract - Cite 92.2% topic match

Provides integration of generative AI with IntelliJ IDEA for coding assistance. Enhances code suggestions, auto-completions, and error detection using OpenAI’s deep learning models. Discusses architecture, training methodologies, and empirical performance, focusing on context-aware recommendations in real-world scenarios.

Provides integration of generative AI with IntelliJ IDEA for coding assistance. Enhances code suggestions, auto-completions, and error detection using OpenAI’s deep learning models. Discusses architecture, training methodologies, and empirical performance, focusing on context-aware recommendations in real-world scenarios.

92.1%

23.4

2024

[16] OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement Tianyu Zheng, ..., and Xiang Yue ArXiv 2024 - 16 citations - Show abstract - Cite - PDF 92.1% topic match

Introduces OpenCodeInterpreter for code generation, execution, and refinement. Integrates execution capabilities and dynamic code refinement with human feedback; evaluated using benchmarks like HumanEval and MBPP. Highlights significant performance metrics and insight into practical applications in real-world coding tasks.

Introduces OpenCodeInterpreter for code generation, execution, and refinement. Integrates execution capabilities and dynamic code refinement with human feedback; evaluated using benchmarks like HumanEval and MBPP. Highlights significant performance metrics and insight into practical applications in real-world coding tasks.

90.1%

46.4

2023

[17] Multi-Agent Collaboration: Harnessing the Power of Intelligent LLM Agents Yashar Talebirad and Amirhossein Nadiri ArXiv 2023 - 65 citations - Show abstract - Cite - PDF 90.1% topic match

Shows a novel multi-agent framework for enhancing LLM capabilities. Demonstrates application through case studies, including software development scenarios. Examines systems like Auto-GPT, addresses collaboration, scalability, and ethical considerations.

Shows a novel multi-agent framework for enhancing LLM capabilities. Demonstrates application through case studies, including software development scenarios. Examines systems like Auto-GPT, addresses collaboration, scalability, and ethical considerations.

88.1%

1.6

2024

[18] DevBench: A Comprehensive Benchmark for Software Development Bowen Li, ..., and Kai Chen ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 88.1% topic match

Provides a comprehensive benchmark for evaluating LLMs in software development lifecycle tasks. Covers stages like design, setup, implementation, and testing, assessing models like GPT-4-Turbo. Highlights current LLMs' shortcomings in understanding complex code structures and managing compilation, offering insights for future developments.

Provides a comprehensive benchmark for evaluating LLMs in software development lifecycle tasks. Covers stages like design, setup, implementation, and testing, assessing models like GPT-4-Turbo. Highlights current LLMs' shortcomings in understanding complex code structures and managing compilation, offering insights for future developments.

87.6%

5.2

2024

[19] EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories Jia Li, ..., and Zhi Jin ArXiv 2024 - 3 citations - Show abstract - Cite - PDF 87.6% topic match

Provides a new code generation benchmark (EvoCodeBench). Aligns with real-world code repositories; includes robust evaluation metrics like Pass@k and Recall@k. Evaluates 10 popular LLMs for repository-level code generation using an automatically updating pipeline.

Provides a new code generation benchmark (EvoCodeBench). Aligns with real-world code repositories; includes robust evaluation metrics like Pass@k and Recall@k. Evaluates 10 popular LLMs for repository-level code generation using an automatically updating pipeline.

86.6%

None

[20] Early Results Pekka Abrahamsson, ..., and Manu Set¨al¨a Journal Not Provided None - 25 citations - Show abstract - Cite 86.6% topic match

Shows use of ChatGPT in a real-world software engineering project. Demonstrates ChatGPT's potential to implement complete systems and divide coding tasks efficiently in a financial software platform. Highlights improvements in ChatGPT-4, focusing on coherence and task management, relevant to real-world coding assistant applications.

Shows use of ChatGPT in a real-world software engineering project. Demonstrates ChatGPT's potential to implement complete systems and divide coding tasks efficiently in a financial software platform. Highlights improvements in ChatGPT-4, focusing on coherence and task management, relevant to real-world coding assistant applications.

86.3%

10.7

2023

[21] SelfEvolve: A Code Evolution Framework via Large Language Models Shuyang Jiang, ..., and Yu Wang ArXiv 2023 - 15 citations - Show abstract - Cite - PDF 86.3% topic match

Develops a two-step pipeline using large language models (LLMs). Leverages LLMs as knowledge providers and self-reflective programmers. Discusses enhancing code generation using both retrieval-based methods and self-improvement strategies.

Develops a two-step pipeline using large language models (LLMs). Leverages LLMs as knowledge providers and self-reflective programmers. Discusses enhancing code generation using both retrieval-based methods and self-improvement strategies.

84.7%

0.0

2024

[22] Experimenting a New Programming Practice with LLMs Simiao Zhang, ..., and G. Pu ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 84.7% topic match

Develops AISD, an AI-aided software development prototype. Automatic generation of use cases, system designs, and implementations from vague user requirements. Emphasizes user feedback and testing; lacks detailed architectural design and performance metrics discussions.

Develops AISD, an AI-aided software development prototype. Automatic generation of use cases, system designs, and implementations from vague user requirements. Emphasizes user feedback and testing; lacks detailed architectural design and performance metrics discussions.

84.7%

1.8

2023

[23] Software Engineering Using Autonomous Agents: Are We There Yet? Samdyuti Suri, ..., and Vikrant S. Kaulgud 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2023 - 2 citations - Show abstract - Cite 84.7% topic match

Explores autonomous agents in software engineering with LLMs. Discusses capabilities, challenges, and opportunities using context-rich prompts for accurate task comprehension. Utilizes Auto-GPT based on GPT-3.5 and GPT-4 for task execution, relevant to agent-like pipelines.

Explores autonomous agents in software engineering with LLMs. Discusses capabilities, challenges, and opportunities using context-rich prompts for accurate task comprehension. Utilizes Auto-GPT based on GPT-3.5 and GPT-4 for task execution, relevant to agent-like pipelines.

83.7%

2.5

2024

[24] Teaching Code LLMs to Use Autocompletion Tools in Repository-Level Code Generation Chong Wang, ..., and Xin Peng ArXiv 2024 - 2 citations - Show abstract - Cite - PDF 83.7% topic match

Introduces ToolGen for repository-level code generation using code LLMs. Leverages autocompletion tools to resolve dependencies in repository-level code. Evaluates effectiveness through a benchmark of 680 real-world code repositories, focusing on dependency errors.

Introduces ToolGen for repository-level code generation using code LLMs. Leverages autocompletion tools to resolve dependencies in repository-level code. Evaluates effectiveness through a benchmark of 680 real-world code repositories, focusing on dependency errors.

82.8%

55.0

2023

[25] SWE-bench: Can Language Models Resolve Real-World GitHub Issues? Carlos E. Jimenez, ..., and Karthik Narasimhan ArXiv 2023 - 58 citations - Show abstract - Cite - PDF 82.8% topic match

Shows: Evaluates language models on their ability to resolve real-world GitHub issues. Details: Models need to edit codebases by coordinating changes across multiple elements and files. Relevance: Discusses limitations in current generative models, indicating challenges in managing entire codebases. Limited by performance on complex tasks.

Shows: Evaluates language models on their ability to resolve real-world GitHub issues. Details: Models need to edit codebases by coordinating changes across multiple elements and files. Relevance: Discusses limitations in current generative models, indicating challenges in managing entire codebases. Limited by performance on complex tasks.

82.5%

3.3

2023

[26] Autonomous Agents in Software Development: A Vision Paper Zeeshan Rasheed, ..., and P. Abrahamsson ArXiv 2023 - 3 citations - Show abstract - Cite - PDF 82.5% topic match

Explores the potential of GPT agents in Software Engineering. Proposes leveraging multiple GPT agents for tasks like project planning, requirements engineering, and software design. Initial experiments with simple software show high-quality code generation and documentation, indicating promising efficiency and scalability.

Explores the potential of GPT agents in Software Engineering. Proposes leveraging multiple GPT agents for tasks like project planning, requirements engineering, and software design. Initial experiments with simple software show high-quality code generation and documentation, indicating promising efficiency and scalability.

81.4%

21.3

2023

[27] Refining ChatGPT-Generated Code: Characterizing and Mitigating Code Quality Issues Yue Liu, ..., and David Lo ArXiv 2023 - 27 citations - Show abstract - Cite - PDF Abstract: Since its introduction in November 2022, ChatGPT has rapidly gained popularity due to its remarkable ability in language understanding and human-like responses. ChatGPT, based on GPT-3.5 architecture, has shown great promise for revolutionizing various research fields, including code generation. However, the reliability and quality of code generated by ChatGPT remain unexplored, raising concerns about potential risks associated with the widespread use of ChatGPT-driven code generation. In this paper, we systematically study the quality of 4,066 ChatGPT-generated code implemented in two popular programming languages, i.e., Java and Python, for 2,033 programming tasks. The goal of this work is three folds. First, we analyze the correctness of ChatGPT on code generation tasks and uncover the factors that influence its effectiveness, including task difficulty, programming language, time that tasks are introduced, and program size. Second, we identify and characterize potential issues with the quality of ChatGPT-generated code. Last, we provide insights into how these issues can be mitigated. Experiments highlight that out of 4,066 programs generated by ChatGPT, 2,756 programs are deemed correct, 1,082 programs provide wrong outputs, and 177 programs contain compilation or runtime errors. Additionally, we further analyze other characteristics of the generated code through static analysis tools, such as code style and maintainability, and find that 1,930 ChatGPT-generated code snippets suffer from maintainability issues. Subsequently, we investigate ChatGPT’s self-repairing ability and its interaction with static analysis tools to fix the errors uncovered in the previous step. Experiments suggest that ChatGPT can partially address these challenges, improving code quality by more than 20%, but there are still limitations and opportunities for improvement. Overall, our study provides valuable insights into the current limitations of ChatGPT and offers a roadmap for future research and development efforts to enhance the code generation capabilities of AI models like ChatGPT. 81.4% topic match

Provides a systematic study on the quality of ChatGPT-generated code. Analyzes correctness, factors influencing effectiveness, and potential issues in ChatGPT’s code generation. Focuses on code quality issues, not on managing entire codebases or agent-like pipelines.

Provides a systematic study on the quality of ChatGPT-generated code. Analyzes correctness, factors influencing effectiveness, and potential issues in ChatGPT’s code generation. Focuses on code quality issues, not on managing entire codebases or agent-like pipelines.

78.0%

0.0

2023

[28] Towards Trustworthy AI Software Development Assistance Daniel Maninger, ..., and Mira Mezini https://doi.org/10.1145/3639476.3639770 2023 - 0 citations - Show abstract - Cite - PDF 78.0% topic match

Provides a holistic architecture for trustworthy AI software development assistants Focuses on constructing, training, and using LLMs fine-tuned on code quality and correctness criteria Integrates graph-based code representations and a knowledge graph, but does not detail complete agent-like pipeline management

Provides a holistic architecture for trustworthy AI software development assistants Focuses on constructing, training, and using LLMs fine-tuned on code quality and correctness criteria Integrates graph-based code representations and a knowledge graph, but does not detail complete agent-like pipeline management

78.0%

1.0

2023

[29] End-to-End Software Construction using ChatGPT: An Experience Report Mauricio Monteiro, ..., and Marco Túlio Valente ArXiv 2023 - 1 citations - Show abstract - Cite - PDF 78.0% topic match

The paper explores using ChatGPT for end-to-end software construction. It documents prompt categories (initialization, feature requests, bug-fixing, layout) and compares outputs with manually implemented apps. Discusses top-down and bottom-up prompt techniques, indicating practical application insights and limitations.

The paper explores using ChatGPT for end-to-end software construction. It documents prompt categories (initialization, feature requests, bug-fixing, layout) and compares outputs with manually implemented apps. Discusses top-down and bottom-up prompt techniques, indicating practical application insights and limitations.

77.5%

3.0

2024

[30] Language Agents as Optimizable Graphs Mingchen Zhuge, ..., and Jürgen Schmidhuber ArXiv 2024 - 2 citations - Show abstract - Cite - PDF 77.5% topic match

Describes: Using LLM agents as optimizable computational graphs. Details: Focuses on refining LLM prompts and agent orchestration within graph structures. Relevance: Emphasizes developing and optimizing LLM-based agents, implying potential applications in managing codebases through agent-like pipelines.

Describes: Using LLM agents as optimizable computational graphs. Details: Focuses on refining LLM prompts and agent orchestration within graph structures. Relevance: Emphasizes developing and optimizing LLM-based agents, implying potential applications in managing codebases through agent-like pipelines.

77.1%

9.5

2023

[31] RepoFusion: Training Code Models to Understand Your Repository Disha Shrivastava, ..., and Torsten Scholak ArXiv 2023 - 13 citations - Show abstract - Cite - PDF 77.1% topic match

Proposes RepoFusion framework for enhanced code completions. Trains models to incorporate repository context for accurate completions. Focuses on context within the repository, not the entire codebase nor agent-like pipelines.

Proposes RepoFusion framework for enhanced code completions. Trains models to incorporate repository context for accurate completions. Focuses on context within the repository, not the entire codebase nor agent-like pipelines.

77.0%

52.7

2022

[32] A Conversational Paradigm for Program Synthesis Erik Nijkamp, ..., and Caiming Xiong ArXiv 2022 - 149 citations - Show abstract - Cite 77.0% topic match

Proposes a conversational program synthesis approach via large language models. Focuses on generating code through a multi-turn conversation to capture user intent. Trains models on natural and programming languages, introduces new benchmark for multi-step synthesis, relevant for coding assistants.

Proposes a conversational program synthesis approach via large language models. Focuses on generating code through a multi-turn conversation to capture user intent. Trains models on natural and programming languages, introduces new benchmark for multi-step synthesis, relevant for coding assistants.

74.3%

0.0

2024

[33] DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories Jia Li, ..., and Yongbin Li Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 74.3% topic match

Proposes a benchmark named DevEval for evaluating LLMs on real-world code. Aligns with real-world code repositories and includes comprehensive annotations and samples from multiple domains. Evaluates coding abilities of popular LLMs, revealing performance metrics relevant to generative AI coding assistants.

Proposes a benchmark named DevEval for evaluating LLMs on real-world code. Aligns with real-world code repositories and includes comprehensive annotations and samples from multiple domains. Evaluates coding abilities of popular LLMs, revealing performance metrics relevant to generative AI coding assistants.

74.0%

39.8

2023

[34] InferFix: End-to-End Program Repair with LLMs Ma Jin, ..., and Alexey Svyatkovskiy Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering 2023 - 65 citations - Show abstract - Cite - PDF 74.0% topic match

Introduces a program repair framework using transformer-based LLMs. Combines static analysis with LLMs for fixing security and performance bugs. Focuses on specific bug-fixing rather than holistic codebase management, lacking architectural design, training methodology, and real-world application analysis.

Introduces a program repair framework using transformer-based LLMs. Combines static analysis with LLMs for fixing security and performance bugs. Focuses on specific bug-fixing rather than holistic codebase management, lacking architectural design, training methodology, and real-world application analysis.

73.8%

1.9

2022

[35] A Practical Three-phase Approach To Fully Automated Programming Using System Decomposition And Coding Copilots Haoli Bai Proceedings of the 2022 5th International Conference on Machine Learning and Machine Intelligence 2022 - 4 citations - Show abstract - Cite 73.8% topic match

Proposes a neuro-symbolic approach using VLS models for automated programming. Divides the coding task into hierarchical tasks, function completion, and handling corner cases, fully automating the process with language models. Includes empirical insights, prompt templates, and practical guidance; relevant for AI-driven coding assistants but lacks detailed architecture and training methodologies specifics.

Proposes a neuro-symbolic approach using VLS models for automated programming. Divides the coding task into hierarchical tasks, function completion, and handling corner cases, fully automating the process with language models. Includes empirical insights, prompt templates, and practical guidance; relevant for AI-driven coding assistants but lacks detailed architecture and training methodologies specifics.

73.4%

5.0

2024

[36] DevEval: Evaluating Code Generation in Practical Software Projects Jia Li, ..., and Mengfei Yang ArXiv 2024 - 4 citations - Show abstract - Cite - PDF 73.4% topic match

Proposes a new benchmark for evaluating code generation. DevEval aligns with practical software projects and assesses LLMs on realistic tasks. Reveals LLM performance on real-world tasks but doesn't cover full agent-like pipeline management.

Proposes a new benchmark for evaluating code generation. DevEval aligns with practical software projects and assesses LLMs on realistic tasks. Reveals LLM performance on real-world tasks but doesn't cover full agent-like pipeline management.

72.4%

0.0

2024

[37] Towards Autonomous Tool Utilization in Language Models: A Unified, Efficient and Scalable Framework Zhi Li, ..., and Yin Zhang International Conference on Language Resources and Evaluation 2024 - 0 citations - Show abstract - Cite - PDF 72.4% topic match

Provides a unified framework for autonomous tool utilization in language models. Unifies sequential decision-making for tool engagement, fine-tuned on diverse APIs with efficient data annotation. Includes dynamic rehearsal for continual learning, enhancing scalability and surpassing GPT-3.5/4 in multiple metrics.

Provides a unified framework for autonomous tool utilization in language models. Unifies sequential decision-making for tool engagement, fine-tuned on diverse APIs with efficient data annotation. Includes dynamic rehearsal for continual learning, enhancing scalability and surpassing GPT-3.5/4 in multiple metrics.

71.6%

185.4

2023

[38] MetaGPT: Meta Programming for Multi-Agent Collaborative Framework Sirui Hong, ..., and Chenglin Wu ArXiv 2023 - 231 citations - Show abstract - Cite - PDF 71.6% topic match

Shows potential of MetaGPT in multi-agent collaborative frameworks. Generates coherent and correct coding solutions, better than chat-based systems. Public GitHub repository is available for reproducibility and further exploration.

Shows potential of MetaGPT in multi-agent collaborative frameworks. Generates coherent and correct coding solutions, better than chat-based systems. Public GitHub repository is available for reproducibility and further exploration.

70.2%

4.8

2023

[39] GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension Bohan Lyu, ..., and Maosong Sun ArXiv 2023 - 4 citations - Show abstract - Cite - PDF 70.2% topic match

Introduces GitAgent for autonomous tool extension using GitHub repositories. GitAgent integrates repositories into its toolset autonomously, leveraging GitHub Issues/PRs to solve problems, achieving a 69.4% success rate. Relevant to coding assistants managing codebases; limited focus on training methodologies and performance metrics for entire agent-like pipelines.

Introduces GitAgent for autonomous tool extension using GitHub repositories. GitAgent integrates repositories into its toolset autonomously, leveraging GitHub Issues/PRs to solve problems, achieving a 69.4% success rate. Relevant to coding assistants managing codebases; limited focus on training methodologies and performance metrics for entire agent-like pipelines.

68.9%

12.6

2023

[40] xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval Mohammad Abdullah Matin Khan, ..., and Shafiq R. Joty ArXiv 2023 - 23 citations - Show abstract - Cite 68.9% topic match

Provides a benchmark for evaluating code understanding, generation, translation, and retrieval. Focuses on pre-trained language models and their performance on various code-related tasks across multiple languages. Lacks specifics on generative AI for managing entire codebases and detailed architectural/training methodologies, but useful for performance metrics and real-world applications evaluation.

Provides a benchmark for evaluating code understanding, generation, translation, and retrieval. Focuses on pre-trained language models and their performance on various code-related tasks across multiple languages. Lacks specifics on generative AI for managing entire codebases and detailed architectural/training methodologies, but useful for performance metrics and real-world applications evaluation.

68.8%

6.1

2023

[41] From Copilot to Pilot: Towards AI Supported Software Development Rohith Pudari and Neil A. Ernst ArXiv 2023 - 10 citations - Show abstract - Cite - PDF 68.8% topic match

Describes the evolution of AI-supported software engineering. Discusses moving from code completion (Copilot) to more holistic software engineering tasks. Mentions avoiding code smells and following language idioms, but lacks details on training methodologies or performance metrics.

Describes the evolution of AI-supported software engineering. Discusses moving from code completion (Copilot) to more holistic software engineering tasks. Mentions avoiding code smells and following language idioms, but lacks details on training methodologies or performance metrics.

67.6%

31.4

2024

[42] If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents Ke Yang, ..., and Chengxiang Zhai ArXiv 2024 - 26 citations - Show abstract - Cite - PDF 67.6% topic match

Describes how code integration enhances LLMs' capabilities. Outlines how code helps LLMs in reasoning, structure, and execution. Highlights the potential of LLMs to function as intelligent agents for complex tasks.

Describes how code integration enhances LLMs' capabilities. Outlines how code helps LLMs in reasoning, structure, and execution. Highlights the potential of LLMs to function as intelligent agents for complex tasks.

67.3%

298.4

2021

[43] CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation Yue Wang, ..., and S. Hoi ArXiv 2021 - 942 citations - Show abstract - Cite - PDF 67.3% topic match

Introduces CodeT5 for code understanding and generation tasks. Uses a unified encoder-decoder model with identifier-aware pre-training and multi-task learning. Focuses more on code semantics rather than full pipeline management, lacking extensive discussion on architectural design and real-world applications.

Introduces CodeT5 for code understanding and generation tasks. Uses a unified encoder-decoder model with identifier-aware pre-training and multi-task learning. Focuses more on code semantics rather than full pipeline management, lacking extensive discussion on architectural design and real-world applications.

61.1%

2.0

2024

[44] PECC: Problem Extraction and Coding Challenges Patrick Haller, ..., and Alan Akbik ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 61.1% topic match

Introduces a novel benchmark, PECC, for evaluating LLMs in problem-solving and code generation tasks. PECC requires LLMs to interpret narrative-embedded problems, extract requirements, and generate executable code, mirroring real-world scenarios. While it investigates generative AI in coding, it does not specifically address managing entire codebases or agent-like pipelines.

Introduces a novel benchmark, PECC, for evaluating LLMs in problem-solving and code generation tasks. PECC requires LLMs to interpret narrative-embedded problems, extract requirements, and generate executable code, mirroring real-world scenarios. While it investigates generative AI in coding, it does not specifically address managing entire codebases or agent-like pipelines.

60.2%

10.0

2023

[45] Generative Artificial Intelligence for Software Engineering - A Research Agenda Anh Nguyen-Duc, ..., and P. Abrahamsson ArXiv 2023 - 10 citations - Show abstract - Cite - PDF 60.2% topic match

Provides an overview of GenAI tools in software development. Details current development, applications, limitations, and challenges of GenAI like ChatGPT and Copilot. Lacks a detailed focus on agent-like pipelines and specific training methodologies.

Provides an overview of GenAI tools in software development. Details current development, applications, limitations, and challenges of GenAI like ChatGPT and Copilot. Lacks a detailed focus on agent-like pipelines and specific training methodologies.

60.0%

2.7

2024

[46] Enhancing LLM-Based Coding Tools through Native Integration of IDE-Derived Static Context Yichen Li, ..., and Michael R. Lyu ArXiv 2024 - 2 citations - Show abstract - Cite - PDF 60.0% topic match

Demonstrates methods to enhance LLM-based code completion with IDE-derived contexts. Proposes IDECoder to leverage accurate, real-time cross-file information for improved repository-level code completion. Focuses on synergy between IDEs and LLMs but lacks in-depth details on architectural design, training methodologies, or broad real-world applications.

Demonstrates methods to enhance LLM-based code completion with IDE-derived contexts. Proposes IDECoder to leverage accurate, real-time cross-file information for improved repository-level code completion. Focuses on synergy between IDEs and LLMs but lacks in-depth details on architectural design, training methodologies, or broad real-world applications.

59.4%

3.8

2023

[47] CoLadder: Supporting Programmers with Hierarchical Code Generation in Multi-Level Abstraction Ryan Yen, ..., and Jian Zhao ArXiv 2023 - 4 citations - Show abstract - Cite - PDF 59.4% topic match

Provides: Hierarchical task decomposition and code manipulation using LLMs for programming. Details: CoLadder helps programmers format intentions, decompose tasks, and evaluate results, improving goal alignment with generated code. Potential Relevance: Focuses on prompt authoring and code evaluation rather than agent-like pipeline management or architectural design.

Provides: Hierarchical task decomposition and code manipulation using LLMs for programming. Details: CoLadder helps programmers format intentions, decompose tasks, and evaluate results, improving goal alignment with generated code. Potential Relevance: Focuses on prompt authoring and code evaluation rather than agent-like pipeline management or architectural design.

58.9%

16.5

2023

[48] Generative AI for Software Practitioners C. Ebert, ..., and C. Ebert IEEE Software 2023 - 22 citations - Show abstract - Cite 58.9% topic match

Provides a discussion on generative AI tools in software engineering. Explores how tools like ChatGPT and CoPilot enhance software productivity, including practical use cases and industry guidance. Lacks specific focus on agent-like pipelines and detailed metrics for managing entire codebases.

Provides a discussion on generative AI tools in software engineering. Explores how tools like ChatGPT and CoPilot enhance software productivity, including practical use cases and industry guidance. Lacks specific focus on agent-like pipelines and detailed metrics for managing entire codebases.

57.2%

0.0

2023

[49] Exploring the intersection of Generative AI and Software Development Filipe Calegario, ..., and César França ArXiv 2023 - 0 citations - Show abstract - Cite - PDF 57.2% topic match

Relates generative AI to software development processes. Explores generative AI techniques like zero-shot prompting and multimodal chain-of-thought for enhancing software engineering tasks. Emphasizes vector embeddings, context tools, and code assistants but lacks specific focus on managing entire codebases and detailed performance metrics.

Relates generative AI to software development processes. Explores generative AI techniques like zero-shot prompting and multimodal chain-of-thought for enhancing software engineering tasks. Emphasizes vector embeddings, context tools, and code assistants but lacks specific focus on managing entire codebases and detailed performance metrics.

56.1%

29.0

2022

[50] Repository-Level Prompt Generation for Large Language Models of Code Disha Shrivastava, ..., and Daniel Tarlow International Conference on Machine Learning 2022 - 68 citations - Show abstract - Cite - PDF 56.1% topic match

Develops a framework for generating effective prompts for LLMs in code autocompletion. Incorporates repository structure and context to improve prompt quality and performance.

Develops a framework for generating effective prompts for LLMs in code autocompletion. Incorporates repository structure and context to improve prompt quality and performance.

55.9%

28.5

2024

[51] Large Language Model based Multi-Agents: A Survey of Progress and Challenges Taicheng Guo, ..., and Xiangliang Zhang ArXiv 2024 - 22 citations - Show abstract - Cite - PDF 55.9% topic match

Provides a survey on LLM-based multi-agent systems. Discusses how LLMs act autonomously for planning, reasoning, and complex problem-solving. Focuses on multi-agent systems and communication, not specifically on coding assistants managing entire codebases.

Provides a survey on LLM-based multi-agent systems. Discusses how LLMs act autonomously for planning, reasoning, and complex problem-solving. Focuses on multi-agent systems and communication, not specifically on coding assistants managing entire codebases.

55.5%

21.1

2023

[52] AI-assisted coding: Experiments with GPT-4 R. Poldrack, ..., and G. Beguš ArXiv 2023 - 32 citations - Show abstract - Cite - PDF 55.5% topic match

Examines: AI code generation using GPT-4. Details: Highlights the need for human validation to ensure code accuracy. Relevant Info: Discusses GPT-4's capabilities in code refactoring and test generation, but lacks in-depth focus on agent-like pipelines and performance metrics.

Examines: AI code generation using GPT-4. Details: Highlights the need for human validation to ensure code accuracy. Relevant Info: Discusses GPT-4's capabilities in code refactoring and test generation, but lacks in-depth focus on agent-like pipelines and performance metrics.

55.5%

13.6

2024

[53] Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects Yuheng Cheng, ..., and Xiuqiang He ArXiv 2024 - 11 citations - Show abstract - Cite - PDF 55.5% topic match

Provides an overview of LLM-based intelligent agents and their applications. Explores definitions, frameworks, and foundational components within single and multi-agent systems, including cognitive and planning methods. Touches on applications in coding but lacks detailed architectural design, training methodologies, and performance metrics for codebase management.

Provides an overview of LLM-based intelligent agents and their applications. Explores definitions, frameworks, and foundational components within single and multi-agent systems, including cognitive and planning methods. Touches on applications in coding but lacks detailed architectural design, training methodologies, and performance metrics for codebase management.

55.3%

28.0

2023

[54] Large Language Models for Software Engineering: Survey and Open Problems Angela Fan, ..., and Jie M. Zhang 2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE) 2023 - 41 citations - Show abstract - Cite - PDF 55.3% topic match

Provides a survey of LLMs for Software Engineering. Covers applications in coding, design, repair, performance improvement, and documentation. Discusses challenges like hallucinations and advocates hybrid techniques combining traditional SE with LLMs.

Provides a survey of LLMs for Software Engineering. Covers applications in coding, design, repair, performance improvement, and documentation. Discusses challenges like hallucinations and advocates hybrid techniques combining traditional SE with LLMs.

54.5%

25.1

2023

[55] Self-planning Code Generation with Large Language Model Xue Jiang, ..., and Ge Li ArXiv 2023 - 41 citations - Show abstract - Cite - PDF 54.5% topic match

Introduces a self-planning approach to code generation using LLMs. Demonstrates improved code quality and problem-solving by decomposing complex intents into manageable steps. Focuses on code generation and quality improvement but does not discuss managing entire codebases or agent-like pipelines.

Introduces a self-planning approach to code generation using LLMs. Demonstrates improved code quality and problem-solving by decomposing complex intents into manageable steps. Focuses on code generation and quality improvement but does not discuss managing entire codebases or agent-like pipelines.

53.9%

0.0

2024

[56] LLM-Based Multi-Agent Systems for Software Engineering: Vision and the Road Ahead Junda He, ..., and David Lo ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 53.9% topic match

Paper provides: A vision for integrating LLMs into multi-agent systems for software engineering. Details: Emphasizes cognitive capabilities for autonomous problem-solving and robustness in complex software projects. Relevance: Focuses on theoretical future applications, not detailed architecture, training methods, or real-world metrics.

Paper provides: A vision for integrating LLMs into multi-agent systems for software engineering. Details: Emphasizes cognitive capabilities for autonomous problem-solving and robustness in complex software projects. Relevance: Focuses on theoretical future applications, not detailed architecture, training methods, or real-world metrics.

53.3%

0.0

2023

[57] xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval Mohammad Abdullah Matin Khan, ..., and Shafiq R. Joty Journal Not Provided 2023 - 0 citations - Show abstract - Cite - PDF 53.3% topic match

Provides a benchmark for code-related tasks Evaluates pre-trained language models on code generation, translation, and retrieval across multiple languages. Focuses on evaluation metrics rather than managing entire codebases or architectural design.

Provides a benchmark for code-related tasks Evaluates pre-trained language models on code generation, translation, and retrieval across multiple languages. Focuses on evaluation metrics rather than managing entire codebases or architectural design.

53.0%

20.1

2023

[58] CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models Hao Yu, ..., and Qianxiang Wang International Conference on Software Engineering 2023 - 35 citations - Show abstract - Cite - PDF 53.0% topic match

What the paper does: Proposes a benchmark for pragmatic code generation. Details: Introduces CoderEval to assess functional correctness of non-standalone generated code in Python and Java. Relevance: Focuses on evaluating generative AI code models but doesn't cover agent-like pipeline management of entire codebases or detailed training methods.

What the paper does: Proposes a benchmark for pragmatic code generation. Details: Introduces CoderEval to assess functional correctness of non-standalone generated code in Python and Java. Relevance: Focuses on evaluating generative AI code models but doesn't cover agent-like pipeline management of entire codebases or detailed training methods.

53.0%

9.7

2023

[59] AI-assisted Code Authoring at Scale: Fine-tuning, deploying, and mixed methods evaluation V. Murali, ..., and Peter C. Rigby Journal Not Provided 2023 - 14 citations - Show abstract - Cite - PDF 53.0% topic match

Shows the deployment of AI-assisted code authoring tools. Discusses CodeCompose, using InCoder LLM for code suggestion in multiple languages at Meta. Lacks detailed discussion on architectural design, training methodologies, real-world applications outside Meta, and specific performance metrics.

Shows the deployment of AI-assisted code authoring tools. Discusses CodeCompose, using InCoder LLM for code suggestion in multiple languages at Meta. Lacks detailed discussion on architectural design, training methodologies, real-world applications outside Meta, and specific performance metrics.

52.9%

0.0

2024

[60] Fight Fire with Fire: How Much Can We Trust ChatGPT on Source Code-Related Tasks? Xiao Yu, ..., and Xin Xia Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 52.9% topic match

Assesses ChatGPT's performance on code-related tasks. Examines ChatGPT's ability to generate and self-verify code and test reports. Focuses on multi-agent collaborative development, but lacks detailed architectural, real-world applications, and specific training methodologies.

Assesses ChatGPT's performance on code-related tasks. Examines ChatGPT's ability to generate and self-verify code and test reports. Focuses on multi-agent collaborative development, but lacks detailed architectural, real-world applications, and specific training methodologies.

52.6%

17.7

2023

[61] Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation E. Zelikman, ..., and A. Kalai ArXiv 2023 - 19 citations - Show abstract - Cite - PDF 52.6% topic match

Shows the capability of a language model to improve itself. Uses a language-model-infused scaffolding program for recursive self-improvement of code generation, including GPT-4. Discusses various self-improvement strategies but lacks details on complete architectural design, training methodologies, and real-world applications in managing entire codebases.

Shows the capability of a language model to improve itself. Uses a language-model-infused scaffolding program for recursive self-improvement of code generation, including GPT-4. Discusses various self-improvement strategies but lacks details on complete architectural design, training methodologies, and real-world applications in managing entire codebases.

50.4%

1.4

2024

[62] Generating Java Methods: An Empirical Assessment of Four AI-Based Code Assistants Vincenzo Corso, ..., and O. Riganelli ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 50.4% topic match

Assesses AI-based code assistants' effectiveness in generating Java methods. Compares GitHub Copilot, Tabnine, ChatGPT, and Google Bard using a 100-method dataset from real-life Java projects. Partially relevant: focuses on generative AI for code snippets, but lacks discussion on agent-like pipelines, architectural design, or detailed training methodologies.

Assesses AI-based code assistants' effectiveness in generating Java methods. Compares GitHub Copilot, Tabnine, ChatGPT, and Google Bard using a 100-method dataset from real-life Java projects. Partially relevant: focuses on generative AI for code snippets, but lacks discussion on agent-like pipelines, architectural design, or detailed training methodologies.

48.5%

0.0

2024

[63] AI-powered Code Review with LLMs: Early Results Zeeshan Rasheed, ..., and Pekka Abrahamsson ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 48.5% topic match

Provides: An LLM-based model for code review and quality improvement. Details: Trained on code repositories to detect issues, suggest improvements, and predict future code risks. Relevance: Focuses on code quality and developer feedback, less on full codebase management or architectural design.

Provides: An LLM-based model for code review and quality improvement. Details: Trained on code repositories to detect issues, suggest improvements, and predict future code risks. Relevance: Focuses on code quality and developer feedback, less on full codebase management or architectural design.

48.2%

1.9

2022

[64] Test-Driven Multi-Task Learning with Functionally Equivalent Code Transformation for Neural Code Generation Xin Wang, ..., and Xiao Cui Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering 2022 - 4 citations - Show abstract - Cite 48.2% topic match

Proposes a novel framework for neural code generation. Combines program analysis with deep learning, using functionally equivalent code snippets and test execution feedback. Focuses on functional correctness and introduces test-driven multi-task learning, relevant but doesn’t cover full agent-like pipeline management.

Proposes a novel framework for neural code generation. Combines program analysis with deep learning, using functionally equivalent code snippets and test execution feedback. Focuses on functional correctness and introduces test-driven multi-task learning, relevant but doesn’t cover full agent-like pipeline management.

47.0%

18.3

2024

[65] OS-Copilot: Towards Generalist Computer Agents with Self-Improvement Zhiyong Wu, ..., and Lingpeng Kong ArXiv 2024 - 13 citations - Show abstract - Cite - PDF 47.0% topic match

Provides a framework for generalist agents called OS-Copilot. Introduces FRIDAY, a self-improving agent for automating diverse computer tasks including code management. Focuses on interfacing with a variety of OS elements; includes performance metrics like GAIA benchmark but lacks specifics on generative AI for entire codebase management.

Provides a framework for generalist agents called OS-Copilot. Introduces FRIDAY, a self-improving agent for automating diverse computer tasks including code management. Focuses on interfacing with a variety of OS elements; includes performance metrics like GAIA benchmark but lacks specifics on generative AI for entire codebase management.

46.2%

28.7

2024

[66] Understanding the planning of LLM agents: A survey Xu Huang, ..., and Enhong Chen ArXiv 2024 - 21 citations - Show abstract - Cite - PDF 46.2% topic match

Provides a systematic view of LLM-based agents planning Offers taxonomy categorizing planning methods including Task Decomposition, Plan Selection, External Module, Reflection, and Memory Focuses on planning rather than managing entire codebases; may inform architecture and methodologies for generative AI coding assistants

Provides a systematic view of LLM-based agents planning Offers taxonomy categorizing planning methods including Task Decomposition, Plan Selection, External Module, Reflection, and Memory Focuses on planning rather than managing entire codebases; may inform architecture and methodologies for generative AI coding assistants

45.7%

0.0

2024

[67] From Language Models to Practical Self-Improving Computer Agents Alex Sheng ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 45.7% topic match

Shows LLMs can self-augment for diverse tasks. Demonstrates an LLM agent generating its own tools for software development and web tasks using prompt engineering. Relevant to the extent of detailing self-improving agents but lacks specific focus on entire codebase management and performance metrics.

Shows LLMs can self-augment for diverse tasks. Demonstrates an LLM agent generating its own tools for software development and web tasks using prompt engineering. Relevant to the extent of detailing self-improving agents but lacks specific focus on entire codebase management and performance metrics.

45.5%

0.0

2024

[68] Can Github issues be solved with Tree Of Thoughts? Ricardo La Rosa, ..., and Bangdi Liu Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 45.5% topic match

Explores ToT framework for solving GitHub issues via LLMs. Introduces structured exploration and self-assessment for complex problem-solving beyond basic code generation. Discusses benchmarks like HumanEval but focuses on decision-making, which is slightly tangential to coding assistant pipelines.

Explores ToT framework for solving GitHub issues via LLMs. Introduces structured exploration and self-assessment for complex problem-solving beyond basic code generation. Discusses benchmarks like HumanEval but focuses on decision-making, which is slightly tangential to coding assistant pipelines.

44.3%

2.0

2024

[69] Automatic Programming: Large Language Models and Beyond Michael R. Lyu, ..., and Patanamon Thongtanunam ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 44.3% topic match

Discusses automated coding using Large Language Models (LLMs). Considers quality, security, and programmer roles in deploying automated code. Touches on program repair and analysis but lacks focus on agent-like pipelines and codebase management.

Discusses automated coding using Large Language Models (LLMs). Considers quality, security, and programmer roles in deploying automated code. Touches on program repair and analysis but lacks focus on agent-like pipelines and codebase management.

43.7%

0.0

2024

[70] CodeGRAG: Extracting Composed Syntax Graphs for Retrieval Augmented Cross-Lingual Code Generation Kounianhua Du, ..., and Weinan Zhang ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 43.7% topic match

Improves code generation with Syntax Graph Retrieval. Uses control and data flow extraction to bridge syntax gaps between programming and natural languages. Focuses on enhancing single-round and cross-lingual code generation, but lacks discussion on agent-like pipelines and codebase management.

Improves code generation with Syntax Graph Retrieval. Uses control and data flow extraction to bridge syntax gaps between programming and natural languages. Focuses on enhancing single-round and cross-lingual code generation, but lacks discussion on agent-like pipelines and codebase management.

43.7%

44.9

2021

[71] Jigsaw: Large Language Models meet Program Synthesis Naman Jain, ..., and Rahul Sharma 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) 2021 - 130 citations - Show abstract - Cite - PDF 43.7% topic match

Provides: Augments large language models for program synthesis. Details: Uses program analysis and synthesis techniques to enhance code generation accuracy from natural language. Additional Info: Focuses on Python Pandas API, utilizing user feedback for continuous improvement; lacks broader codebase management aspects.

Provides: Augments large language models for program synthesis. Details: Uses program analysis and synthesis techniques to enhance code generation accuracy from natural language. Additional Info: Focuses on Python Pandas API, utilizing user feedback for continuous improvement; lacks broader codebase management aspects.

43.5%

0.0

2024

[72] On The Importance of Reasoning for Context Retrieval in Repository-Level Code Editing Alexander Kovrigin, ..., and T. Bryksin Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 43.5% topic match

Examines the role of reasoning in context retrieval for repository-level code editing. Decouples context retrieval from other components, conducting focused experiments on context gathering in codebases. Details repository navigation using LLMs but lacks comprehensive coverage of architectural design, training methodologies, or performance metrics for generative AI in creating agent-like pipelines.

Examines the role of reasoning in context retrieval for repository-level code editing. Decouples context retrieval from other components, conducting focused experiments on context gathering in codebases. Details repository navigation using LLMs but lacks comprehensive coverage of architectural design, training methodologies, or performance metrics for generative AI in creating agent-like pipelines.

43.3%

3.0

2024

[73] When Large Language Models Confront Repository-Level Automatic Program Repair: How Well They Done? Yuxiao Chen, ..., and Yanjun Wu https://doi.org/10.48550/arXiv.2403.00448 2024 - 2 citations - Show abstract - Cite - PDF 43.3% topic match

Provides an evaluation of large language models for repository-level automatic program repair. Introduces the RepoBugs benchmark and examines repository-level context extraction methods for improving repair success rates. Focuses on program repair rather than broader agent-like pipelines for managing entire codebases.

Provides an evaluation of large language models for repository-level automatic program repair. Introduces the RepoBugs benchmark and examines repository-level context extraction methods for improving repair success rates. Focuses on program repair rather than broader agent-like pipelines for managing entire codebases.

43.2%

0.9

2023

[74] OpenAi's GPT4 as coding assistant Lefteris Moussiades and G. Zografos ArXiv 2023 - 1 citations - Show abstract - Cite - PDF 43.2% topic match

Examines GPT3.5 and GPT4 as coding assistants Tests capabilities in answering questions, code generation, and debugging Focuses more on productivity improvements than full codebase management

Examines GPT3.5 and GPT4 as coding assistants Tests capabilities in answering questions, code generation, and debugging Focuses more on productivity improvements than full codebase management

43.1%

6.5

2024

[75] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models Zehui Chen, ..., and Feng Zhao ArXiv 2024 - 4 citations - Show abstract - Cite - PDF 43.1% topic match

Provides: Insights on tuning LLMs for agent-based tasks. Expands: Observes key challenges in current agent training methodologies and identifies issues like data distribution shifts and hallucinations.

Provides: Insights on tuning LLMs for agent-based tasks. Expands: Observes key challenges in current agent training methodologies and identifies issues like data distribution shifts and hallucinations.

42.5%

18.6

2022

[76] Language Models Can Teach Themselves to Program Better Patrick M. Haluptzok, ..., and A. Kalai ArXiv 2022 - 42 citations - Show abstract - Cite - PDF 42.5% topic match

Shows LMs can self-improve via self-generated programming problems. Demonstrates the use of an interpreter to verify correctness and enhance performance using self-play techniques. Relevant in testing performance metrics but may lack comprehensive focus on managing entire codebases or agent-like pipelines.

Shows LMs can self-improve via self-generated programming problems. Demonstrates the use of an interpreter to verify correctness and enhance performance using self-play techniques. Relevant in testing performance metrics but may lack comprehensive focus on managing entire codebases or agent-like pipelines.

42.5%

6.5

2023

[77] L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models Ansong Ni, ..., and Arman Cohan ArXiv 2023 - 7 citations - Show abstract - Cite - PDF 42.5% topic match

Comprehensively evaluates language-to-code generation capabilities of large language models. Analyzes performance across tasks like semantic parsing, math reasoning, and Python programming using various models and methods. Includes human evaluations and confidence calibration but lacks focus on managing entire codebases and agent-like pipelines.

Comprehensively evaluates language-to-code generation capabilities of large language models. Analyzes performance across tasks like semantic parsing, math reasoning, and Python programming using various models and methods. Includes human evaluations and confidence calibration but lacks focus on managing entire codebases and agent-like pipelines.

41.6%

1.7

2023

[78] Artificial Intelligence-Based Tools in Software Development Processes: Application of ChatGPT Z. Özpolat, ..., and M. Karabatak European Journal of Technic 2023 - 2 citations - Show abstract - Cite 41.6% topic match

Highlights ChatGPT's role in software development. Explores code generation, test automation, error analysis, and performance improvements using ChatGPT. Lacks detailed discussion on managing entire codebases or specific agent-like pipeline architectures.

Highlights ChatGPT's role in software development. Explores code generation, test automation, error analysis, and performance improvements using ChatGPT. Lacks detailed discussion on managing entire codebases or specific agent-like pipeline architectures.

39.0%

0.0

2023

[79] Developer Experiences with a Contextualized AI Coding Assistant: Usability, Expectations, and Outcomes Gustavo Pinto, ..., and Edward Monteiro ArXiv 2023 - 0 citations - Show abstract - Cite - PDF 39.0% topic match

Explores usability of contextualized AI coding assistants. Focuses on experiences of 62 participants using StackSpot AI, highlighting benefits and challenges in software development. Lacks specific coverage on managing entire codebases, architectural design, training methodologies, and performance metrics.

Explores usability of contextualized AI coding assistants. Focuses on experiences of 62 participants using StackSpot AI, highlighting benefits and challenges in software development. Lacks specific coverage on managing entire codebases, architectural design, training methodologies, and performance metrics.

38.8%

2.8

2022

[80] Aligning Offline Metrics and Human Judgments of Value of AI-Pair Programmers Victor C. Dibia, ..., and Saleema Amershi ArXiv 2022 - 8 citations - Show abstract - Cite 38.8% topic match

Focuses on aligning offline metrics with human judgments for AI-pair programmers. Examines the discrepancy between functional correctness metrics and programmer-valued suggestions in AI-driven code generation. Relevant for understanding performance metrics in coding assistants, but lacks detailed discussion on architectural design and training methodologies.

Focuses on aligning offline metrics with human judgments for AI-pair programmers. Examines the discrepancy between functional correctness metrics and programmer-valued suggestions in AI-driven code generation. Relevant for understanding performance metrics in coding assistants, but lacks detailed discussion on architectural design and training methodologies.

36.5%

4.4

2023

[81] Chatbots As Fluent Polyglots: Revisiting Breakthrough Code Snippets David A. Noever and Kevin Williams ArXiv 2023 - 8 citations - Show abstract - Cite - PDF 36.5% topic match

Provides AI-driven code assistant insights and improvements on influential code. Analyzes significant code advances, offering clarity, performance improvements, and deep reasoning for bug corrections and optimizations. Focuses on automated documentation, code commentary, and translating legacy code, but lacks detail on full pipeline management or architectural designs.

Provides AI-driven code assistant insights and improvements on influential code. Analyzes significant code advances, offering clarity, performance improvements, and deep reasoning for bug corrections and optimizations. Focuses on automated documentation, code commentary, and translating legacy code, but lacks detail on full pipeline management or architectural designs.

36.4%

0.0

2023

[82] Role of GenAI in Automated Code Generation within DevOps Practices: Explore how Generative AI Prachi Tembhekar, ..., and Jawaharbabu Jeyaraman Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (online) 2023 - 0 citations - Show abstract - Cite 36.4% topic match

Discusses AI-driven Automated Code Generation (ACG) Reviews traditional and AI techniques for ACG, compares performance metrics Focuses on ACG in DevOps; lacks specifics on generative AI for managing entire codebases

Discusses AI-driven Automated Code Generation (ACG) Reviews traditional and AI techniques for ACG, compares performance metrics Focuses on ACG in DevOps; lacks specifics on generative AI for managing entire codebases

36.1%

4.4

2023

[83] On the Reliability and Explainability of Automated Code Generation Approaches Yue Liu, ..., and Li Li ArXiv 2023 - 8 citations - Show abstract - Cite 36.1% topic match

Provides empirical analysis on reliability and explainability of code generation models. Assesses five models on four datasets, revealing issues in data duplication and input insensitivity. Relevant by discussing model capabilities but lacks focus on agent-like pipelines or managing entire codebases.

Provides empirical analysis on reliability and explainability of code generation models. Assesses five models on four datasets, revealing issues in data duplication and input insensitivity. Relevant by discussing model capabilities but lacks focus on agent-like pipelines or managing entire codebases.

34.7%

6.7

2024

[84] RepairAgent: An Autonomous, LLM-Based Agent for Program Repair Islem Bouzenia, ..., and Michael Pradel ArXiv 2024 - 4 citations - Show abstract - Cite - PDF 34.7% topic match

Introduces an LLM-based agent for autonomous program repair. Demonstrates dynamic tool invocation and interaction, managing bug fixes with a flexible, state-guided approach. Focuses on program repair rather than entire codebase management or architectural design.

Introduces an LLM-based agent for autonomous program repair. Demonstrates dynamic tool invocation and interaction, managing bug fixes with a flexible, state-guided approach. Focuses on program repair rather than entire codebase management or architectural design.

33.9%

0.0

2023

[85] Synthesizing Sentience: Integrating Large Language Models and Autonomous Agents for Emulating Human Cognitive Complexity J. Ratican, ..., and Daniel Plate Journal of Artificial Intelligence, Machine Learning and Data Science 2023 - 0 citations - Show abstract - Cite 33.9% topic match

Provides methodology for integrating large language models with autonomous agents. Explores emulating human cognitive complexity via autonomous agents like GPT-based systems. Focuses more on cognitive complexity emulation than specifically managing entire codebases or detailed training and performance metrics.

Provides methodology for integrating large language models with autonomous agents. Explores emulating human cognitive complexity via autonomous agents like GPT-based systems. Focuses more on cognitive complexity emulation than specifically managing entire codebases or detailed training and performance metrics.

32.8%

6.0

2024

[86] Data Interpreter: An LLM Agent For Data Science Sirui Hong, ..., and Xiawu Zheng ArXiv 2024 - 4 citations - Show abstract - Cite - PDF 32.8% topic match

Introduces a LLM-based agent for dynamic data science problem-solving. Uses hierarchical graph structures, tool integration, and logical inconsistency identification for real-time adaptability and optimization. Focuses on data science with code execution rather than managing entire codebases; lacks emphasis on architectural design, training methodologies, and performance metrics specific to coding assistants.

Introduces a LLM-based agent for dynamic data science problem-solving. Uses hierarchical graph structures, tool integration, and logical inconsistency identification for real-time adaptability and optimization. Focuses on data science with code execution rather than managing entire codebases; lacks emphasis on architectural design, training methodologies, and performance metrics specific to coding assistants.

32.0%

17.6

2023

[87] AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors Weize Chen, ..., and Jie Zhou Journal Not Provided 2023 - 21 citations - Show abstract - Cite - PDF 32.0% topic match

Proposes multi-agent framework leveraging LLMs. Demonstrates dynamic, collaborative task execution, showcasing emergent social behaviors and strategies for enhancing group efficiency. Lacks specifics on managing entire codebases or architectural design of coding assistants.

Proposes multi-agent framework leveraging LLMs. Demonstrates dynamic, collaborative task execution, showcasing emergent social behaviors and strategies for enhancing group efficiency. Lacks specifics on managing entire codebases or architectural design of coding assistants.

31.8%

None

[88] Grounding Code Generation with Input-Output Speciﬁcations No author found Journal Not Provided None - 0 citations - Show abstract - Cite 31.8% topic match

Proposes an approach for instruction fine-tuning LLMs for code generation. Uses synthetic data and execution-derived feedback to align LLM outputs with user intentions. Paper focuses on improving initial code generation accuracy, not full management of codebases or broader architectural tasks.

Proposes an approach for instruction fine-tuning LLMs for code generation. Uses synthetic data and execution-derived feedback to align LLM outputs with user intentions. Paper focuses on improving initial code generation accuracy, not full management of codebases or broader architectural tasks.

31.1%

86.2

2023

[89] AgentBench: Evaluating LLMs as Agents Xiao Liu, ..., and Jie Tang ArXiv 2023 - 106 citations - Show abstract - Cite - PDF 31.1% topic match

Provides a benchmark to evaluate LLMs as agents in complex environments. Assesses reasoning, decision-making, and instruction following in multi-turn settings using 8 distinct environments. Relevant for understanding LLMs' agent-like capabilities but lacks focus on managing entire codebases and specific coding assistant tasks.

Provides a benchmark to evaluate LLMs as agents in complex environments. Assesses reasoning, decision-making, and instruction following in multi-turn settings using 8 distinct environments. Relevant for understanding LLMs' agent-like capabilities but lacks focus on managing entire codebases and specific coding assistant tasks.

30.2%

7.4

2024

[90] RepoAgent: An LLM-Powered Open-Source Framework for Repository-level Code Documentation Generation Qinyu Luo, ..., and Maosong Sun ArXiv 2024 - 5 citations - Show abstract - Cite - PDF 30.2% topic match

Introduces RepoAgent for repository-level code documentation. RepoAgent uses large language models to proactively generate, maintain, and update code documentation. Focuses on documentation rather than entire codebase management; relevant for context but not fully aligned with the criteria.

Introduces RepoAgent for repository-level code documentation. RepoAgent uses large language models to proactively generate, maintain, and update code documentation. Focuses on documentation rather than entire codebase management; relevant for context but not fully aligned with the criteria.

30.1%

0.0

2024

[91] How to Understand Whole Software Repository? Yingwei Ma, ..., and Yongbin Li Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 30.1% topic match

Paper focus: Highlights the need for understanding whole software repositories in ASE. Detailed insight: Criticizes current LLM-based agents for focusing on local code information, stressing the importance of capturing the global context. Relevance consideration: Discusses challenges such as long input lengths and complex dependencies but does not detail generative AI for managing entire codebases.

Paper focus: Highlights the need for understanding whole software repositories in ASE. Detailed insight: Criticizes current LLM-based agents for focusing on local code information, stressing the importance of capturing the global context. Relevance consideration: Discusses challenges such as long input lengths and complex dependencies but does not detail generative AI for managing entire codebases.

29.9%

8.8

2023

[92] AceCoder: Utilizing Existing Code to Enhance Code Generation Jia Li, ..., and Zhi Jin Journal Not Provided 2023 - 14 citations - Show abstract - Cite - PDF 29.9% topic match

Proposes new prompting technique for code generation. AceCoder uses guided code generation and example retrieval to enhance accuracy. Focuses on solving requirement understanding and code implementation, but doesn't cover managing entire codebases.

Proposes new prompting technique for code generation. AceCoder uses guided code generation and example retrieval to enhance accuracy. Focuses on solving requirement understanding and code implementation, but doesn't cover managing entire codebases.

29.6%

4.4

2023

[93] Large Language Models Are State-of-the-Art Evaluators of Code Generation Terry Yue Zhuo ArXiv 2023 - 8 citations - Show abstract - Cite 29.6% topic match

Investigates evaluation of code generation by large language models. Explores limitations of current metrics like BLEU for code tasks. No direct focus on agent-like pipelines or managing entire codebases.

Investigates evaluation of code generation by large language models. Explores limitations of current metrics like BLEU for code tasks. No direct focus on agent-like pipelines or managing entire codebases.

29.0%

24.3

2024

[94] Executable Code Actions Elicit Better LLM Agents Xingyao Wang, ..., and Heng Ji ArXiv 2024 - 18 citations - Show abstract - Cite - PDF 29.0% topic match

Introduces executable Python code for LLM agent actions (CodeAct). Unifies action space using Python code, improving flexibility and success rates. Paper tests CodeAct on benchmarks, suggesting higher performance for practical applications and user interactions.

Introduces executable Python code for LLM agent actions (CodeAct). Unifies action space using Python code, improving flexibility and success rates. Paper tests CodeAct on benchmarks, suggesting higher performance for practical applications and user interactions.

28.5%

0.0

2023

[95] Static Code Analysis in the AI Era: An In-depth Exploration of the Concept, Function, and Potential of Intelligent Code Analysis Agents Gang Fan, ..., and Peng Di ArXiv 2023 - 0 citations - Show abstract - Cite - PDF 28.5% topic match

Describes the use of AI for intelligent code analysis. Introduces Intelligent Code Analysis Agent (ICAA) combining LLMs (GPT-3/4) and traditional techniques. Focus is on error detection and business logic inconsistencies, not on managing entire codebases or architectural design.

Describes the use of AI for intelligent code analysis. Introduces Intelligent Code Analysis Agent (ICAA) combining LLMs (GPT-3/4) and traditional techniques. Focus is on error detection and business logic inconsistencies, not on managing entire codebases or architectural design.

27.1%

261.6

2021

[96] Program Synthesis with Large Language Models Jacob Austin, ..., and Charles Sutton ArXiv 2021 - 838 citations - Show abstract - Cite - PDF 27.1% topic match

Evaluates: Uses large language models for program synthesis from natural language descriptions. Detail: Assesses models on MBPP and MathQA-Python benchmarks in few-shot and fine-tuning regimes. Relevance: Focuses on code synthesis, not entire codebase management or agent-like pipelines. Lacks architectural and real-world application details.

Evaluates: Uses large language models for program synthesis from natural language descriptions. Detail: Assesses models on MBPP and MathQA-Python benchmarks in few-shot and fine-tuning regimes. Relevance: Focuses on code synthesis, not entire codebase management or agent-like pipelines. Lacks architectural and real-world application details.

26.5%

29.5

2023

[97] AutoAgents: A Framework for Automatic Agent Generation Guangyao Chen, ..., and Yemin Shi ArXiv 2023 - 32 citations - Show abstract - Cite - PDF 26.5% topic match

25.6%

4.8

2023

[98] Formally Specifying the High-Level Behavior of LLM-Based Agents M. Crouse, ..., and Luis A. Lastras ArXiv 2023 - 5 citations - Show abstract - Cite - PDF 25.6% topic match

Proposes a framework for designing LLM-based agents. Simplifies agent creation by offering a minimalistic generation framework, noting the diversity in agent tasks. Lacks details on managing codebases, training methodologies, or performance metrics in coding assistants.

Proposes a framework for designing LLM-based agents. Simplifies agent creation by offering a minimalistic generation framework, noting the diversity in agent tasks. Lacks details on managing codebases, training methodologies, or performance metrics in coding assistants.

25.0%

63.9

2023

[99] The Programmer’s Assistant: Conversational Interaction with a Large Language Model for Software Development Steven I. Ross, ..., and Justin D. Weisz Proceedings of the 28th International Conference on Intelligent User Interfaces 2023 - 109 citations - Show abstract - Cite - PDF 25.0% topic match

24.3%

1.4

2024

[100] Understanding the Weakness of Large Language Model Agents within a Complex Android Environment Mingzhe Xing, ..., and Zhengjin Xiao ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 24.3% topic match

Evaluates: LLM agents within general-purpose software systems, including Android environments. Details: Discusses challenges like vast action spaces, inter-application cooperation, and meeting user constraints. Relevance: Focuses on broader LLM agent challenges rather than coding assistants managing codebases or specific architectural designs and training methodologies.

Evaluates: LLM agents within general-purpose software systems, including Android environments. Details: Discusses challenges like vast action spaces, inter-application cooperation, and meeting user constraints. Relevance: Focuses on broader LLM agent challenges rather than coding assistants managing codebases or specific architectural designs and training methodologies.

24.2%

2.0

2024

[101] A Unified Debugging Approach via LLM-Based Multi-Agent Synergy Cheryl Lee, ..., and Michael R. Lyu ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 24.2% topic match

Proposes an LLM-based multi-agent framework for automated debugging. Details challenges in LLM-based debugging: fault localization, complex logic errors, and context ignorance. Does not focus on managing entire codebases, architectural design, or training methodologies.

Proposes an LLM-based multi-agent framework for automated debugging. Details challenges in LLM-based debugging: fault localization, complex logic errors, and context ignorance. Does not focus on managing entire codebases, architectural design, or training methodologies.

23.8%

0.0

2024

[102] CodeAgent: Collaborative Agents for Software Engineering Daniel Tang, ..., and Tegawendé F. Bissyandé ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 23.8% topic match

Introduces multi-agent-based system for code review. Details collaborative QA-Checker supervision agent for quality assurance during code review. Focuses on code review automation, not full codebase management.

Introduces multi-agent-based system for code review. Details collaborative QA-Checker supervision agent for quality assurance during code review. Focuses on code review automation, not full codebase management.

23.0%

0.0

2023

[103] of GitHub Copilot’s Code Mohammed Latif Siddiq Journal Not Provided 2023 - 0 citations - Show abstract - Cite 23.0% topic match

Scope: Evaluates GitHub Copilot's capabilities. Details: Assesses code generation, explaining functions, and generating reviews, focusing on developer substitution and software vulnerability. Relevance: Lacks detailed discussion on training methodologies, overall codebase management, or specific architectural designs.

Scope: Evaluates GitHub Copilot's capabilities. Details: Assesses code generation, explaining functions, and generating reviews, focusing on developer substitution and software vulnerability. Relevance: Lacks detailed discussion on training methodologies, overall codebase management, or specific architectural designs.

22.9%

2.3

2023

[104] Human AI Collaboration in Software Engineering: Lessons Learned from a Hands On Workshop Muhammad Hamza, ..., and Tahsinur Rahman ArXiv 2023 - 2 citations - Show abstract - Cite - PDF 22.9% topic match

Explores human-AI collaboration using ChatGPT in software engineering. Examines AI's efficiency in code generation and optimization, highlighting human oversight for complex tasks. Does not focus on managing entire codebases or agent-like pipelines; primarily about collaboration dynamics.

Explores human-AI collaboration using ChatGPT in software engineering. Examines AI's efficiency in code generation and optimization, highlighting human oversight for complex tasks. Does not focus on managing entire codebases or agent-like pipelines; primarily about collaboration dynamics.

22.5%

0.0

2023

[105] Automatic Robotic Development through Collaborative Framework by Large Language Models Zhirong Luan, ..., and Badong Chen 2023 China Automation Congress (CAC) 2023 - 0 citations - Show abstract - Cite - PDF 22.5% topic match

Proposes a collaborative framework using multiple LLMs for robot development. Framework involves distinct roles for LLMs: analysts, programmers, testers, enhancing complex task handling. Focus is on robotic development, not directly on managing entire codebases or coding assistants.

Proposes a collaborative framework using multiple LLMs for robot development. Framework involves distinct roles for LLMs: analysts, programmers, testers, enhancing complex task handling. Focus is on robotic development, not directly on managing entire codebases or coding assistants.

22.4%

1.7

2024

[106] Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World Guande Wu, ..., and He He ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 22.4% topic match

21.2%

2.5

2022

[107] Generation Probabilities are Not Enough: Improving Error Highlighting for AI Code Suggestions Helena Vasconcelos, ..., and Jennifer Wortman Journal Not Provided 2022 - 7 citations - Show abstract - Cite 21.2% topic match

Evaluates shortcomings in AI code suggestions. Highlights error-prone recommendations from models like Copilot and their potential introduction of bugs and vulnerabilities. Does not specifically discuss creating agent-like pipelines or managing entire codebases; focus is on error highlighting in AI code completions.

Evaluates shortcomings in AI code suggestions. Highlights error-prone recommendations from models like Copilot and their potential introduction of bugs and vulnerabilities. Does not specifically discuss creating agent-like pipelines or managing entire codebases; focus is on error highlighting in AI code completions.

21.1%

3.0

2023

[108] On the Reliability and Explainability of Language Models for Program Generation Yue Liu, ..., and Li Li ACM Transactions on Software Engineering and Methodology 2023 - 5 citations - Show abstract - Cite - PDF 21.1% topic match

21.0%

0.0

2022

[109] L EVERAGING N EURAL L ANGUAGE M ODEL FOR A U - TOMATED C ODE Q UALITY I SSUE I DENTIFICATION No author found Journal Not Provided 2022 - 0 citations - Show abstract - Cite 21.0% topic match

What the paper does: Provides a framework for automated code quality issue identification. Expanded details: Uses neural language models, specifically transformers, to spot and recommend fixes for stylistic and quality issues in code. Relevance to the topic: Focuses on code quality rather than managing entire codebases; lacks coverage of architectural design, training methodologies, and comprehensive performance metrics for agent-like pipelines.

What the paper does: Provides a framework for automated code quality issue identification. Expanded details: Uses neural language models, specifically transformers, to spot and recommend fixes for stylistic and quality issues in code. Relevance to the topic: Focuses on code quality rather than managing entire codebases; lacks coverage of architectural design, training methodologies, and comprehensive performance metrics for agent-like pipelines.

20.4%

0.0

2024

[110] Magenta: Metrics and Evaluation Framework for Generative Agents based on LLMs Sudarshan Kamath Barkur, ..., and Sigurd Schacht Intelligent Human Systems Integration (IHSI 2024): Integrating People and Intelligent Systems 2024 - 0 citations - Show abstract - Cite Abstract: Large Language Models (LLMs) have emerged as a driving force in the field of Natural Language Processing (NLP) with applications spanning various domains, including the development of Autonomous Generative Agents. Generative Agents are computational software programs designed to believably simulate human behavior by harnessing the capabilities of large language models. Through repetitive prompts against the large language model, these agents operate based on a system architecture consisting of memory streams, reflection, and planning, allowing them to store experiences, learn from them, and translate insights into high-level action plans to interact with their environment. This paper discusses the current landscape of language models and autonomous agents, their advantages and challenges, and the current state of evaluation, and proposes an innovative evaluation benchmark designed to provide a holistic perspective on their performance. Additionally, we see the impact of fine-tuning such an LLM, evaluate using our benchmark, and then propose a framework for evaluation of both the agents and their underlying LLMs. The existing frameworks for evaluating LLMs and autonomous agents focus on single tasks and are limited in capturing their capabilities. We outline the methodology for evaluating autonomous agents' performance in responding to single and multi-step prompts. The process consists of three key stages: Preparation of the data, Preparation of the Gold Answers, and Evaluations. We use the meticulously crafted 20 unique prompts to challenge the agents, covering simple and complex questions. Using GPT-4, a state-of-the-art model, we generate the initial responses, which undergo rigorous verification to produce gold answers, indicating correctness and revealing the minimum steps required for task completion. Our evaluation framework relies on two critical metrics: the effort metrics, quantifying the steps taken by autonomous agents, and the success rate, measuring their accuracy in achieving task objectives and also keeping track of hallucinations of the model. We conduct experiments with ten different models, representing the current landscape of natural language processing models, presenting each with 20 unique prompts. Their responses are meticulously compared to our gold answers and gold steps (optimal number of steps) to generate the evaluation metrics. Similarly, a fine-tuned model is also evaluated with ten different questions, which test the agent's decision-making process by selecting the correct tool and then the ability of the model to reach the correct conclusion to the question asked by the user in this process.This comprehensive approach ensures a thorough assessment of autonomous agents' capabilities. It demonstrates the utility of these metrics, revealing how they can shed light on the strengths and weaknesses of various autonomous agents. As a step toward standardization, we propose transforming the evaluation process of LLMs into an automated framework that accommodates all types of language models, agents, and LLM-based applications. Such an approach promises to establish a unified and comprehensive evaluation methodology, empowering users to make informed decisions when selecting, fine-tuning, and assessing the accuracy of underlying language models and their applications for different domains.In summary, this paper contributes to the ongoing research on evaluating LLMs and autonomous agents by introducing a novel benchmark and proposing a framework, focusing on evaluating the language models while keeping different knowledge domains in mind. Our framework will enhance our understanding of these technologies and serve as a valuable resource for researchers, engineers, and practitioners working in the ever-evolving landscape of NLP and autonomous systems. 20.4% topic match

19.8%

27.5

2023

[111] Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization Weiran Yao, ..., and S. Savarese ArXiv 2023 - 34 citations - Show abstract - Cite - PDF 19.8% topic match

19.6%

0.0

2023

[112] Automated DevOps Pipeline Generation for Code Repositories using Large Language Models Deep Mehta, ..., and Bowen Xu ArXiv 2023 - 0 citations - Show abstract - Cite - PDF 19.6% topic match

19.3%

1.2

2023

[113] A Self-Iteration Code Generation Method Based on Large Language Models Tianyou Chang, ..., and Zhiyong Feng 2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS) 2023 - 1 citations - Show abstract - Cite 19.3% topic match

19.3%

3.2

2023

[114] Private-Library-Oriented Code Generation with Large Language Models Daoguang Zan, ..., and Yongji Wang ArXiv 2023 - 4 citations - Show abstract - Cite - PDF 19.3% topic match

Proposes a framework for generating code in private libraries using LLMs. Addresses the challenge of LLMs' limited exposure to private APIs during pre-training. Focuses on code generation rather than managing entire codebases or architectural design.

Proposes a framework for generating code in private libraries using LLMs. Addresses the challenge of LLMs' limited exposure to private APIs during pre-training. Focuses on code generation rather than managing entire codebases or architectural design.

19.2%

0.4

2022

[115] Self-Programming Artificial Intelligence Using Code-Generating Language Models Alex Sheng and Shankar Padmanabhan Journal Not Provided 2022 - 1 citations - Show abstract - Cite - PDF 19.2% topic match

Shows application of AI code generation for modifying its own source code. Details implementation and validation of a self-programming AI model improving itself for better performance. Does not specifically cover managing entire codebases or detailed performance metrics for coding assistants.

Shows application of AI code generation for modifying its own source code. Details implementation and validation of a self-programming AI model improving itself for better performance. Does not specifically cover managing entire codebases or detailed performance metrics for coding assistants.

19.1%

2.0

2024

[116] Generative AI Project Assistant V. Manikandan, ..., and K. C. International Journal of Innovative Science and Research Technology (IJISRT) 2024 - 1 citations - Show abstract - Cite 19.1% topic match

Adapts generative AI for project management. Utilizes NLP and ML for task automation and intelligent suggestions. Lacks specific focus on coding assistants, codebase management, architectural design, and performance metrics.

Adapts generative AI for project management. Utilizes NLP and ML for task automation and intelligent suggestions. Lacks specific focus on coding assistants, codebase management, architectural design, and performance metrics.

18.5%

22.6

2023

[117] Improving ChatGPT Prompt for Code Generation Chao Liu, ..., and Meng Yan ArXiv 2023 - 33 citations - Show abstract - Cite - PDF 18.5% topic match

18.4%

3.9

2023

[118] Large Language Models for Code Analysis: Do LLMs Really Do Their Job? Chongzhou Fang, ..., and H. Homayoun ArXiv 2023 - 4 citations - Show abstract - Cite - PDF 18.4% topic match

18.2%

0.0

2024

[119] Envisioning the Next-Generation AI Coding Assistants: Insights & Proposals K. Nghiem, ..., and Nghi D. Q. Bui ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 18.2% topic match

Proposes advancements in AI coding assistant design. Discusses integration with IDEs, backend design, and responsible data collection. Lacks specific focus on generative AI for agent-like pipelines; relevance is limited.

Proposes advancements in AI coding assistant design. Discusses integration with IDEs, backend design, and responsible data collection. Lacks specific focus on generative AI for agent-like pipelines; relevance is limited.

17.3%

7.8

2022

[120] Improving automatically generated code from Codex via Automated Program Repair Zhiyu Fan, ..., and Shin Hwei Tan ArXiv 2022 - 22 citations - Show abstract - Cite 17.3% topic match

17.2%

4.0

2024

[121] NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness Manav Singhal, ..., and Aditya Kanade ArXiv 2024 - 3 citations - Show abstract - Cite - PDF 17.2% topic match

17.2%

6.3

2023

[122] ML-Bench: Evaluating Large Language Models for Code Generation in Repository-Level Machine Learning Tasks Yuliang Liu, ..., and Mark B. Gerstein Journal Not Provided 2023 - 6 citations - Show abstract - Cite - PDF 17.2% topic match

17.1%

0.0

2023

[123] An empirical analysis of GitHub Copilot Isak Nilsson and Eva Darulova Journal Not Provided 2023 - 0 citations - Show abstract - Cite 17.1% topic match

Provides an empirical evaluation of GitHub Copilot. Explores Copilot’s effectiveness and efficiency in aiding software developers with work-related programming tasks. Lacks detailed discussion on architectural design, training methodologies, or agent-like pipeline management of entire codebases.

Provides an empirical evaluation of GitHub Copilot. Explores Copilot’s effectiveness and efficiency in aiding software developers with work-related programming tasks. Lacks detailed discussion on architectural design, training methodologies, or agent-like pipeline management of entire codebases.

16.9%

2.8

2024

[124] Analyzing Prompt Influence on Automated Method Generation: An Empirical Study with Copilot Ionut Daniel Fagadau, ..., and O. Riganelli ArXiv 2024 - 2 citations - Show abstract - Cite - PDF 16.9% topic match

Investigates prompt engineering's effect on code generation with Copilot. Analyzes how eight prompt features impact correctness, complexity, size, and similarity of generated Java methods. Does not address agent-like pipelines or entire codebase management. Focuses more on improving individual code generation instances.

Investigates prompt engineering's effect on code generation with Copilot. Analyzes how eight prompt features impact correctness, complexity, size, and similarity of generated Java methods. Does not address agent-like pipelines or entire codebase management. Focuses more on improving individual code generation instances.

16.9%

0.0

2024

[125] CodeCloak: A Method for Evaluating and Mitigating Code Leakage by LLM Code Assistants Amit Finkman, ..., and A. Shabtai ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 16.9% topic match

What the paper does: Evaluates and mitigates code leakage by LLM code assistants.

What the paper does: Evaluates and mitigates code leakage by LLM code assistants.

16.6%

0.0

2023

[126] GPT-in-the-Loop: Supporting Adaptation in Multiagent Systems N. Nascimento, ..., and Donald D. Cowan 2023 IEEE International Conference on Big Data (BigData) 2023 - 0 citations - Show abstract - Cite 16.6% topic match

Shows potential of GPT-driven behavior in multiagent systems. Demonstrates enhanced decision-making and adaptability using GPT-4 within an IoT framework. Focuses on multiagent systems, not specifically on entire codebase management or detailed coding assistant metrics.

Shows potential of GPT-driven behavior in multiagent systems. Demonstrates enhanced decision-making and adaptability using GPT-4 within an IoT framework. Focuses on multiagent systems, not specifically on entire codebase management or detailed coding assistant metrics.

16.6%

0.0

2024

[127] Benchmarking Large Language Models for Code Generation Sumedh Arun Patil, ..., and Rupali Kadu International Journal For Multidisciplinary Research 2024 - 0 citations - Show abstract - Cite 16.6% topic match

Provides performance benchmarks for LLMs in code generation. Focuses on evaluating LLMs like codellama-13b-instruct for interpreting and generating code from natural language instructions. Does not specifically address agent-like pipelines, entire codebase management, or detailed architectural design and training methodologies.

Provides performance benchmarks for LLMs in code generation. Focuses on evaluating LLMs like codellama-13b-instruct for interpreting and generating code from natural language instructions. Does not specifically address agent-like pipelines, entire codebase management, or detailed architectural design and training methodologies.

16.6%

24.6

2022

[128] Automated Repair of Programs from Large Language Models Zhiyu Fan, ..., and Shin Hwei Tan 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) 2022 - 60 citations - Show abstract - Cite - PDF 16.6% topic match

16.5%

0.5

2023

[129] A Taxonomy for Autonomous LLM-Powered Multi-Agent Architectures Thorsten Händler International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management 2023 - 1 citations - Show abstract - Cite 16.5% topic match

16.4%

0.0

2023

[130] Automatic Unit Test Data Generation and Actor-Critic Reinforcement Learning for Code Synthesis P. Gorinski, ..., and Ignacio Iacobacci ArXiv 2023 - 0 citations - Show abstract - Cite - PDF 16.4% topic match

16.3%

73.8

2023

[131] Is ChatGPT the Ultimate Programming Assistant - How far is it? Haoye Tian, ..., and Tegawend'e F. Bissyand'e ArXiv 2023 - 112 citations - Show abstract - Cite - PDF 16.3% topic match

16.1%

28.2

2023

[132] Evaluating the Code Quality of AI-Assisted Code Generation Tools: An Empirical Study on GitHub Copilot, Amazon CodeWhisperer, and ChatGPT Burak Yetistiren, ..., and Eray Tüzün ArXiv 2023 - 43 citations - Show abstract - Cite - PDF 16.1% topic match

16.0%

4.0

2024

[133] An Empirical Study on Usage and Perceptions of LLMs in a Software Engineering Project Sanka Rasnayaka, ..., and Ganesh Neelakanta Iyer ArXiv 2024 - 3 citations - Show abstract - Cite - PDF 16.0% topic match

15.9%

0.0

2024

[134] Enhancing Programming Error Messages in Real Time with Generative AI Bailey Kimmel, ..., and Chase Yamaguchi https://doi.org/10.1145/3613905.3647967 2024 - 0 citations - Show abstract - Cite - PDF 15.9% topic match

Shows generative AI enhancing programming error messages. Implements GPT-4 in an automated assessment tool for real-time compiler, run-time, and logic errors. Lacks coverage of managing entire codebases, architectural design, or performance metrics for coding assistants.

Shows generative AI enhancing programming error messages. Implements GPT-4 in an automated assessment tool for real-time compiler, run-time, and logic errors. Lacks coverage of managing entire codebases, architectural design, or performance metrics for coding assistants.

15.0%

0.0

2024

[135] Integrating Generative AI for Advancing Agile Software Development and Mitigating Project Management Challenges A. Bahi, ..., and Youssef Gahi International Journal of Advanced Computer Science and Applications 2024 - 0 citations - Show abstract - Cite 15.0% topic match

Explores integrating Generative AI in Agile software development. Focuses on using Generative AI for code generation, automated testing, and predictive analytics within Agile methodologies. Lacks specific coverage on managing entire codebases, agent-like pipelines, architectural design, training methodologies, and detailed performance metrics.

Explores integrating Generative AI in Agile software development. Focuses on using Generative AI for code generation, automated testing, and predictive analytics within Agile methodologies. Lacks specific coverage on managing entire codebases, agent-like pipelines, architectural design, training methodologies, and detailed performance metrics.

14.8%

0.0

2024

[136] How far are AI-powered programming assistants from meeting developers' needs? Xin Tan, ..., and Li Zhang ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 14.8% topic match

14.4%

6.3

2024

[137] RepoHyper: Better Context Retrieval Is All You Need for Repository-Level Code Completion Huy N. Phan, ..., and Nghi D. Q. Bui ArXiv 2024 - 4 citations - Show abstract - Cite - PDF 14.4% topic match

13.9%

0.5

2023

[138] Accelerating software development with AI: exploring the impact of ChatGPT and GitHub Copilot Illia Solohubov, ..., and Stepan Skrupsky CTE 2023 - 1 citations - Show abstract - Cite 13.9% topic match

13.4%

39.5

2023

[139] ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation Xueying Du, ..., and Yiling Lou ArXiv 2023 - 49 citations - Show abstract - Cite - PDF 13.4% topic match

13.4%

0.0

2024

[140] InfiCoder-Eval: Systematically Evaluating the Question-Answering Capabilities of Code Large Language Models Linyi Li, ..., and Hongxia Yang ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 13.4% topic match

13.3%

0.0

2023

[141] Exploring Copilot Github to Automatically Solve Programming Problems in Computer Science Courses C. Rubio-Manzano, ..., and Christian Vidal-Castro 2023 42nd IEEE International Conference of the Chilean Computer Science Society (SCCC) 2023 - 0 citations - Show abstract - Cite 13.3% topic match

12.9%

23.4

2023

[142] Investigating Code Generation Performance of ChatGPT with Crowdsourcing Social Data Yunhe Feng, ..., and Haihua Chen 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC) 2023 - 33 citations - Show abstract - Cite Abstract: The recent advancements in Artificial Intelligence, particularly in large language models and generative models, are reshaping the field of software engineering by enabling innovative ways of performing various tasks, such as programming, debugging, and testing. However, few existing works have thoroughly explored the potential of AI in code generation and users’ attitudes toward AI-assisted coding tools. This knowledge gap leaves it unclear how AI is transforming software engineering and programming education. This paper presents a scalable crowdsourcing data-driven framework to investigate the code generation performance of generative large language models from diverse perspectives across multiple social media platforms. Specifically, we utilize ChatGPT, a popular generative large language model, as a representative example to reveal its insights and patterns in code generation. First, we propose a hybrid keyword word expansion method that integrates words suggested by topic modeling and expert knowledge to filter relevant social posts of interest on Twitter and Reddit. Then we collect 316K tweets and 3.2K Reddit posts about ChatGPT’s code generation, spanning from Dec. 1, 2022 to January 31, 2023. Our data analytics show that ChatGPT has been used in more than 10 programming languages, with Python and JavaScript being the two most popular, for a diverse range of tasks such as code debugging, interview preparation, and academic assignment solving. Surprisingly, our analysis shows that fear is the dominant emotion associated with ChatGPT’s code generation, overshadowing emotions of happiness, anger, surprise, and sadness. Furthermore, we construct a ChatGPT prompt and corresponding code dataset by analyzing the screen-shots of ChatGPT code generation shared on social media. This dataset enables us to evaluate the quality of the generated code, and we have released this dataset to the public. We believe the insights gained from our work will provide valuable guidance for future research on AI-powered code generation. 12.9% topic match

12.6%

3.0

2023

[143] Exploring and Characterizing Large Language Models For Embedded System Development and Debugging Zachary Englhardt, ..., and Vikram Iyer ArXiv 2023 - 4 citations - Show abstract - Cite - PDF 12.6% topic match

12.6%

504.3

2023

[144] Toolformer: Language Models Can Teach Themselves to Use Tools Timo Schick, ..., and Thomas Scialom ArXiv 2023 - 867 citations - Show abstract - Cite - PDF 12.6% topic match

12.5%

6.3

2023

[145] CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation Weixiang Yan, ..., and Hari Sundaram ArXiv 2023 - 6 citations - Show abstract - Cite - PDF 12.5% topic match

12.3%

1.9

2023

[146] The Program Testing Ability of Large Language Models for Code Weimin Xiong, ..., and Hao Chen ArXiv 2023 - 2 citations - Show abstract - Cite - PDF 12.3% topic match

12.1%

1.9

2024

[147] Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository Ajinkya Deshpande, ..., and Suresh Parthasarathy Journal Not Provided 2024 - 1 citations - Show abstract - Cite - PDF 12.1% topic match

12.1%

11.8

2023

[148] LLaMA-Reviewer: Advancing Code Review Automation with Large Language Models through Parameter-Efficient Fine-Tuning Jun Lu, ..., and Chun Zuo 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE) 2023 - 14 citations - Show abstract - Cite - PDF 12.1% topic match

11.5%

0.0

2023

[149] Generative AI-Driven Approach to Converting Numerical Code into Mathematical Functions E. R. Adwaith Krishna, ..., and K. S. Nisha 2023 2nd International Conference on Automation, Computing and Renewable Systems (ICACRS) 2023 - 0 citations - Show abstract - Cite 11.5% topic match

11.5%

4.3

2023

[150] CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models Lingyue Fu, ..., and Yong Yu ArXiv 2023 - 5 citations - Show abstract - Cite - PDF 11.5% topic match

11.2%

9.8

2023

[151] TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage Jingqing Ruan, ..., and Rui Zhao Journal Not Provided 2023 - 12 citations - Show abstract - Cite - PDF 11.2% topic match

11.1%

222.3

2023

[152] StarCoder: may the source be with you! Raymond Li, ..., and H. D. Vries ArXiv 2023 - 328 citations - Show abstract - Cite - PDF 11.1% topic match

11.0%

1.1

2023

[153] Evaluating the Usability and Functionality of Intelligent Source Code Completion Assistants: A Comprehensive Review Tilen Hliš, ..., and Luka Pavlič Applied Sciences 2023 - 1 citations - Show abstract - Cite Abstract: As artificial intelligence advances, source code completion assistants are becoming more advanced and powerful. Existing traditional assistants are no longer up to all the developers’ challenges. Traditional assistants usually present proposals in alphabetically sorted lists, which does not make a developer’s tasks any easier (i.e., they still have to search and filter an appropriate proposal manually). As a possible solution to the presented issue, intelligent assistants that can classify suggestions according to relevance in particular contexts have emerged. Artificial intelligence methods have proven to be successful in solving such problems. Advanced intelligent assistants not only take into account the context of a particular source code but also, more importantly, examine other available projects in detail to extract possible patterns related to particular source code intentions. This is how intelligent assistants try to provide developers with relevant suggestions. By conducting a systematic literature review, we examined the current intelligent assistant landscape. Based on our review, we tested four intelligent assistants and compared them according to their functionality. GitHub Copilot, which stood out, allows suggestions in the form of complete source code sections. One would expect that intelligent assistants, with their outstanding functionalities, would be one of the most popular helpers in a developer’s toolbox. However, through a survey we conducted among practitioners, the results, surprisingly, contradicted this idea. Although intelligent assistants promise high usability, our questionnaires indicate that usability improvements are still needed. However, our research data show that experienced developers value intelligent assistants highly, highlighting their significant utility for the experienced developers group when compared to less experienced individuals. The unexpectedly low net promoter score (NPS) for intelligent code assistants in our study was quite surprising, highlighting a stark contrast between the anticipated impact of these advanced tools and their actual reception among developers. 11.0% topic match

10.8%

60.9

2022

[154] An Empirical Evaluation of GitHub Copilot's Code Suggestions N. Nguyen and Sarah Nadi 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR) 2022 - 152 citations - Show abstract - Cite 10.8% topic match

10.6%

1.4

2024

[155] Assessing AI-Based Code Assistants in Method Generation Tasks Vincenzo Corso, ..., and O. Riganelli ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 10.6% topic match

10.5%

13.4

2023

[156] CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules Hung Le, ..., and Shafiq R. Joty ArXiv 2023 - 14 citations - Show abstract - Cite - PDF 10.5% topic match

10.4%

1.9

2023

[157] ChatGPT as a Software Development Bot: A Project-Based Study Muhammad Waseem, ..., and T. Mikkonen International Conference on Evaluation of Novel Approaches to Software Engineering 2023 - 2 citations - Show abstract - Cite - PDF 10.4% topic match

Shows use of ChatGPT as a supportive tool in software development. Enhances productivity, accuracy, learning outcomes, and collaboration among students. Focuses on educational impact, with limited details on managing entire codebases or agent-like pipelines.

Shows use of ChatGPT as a supportive tool in software development. Enhances productivity, accuracy, learning outcomes, and collaboration among students. Focuses on educational impact, with limited details on managing entire codebases or agent-like pipelines.

10.2%

0.0

2024

[158] Experimenting with Multi-Agent Software Development: Towards a Unified Platform Malik Abdul Sami, ..., and Pekka Abrahamsson Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 10.2% topic match

10.1%

0.0

2023

[159] Are We Testing or Being Tested? Exploring the Practical Applications of Large Language Models in Software Testing Robson Santos, ..., and Ronnie de Souza Santos ArXiv 2023 - 0 citations - Show abstract - Cite - PDF 10.1% topic match

9.9%

0.0

2024

[160] Using AI Assistants in Software Development: A Qualitative Study on Security Practices and Concerns J. Klemmer, ..., and University of North Carolina at Charlotte Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 9.9% topic match

9.8%

2.5

2024

[161] LLM4PLC: Harnessing Large Language Models for Verifiable Programming of PLCs in Industrial Control Systems Mohamad Fakih, ..., and M. A. Faruque ArXiv 2024 - 2 citations - Show abstract - Cite - PDF 9.8% topic match

9.8%

0.0

2024

[162] Large Language Models Synergize with Automated Machine Learning Jinglue Xu, ..., and Kenji Tei ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 9.8% topic match

9.7%

144.1

2023

[163] CodeT5+: Open Code Large Language Models for Code Understanding and Generation Yue Wang, ..., and Steven C. H. Hoi Conference on Empirical Methods in Natural Language Processing 2023 - 211 citations - Show abstract - Cite - PDF 9.7% topic match

9.7%

6.9

2024

[164] Large Language Model Evaluation Via Multi AI Agents: Preliminary results Zeeshan Rasheed, ..., and Pekka Abrahamsson ArXiv 2024 - 4 citations - Show abstract - Cite - PDF 9.7% topic match

9.5%

1.6

2024

[165] Transforming Competition into Collaboration: The Revolutionary Role of Multi-Agent Systems and Language Models in Modern Organizations Carlos Jose Xavier Cruz ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 9.5% topic match

9.4%

None

[166] A Good Novelist Should be a Good Coder: From Language Critics to Automatic Code Generation Brian Munoz Calonge and Mu-sheng Lin Journal Not Provided None - 0 citations - Show abstract - Cite 9.4% topic match

9.4%

14.7

2023

[167] ChatGPT: A Study on its Utility for Ubiquitous Software Engineering Tasks G. Sridhara, ..., and Sourav Mazumdar ArXiv 2023 - 21 citations - Show abstract - Cite - PDF 9.4% topic match

9.3%

3.7

2023

[168] Exploring the Robustness of Large Language Models for Solving Programming Problems Atsushi Shirafuji, ..., and Jun Suzuki ArXiv 2023 - 5 citations - Show abstract - Cite - PDF 9.3% topic match

9.2%

9.6

2023

[169] A Preliminary Analysis on the Code Generation Capabilities of GPT-3.5 and Bard AI Models for Java Functions Giuseppe Destefanis, ..., and Marco Ortu ArXiv 2023 - 14 citations - Show abstract - Cite - PDF 9.2% topic match

9.1%

0.0

2023

[170] Integrating Cloud-Based AI in Software Engineers' Professional Training and Development Patrick Wolfschwenger, ..., and Z. Lavicza 2023 IEEE Frontiers in Education Conference (FIE) 2023 - 0 citations - Show abstract - Cite 9.1% topic match

8.9%

14.4

2022

[171] Better Together? An Evaluation of AI-Supported Code Translation Justin D. Weisz, ..., and John T. Richards 27th International Conference on Intelligent User Interfaces 2022 - 39 citations - Show abstract - Cite - PDF 8.9% topic match

8.7%

0.0

2024

[172] Large Language Models as Test Case Generators: Performance Evaluation and Enhancement Ke-Shen Li and Yuan Yuan ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 8.7% topic match

8.7%

0.0

2023

[173] Code generation for large commercial projects: how commercial development will look like in the future V. O. Mykolaienko Connectivity 2023 - 0 citations - Show abstract - Cite 8.7% topic match

8.6%

0.0

2022

[174] An Initial Look at Self-Reprogramming Artificial Intelligence Alex Sheng ArXiv 2022 - 0 citations - Show abstract - Cite 8.6% topic match

8.4%

1.1

2023

[175] Lessons from Building StackSpot AI: A Contextualized AI Coding Assistant Gustavo Pinto, ..., and Edward Monteiro https://doi.org/10.1145/3639477.3639751 2023 - 1 citations - Show abstract - Cite - PDF 8.4% topic match

8.3%

13.3

2024

[176] LDB: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step Li Zhong, ..., and Jingbo Shang Journal Not Provided 2024 - 9 citations - Show abstract - Cite - PDF 8.3% topic match

8.3%

17.0

2022

[177] Taking Flight with Copilot C. Bird, ..., and Idan Gazit Queue 2022 - 31 citations - Show abstract - Cite 8.3% topic match

8.3%

22.9

2022

[178] PanGu-Coder: Program Synthesis with Function-Level Language Modeling Fenia Christopoulou, ..., and Qun Liu ArXiv 2022 - 52 citations - Show abstract - Cite - PDF 8.3% topic match

8.2%

2.9

2024

[179] Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation Kailun Jin, ..., and Hadi Hemmati ArXiv 2024 - 2 citations - Show abstract - Cite - PDF 8.2% topic match

8.2%

37.3

2022

[180] Few-shot training LLMs for project-specific code-summarization Toufique Ahmed and Prem Devanbu Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering 2022 - 86 citations - Show abstract - Cite - PDF 8.2% topic match

8.0%

1.6

2024

[181] Towards Single-System Illusion in Software-Defined Vehicles - Automated, AI-Powered Workflow Krzysztof Lebioda, ..., and Alois Knoll ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 8.0% topic match

7.9%

27.3

2022

[182] Assessing the quality of GitHub copilot’s code generation Burak Yetistiren, ..., and Eray Tüzün Proceedings of the 18th International Conference on Predictive Models and Data Analytics in Software Engineering 2022 - 54 citations - Show abstract - Cite 7.9% topic match

7.8%

0.0

2024

[183] Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions Pouya Pezeshkpour, ..., and Estevam R. Hruschka ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 7.8% topic match

7.7%

0.0

2024

[184] Rocks Coding, Not Development-A Human-Centric, Experimental Evaluation of LLM-Supported SE Tasks Wei Wang, ..., and Yi Wang ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 7.7% topic match

7.7%

0.0

2023

[185] EXPLORING THE DESIGN SPACE OF AI BASED CODE COMPLETION ENGINES Parth Thakkar Journal Not Provided 2023 - 0 citations - Show abstract - Cite Abstract: Artificial Intelligence (AI) based code completion tools such as Github Copilot have recently gained tremendous popularity due to their ability to suggest arbitrary length snippets, improving developer productivity dramatically. However, there is little public understanding of what it takes to build such a tool. In this thesis, we explore the design space of building such a tool. We study the importance of the two key components of such a tool: the Large Language Model (LLM) that predicts the suggestions, and the system around it that feeds it the right context and filters out the bad suggestions. We start by exploring the design of Github Copilot to understand the state of the art, and describe the three key components of Copilot: Prompt Engineering, Model Invocation and Feedback loop. We then study the various factors that affect the quality of the suggestions generated by the LLM. We study both (a) the impact of the context fed to the LLM, and (b) the impact of the LLM itself. For the former, we study the impact including context from other files and code after the cursor along with different methods of context collection and amount of collected context. For the latter, we study the impact of the size of the LLM and the training procedure. Apart from factors affecting the quality of suggestions, we also study the factors affecting the latency of such code completion engines, as low latency is critical for building good code completion engines. We find that the context fed to the model makes a significant difference in the quality of generated suggestions, and good context collection can improve the quality of suggestions by 11-26% points (20-113% relative improvement) on the exact match metric for one line suggestions. Models that can exploit the context after the cursor can further improve the quality of suggestions by 6-14% points (12-16% relative improvement). Our experiments show that increasing the prompt length beyond a point does not improve suggestion quality significantly, and that 2048-4096 tokens are sufficient. We also find that the size of the LLM has a much smaller impact on the quality of suggestions than other factors such as the context fed to the model and the training procedure used. For example, we found that the SantaCoder model (1.1B parameters) provided better suggestions than the 16B CodeGen-Multi 7.7% topic match

7.5%

0.7

2023

[186] LMs: Understanding Code Syntax and Semantics for Code Analysis Wei Ma, ..., and Yang Liu Journal Not Provided 2023 - 1 citations - Show abstract - Cite - PDF 7.5% topic match

7.3%

24.4

2022

[187] What is it like to program with artificial intelligence? Advait Sarkar, ..., and B. Zorn Annual Workshop of the Psychology of Programming Interest Group 2022 - 54 citations - Show abstract - Cite - PDF 7.3% topic match

7.3%

0.0

2024

[188] TESTEVAL: Benchmarking Large Language Models for Test Case Generation Wenhan Wang, ..., and Lei Ma Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 7.3% topic match

7.2%

8.3

2023

[189] Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation Xinyu Li, ..., and Ming Li ArXiv 2023 - 12 citations - Show abstract - Cite - PDF 7.2% topic match

7.1%

158.5

2023

[190] Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation Jiawei Liu, ..., and Lingming Zhang ArXiv 2023 - 237 citations - Show abstract - Cite - PDF 7.1% topic match

7.1%

0.0

2024

[191] Codexity: Secure AI-assisted Code Generation Sung Yong Kim, ..., and Abhik Roychoudhury ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 7.1% topic match

7.0%

5.2

2024

[192] Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization Yoichi Ishibashi and Yoshimasa Nishimura ArXiv 2024 - 3 citations - Show abstract - Cite - PDF 7.0% topic match

7.0%

2.4

2023

[193] MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks Jingyao Li, ..., and Jiaya Jia ArXiv 2023 - 2 citations - Show abstract - Cite - PDF 7.0% topic match

6.8%

0.0

2023

[194] Deep Learning Based Code Generation Methods: Literature Review Zezhou Yang, ..., and Michael R. Lyu https://doi.org/10.13328/j.cnki.jos.006981. 2023 - 0 citations - Show abstract - Cite - PDF 6.8% topic match

6.8%

3.9

2024

[195] Can Large Language Models Write Parallel Code? Daniel Nichols, ..., and A. Bhatele ArXiv 2024 - 3 citations - Show abstract - Cite - PDF 6.8% topic match

6.8%

23.7

2021

[196] In-IDE Code Generation from Natural Language: Promise and Challenges Frank F. Xu, ..., and Graham Neubig ACM Transactions on Software Engineering and Methodology (TOSEM) 2021 - 89 citations - Show abstract - Cite - PDF Abstract: A great part of software development involves conceptualizing or communicating the underlying procedures and logic that needs to be expressed in programs. One major difficulty of programming is turning concept into code, especially when dealing with the APIs of unfamiliar libraries. Recently, there has been a proliferation of machine learning methods for code generation and retrieval from natural language queries, but these have primarily been evaluated purely based on retrieval accuracy or overlap of generated code with developer-written code, and the actual effect of these methods on the developer workflow is surprisingly unattested. In this article, we perform the first comprehensive investigation of the promise and challenges of using such technology inside the PyCharm IDE, asking, “At the current state of technology does it improve developer productivity or accuracy, how does it affect the developer experience, and what are the remaining gaps and challenges?” To facilitate the study, we first develop a plugin for the PyCharm IDE that implements a hybrid of code generation and code retrieval functionality, and we orchestrate virtual environments to enable collection of many user events (e.g., web browsing, keystrokes, fine-grained code edits). We ask developers with various backgrounds to complete 7 varieties of 14 Python programming tasks ranging from basic file manipulation to machine learning or data visualization, with or without the help of the plugin. While qualitative surveys of developer experience are largely positive, quantitative results with regards to increased productivity, code quality, or program correctness are inconclusive. Further analysis identifies several pain points that could improve the effectiveness of future machine learning-based code generation/retrieval developer assistants and demonstrates when developers prefer code generation over code retrieval and vice versa. We release all data and software to pave the road for future empirical studies on this topic, as well as development of better code generation models. 6.8% topic match

6.7%

7.3

2024

[197] Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments Yu Gu, ..., and Yu Su ArXiv 2024 - 5 citations - Show abstract - Cite - PDF 6.7% topic match

6.7%

8.1

2024

[198] StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback Shihan Dou, ..., and Tao Gui ArXiv 2024 - 6 citations - Show abstract - Cite - PDF 6.7% topic match

6.6%

4.9

2024

[199] LLM Harmony: Multi-Agent Communication for Problem Solving Sumedh Rasal ArXiv 2024 - 4 citations - Show abstract - Cite - PDF 6.6% topic match

6.6%

0.0

2023

[200] A Novel Approach for Rapid Development Based on ChatGPT and Prompt Engineering Youjia Li, ..., and Zheng Zhang ArXiv 2023 - 0 citations - Show abstract - Cite - PDF 6.6% topic match

6.6%

1.0

2019

[201] Deep Learning Based Code Completion Models for Programming Codes Shuai Wang, ..., and Zhonghai Wu Proceedings of the 2019 3rd International Symposium on Computer Science and Intelligent Control 2019 - 5 citations - Show abstract - Cite 6.6% topic match

6.6%

0.0

2024

[202] Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation Jingchang Chen, ..., and Bing Qin Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 6.6% topic match

6.5%

62.9

2023

[203] LEVER: Learning to Verify Language-to-Code Generation with Execution Ansong Ni, ..., and Xi Victoria Lin ArXiv 2023 - 107 citations - Show abstract - Cite - PDF 6.5% topic match

6.4%

8.8

2023

[204] A Large-Scale Survey on the Usability of AI Programming Assistants: Successes and Challenges Jenny T Liang, ..., and B. Myers International Conference on Software Engineering 2023 - 14 citations - Show abstract - Cite - PDF 6.4% topic match

6.4%

0.0

2024

[205] CONLINE: Complex Code Generation and Refinement with Online Searching and Correctness Testing Xinyi He, ..., and Dongmei Zhang ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 6.4% topic match

6.4%

0.0

2024

[206] Generative Software Engineering Yuan Huang, ..., and Zibin Zheng ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 6.4% topic match

6.3%

4.4

2023

[207] From Misuse to Mastery: Enhancing Code Generation with Knowledge-Driven AI Chaining Xiaoxue Ren, ..., and Xiaohu Yang 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2023 - 5 citations - Show abstract - Cite - PDF 6.3% topic match

6.3%

0.0

2023

[208] Students' Perspective on AI Code Completion: Benefits and Challenges Wannita Takerngsaksiri, ..., and C. Tantithamthavorn ArXiv 2023 - 0 citations - Show abstract - Cite - PDF 6.3% topic match

6.3%

8.8

2022

[209] Security Implications of Large Language Model Code Assistants: A User Study Gustavo Sandoval, ..., and S. Garg ArXiv 2022 - 25 citations - Show abstract - Cite 6.3% topic match

6.3%

2.8

2023

[210] Generative AI for Product Design: Getting the Right Design and the Design Right Matthew K. Hong, ..., and M. Klenk ArXiv 2023 - 4 citations - Show abstract - Cite - PDF 6.3% topic match

6.2%

61.7

2022

[211] CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning Hung Le, ..., and S. Hoi ArXiv 2022 - 143 citations - Show abstract - Cite - PDF 6.2% topic match

6.2%

0.0

2024

[212] Content-Centric Prototyping of Generative AI Applications: Emerging Approaches and Challenges in Collaborative Software Teams Hari Subramonyam, ..., and Anoop Sinha ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 6.2% topic match

6.2%

0.0

2023

[213] Smart Prompt Advisor: Multi-Objective Prompt Framework for Consistency and Best Practices Kanchanjot Kaur Phokela, ..., and Vikrant S. Kaulgud 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2023 - 0 citations - Show abstract - Cite 6.2% topic match

6.1%

7.2

2024

[214] AutoCodeRover: Autonomous Program Improvement Yuntong Zhang, ..., and Abhik Roychoudhury ArXiv 2024 - 4 citations - Show abstract - Cite - PDF 6.1% topic match

6.1%

19.7

2023

[215] Magicoder: Source Code Is All You Need Yuxiang Wei, ..., and Lingming Zhang ArXiv 2023 - 36 citations - Show abstract - Cite 6.1% topic match

6.1%

12.8

2022

[216] AI-Driven Development Is Here: Should You Worry? Neil A. Ernst, ..., and T. Menzies IEEE Software 2022 - 34 citations - Show abstract - Cite - PDF 6.1% topic match

6.1%

0.0

2023

[217] Stochastic Code Generation Swapnil Sharma, ..., and V. KranthiKiranG. ArXiv 2023 - 0 citations - Show abstract - Cite - PDF 6.1% topic match

6.1%

13.1

2023

[218] Experiential Co-Learning of Software-Developing Agents Cheng Qian, ..., and Maosong Sun ArXiv 2023 - 11 citations - Show abstract - Cite - PDF 6.1% topic match

6.1%

1.1

2023

[219] A Review on Code Generation with LLMs: Application and Evaluation Jianxun Wang and Yixiang Chen 2023 IEEE International Conference on Medical Artificial Intelligence (MedAI) 2023 - 1 citations - Show abstract - Cite 6.1% topic match

6.0%

7.5

2024

[220] Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization Wenqi Zhang, ..., and Weiming Lu ArXiv 2024 - 5 citations - Show abstract - Cite - PDF 6.0% topic match

6.0%

215.9

2022

[221] CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis Erik Nijkamp, ..., and Caiming Xiong International Conference on Learning Representations 2022 - 561 citations - Show abstract - Cite - PDF 6.0% topic match

6.0%

31.2

2022

[222] NatGen: generative pre-training by “naturalizing” source code Saikat Chakraborty, ..., and Baishakhi Ray Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering 2022 - 74 citations - Show abstract - Cite - PDF 6.0% topic match

5.9%

6.0

2023

[223] STEAM: Simulating the InTeractive BEhavior of ProgrAMmers for Automatic Bug Fixing Yuwei Zhang, ..., and Ge Li ArXiv 2023 - 7 citations - Show abstract - Cite - PDF 5.9% topic match

5.9%

2.2

2023

[224] Improving Code Generation by Dynamic Temperature Sampling Yuqi Zhu, ..., and Hong Mei ArXiv 2023 - 4 citations - Show abstract - Cite 5.9% topic match

5.9%

None

[225] B UILDING THE F UTURE OF R ESPONSIBLE AI: A P ATTERN -O RIENTED R EFERENCE A RCHITECTURE FOR D ESIGNING L ARGE L ANGUAGE M ODEL BASED A GENTS Qinghua Lu, ..., and Australia Journal Not Provided None - 0 citations - Show abstract - Cite 5.9% topic match

5.9%

0.0

2023

[226] Analysis ChatGPT Potential: Transforming Software Development with AI Chat Bots Justine Winata Purwoko, ..., and Karen Etania Saputra 2023 International Conference on Networking, Electrical Engineering, Computer Science, and Technology (IConNECT) 2023 - 0 citations - Show abstract - Cite 5.9% topic match

5.8%

8.1

2018

[227] Context-Aware Conversational Developer Assistants N. Bradley, ..., and Reid Holmes 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE) 2018 - 52 citations - Show abstract - Cite 5.8% topic match

5.8%

99.7

2023

[228] CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X Qinkai Zheng, ..., and Jie Tang ArXiv 2023 - 158 citations - Show abstract - Cite - PDF 5.8% topic match

5.8%

113.5

2023

[229] Language Models can Solve Computer Tasks Geunwoo Kim, ..., and S. McAleer ArXiv 2023 - 180 citations - Show abstract - Cite - PDF 5.8% topic match

5.7%

2.8

2024

[230] Framework for evaluating code generation ability of large language models Sangyeop Yeo, ..., and Taeho Kim ETRI Journal 2024 - 2 citations - Show abstract - Cite 5.7% topic match

5.7%

5.7

2022

[231] A Scalable and Extensible Approach to Benchmarking NL2Code for 18 Programming Languages Federico Cassano, ..., and Abhinav Jangda ArXiv 2022 - 16 citations - Show abstract - Cite 5.7% topic match

5.7%

0.0

2022

[232] Autonomous Intelligent Software Development Mark Alan Matties ArXiv 2022 - 0 citations - Show abstract - Cite - PDF 5.7% topic match

5.6%

0.0

2022

[233] Automated Repair of Code from Language Models Zhiyu Fan, ..., and Shin Hwei Tan Journal Not Provided 2022 - 0 citations - Show abstract - Cite 5.6% topic match

5.6%

13.2

2022

[234] MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation Federico Cassano, ..., and Abhinav Jangda Journal Not Provided 2022 - 29 citations - Show abstract - Cite - PDF 5.6% topic match

5.6%

15.9

2023

[235] Can LLM Replace Stack Overflow? A Study on Robustness and Reliability of Large Language Model Code Generation Li Zhong and Zilong Wang AAAI Conference on Artificial Intelligence 2023 - 19 citations - Show abstract - Cite - PDF 5.6% topic match

5.6%

0.0

2024

[236] CodeEditorBench: Evaluating Code Editing Capability of Large Language Models Jiawei Guo, ..., and Jie Fu ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 5.6% topic match

5.6%

1.1

2023

[237] Function-constrained Program Synthesis Patrick Hajali and Ignas Budvytis ArXiv 2023 - 1 citations - Show abstract - Cite - PDF 5.6% topic match

5.6%

6.3

2023

[238] Navigating Complexity in Software Engineering: A Prototype for Comparing GPT-n Solutions Christoph Treude 2023 IEEE/ACM 5th International Workshop on Bots in Software Engineering (BotSE) 2023 - 11 citations - Show abstract - Cite - PDF 5.6% topic match

5.5%

1.5

2022

[239] AI-assisted programming: applications, user experiences, and neuro-symbolic techniques (keynote) Sumit Gulwani Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering 2022 - 3 citations - Show abstract - Cite 5.5% topic match

5.5%

35.0

2023

[240] Self-Edit: Fault-Aware Code Editor for Code Generation Kechi Zhang, ..., and Zhi Jin Annual Meeting of the Association for Computational Linguistics 2023 - 52 citations - Show abstract - Cite - PDF 5.5% topic match

5.4%

1.0

2023

[241] GPTutor: an open-source AI pair programming tool alternative to Copilot Eason Chen, ..., and Pierce Hung ArXiv 2023 - 1 citations - Show abstract - Cite - PDF 5.4% topic match

5.4%

2.6

2024

[242] LangProp: A code optimization framework using Large Language Models applied to driving Shu Ishida, ..., and Anthony Hu Journal Not Provided 2024 - 2 citations - Show abstract - Cite - PDF 5.4% topic match

5.4%

1.2

2024

[243] PythonSaga: Redefining the Benchmark to Evaluate Code Generating LLM Ankit Yadav and Mayank Singh Journal Not Provided 2024 - 1 citations - Show abstract - Cite - PDF 5.4% topic match

5.4%

0.0

2023

[244] The Ethics of Artificial Intelligence in the Era of Generative AI V. Kirova, ..., and Thomas J. Marlowe Journal of Systemics, Cybernetics and Informatics 2023 - 0 citations - Show abstract - Cite 5.4% topic match

5.4%

0.0

2024

[245] Assessing the Effectiveness and Security Implications of AI Code Generators Maryam Taeb, ..., and Shonda Bernadin Journal of The Colloquium for Information Systems Security Education 2024 - 0 citations - Show abstract - Cite 5.4% topic match

5.3%

0.0

2024

[246] Advancing GenAI Assisted Programming-A Comparative Study on Prompt Efficiency and Code Quality Between GPT-4 and GLM-4 Angus Yang, ..., and Jie Li ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 5.3% topic match

5.3%

7.3

2023

[247] Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation Wenqing Zheng, ..., and Zhangyang Wang International Conference on Machine Learning 2023 - 11 citations - Show abstract - Cite - PDF 5.3% topic match

5.3%

25.0

2023

[248] RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems Tianyang Liu, ..., and Julian McAuley ArXiv 2023 - 35 citations - Show abstract - Cite - PDF 5.3% topic match

5.3%

41.2

2023

[249] CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X Qinkai Zheng, ..., and Jie Tang Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2023 - 51 citations - Show abstract - Cite 5.3% topic match

5.3%

60.2

2023

[250] AgentTuning: Enabling Generalized Agent Abilities for LLMs Aohan Zeng, ..., and Jie Tang ArXiv 2023 - 62 citations - Show abstract - Cite - PDF 5.3% topic match

5.2%

1.6

2024

[251] Enhancing Security of AI-Based Code Synthesis with GitHub Copilot via Cheap and Efficient Prompt-Engineering Jakub Res, ..., and P. Hanáček ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 5.2% topic match

5.2%

1.5

2023

[252] Validating AI-Generated Code with Live Programming Kasra Ferdowsi, ..., and Sorin Lerner International Conference on Human Factors in Computing Systems 2023 - 2 citations - Show abstract - Cite - PDF 5.2% topic match

5.2%

65.6

2022

[253] GitHub Copilot AI pair programmer: Asset or Liability? Arghavan Moradi Dakhel, ..., and Z. Jiang J. Syst. Softw. 2022 - 153 citations - Show abstract - Cite - PDF 5.2% topic match

5.2%

1.4

2024

[254] LLMs for Coding and Robotics Education Peng Shu, ..., and Tianming Liu ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 5.2% topic match

5.2%

0.8

2020

[255] Automated Coding: The Quest to Develop Programs That Write Programs M. Campbell Computer 2020 - 4 citations - Show abstract - Cite 5.2% topic match

5.2%

0.0

2024

[256] On Augmenting Scenario-Based Modeling with Generative AI David Harel, ..., and Smadar Szekely International Conference on Model-Driven Engineering and Software Development 2024 - 0 citations - Show abstract - Cite - PDF 5.2% topic match

5.2%

0.0

2024

[257] Transforming Software Development with Generative AI: Empirical Insights on Collaboration and Workflow Rasmus Ulfsnes, ..., and Marianne Skarpen Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 5.2% topic match

5.1%

7.4

2023

[258] Practices and Challenges of Using GitHub Copilot: An Empirical Study Beiqi Zhang, ..., and Muhammad Waseem International Conference on Software Engineering and Knowledge Engineering 2023 - 12 citations - Show abstract - Cite - PDF 5.1% topic match

5.1%

7.6

2024

[259] Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering T. Ridnik, ..., and Itamar Friedman ArXiv 2024 - 6 citations - Show abstract - Cite - PDF 5.1% topic match

5.1%

5.1

2024

[260] CYCLE: Learning to Self-Refine the Code Generation Yangruibo Ding, ..., and Baishakhi Ray ArXiv 2024 - 3 citations - Show abstract - Cite - PDF 5.1% topic match

5.0%

29.6

2023

[261] TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents Jingqing Ruan, ..., and Rui Zhao ArXiv 2023 - 54 citations - Show abstract - Cite 5.0% topic match

5.0%

34.7

2023

[262] Towards Human-Bot Collaborative Software Architecting with ChatGPT Aakash Ahmad, ..., and T. Mikkonen Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering 2023 - 58 citations - Show abstract - Cite - PDF 5.0% topic match

4.9%

2.6

2023

[263] Demystifying Practices, Challenges and Expected Features of Using GitHub Copilot Beiqi Zhang, ..., and Muhammad Waseem ArXiv 2023 - 3 citations - Show abstract - Cite - PDF 4.9% topic match

4.9%

0.0

2024

[264] On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation Atharva Naik Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 4.9% topic match

4.9%

0.0

2023

[265] AI-driven software engineering Josh Mahmood Ali Advances in Engineering Innovation 2023 - 0 citations - Show abstract - Cite 4.9% topic match

4.9%

0.0

2024

[266] Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation Jessica López Espejel, ..., and E. Ettifouri ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 4.9% topic match

4.9%

192.2

2023

[267] ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Yujia Qin, ..., and Maosong Sun ArXiv 2023 - 240 citations - Show abstract - Cite - PDF 4.9% topic match

4.9%

0.0

2023

[268] Object Oriented BDD and Executable Human-Language Module Specification Eric Lee, ..., and Q. Cao 2023 26th ACIS International Winter Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD-Winter) 2023 - 0 citations - Show abstract - Cite 4.9% topic match

4.8%

1.5

2024

[269] AgentScope: A Flexible yet Robust Multi-Agent Platform Dawei Gao, ..., and Jingren Zhou ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 4.8% topic match

4.8%

14.6

2024

[270] Evaluating Large Language Models in Class-Level Code Generation Xueying Du, ..., and Yiling Lou International Conference on Software Engineering 2024 - 8 citations - Show abstract - Cite Abstract: Recently, many large language models (LLMs) have been proposed, showing advanced proficiency in code generation. Meanwhile, many efforts have been dedicated to evaluating LLMs on code generation benchmarks such as HumanEval. Although being very helpful for comparing different LLMs, existing evaluation focuses on a simple code generation scenario ( i.e., function-level or statement-level code generation), which mainly asks LLMs to generate one single code unit ( e.g., a function or a statement) for the given natural language description. Such evaluation focuses on generating independent and often small-scale code units, thus leaving it unclear how LLMs perform in real-world software development scenarios. To fill this knowledge gap, we make the first attempt to evaluate LLMs in a more challenging code generation scenario, i.e., class-level code generation. Compared with existing code generation benchmarks, it better reflects real-world software development scenarios due to it comprising broader contextual dependencies and multiple, interdependent units of code. We first manually construct the first class-level code generation benchmark ClassEval of 100 class-level Python code generation tasks with approximately 500 person-hours. Based on the new benchmark ClassEval, we then perform the first study of 11 state-of-the-art LLMs on class-level code generation. Based on our results, we find that all LLMs perform much worse on class-level code generation compared to the method-level. While GPT models still dominate other LLMs on class-level code generation, the performance rankings of other models on method-level code generation no longer holds for class-level code generation. Besides, most models (except GPT models) perform better when generating the class method by method; and they have the limited ability of generating dependent code. Based on our findings, we call for software engineering (SE) researchers’ expertise to build more LLM benchmarks based on practical and complicated software development scenarios 4.8% topic match

4.8%

35.5

2023

[271] Agents: An Open-source Framework for Autonomous Language Agents Wangchunshu Zhou, ..., and Mrinmaya Sachan ArXiv 2023 - 40 citations - Show abstract - Cite - PDF 4.8% topic match

4.8%

0.0

2024

[272] An Approach for Rapid Source Code Development Based on ChatGPT and Prompt Engineering Youjia Li, ..., and Zheng Zhang IEEE Access 2024 - 0 citations - Show abstract - Cite 4.8% topic match

4.8%

6.8

2024

[273] EffiBench: Benchmarking the Efficiency of Automatically Generated Code Dong Huang, ..., and Heming Cui ArXiv 2024 - 5 citations - Show abstract - Cite - PDF 4.8% topic match

4.8%

1.7

2024

[274] Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback Zhangqian Bi, ..., and Hai Jin ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 4.8% topic match

4.8%

12.4

2024

[275] AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls Yu Du, ..., and Hongyang Zhang ArXiv 2024 - 9 citations - Show abstract - Cite - PDF 4.8% topic match

4.8%

1.4

2023

[276] Empowering Business Transformation: The Positive Impact and Ethical Considerations of Generative AI in Software Product Management - A Systematic Literature Review N. A. Parikh ArXiv 2023 - 2 citations - Show abstract - Cite - PDF 4.8% topic match

4.7%

0.0

2024

[277] A Survey on Large Language Models for Code Generation Juyong Jiang, ..., and Sunghun Kim Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 4.7% topic match

4.7%

13.1

2023

[278] Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems N. Nascimento, ..., and Donald D. Cowan 2023 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C) 2023 - 17 citations - Show abstract - Cite - PDF 4.7% topic match

4.7%

82.2

2022

[279] CodeT: Code Generation with Generated Tests Bei Chen, ..., and Weizhu Chen ArXiv 2022 - 187 citations - Show abstract - Cite - PDF 4.7% topic match

4.6%

0.0

2024

[280] Prioritizing Software Requirements Using Large Language Models Malik Abdul Sami, ..., and Pekka Abrahamsson Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 4.6% topic match

4.6%

78.0

2022

[281] The Robots Are Coming: Exploring the Implications of OpenAI Codex on Introductory Programming James Finnie-Ansley, ..., and J. Prather Proceedings of the 24th Australasian Computing Education Conference 2022 - 211 citations - Show abstract - Cite 4.6% topic match

4.6%

1.6

2023

[282] An Evaluation Method for Large Language Models’ Code Generation Capability Haoran Su, ..., and Hong Zhang 2023 10th International Conference on Dependable Systems and Their Applications (DSA) 2023 - 2 citations - Show abstract - Cite 4.6% topic match

4.6%

0.0

2024

[283] Enhancing Code Generation Performance of Smaller Models by Distilling the Reasoning Ability of LLMs Zhihong Sun, ..., and Zhi Jin International Conference on Language Resources and Evaluation 2024 - 0 citations - Show abstract - Cite - PDF 4.6% topic match

4.6%

1.6

2024

[284] AUTONODE: A Neuro-Graphic Self-Learnable Engine for Cognitive GUI Automation Arkajit Datta, ..., and Rajat Chawla ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 4.6% topic match

4.6%

0.0

2024

[285] The Impact of AI Tool on Engineering at ANZ Bank An Emperical Study on GitHub Copilot within Coporate Environment Sayan Chatterjee, ..., and Tim Hogarth ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 4.6% topic match

4.5%

2.1

2024

[286] Benchmarking Educational Program Repair Charles Koutcheme, ..., and Paul Denny Journal Not Provided 2024 - 1 citations - Show abstract - Cite - PDF 4.5% topic match

4.5%

0.0

2024

[287] A systematic evaluation of large language models for generating programming code Wenpin Hou and Zhicheng Ji ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 4.5% topic match

4.5%

1.1

2023

[288] Best uses of ChatGPT and Generative AI for computer science research E.C. Garrido-Merchán ArXiv 2023 - 1 citations - Show abstract - Cite - PDF 4.5% topic match

4.5%

12.0

2021

[289] Can OpenAI Codex and Other Large Language Models Help Us Fix Security Bugs? H. Pearce, ..., and Brendan Dolan-Gavitt ArXiv 2021 - 46 citations - Show abstract - Cite 4.5% topic match

4.4%

17.4

2024

[290] LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code Naman Jain, ..., and Ion Stoica ArXiv 2024 - 11 citations - Show abstract - Cite - PDF 4.4% topic match

4.4%

0.0

2024

[291] Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning Yubo Mai, ..., and Jianling Sun ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 4.4% topic match

4.4%

842.2

2021

[292] Evaluating Large Language Models Trained on Code Mark Chen, ..., and Wojciech Zaremba ArXiv 2021 - 2790 citations - Show abstract - Cite - PDF 4.4% topic match

4.4%

4.8

2024

[293] Navigating Generative Artificial Intelligence Promises and Perils for Knowledge and Creative Work Hind Benbya, ..., and Toomas Tamm J. Assoc. Inf. Syst. 2024 - 4 citations - Show abstract - Cite 4.4% topic match

4.3%

2.5

2024

[294] Effectiveness of ChatGPT in Coding: A Comparative Analysis of Popular Large Language Models Carlos Eduardo Andino Coello, ..., and Rand Kouatly Digit. 2024 - 2 citations - Show abstract - Cite 4.3% topic match

4.3%

0.6

2017

[295] Automatic Code Completion A. Ginzberg, ..., and Tara Balakrishnan Journal Not Provided 2017 - 5 citations - Show abstract - Cite 4.3% topic match

4.3%

5.7

2023

[296] A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages Alessio Buscemi ArXiv 2023 - 7 citations - Show abstract - Cite - PDF 4.3% topic match

4.3%

15.6

2023

[297] Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review M. Wong, ..., and C. Tan Entropy 2023 - 22 citations - Show abstract - Cite - PDF 4.3% topic match

4.2%

1.8

2023

[298] A Closer Look at Different Difficulty Levels Code Generation Abilities of ChatGPT Dapeng Yan, ..., and Zhiming Liu 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2023 - 2 citations - Show abstract - Cite Abstract: Code generation aims to generate source code implementing human requirements illustrated with natural language specifications. With the rapid development of intelligent software engineering, automated code generation has become a hot research topic in both artificial intelligence and software engineering, and researchers have made significant achievements on code generation. More recently, large language models (LLMs) have demonstrated outstanding performance on code generation tasks, such as ChatGPT released by OpenAI presents the fantastic potential on automated code generation. However, the existing studies are limited to exploring LLMs' ability for generating code snippets to solve simple programming problems, the task of competition-level code generation has never been investigated. The specifications of the programming competition are always complicated and require the specific input/output format as well as the high-level algorithmic reasoning ability. In this study, we conduct the first large empirical study to investigate the zero-shot learning ability of ChatGPT for solving competition programming problems. Specifically, we warm up the design of prompts by using the Human-Eval dataset. Then, we apply the well-designed prompt to the competition-level code generation dataset, namely APPS, to further explore the effectiveness of using ChatGPT for solving competition problems. We collect ChatGPT's outputs on 5,000 code competition problems, the evaluation results show that it can successfully pass 25.4% test cases. By further feeding extra information (e.g, test failed information) to ChatGPT, we observe that ChatGPT has the potential to fix partial pass into a fully pass program. Moreover, we investigate the solutions generated by LLMs and the existing solutions, we find that it prefers to directly copy the code instead of re-write when facing more difficult problems. Finally, we evaluate the code quality generated by ChatGPT in terms of “code cleanness”, we observe that the generated codes are with small functions and file sizes, which are in line with the standard of clean code. 4.2% topic match

4.2%

2.0

2023

[299] ChatGPT opens a new door for bioinformatics Dong Xu Quantitative biology (Beijing, China) 2023 - 3 citations - Show abstract - Cite 4.2% topic match

4.2%

1.8

2023

[300] Test-Case-Driven Programming Understanding in Large Language Models for Better Code Generation Zhao Tian and Junjie Chen ArXiv 2023 - 2 citations - Show abstract - Cite - PDF 4.2% topic match

4.2%

0.0

2024

[301] Contextual API Completion for Unseen Repositories Using LLMs Noor Nashid, ..., and Ali Mesbah Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 4.2% topic match

4.1%

0.9

2023

[302] Coding by Design: GPT-4 empowers Agile Model Driven Development Ahmed R. Sadik, ..., and Markus Olhofer ArXiv 2023 - 1 citations - Show abstract - Cite - PDF 4.1% topic match

4.1%

0.0

2023

[303] Safurai 001: New Qualitative Approach for Code LLM Evaluation Davide Cifarelli, ..., and Alessandro Puppo ArXiv 2023 - 0 citations - Show abstract - Cite - PDF 4.1% topic match

4.1%

7.1

2023

[304] Analysis of ChatGPT on Source Code Ahmed R. Sadik, ..., and Jibesh Patra ArXiv 2023 - 10 citations - Show abstract - Cite - PDF 4.1% topic match

4.1%

0.0

2024

[305] A STUDY ON THE APPLICATION OF GEN AI TOOLS SUPPORTS TO FINANCIAL SERVICES G. R. INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 2024 - 0 citations - Show abstract - Cite 4.1% topic match

4.1%

4.1

2024

[306] S-Agents: Self-organizing Agents in Open-ended Environments Jia-Qing Chen, ..., and Li Zhang ArXiv 2024 - 3 citations - Show abstract - Cite - PDF 4.1% topic match

4.0%

1.1

2023

[307] Assessing the Security of GitHub Copilot Generated Code - A Targeted Replication Study Vahid Majdinasab, ..., and Foutse Khomh ArXiv 2023 - 1 citations - Show abstract - Cite - PDF 4.0% topic match

4.0%

1.8

2024

[308] Exploring Autonomous Agents through the Lens of Large Language Models: A Review Saikat Barua ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 4.0% topic match

4.0%

49.0

2023

[309] Emergent autonomous scientific research capabilities of large language models Daniil A. Boiko, ..., and Gabe Gomes ArXiv 2023 - 76 citations - Show abstract - Cite - PDF 4.0% topic match

4.0%

13.6

2022

[310] Compilable Neural Code Generation with Compiler Feedback Xin Wang, ..., and Qun Liu Findings 2022 - 36 citations - Show abstract - Cite - PDF 4.0% topic match

4.0%

None

[311] Generative AI in Introductory Programming Brett A. Becker, ..., and L. Malmi Journal Not Provided None - 1 citations - Show abstract - Cite 4.0% topic match

4.0%

7.1

2023

[312] Do Large Language Models Pay Similar Attention Like Human Programmers When Generating Code? Bonan Kou, ..., and Tianyi Zhang https://doi.org/10.1145/3660807 2023 - 10 citations - Show abstract - Cite - PDF 4.0% topic match

4.0%

35.8

2021

[313] Examining Zero-Shot Vulnerability Repair with Large Language Models H. Pearce, ..., and Brendan Dolan-Gavitt 2023 IEEE Symposium on Security and Privacy (SP) 2021 - 104 citations - Show abstract - Cite - PDF 4.0% topic match

3.9%

0.0

2023

[314] Consistency of Code: A Prompt Based Approach to Comprehend Functionality Hoyoung Choi, ..., and Kyungsik Han 2023 30th Asia-Pacific Software Engineering Conference (APSEC) 2023 - 0 citations - Show abstract - Cite 3.9% topic match

3.9%

4.1

2023

[315] Testing LLMs on Code Generation with Varying Levels of Prompt Specificity Lincoln Murr, ..., and David Gao ArXiv 2023 - 4 citations - Show abstract - Cite - PDF 3.9% topic match

3.9%

1.5

2023

[316] Copilot for Xcode: Exploring AI-Assisted Programming by Prompting Cloud-based Large Language Models C. Tan, ..., and Ching Nam Hang ArXiv 2023 - 2 citations - Show abstract - Cite - PDF 3.9% topic match

3.9%

0.0

2024

[317] Leveraging and exercising caution with ChatGPT and other generative artificial intelligence tools in environmental psychology research Shuai Yuan, ..., and Aaron Reuben Frontiers in Psychology 2024 - 0 citations - Show abstract - Cite 3.9% topic match

3.9%

31.3

2023

[318] MetaAgents: Simulating Interactions of Human Behaviors for LLM-based Task-oriented Coordination via Collaborative Generative Agents Yuan Li, ..., and Lichao Sun ArXiv 2023 - 33 citations - Show abstract - Cite - PDF 3.9% topic match

3.8%

0.0

2024

[319] From Trends to Experiences: Co-Creation with Generative Artificial Intelligence in Developing Interactive Multimedia Applications Alexander Rozo-Torres, ..., and Wilson J. Sarmiento CLEI Electron. J. 2024 - 0 citations - Show abstract - Cite 3.8% topic match

3.8%

4.0

2024

[320] Ocassionally Secure: A Comparative Analysis of Code Generation Assistants Ran Elgedawy, ..., and Scott Ruoti ArXiv 2024 - 3 citations - Show abstract - Cite - PDF 3.8% topic match

3.8%

3.0

2022

[321] Aligning Offline Metrics and Human Judgments of Value for Code Generation Models Victor C. Dibia, ..., and Saleema Amershi Annual Meeting of the Association for Computational Linguistics 2022 - 6 citations - Show abstract - Cite - PDF 3.8% topic match

3.8%

None

[322] MoonBit: Explore the Design of an AI-Friendly Programming Language No author found Journal Not Provided None - 0 citations - Show abstract - Cite 3.8% topic match

3.8%

None

[323] Beyond Code Generation: Unveiling Chatgpt's Potential For The Software Development Workflow Arpan Mukherjee1, ..., and Er. Nisha Rathore3 Journal Not Provided None - 0 citations - Show abstract - Cite 3.8% topic match

3.8%

0.0

2024

[324] Improving Natural Language Capability of Code Large Language Model Wei Li, ..., and Yongji Wang ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 3.8% topic match

3.7%

6.2

2024

[325] DebugBench: Evaluating Debugging Capability of Large Language Models Runchu Tian, ..., and Maosong Sun ArXiv 2024 - 5 citations - Show abstract - Cite - PDF 3.7% topic match

3.7%

0.0

2024

[326] S3LLM: Large-Scale Scientific Software Understanding with LLMs using Source, Metadata, and Document Kareem Shaik, ..., and Yunhe Feng ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 3.7% topic match

3.7%

0.0

2024

[327] Generative AI in Industrial Revolution: A Comprehensive Research on Transformations, Challenges, and Future Directions Xiang Yafei, ..., and Penghao Lianga Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (online) 2024 - 0 citations - Show abstract - Cite 3.7% topic match

3.7%

294.0

2022

[328] Competition-level code generation with AlphaCode Yujia Li, ..., and O. Vinyals Science 2022 - 800 citations - Show abstract - Cite - PDF Abstract: Programming is a powerful and ubiquitous problem-solving tool. Systems that can assist programmers or even generate programs themselves could make programming more productive and accessible. Recent transformer-based neural network models show impressive code generation abilities yet still perform poorly on more complex tasks requiring problem-solving skills, such as competitive programming problems. Here, we introduce AlphaCode, a system for code generation that achieved an average ranking in the top 54.3% in simulated evaluations on recent programming competitions on the Codeforces platform. AlphaCode solves problems by generating millions of diverse programs using specially trained transformer-based networks and then filtering and clustering those programs to a maximum of just 10 submissions. This result marks the first time an artificial intelligence system has performed competitively in programming competitions. Description Machine learning systems can program too Computer programming competitions are popular tests among programmers that require critical thinking informed by experience and creating solutions to unforeseen problems, both of which are key aspects of human intelligence but challenging to mimic by machine learning models. Using self-supervised learning and an encoder-decoder transformer architecture, Li et al. developed AlphaCode, a deep-learning model that can achieve approximately human-level performance on the Codeforces platform, which regularly hosts these competitions and attracts numerous participants worldwide (see the Perspective by Kolter). The development of such coding platforms could have a huge impact on programmers’ productivity. It may even change the culture of programming by shifting human work to formulating problems, with machine learning being the main one responsible for generating and executing codes. —YS Modern machine learning systems can achieve average human-level performance in popular competitive programming contests. 3.7% topic match

3.7%

0.0

2023

[329] Human-AI Collaboration in Thematic Analysis using ChatGPT: A User Study and Design Recommendations Lixiang Yan, ..., and Roberto Martínez-Maldonado ArXiv 2023 - 0 citations - Show abstract - Cite - PDF 3.7% topic match

3.6%

None

[330] Tackling Students’ Coding Assignments with LLMs Adam Dingle Journal Not Provided None - 0 citations - Show abstract - Cite 3.6% topic match

3.6%

4.2

2023

[331] Domain Adaptive Code Completion via Language Models and Decoupled Domain Databases Ze Tang, ..., and Bin Luo 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2023 - 5 citations - Show abstract - Cite - PDF 3.6% topic match

3.6%

None

[332] ChatGPT in Action: Analyzing Its Use in Software Development Md Arifa I. Champa, ..., and M. Zibran Journal Not Provided None - 2 citations - Show abstract - Cite 3.6% topic match

3.6%

3.8

2024

[333] A Survey on Self-Evolution of Large Language Models Zhengwei Tao, ..., and Jingren Zhou ArXiv 2024 - 2 citations - Show abstract - Cite - PDF 3.6% topic match

3.6%

8.2

2023

[334] No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT Zhijie Liu, ..., and L. Zhang ArXiv 2023 - 10 citations - Show abstract - Cite - PDF 3.6% topic match

3.6%

1.1

2023

[335] LLM4TDD: Best Practices for Test Driven Development Using Large Language Models Sanyogita Piya and Allison Sullivan ArXiv 2023 - 1 citations - Show abstract - Cite - PDF 3.6% topic match

3.6%

2.0

2023

[336] Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation Muhammad Fawad Akbar Khan, ..., and Hamid Karimi ArXiv 2023 - 2 citations - Show abstract - Cite - PDF 3.6% topic match

3.5%

1.5

2023

[337] Sand-in-the-loop: Investigating embodied co-creation for shared understandings of generative AI Dina El-Zanfaly, ..., and Yanwen Dong Companion Publication of the 2023 ACM Designing Interactive Systems Conference 2023 - 2 citations - Show abstract - Cite 3.5% topic match

3.5%

1.3

2024

[338] Interactions with Prompt Problems: A New Way to Teach Programming with Large Language Models J. Prather, ..., and Bailey Kimmel ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 3.5% topic match

3.5%

5.6

2023

[339] ChatGPT for PLC/DCS Control Logic Generation Heiko Koziolek, ..., and Virendra Ashiwal 2023 IEEE 28th International Conference on Emerging Technologies and Factory Automation (ETFA) 2023 - 8 citations - Show abstract - Cite - PDF 3.5% topic match

3.5%

1.0

2020

[340] Machine Learning-Based Code Auto-Completion Implementation for Firmware Developers Junghyun Kim, ..., and Sang-Kun Choi Applied Sciences 2020 - 4 citations - Show abstract - Cite 3.5% topic match

3.5%

14.2

2023

[341] Understanding the Usability of AI Programming Assistants Jenny T. Liang, ..., and Brad A. Myers ArXiv 2023 - 26 citations - Show abstract - Cite 3.5% topic match

3.5%

0.0

2024

[342] Prompt-based Code Completion via Multi-Retrieval Augmented Generation Hanzhuo Tan, ..., and Yuqun Zhang Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 3.5% topic match

3.5%

1.4

2024

[343] Assured LLM-Based Software Engineering N. Alshahwan, ..., and Eddy Wang ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 3.5% topic match

3.5%

14.4

2023

[344] Prompt Problems: A New Programming Exercise for the Generative AI Era Paul Denny, ..., and Brent N. Reeves ArXiv 2023 - 14 citations - Show abstract - Cite - PDF 3.5% topic match

3.4%

0.0

2024

[345] CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks Yiqing Xie, ..., and Carolyn Rose ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 3.4% topic match

3.4%

7.0

2023

[346] A Survey on Large Language Models for Software Engineering Quanjun Zhang, ..., and Zhenyu Chen ArXiv 2023 - 6 citations - Show abstract - Cite - PDF 3.4% topic match

3.4%

0.0

2023

[347] AskIt: Unified Programming Interface for Programming with Large Language Models Katsumi Okuda and Saman P. Amarasinghe 2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) 2023 - 0 citations - Show abstract - Cite - PDF 3.4% topic match

3.4%

0.0

2024

[348] MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation Jianbo Dai, ..., and Zhijiang Guo Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 3.4% topic match

3.4%

9.7

2022

[349] Execution-based Evaluation for Data Science Code Generation Models Junjie Huang, ..., and Jianfeng Gao ArXiv 2022 - 19 citations - Show abstract - Cite - PDF 3.4% topic match

3.4%

0.0

2023

[350] Regulating the use of generative AI in academic research and publications Ivana Kunda PUBMET 2023 - 0 citations - Show abstract - Cite 3.4% topic match

3.3%

0.0

2024

[351] Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning Junzhi Chen, ..., and Benyou Wang Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 3.3% topic match

3.3%

0.0

2024

[352] Knowledge-Aware Code Generation with Large Language Models Tao Huang, ..., and Chen Lyu ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 3.3% topic match

3.3%

3.0

2023

[353] Exploring the Problems, their Causes and Solutions of AI Pair Programming: A Study with Practitioners of GitHub Copilot Xiyu Zhou, ..., and Muhammad Waseem Journal Not Provided 2023 - 3 citations - Show abstract - Cite - PDF 3.3% topic match

3.3%

4.4

2023

[354] On the Effectiveness of Large Language Models in Domain-Specific Code Generation Meng Chen, ..., and Xiaodong Gu ArXiv 2023 - 4 citations - Show abstract - Cite - PDF 3.3% topic match

3.3%

3.6

2024

[355] On Evaluating the Efficiency of Source Code Generated by LLMs Changan Niu, ..., and Vincent Ng ArXiv 2024 - 2 citations - Show abstract - Cite - PDF 3.3% topic match

3.3%

5.6

2023

[356] Generative Artificial Intelligence Assistants in Software Development Education: A Vision for Integrating Generative Artificial Intelligence Into Educational Practice, Not Instinctively Defending Against It Christopher Bull and Ahmed Kharrufa IEEE Software 2023 - 9 citations - Show abstract - Cite - PDF 3.3% topic match

3.2%

8.8

2022

[357] Using pre-trained language models to resolve textual and semantic merge conflicts (experience paper) Jialu Zhang, ..., and Shuvendu K. Lahiri Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis 2022 - 20 citations - Show abstract - Cite 3.2% topic match

3.2%

0.0

2023

[358] CERN for AGI: A Theoretical Framework for Autonomous Simulation-Based Artificial Intelligence Testing and Alignment Ljubiša Bojić, ..., and Boris Delibasic ArXiv 2023 - 0 citations - Show abstract - Cite - PDF 3.2% topic match

3.2%

188.8

2023

[359] WizardCoder: Empowering Code Large Language Models with Evol-Instruct Ziyang Luo, ..., and Daxin Jiang ArXiv 2023 - 260 citations - Show abstract - Cite - PDF 3.2% topic match

3.2%

9.5

2022

[360] Piloting Copilot and Codex: Hot Temperature, Cold Prompts, or Black Magic? Jean-Baptiste Döderlein, ..., and B. Combemale ArXiv 2022 - 19 citations - Show abstract - Cite - PDF 3.2% topic match

3.2%

2.4

2023

[361] A Prompt Learning Framework for Source Code Summarization Weisong Sun, ..., and Zhenyu Chen ArXiv 2023 - 2 citations - Show abstract - Cite - PDF 3.2% topic match

3.2%

0.0

2024

[362] TauchiGPT_V2: An Offline Agent-based Opensource AI Tool designed to Assist in Academic Research A. Farooq, ..., and R. Raisamo Human Interaction and Emerging Technologies (IHIET-AI 2024): Artificial Intelligence and Future Applications 2024 - 0 citations - Show abstract - Cite Abstract: Recent progress in artificial intelligence, particularly deep learning, has ushered in a new era of autonomously generated content spanning text, audio, and visuals. This means Large Language Models (LLMs) such as ChatGPT, Llama2, Claude, and PaLM 2 are now developed enough to not only fill in the gaps within user-generated content, but also create unique content of their own, using predefined styles, formats, and writing techniques. With selective modelling and fine-tuning relevant training data, LLMs can output original content for a wide range of tasks previously considered solely the domain of human creativity. However, if we look at the area of research and development within academics, this AI renaissance has yet to make a meaningful impact finding in the pedagogical domains. Crafting a tailored R&D instrument, adept at intricate research procedures, previously presented a formidable challenge regarding expertise, time, and fiscal resources. However, the latest development Within this context, Generative Pre-trained Transformers (GPT) and their foundational structures offer a beacon, given their potential to exploit pre-trained Large Language Models (LLMs) for optimizing standard research operations. Our previous work on Autonomous Agents shows that using existing tools and deductive reasoning techniques built on the LangChain model can create a customized tool for academic research. This study builds on the existing work in autonomous agents and open-source LLMs to develop TAUCHI-GPT_V2, a novel adaptation of the academic research assistant. TAUCHI-GPT_V2, conceptualized as an open-source initiative, is built on top of the LangChain architecture employing LLaMA2-13b as the core LLM, ingesting users’ own data and files to provide highly relevant contextual results. In this paper, we discuss how TAUCHI-GPT_V2 uses custom offline localized vectorDB for parsing users’ personal files to output relevant contextual results within a chat interface. We also put the model to the test by having academic researchers utilize the tool within their daily workflow and report its efficacy and reliability in both hallucinations as well as citing relevant information to enhance user workflow for academic research-related tasks. 3.2% topic match

3.2%

3.0

2023

[363] Can ChatGPT advance software testing intelligence? An experience report on metamorphic testing Q. Luu, ..., and Tsong Yueh Chen ArXiv 2023 - 3 citations - Show abstract - Cite - PDF 3.2% topic match

3.2%

0.0

2023

[364] On Automated Assistants for Software Development: The Role of LLMs Mira Leung and Gail Murphy 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2023 - 0 citations - Show abstract - Cite 3.2% topic match

3.2%

0.0

2024

[365] Neural Models for Source Code Synthesis and Completion Mitodru Niyogi ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 3.2% topic match

3.2%

10.5

2021

[366] An Empirical Cybersecurity Evaluation of GitHub Copilot's Code Contributions H. Pearce, ..., and R. Karri ArXiv 2021 - 40 citations - Show abstract - Cite 3.2% topic match

3.2%

None

[367] in a David Lynch, ..., and Michael O’Neill Journal Not Provided None - 0 citations - Show abstract - Cite 3.2% topic match

3.1%

1.7

2023

[368] Study of software developers' experience using the Github Copilot Tool in the software development process Mateusz Jaworski and Dariusz Piotrkowski ArXiv 2023 - 3 citations - Show abstract - Cite - PDF 3.1% topic match

3.1%

0.0

2023

[369] Building Trust in AI -A Simplified Guide to Ensure Software Quality Sahithi Devalla and Manas Kumar Yogix Journal of Soft Computing Paradigm 2023 - 0 citations - Show abstract - Cite 3.1% topic match

3.1%

2.1

2023

[370] Evaluating In-Context Learning of Libraries for Code Generation Arkil Patel, ..., and Pradeep Dasigi ArXiv 2023 - 2 citations - Show abstract - Cite - PDF 3.1% topic match

3.1%

0.0

2023

[371] AI-Assisted Security: A Step towards Reimagining Software Development for a Safer Future Yong Shi, ..., and Kai Qian 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC) 2023 - 0 citations - Show abstract - Cite 3.1% topic match

3.1%

4.4

2021

[372] CGEMs: A Metric Model for Automatic Code Generation using GPT-3 Aishwarya Narasimhan, ..., and Sony India Software Centre Pvt. Ltd. ArXiv 2021 - 14 citations - Show abstract - Cite - PDF 3.1% topic match

3.1%

14.3

2023

[373] Towards an Understanding of Large Language Models in Software Engineering Tasks Zibin Zheng, ..., and Weicheng Wang ArXiv 2023 - 17 citations - Show abstract - Cite - PDF 3.1% topic match

3.0%

6.0

2023

[374] The (ab)use of Open Source Code to Train Large Language Models Ali Al-Kaswan and M. Izadi 2023 IEEE/ACM 2nd International Workshop on Natural Language-Based Software Engineering (NLBSE) 2023 - 10 citations - Show abstract - Cite - PDF 3.0% topic match

3.0%

0.1

2007

[375] Toward automated software development Douglas R. Smith Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering 2007 - 2 citations - Show abstract - Cite 3.0% topic match

3.0%

0.0

2023

[376] Seq2Seq or Seq2Tree: Generating Code Using Both Paradigms via Mutual Learning Yunfei Zhao, ..., and Ge Li Proceedings of the 14th Asia-Pacific Symposium on Internetware 2023 - 0 citations - Show abstract - Cite 3.0% topic match

3.0%

1.6

2023

[377] GrACE: Generation using Associated Code Edits Priyanshu Gupta, ..., and A. Tiwari ArXiv 2023 - 3 citations - Show abstract - Cite 3.0% topic match

3.0%

225.9

2022

[378] Large Language Models Are Human-Level Prompt Engineers Yongchao Zhou, ..., and Jimmy Ba ArXiv 2022 - 449 citations - Show abstract - Cite - PDF 3.0% topic match

3.0%

1.4

2024

[379] AI-Assisted Programming Tasks Using Code Embeddings and Transformers S. Kotsiantis, ..., and M. Tzagarakis Electronics 2024 - 1 citations - Show abstract - Cite 3.0% topic match

3.0%

10.5

2023

[380] Large Language Models and Simple, Stupid Bugs Kevin Jesse, ..., and Emily Morgan 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) 2023 - 17 citations - Show abstract - Cite - PDF 3.0% topic match

2.9%

12.9

2022

[381] Fault-Aware Neural Code Rankers J. Inala, ..., and Jianfeng Gao ArXiv 2022 - 31 citations - Show abstract - Cite - PDF 2.9% topic match

2.9%

0.0

2024

[382] CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models Xuechen Liang, ..., and Yiting Xie ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 2.9% topic match

2.9%

0.0

2024

[383] Guiding Enumerative Program Synthesis with Large Language Models Yixuan Li, ..., and Elizabeth Polgreen ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 2.9% topic match

2.9%

38.4

2022

[384] Synchromesh: Reliable code generation from pre-trained language models Gabriel Poesia, ..., and Sumit Gulwani ArXiv 2022 - 106 citations - Show abstract - Cite - PDF 2.9% topic match

2.9%

12.8

2023

[385] Towards Generating Functionally Correct Code Edits from Natural Language Issue Descriptions Sarah Fakhoury, ..., and Shuvendu K. Lahiri ArXiv 2023 - 20 citations - Show abstract - Cite - PDF 2.9% topic match

2.9%

1.7

2024

[386] Evaluating Large Language Models with Runtime Behavior of Program Execution Junkai Chen, ..., and Xin Xia ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 2.9% topic match

2.9%

0.0

2024

[387] AgentCoord: Visually Exploring Coordination Strategy for LLM-based Multi-Agent Collaboration Bo Pan, ..., and Wei Chen ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 2.9% topic match

2.9%

8.6

2023

[388] Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach Bin Zhang, ..., and Guoliang Fan ArXiv 2023 - 8 citations - Show abstract - Cite - PDF 2.9% topic match

2.9%

2.2

2023

[389] The Transformative Influence of Large Language Models on Software Development Sajed Jalil ArXiv 2023 - 2 citations - Show abstract - Cite - PDF 2.9% topic match

2.9%

32.6

2023

[390] Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View Jintian Zhang, ..., and Shumin Deng ArXiv 2023 - 35 citations - Show abstract - Cite - PDF 2.9% topic match

2.8%

47.3

2023

[391] MultiPL-E: A Scalable and Polyglot Approach to Benchmarking Neural Code Generation Federico Cassano, ..., and Abhinav Jangda IEEE Transactions on Software Engineering 2023 - 63 citations - Show abstract - Cite 2.8% topic match

2.8%

0.0

2024

[392] Bard, ChatGPT and 3DGPT: a scientometric analysis of generative AI tools and assessment of implications for mechanical engineering education K. B. Mustapha, ..., and Y. Abakr Interactive Technology and Smart Education 2024 - 0 citations - Show abstract - Cite 2.8% topic match

2.8%

0.0

2024

[393] Using artificial intelligence for economic research: An agricultural odyssey Andrew Leigh Australian Journal of Agricultural and Resource Economics 2024 - 0 citations - Show abstract - Cite 2.8% topic match

2.8%

0.0

2023

[394] A User-centered Security Evaluation of Copilot Owura Asare, ..., and N. Asokan International Conference on Software Engineering 2023 - 0 citations - Show abstract - Cite - PDF 2.8% topic match

2.8%

0.0

2023

[395] Human, What Must I Tell You? Markus Borg, ..., and Markus Borg IEEE Software 2023 - 0 citations - Show abstract - Cite 2.8% topic match

2.8%

6.3

2023

[396] Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models Y. Hu, ..., and Muhan Zhang ArXiv 2023 - 9 citations - Show abstract - Cite - PDF 2.8% topic match

2.7%

3.0

2023

[397] Exploring Large Language Models for Code Explanation Paheli Bhattacharya, ..., and Rishabh Gupta ArXiv 2023 - 3 citations - Show abstract - Cite - PDF 2.7% topic match

2.7%

2.1

2023

[398] AutoScrum: Automating Project Planning Using Large Language Models Martin Schroder Journal Not Provided 2023 - 3 citations - Show abstract - Cite - PDF 2.7% topic match

2.7%

3.1

2023

[399] Conceptual Model Interpreter for Large Language Models Felix Härer ArXiv 2023 - 3 citations - Show abstract - Cite - PDF 2.7% topic match

2.7%

0.0

2024

[400] A Survey on Large Language Models from Concept to Implementation Chen Wang, ..., and Jiaqi Gong ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 2.7% topic match

2.6%

1.4

2022

[401] An Integrative Human-Centered Architecture for Interactive Programming Assistants Andrew Blinn, ..., and Cyrus Omar 2022 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) 2022 - 3 citations - Show abstract - Cite 2.6% topic match

2.6%

3.1

2021

[402] Natural language-guided programming Geert Heyman, ..., and Tom Van Cutsem Proceedings of the 2021 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software 2021 - 10 citations - Show abstract - Cite - PDF 2.6% topic match

2.6%

0.9

2023

[403] Frustrated with Code Quality Issues? LLMs can Help! Nalin Wadhwa, ..., and S. Rajamani ArXiv 2023 - 1 citations - Show abstract - Cite - PDF 2.6% topic match

2.6%

19.9

2021

[404] Few-Shot Semantic Parsing with Language Models Trained on Code Richard Shin and Benjamin Van Durme North American Chapter of the Association for Computational Linguistics 2021 - 57 citations - Show abstract - Cite - PDF 2.6% topic match

2.6%

1.1

2023

[405] Enhancing Robot Program Synthesis Through Environmental Context Tianyi Chen, ..., and Xin Peng ArXiv 2023 - 1 citations - Show abstract - Cite - PDF 2.6% topic match

2.5%

8.0

2023

[406] Evaluating ChatGPT and GPT-4 for Visual Programming A. Singla Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 2 2023 - 10 citations - Show abstract - Cite - PDF Abstract: Generative AI has the potential to drastically improve the landscape of computing education by automatically generating personalized feedback and content. In particular, this potential lies in the advanced capabilities of state-of-the-art deep generative and large language models such as OpenAI’s Codex [7], ChatGPT [11], and GPT-4 [12]. In our work, we seek to investigate the capabilities of these models in visual programming domains popularly used for K-8 programming education, including domains like Scratch [17], Hour of Code: Maze Challenge by Code.org [4, 5], and Karel [13]. Recent works have shown us sparks of advanced capabilities of such models for various education scenarios in introductory Python programming [2, 14, 18, 20]. In fact, a study in 2022 had ranked Codex in the top quartile w.r.t students in a large Python programming course [8]. However, all these works consider only text-based Python programming and leave open the question of how well these models would perform for visual programming. The main research question is: Do state-of-the-art neural generative models show advanced capabilities for visual programming on par with their capabilities on text-based Python programming? In our work, we evaluate these models for visual programming based on the following three settings designed to capture various generative and problem-solving capabilities: We conduct our evaluation based on 10 representative tasks from two visual programming domains: Hour of Code: Maze Challenge by Code.org [4, 5] and Intro to Programming with Karel course by CodeHS.com [3, 13]. As illustrative examples, Figures 1, 2, and 3 show the output of GPT-4 in three settings for Maze18 task. We will provide the detailed analysis and prompts used in a longer version of this poster. Our preliminary results for ChatGPT (based on GPT-3.5) and GPT-4 show that these models perform poorly and produce incorrect output the majority of the time. These results highlight that state-of-the-art neural generative models like GPT-4 still struggle to combine spatial, logical, and programming skills crucial for visual programming. As the next step, it would be important to curate novel benchmarks that the research community can use to evaluate improvements in future versions of these models for visual programming. 2.5% topic match

2.5%

18.0

2022

[407] Interactive Code Generation via Test-Driven User-Intent Formalization Shuvendu K. Lahiri, ..., and Jianfeng Gao ArXiv 2022 - 40 citations - Show abstract - Cite - PDF 2.5% topic match

2.5%

3.9

2024

[408] PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety Zaibin Zhang, ..., and Jing Shao ArXiv 2024 - 3 citations - Show abstract - Cite - PDF 2.5% topic match

2.5%

17.2

2023

[409] A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair Quanjun Zhang, ..., and Zhenyu Chen ArXiv 2023 - 18 citations - Show abstract - Cite - PDF 2.5% topic match

2.5%

77.7

2023

[410] Large Language Models as Tool Makers Tianle Cai, ..., and Denny Zhou ArXiv 2023 - 111 citations - Show abstract - Cite - PDF 2.5% topic match

2.5%

0.0

2024

[411] Quality Assessment of Prompts Used in Code Generation Mohammed Latif Siddiq, ..., and Joanna C. S. Santos ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 2.5% topic match

2.5%

0.0

2024

[412] Developing Time Series Forecasting Models with Generative Large Language Models Juan Morales-García, ..., and Fernando Terroso-Sáenz ACM Transactions on Intelligent Systems and Technology 2024 - 0 citations - Show abstract - Cite 2.5% topic match

2.4%

0.0

2024

[413] Can LLM-Augmented autonomous agents cooperate?, An evaluation of their cooperative capabilities through Melting Pot Manuel Mosquera, ..., and Ruben Manrique ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 2.4% topic match

2.4%

0.0

2023

[414] ARTIFICIAL INTELLIGENCE CHATBOTS – A HELP OR HINDRANCE TO COMPUTER SCIENCE EDUCATION Paul Sage Education and New Developments 2023 – Volume 2 2023 - 0 citations - Show abstract - Cite 2.4% topic match

2.4%

1.6

2024

[415] Evaluating the Application of Large Language Models to Generate Feedback in Programming Education Sven Jacobs and Steffen Jaschke ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 2.4% topic match

2.4%

17.9

2019

[416] Learning to Infer Program Sketches Maxwell Nye, ..., and Armando Solar-Lezama ArXiv 2019 - 102 citations - Show abstract - Cite - PDF 2.4% topic match

2.4%

0.0

2023

[417] Do Robots Dream of Passing a Programming Course? Nicolás Torres 2023 42nd IEEE International Conference of the Chilean Computer Science Society (SCCC) 2023 - 0 citations - Show abstract - Cite 2.4% topic match

2.4%

1.8

2023

[418] Combining Contexts from Multiple Sources for Documentation-Specific Code Example Generation Junaed Younus Khan and Gias Uddin 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2023 - 3 citations - Show abstract - Cite - PDF 2.4% topic match

2.3%

0.0

2020

[419] Bidirectional Transformer Language Models for Smart Autocompletion of Source Code Felix Binder, ..., and A. Ulges https://doi.org/10.18420/inf2020_83 2020 - 0 citations - Show abstract - Cite 2.3% topic match

2.3%

2.2

2023

[420] Improving Knowledge Extraction from LLMs for Robotic Task Learning through Agent Analysis James R. Kirk, ..., and Peter Lindes ArXiv 2023 - 4 citations - Show abstract - Cite 2.3% topic match

2.3%

9.5

2023

[421] Benchmarking and Explaining Large Language Model-based Code Generation: A Causality-Centric Approach Zhenlan Ji, ..., and Shuai Wang ArXiv 2023 - 10 citations - Show abstract - Cite - PDF 2.3% topic match

2.3%

15.8

2022

[422] Automatic Code Documentation Generation Using GPT-3 Junaed Younus Khan and Gias Uddin Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering 2022 - 34 citations - Show abstract - Cite - PDF 2.3% topic match

2.3%

3.5

2023

[423] Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models Yuqi Zhu, ..., and Hong Mei AAAI Conference on Artificial Intelligence 2023 - 4 citations - Show abstract - Cite - PDF 2.3% topic match

2.3%

1.2

2024

[424] State-Based Dynamic Graph with Breadth First Progression For Autonomous Robots Tushar Chugh, ..., and Jeshwanth Challagundla 2024 IEEE 14th Annual Computing and Communication Workshop and Conference (CCWC) 2024 - 1 citations - Show abstract - Cite Abstract: This paper introduces a novel method for enhancing robotic systems using Large Language Models (LLMs). We focus on leveraging LLMs to substantially improve robots’ decision-making and interaction with their environment. Our proposed framework employs an agent-based approach, where robots utilize LLMs for sophisticated pattern recognition, environmental understanding, and autonomous decision-making. The main innovation of this research is the integration of LLMs into a robotic system, enabling robots to process large volumes of unstructured data, recognize complex patterns, and make informed decisions with increased precision and efficiency. This integration marks a significant leap in robotic cognitive abilities, surpassing the constraints of traditional programming. Our methodology transforms LLMs into dynamic elements within robotic systems, fostering enhanced and responsive interactions with the environment. Robots equipped with LLMs are thus capable of advanced autonomous operations, including navigating complex environments, solving intricate problems, and interacting more naturally with humans. The primary contribution of this work is the creation of an agent-based graph framework, designed to facilitate collaborative problem-solving in robotic systems. This framework consists of multiple agents working collaboratively, spanning from data ingestion to comprehensive world understanding and decision-making. These agents include modules responsible for various operational aspects, such as environmental analysis, data processing, and specialized LLMs for data interpretation and summarization. Positioning LLMs as proactive, inquisitive agents, our approach enables them to actively seek information and efficiently collaborate with other agents to complete tasks. The dynamic nature of this graph search and inter-agent communication model is a considerable innovation in robotics, offering a more integrated and effective approach for robots to tackle complex tasks, thereby enhancing their ability to operate autonomously and intelligently in diverse environments. 2.3% topic match

2.3%

44.8

2022

[425] Multi-lingual Evaluation of Code Generation Models Ben Athiwaratkun, ..., and Bing Xiang ArXiv 2022 - 90 citations - Show abstract - Cite - PDF 2.3% topic match

2.2%

22.2

2023

[426] AutoML-GPT: Automatic Machine Learning with GPT Shujian Zhang, ..., and Mi Zhou ArXiv 2023 - 33 citations - Show abstract - Cite - PDF 2.2% topic match

2.2%

39.2

2022

[427] Large Language Models Meet NL2Code: A Survey Daoguang Zan, ..., and Jian-Guang Lou Annual Meeting of the Association for Computational Linguistics 2022 - 73 citations - Show abstract - Cite - PDF 2.2% topic match

2.2%

22.0

2023

[428] Retrieval-Based Prompt Selection for Code-Related Few-Shot Learning Noor Nashid, ..., and A. Mesbah 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) 2023 - 33 citations - Show abstract - Cite 2.2% topic match

2.2%

8.8

2023

[429] Towards Enhancing In-Context Learning for Code Generation Jia Li, ..., and Zhi Jin ArXiv 2023 - 16 citations - Show abstract - Cite 2.2% topic match

2.2%

24.0

2023

[430] SkCoder: A Sketch-based Approach for Automatic Code Generation Jia Li, ..., and Xing Hu 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) 2023 - 41 citations - Show abstract - Cite - PDF 2.2% topic match

2.1%

0.5

2016

[431] A Runtime Framework for Machine-Augmented Software Design Using Unsupervised Self-Learning Roberto Rodrigues Filho and Barry Porter 2016 IEEE International Conference on Autonomic Computing (ICAC) 2016 - 4 citations - Show abstract - Cite 2.1% topic match

2.1%

3.9

2017

[432] Program synthesis: challenges and opportunities C. David and D. Kroening Philosophical transactions. Series A, Mathematical, physical, and engineering sciences 2017 - 28 citations - Show abstract - Cite 2.1% topic match

2.1%

4.2

2023

[433] MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration Lin Xu, ..., and Jiashi Feng ArXiv 2023 - 4 citations - Show abstract - Cite - PDF 2.1% topic match

2.1%

1.8

2023

[434] PwR: Exploring the Role of Representations in Conversational Programming YM Pradyumna, ..., and S. Rajamani ArXiv 2023 - 2 citations - Show abstract - Cite - PDF 2.1% topic match

2.1%

0.0

2017

[435] Deep Learning for Compilers First Year Review Document Pavlos Petoumenos and Richard Mayr Journal Not Provided 2017 - 0 citations - Show abstract - Cite 2.1% topic match

2.0%

0.0

2024

[436] Program Decomposition and Translation with Static Analysis Ali Reza Ibrahimzada ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 2.0% topic match

2.0%

0.0

2024

[437] Toward Automated Programming for Robotic Assembly Using ChatGPT Annabella Macaluso, ..., and Sachin Chitta Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 2.0% topic match

2.0%

1.4

2024

[438] Exploring Interaction Patterns for Debugging: Enhancing Conversational Capabilities of AI-assistants Bhavya Chopra, ..., and Sumit Gulwani ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 2.0% topic match

2.0%

3.6

2023

[439] Analysis of the Application of Generative AI in Business Management Qian Bi Advances in Economics and Management Research 2023 - 5 citations - Show abstract - Cite 2.0% topic match

2.0%

0.0

2024

[440] Toward Artificial Intelligence-Human Paired Programming: A Review of the Educational Applications and Research on Artificial Intelligence Code-Generation Tools Jiangyue Liu and Siran Li Journal of Educational Computing Research 2024 - 0 citations - Show abstract - Cite 2.0% topic match

2.0%

1.8

2017

[441] AP: Artificial Programming Rishabh Singh and Pushmeet Kohli Summit on Advances in Programming Languages 2017 - 13 citations - Show abstract - Cite Abstract: The ability to automatically discover a program consistent with a given user intent (specification) is the holy grail of Computer Science. While significant progress has been made on the so-called problem of Program Synthesis, a number of challenges remain; particularly for the case of synthesizing richer and larger programs. This is in large part due to the difficulty of search over the space of programs. In this paper, we argue that the above-mentioned challenge can be tackled by learning synthesizers automatically from a large amount of training data. We present a first step in this direction by describing our novel synthesis approach based on two neural architectures for tackling the two key challenges of Learning to understand partial input-output specifications and Learning to search programs. The first neural architecture called the Spec Encoder computes a continuous representation of the specification, whereas the second neural architecture called the Program Generator incrementally constructs programs in a hypothesis space that is conditioned by the specification vector. The key idea of the approach is to train these architectures using a large set of (spec,P) pairs, where P denotes a program sampled from the DSL L and spec denotes the corresponding specification satisfied by P. We demonstrate the effectiveness of our approach on two preliminary instantiations. The first instantiation, called Neural FlashFill, corresponds to the domain of string manipulation programs similar to that of FlashFill. The second domain considers string transformation programs consisting of composition of API functions. We show that a neural system is able to perform quite well in learning a large majority of programs from few input-output examples. We believe this new approach will not only dramatically expand the applicability and effectiveness of Program Synthesis, but also would lead to the coming together of the Program Synthesis and Machine Learning research disciplines. 2.0% topic match

1.9%

0.0

2024

[442] Analysis of Computer Application Software Automation Development Technology Tongwei Xie and Yanchang Song Advances in Computer and Communication 2024 - 0 citations - Show abstract - Cite 1.9% topic match

1.9%

0.0

2023

[443] Tech Report Generative AI Adrian Cevallos, ..., and Jorge Rodriguez Breuning https://doi.org/10.18235/0005105 2023 - 0 citations - Show abstract - Cite 1.9% topic match

1.9%

6.0

2023

[444] ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models Chenliang Li, ..., and Jingren Zhou ArXiv 2023 - 7 citations - Show abstract - Cite - PDF 1.9% topic match

1.9%

26.0

2019

[445] Write, Execute, Assess: Program Synthesis with a REPL Kevin Ellis, ..., and Armando Solar-Lezama Neural Information Processing Systems 2019 - 140 citations - Show abstract - Cite - PDF 1.9% topic match

1.9%

3.2

2024

[446] Software Vulnerability and Functionality Assessment using LLMs Rasmus Ingemann Tuffveson Jensen, ..., and Salwa Alamir ArXiv 2024 - 2 citations - Show abstract - Cite - PDF 1.9% topic match

1.8%

0.0

2023

[447] Assessment of ChatGPT's Proficiency in Software Development Dae-Kyoo Kim, ..., and Lunjin Lu 2023 Congress in Computer Science, Computer Engineering, & Applied Computing (CSCE) 2023 - 0 citations - Show abstract - Cite 1.8% topic match

1.8%

15.3

2023

[448] Making Language Models Better Tool Learners with Execution Feedback Shuofei Qiao, ..., and Ningyu Zhang ArXiv 2023 - 22 citations - Show abstract - Cite - PDF 1.8% topic match

1.8%

0.0

2023

[449] ChatGPT Assisted Development of Laravel Applications Antonio Jankovic, ..., and M. Tosic 2023 16th International Conference on Advanced Technologies, Systems and Services in Telecommunications (TELSIKS) 2023 - 0 citations - Show abstract - Cite 1.8% topic match

1.8%

301.3

2023

[450] The Rise and Potential of Large Language Model Based Agents: A Survey Zhiheng Xi, ..., and Tao Gui ArXiv 2023 - 339 citations - Show abstract - Cite - PDF 1.8% topic match

1.8%

31.4

2023

[451] How Secure is Code Generated by ChatGPT? R. Khoury, ..., and Baba Mamadou Camara 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC) 2023 - 48 citations - Show abstract - Cite - PDF 1.8% topic match

1.8%

4.6

2022

[452] When Neural Model Meets NL2Code: A Survey Daoguang Zan, ..., and Jian-Guang Lou ArXiv 2022 - 13 citations - Show abstract - Cite 1.8% topic match

1.8%

2.6

2023

[453] A Case Study on Scaffolding Exploratory Data Analysis for AI Pair Programmers Hao Zhou and Jingbo Li Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems 2023 - 4 citations - Show abstract - Cite 1.8% topic match

1.8%

0.0

2024

[454] Development in times of hype: How freelancers explore Generative AI? Mateusz Dolata, ..., and Gerhard Schwabe International Conference on Software Engineering 2024 - 0 citations - Show abstract - Cite - PDF 1.8% topic match

1.8%

2.8

2023

[455] COLDECO: An End User Spreadsheet Inspection Tool for AI-Generated Code Kasra Ferdowsi, ..., and Ben Zorn 2023 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) 2023 - 3 citations - Show abstract - Cite 1.8% topic match

1.8%

0.5

2013

[456] Software Programmed by Artificial Agents toward an Autonomous Development Process for Code Generation C. Insaurralde 2013 IEEE International Conference on Systems, Man, and Cybernetics 2013 - 5 citations - Show abstract - Cite 1.8% topic match

1.7%

0.0

2024

[457] SpreadNaLa: A Naturalistic Code Generation Evaluation Dataset of Spreadsheet Formulas Sebastian Schuster, ..., and Vera Demberg International Conference on Language Resources and Evaluation 2024 - 0 citations - Show abstract - Cite - PDF 1.7% topic match

1.7%

14.5

2023

[458] ChatGPT and Other Large Language Models as Evolutionary Engines for Online Interactive Collaborative Game Design P. Lanzi and D. Loiacono Proceedings of the Genetic and Evolutionary Computation Conference 2023 - 25 citations - Show abstract - Cite - PDF 1.7% topic match

1.7%

1.4

2024

[459] Governance of Generative Artificial Intelligence for Companies Johannes Schneider, ..., and Christian Meske ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 1.7% topic match

1.7%

36.5

2018

[460] Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow Pengcheng Yin, ..., and Graham Neubig 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) 2018 - 235 citations - Show abstract - Cite - PDF 1.7% topic match

1.7%

5.5

2024

[461] Professional Agents - Evolving Large Language Models into Autonomous Experts with Human-Level Competencies Zhixuan Chu, ..., and Jinjie Gu ArXiv 2024 - 4 citations - Show abstract - Cite - PDF 1.7% topic match

1.7%

0.0

2023

[462] SCALE: Semantic Code Analysis via Learned Embeddings Jason Abohwo, ..., and Michael Lutz 2023 International Conference on Artificial Intelligence, Blockchain, Cloud Computing, and Data Analytics (ICoABCD) 2023 - 0 citations - Show abstract - Cite 1.7% topic match

1.7%

0.5

2022

[463] Systematic Literature Review on Solving Competitive Programming Problem with Artificial Intelligence (AI) Francis Alexander, ..., and Felix Pherry 2022 1st International Conference on Software Engineering and Information Technology (ICoSEIT) 2022 - 1 citations - Show abstract - Cite 1.7% topic match

1.7%

0.0

2021

[464] Deep Learning Based Code Generation from Requirements Text: Are We There Yet? Hui Liu, ..., and Lu Zhang Journal Not Provided 2021 - 0 citations - Show abstract - Cite 1.7% topic match

1.7%

62.1

2023

[465] Large Language Models for Software Engineering: A Systematic Literature Review Xinying Hou, ..., and Haoyu Wang ArXiv 2023 - 74 citations - Show abstract - Cite - PDF 1.7% topic match

1.6%

4.1

2024

[466] Can Large Language Models Serve as Data Analysts? A Multi-Agent Assisted Approach for Qualitative Data Analysis Zeeshan Rasheed, ..., and P. Abrahamsson ArXiv 2024 - 3 citations - Show abstract - Cite - PDF 1.6% topic match

1.6%

0.7

2022

[467] C3PO: A Lightweight Copying Mechanism for Translating Pseudocode to Code Vishruth Veerendranath, ..., and H. Mamatha Journal Not Provided 2022 - 2 citations - Show abstract - Cite - PDF 1.6% topic match

1.5%

30.2

2023

[468] VerilogEval: Evaluating Large Language Models for Verilog Code Generation Mingjie Liu, ..., and Haoxing Ren ArXiv 2023 - 34 citations - Show abstract - Cite - PDF 1.5% topic match

1.5%

10.1

2023

[469] CGMI: Configurable General Multi-Agent Interaction Framework Jinxin Shi, ..., and Liangbo He ArXiv 2023 - 12 citations - Show abstract - Cite - PDF 1.5% topic match

1.5%

1.3

2024

[470] Exploring the Potential of Large Language Models in Self-adaptive Systems Jialong Li, ..., and Kenji Tei https://doi.org/10.48550/arXiv.2401.07534 2024 - 1 citations - Show abstract - Cite - PDF 1.5% topic match

1.5%

1.1

2023

[471] Exploring the Limits of ChatGPT in Software Security Applications Fangzhou Wu, ..., and Chaowei Xiao ArXiv 2023 - 1 citations - Show abstract - Cite - PDF 1.5% topic match

1.5%

0.0

2024

[472] PromptSet: A Programmer's Prompting Dataset Kaiser Pister, ..., and Ishan Joshi ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 1.5% topic match

1.4%

0.0

1999

[473] Learning how to program Muhammad Afzal Upal and S. Padmanabhuni Engineering Solutions for the Next Millennium. 1999 IEEE Canadian Conference on Electrical and Computer Engineering (Cat. No.99TH8411) 1999 - 0 citations - Show abstract - Cite 1.4% topic match

1.4%

41.2

2023

[474] How Generative AI models such as ChatGPT can be (Mis)Used in SPC Practice, Education, and Research? An Exploratory Study F. Megahed, ..., and L. A. Jones‐Farmer ArXiv 2023 - 70 citations - Show abstract - Cite - PDF 1.4% topic match

1.4%

143.3

2022

[475] A systematic evaluation of large language models of code Frank F. Xu, ..., and Vincent J. Hellendoorn Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming 2022 - 383 citations - Show abstract - Cite - PDF 1.4% topic match

1.4%

3.1

2023

[476] CompCodeVet: A Compiler-guided Validation and Enhancement Approach for Code Dataset Le Chen, ..., and A. Jannesari ArXiv 2023 - 3 citations - Show abstract - Cite - PDF 1.4% topic match

1.3%

4.1

2022

[477] CodeGen-Test: An Automatic Code Generation Model Integrating Program Test Information M. Zhong, ..., and Mingwen Wang 2023 2nd International Conference on Cloud Computing, Big Data Application and Software Engineering (CBASE) 2022 - 11 citations - Show abstract - Cite - PDF 1.3% topic match

1.3%

25.4

2023

[478] How ChatGPT Will Change Software Engineering Education Marian Daun and Jennifer Brings Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 2023 - 34 citations - Show abstract - Cite 1.3% topic match

1.3%

12.5

2023

[479] ALGO: Synthesizing Algorithmic Programs with Generated Oracle Verifiers Kexun Zhang, ..., and Lei Li ArXiv 2023 - 18 citations - Show abstract - Cite - PDF 1.3% topic match

1.3%

0.0

2023

[480] Novel Preprocessing Technique for Data Embedding in Engineering Code Generation Using Large Language Model Yu-Chen Lin, ..., and Jyh-Shing Roger Jang Journal Not Provided 2023 - 0 citations - Show abstract - Cite - PDF 1.3% topic match

1.3%

0.0

2024

[481] Exploring the use of ChatGPT in learning and instructing statistics and data analytics Yixun Xing Teaching Statistics 2024 - 0 citations - Show abstract - Cite 1.3% topic match

1.3%

0.0

2024

[482] Unveiling Disparities in Web Task Handling Between Human and Web Agent Kihoon Son, ..., and Juho Kim Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 1.3% topic match

1.3%

12.4

2023

[483] Comparing Software Developers with ChatGPT: An Empirical Investigation N. Nascimento, ..., and Donald D. Cowan ArXiv 2023 - 18 citations - Show abstract - Cite - PDF 1.3% topic match

1.3%

0.0

2024

[484] Adopting Generative AI for Literature Reviews: An Epistemological Perspective G. Schryen, ..., and Jiaqi Yang Hawaii International Conference on System Sciences 2024 - 0 citations - Show abstract - Cite 1.3% topic match

1.3%

0.0

2023

[485] MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework Sirui Hong, ..., and Jürgen Schmidhuber Journal Not Provided 2023 - 0 citations - Show abstract - Cite 1.3% topic match

1.2%

0.7

2023

[486] Code Generation for Collectible Card Games with Complex APIs John Licato, ..., and Brayden Hollis The International FLAIRS Conference Proceedings 2023 - 1 citations - Show abstract - Cite 1.2% topic match

1.2%

1.1

2023

[487] Distinguishing AI- and Human-Generated Code: A Case Study Sufiyan Bukhari, ..., and Lorenzo De Carli Proceedings of the 2023 Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses 2023 - 1 citations - Show abstract - Cite 1.2% topic match

1.2%

0.0

2024

[488] Legal Aspects for Software Developers Interested in Generative AI Applications Steffen Herbold, ..., and Joel Mittel ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 1.2% topic match

1.2%

1.4

2023

[489] The Good, the Bad, and the Missing: Neural Code Generation for Machine Learning Tasks Jiho Shin, ..., and Song Wang ACM Transactions on Software Engineering and Methodology 2023 - 2 citations - Show abstract - Cite - PDF 1.2% topic match

1.2%

7.6

2023

[490] GameGPT: Multi-agent Collaborative Framework for Game Development Dake Chen, ..., and Haoyang Zhang ArXiv 2023 - 8 citations - Show abstract - Cite - PDF 1.2% topic match

1.2%

7.5

2023

[491] Balancing Autonomy and Alignment: A Multi-Dimensional Taxonomy for Autonomous LLM-powered Multi-Agent Architectures Thorsten Händler ArXiv 2023 - 8 citations - Show abstract - Cite - PDF 1.2% topic match

1.2%

1.7

2024

[492] Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning Qinhao Zhou, ..., and Yongbin Li ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 1.2% topic match

1.1%

0.5

2019

[493] Artificial intelligence for software development — the present and the challenges for the future Łukasz Korzeniowski and K. Goczyła Bulletin of the Military University of Technology 2019 - 3 citations - Show abstract - Cite 1.1% topic match

1.1%

1.9

2023

[494] Two Birds with One Stone: Boosting Code Generation and Code Search via a Generative Adversarial Network Shangwen Wang, ..., and Xiaoguang Mao Proceedings of the ACM on Programming Languages 2023 - 2 citations - Show abstract - Cite 1.1% topic match

1.1%

0.0

2024

[495] Large Language Model-Enabled Multi-Agent Manufacturing Systems Jonghan Lim, ..., and Ilya Kovalenko Journal Not Provided 2024 - 0 citations - Show abstract - Cite - PDF 1.1% topic match

1.0%

1.1

2023

[496] An Infinity of Pong: A Raspberry Pi Pico W handheld writes its own games Jose Antonio Garcia Peiro IEEE Spectrum 2023 - 2 citations - Show abstract - Cite 1.0% topic match

0.9%

None

[497] Is Attention All You Need? Toward a Conceptual Model for Social Awareness in Large Language Models No author found Journal Not Provided None - 0 citations - Show abstract - Cite 0.9% topic match

0.9%

2.0

2019

[498] Towards a Knowledge Warehouse and Expert System for the Automation of SDLC Tasks Ritu Kapur and B. Sodhi 2019 IEEE/ACM International Conference on Software and System Processes (ICSSP) 2019 - 11 citations - Show abstract - Cite 0.9% topic match

0.9%

4.9

2024

[499] Embodied LLM Agents Learn to Cooperate in Organized Teams Xudong Guo, ..., and Mengdi Wang ArXiv 2024 - 3 citations - Show abstract - Cite - PDF 0.9% topic match

0.9%

0.2

2020

[500] Research on Deep Learning Based Code Generation from Natural Language Description Jiaqi Zhu and Mingzhu Shen 2020 IEEE 5th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA) 2020 - 1 citations - Show abstract - Cite 0.9% topic match

0.9%

34.4

2023

[501] “It’s Weird That it Knows What I Want”: Usability and Interactions with Copilot for Novice Programmers J. Prather, ..., and E. Santos ACM Transactions on Computer-Human Interaction 2023 - 54 citations - Show abstract - Cite - PDF 0.9% topic match

0.9%

0.0

2023

[502] Unleashing the Power of Generative Artificial Intelligence: Exploring its Boundless Potential and Overcoming Challenges in Academic Environments Atul Fegade, ..., and Vandana Khanna 2023 6th International Conference on Contemporary Computing and Informatics (IC3I) 2023 - 0 citations - Show abstract - Cite 0.9% topic match

0.9%

0.0

2023

[503] Navigating Generative AI: Opportunities, Limitations, and Ethical Considerations in Massage Therapy and Beyond Amanda Baskwill International Journal of Therapeutic Massage & Bodywork 2023 - 0 citations - Show abstract - Cite 0.9% topic match

0.9%

11.3

2022

[504] Automatically Generating CS Learning Materials with Large Language Models S. Macneil, ..., and Sami Sarsa Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 2 2022 - 30 citations - Show abstract - Cite - PDF 0.9% topic match

0.8%

5.1

2024

[505] Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM Chun Xia, ..., and Lingming Zhang ArXiv 2024 - 3 citations - Show abstract - Cite - PDF 0.8% topic match

0.8%

1.9

2023

[506] How to Teach Programming in the AI Era? Using LLMs as a Teachable Agent for Debugging Qianou Ma, ..., and Tongshuang Wu Journal Not Provided 2023 - 2 citations - Show abstract - Cite - PDF 0.8% topic match

0.7%

10.8

2023

[507] Cooperation on the Fly: Exploring Language Agents for Ad Hoc Teamwork in the Avalon Game Zijing Shi, ..., and Yali Du ArXiv 2023 - 9 citations - Show abstract - Cite - PDF 0.7% topic match

0.7%

2.8

2024

[508] A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents Lingbo Mo, ..., and Huan Sun ArXiv 2024 - 2 citations - Show abstract - Cite - PDF 0.7% topic match

0.6%

0.0

2023

[509] The AI-powered PM Toolkit - A Project Manager’s Guide to Thrive the Generative AI Wave Mahesh Deshpande Journal of Artificial Intelligence & Cloud Computing 2023 - 0 citations - Show abstract - Cite 0.6% topic match

0.6%

0.0

2024

[510] Analysis and Advancement in Domain-Specific Templated Question Answering Aaditya Baranwal, ..., and Sukriti Goyal Software Engineering & Trends 2024 - 0 citations - Show abstract - Cite 0.6% topic match

0.6%

8.6

2023

[511] ICE-Score: Instructing Large Language Models to Evaluate Code Terry Yue Zhuo Findings 2023 - 13 citations - Show abstract - Cite - PDF 0.6% topic match

0.6%

6.6

2023

[512] Generative Artificial Intelligence in Learning Analytics: Contextualising Opportunities and Challenges through the Learning Analytics Cycle Lixiang Yan, ..., and D. Gašević ArXiv 2023 - 6 citations - Show abstract - Cite - PDF 0.6% topic match

0.6%

7.9

2023

[513] Fully Autonomous Programming with Large Language Models Vadim Liventsev, ..., and L. Moonen Proceedings of the Genetic and Evolutionary Computation Conference 2023 - 12 citations - Show abstract - Cite - PDF 0.6% topic match

0.6%

0.0

2023

[514] A Review of Repository Level Prompting for LLMs Douglas Schonholtz ArXiv 2023 - 0 citations - Show abstract - Cite - PDF 0.6% topic match

0.6%

1.4

2023

[515] Evaluating GPT's Programming Capability through CodeWars' Katas Zizhuo Zhang, ..., and Yanfei Jiang ArXiv 2023 - 2 citations - Show abstract - Cite - PDF 0.6% topic match

0.6%

0.0

2024

[516] Using ChatGPT in Software Requirements Engineering: A Comprehensive Review Nuno Marques, ..., and Jorge Bernardino Future Internet 2024 - 0 citations - Show abstract - Cite 0.6% topic match

0.5%

2.3

2023

[517] Unmasking the giant: A comprehensive evaluation of ChatGPT's proficiency in coding algorithms and data structures Sayed Erfan Arefin, ..., and Abdul Serwadda International Conference on Agents and Artificial Intelligence 2023 - 3 citations - Show abstract - Cite - PDF 0.5% topic match

0.5%

1.7

2024

[518] CoderUJB: An Executable and Unified Java Benchmark for Practical Programming Scenarios Zhengran Zeng, ..., and Shi-Bo Zhang ArXiv 2024 - 1 citations - Show abstract - Cite - PDF 0.5% topic match

0.5%

0.4

2015

[519] Program synthesis: opportunities for the next decade Rastislav Bodík Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming 2015 - 4 citations - Show abstract - Cite 0.5% topic match

0.5%

1.3

2024

[520] Teaching Transformed Chris Edwards Commun. ACM 2024 - 1 citations - Show abstract - Cite 0.5% topic match

0.5%

0.0

2024

[521] Generative Artificial Intelligence for Industry: Opportunities, Challenges, and Impact Barun Kumar Saha 2024 International Conference on Artificial Intelligence in Information and Communication (ICAIIC) 2024 - 0 citations - Show abstract - Cite 0.5% topic match

0.4%

0.0

2024

[522] A Preliminary Study on Using Large Language Models in Software Pentesting Kumar Shashwat, ..., and Armin Ziaie Tabari ArXiv 2024 - 0 citations - Show abstract - Cite - PDF 0.4% topic match

0.4%

5.1

2018

[523] Investigating the Prospects of Generative Artificial Intelligence Mounika Mandapuram, ..., and Manjunath Reddy Asian Journal of Humanity, Art and Literature 2018 - 30 citations - Show abstract - Cite 0.4% topic match

0.4%

None

[524] Program synthesis for integer sequence generation Natasha Butt, ..., and Taco Cohen Qualcomm Journal Not Provided None - 1 citations - Show abstract - Cite 0.4% topic match

0.4%

4.1

2023

[525] CodeLMSec Benchmark: Systematically Evaluating and Finding Security Vulnerabilities in Black-Box Code Language Models Hossein Hajipour, ..., and Mario Fritz 2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML) 2023 - 7 citations - Show abstract - Cite - PDF 0.4% topic match

0.3%

2.6

2024

[526] How Beginning Programmers and Code LLMs (Mis)read Each Other S. Nguyen, ..., and Molly Q. Feldman International Conference on Human Factors in Computing Systems 2024 - 2 citations - Show abstract - Cite - PDF 0.3% topic match

0.3%

1.9

2023

[527] LLM4VV: Developing LLM-Driven Testsuite for Compiler Validation Christian Munley, ..., and Sunita Chandrasekaran ArXiv 2023 - 2 citations - Show abstract - Cite - PDF 0.3% topic match

0.3%

5.1

2018

[528] Recent Advances in Neural Program Synthesis Neel Kant ArXiv 2018 - 34 citations - Show abstract - Cite - PDF 0.3% topic match

0.3%

96.7

2021

[529] Measuring Coding Challenge Competence With APPS Dan Hendrycks, ..., and J. Steinhardt ArXiv 2021 - 333 citations - Show abstract - Cite - PDF 0.3% topic match

0.3%

24.2

2022

[530] Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants Gustavo Sandoval, ..., and Brendan Dolan-Gavitt USENIX Security Symposium 2022 - 53 citations - Show abstract - Cite - PDF 0.3% topic match

0.2%

18.1

2023

[531] Accelerating Innovation With Generative AI: AI-Augmented Digital Prototyping and Innovation Methods Volker Bilgram and F. Laarmann IEEE Engineering Management Review 2023 - 33 citations - Show abstract - Cite 0.2% topic match

0.2%

5.6

2024

[532] Exploring the Potential of Generative Artificial Intelligence Deepak Kumar Sahu INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 2024 - 3 citations - Show abstract - Cite 0.2% topic match

0.2%

32.7

2023

[533] Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses Jaromír Šavelka, ..., and M. Sakr Proceedings of the 2023 ACM Conference on International Computing Education Research - Volume 1 2023 - 45 citations - Show abstract - Cite - PDF Abstract: This paper studies recent developments in large language models’ (LLM) abilities to pass assessments in introductory and intermediate Python programming courses at the postsecondary level. The emergence of ChatGPT resulted in heated debates of its potential uses (e.g., exercise generation, code explanation) as well as misuses in programming classes (e.g., cheating). Recent studies show that while the technology performs surprisingly well on diverse sets of assessment instruments employed in typical programming classes the performance is usually not sufficient to pass the courses. The release of GPT-4 largely emphasized notable improvements in the capabilities related to handling assessments originally designed for human test-takers. This study is the necessary analysis in the context of this ongoing transition towards mature generative AI systems. Specifically, we report the performance of GPT-4, comparing it to the previous generations of GPT models, on three Python courses with assessments ranging from simple multiple-choice questions (no code involved) to complex programming projects with code bases distributed into multiple files (599 exercises overall). Additionally, we analyze the assessments that were not handled well by GPT-4 to understand the current limitations of the model, as well as its capabilities to leverage feedback provided by an auto-grader. We found that the GPT models evolved from completely failing the typical programming class’ assessments (the original GPT-3) to confidently passing the courses with no human involvement (GPT-4). While we identified certain limitations in GPT-4’s handling of MCQs and coding exercises, the rate of improvement across the recent generations of GPT models strongly suggests their potential to handle almost any type of assessment widely used in higher education programming courses. These findings could be leveraged by educators and institutions to adapt the design of programming assessments as well as to fuel the necessary discussions into how programming classes should be updated to reflect the recent technological developments. This study provides evidence that programming instructors need to prepare for a world in which there is an easy-to-use widely accessible technology that can be utilized by learners to collect passing scores, with no effort whatsoever, on what today counts as viable programming knowledge and skills assessments. 0.2% topic match

0.2%

0.0

2023

[534] Harnessing Predictive Modeling and Software Analytics in the Age of LLM-Powered Software Development (Invited Talk) Foutse Khomh Proceedings of the 19th International Conference on Predictive Models and Data Analytics in Software Engineering 2023 - 0 citations - Show abstract - Cite 0.2% topic match

0.2%

3.8

2017

[535] AI programmer: autonomously creating software programs using genetic algorithms Kory Becker and Justin Emile Gottschlich Proceedings of the Genetic and Evolutionary Computation Conference Companion 2017 - 27 citations - Show abstract - Cite - PDF 0.2% topic match

0.1%

0.0

2024

[536] Harnessing Generative AI for Manufacturing Innovation: Applications and Opportunities Mofeoluwa Jide-Jegede and Tomiwa Omotesho 2024 International Conference on Artificial Intelligence in Information and Communication (ICAIIC) 2024 - 0 citations - Show abstract - Cite 0.1% topic match

0.1%

6.6

2023

[537] Software Vulnerability Detection using Large Language Models Moumita Das Purba, ..., and Bill Chu 2023 IEEE 34th International Symposium on Software Reliability Engineering Workshops (ISSREW) 2023 - 7 citations - Show abstract - Cite 0.1% topic match

0.1%

42.9

2021

[538] CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks Ruchi Puri, ..., and Ulrich Finkler Journal Not Provided 2021 - 147 citations - Show abstract - Cite - PDF 0.1% topic match

0.1%

1.2

2023

[539] Application of Generative AI for Business Analyst Role Nilesh D Kulkarni and Saurav Bansal Journal of Artificial Intelligence & Cloud Computing 2023 - 1 citations - Show abstract - Cite 0.1% topic match

0.0%

63.3

2021

[540] Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions H. Pearce, ..., and R. Karri 2022 IEEE Symposium on Security and Privacy (SP) 2021 - 202 citations - Show abstract - Cite - PDF 0.0% topic match

Research topic

Report

References

Select the format you want to download:

Download Citations

Citation Text

BibTeX

RIS

Download Citations

Citation Text

BibTeX

RIS

Download Citations

Citation Text

BibTeX

RIS

Download Citations

Citation Text

BibTeX

RIS

Download Citations

Citation Text

BibTeX

RIS

Download Citations

Citation Text

BibTeX

RIS

Download Citations

Citation Text

BibTeX

RIS

Download Citations

Citation Text

BibTeX

RIS

Download Citations

Citation Text

BibTeX

RIS

Download Citations

Citation Text

BibTeX

RIS

Download Citations

Citation Text

BibTeX

RIS

Download Citations

Citation Text

BibTeX

RIS

Download Citations

Citation Text

BibTeX

RIS

Download Citations

Citation Text

BibTeX

RIS

Download Citations

Citation Text

BibTeX

RIS

Download Citations

Citation Text

BibTeX

RIS

Download Citations

Citation Text

BibTeX

RIS

Download Citations

Citation Text

BibTeX

RIS

Download Citations

Citation Text

BibTeX

RIS