留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

面向航天控制软件智能合成技术评价方法

王鹤然 赵明 董晓刚 顾斌 李晓锋 钟睿明

王鹤然, 赵明, 董晓刚, 顾斌, 李晓锋, 钟睿明. 面向航天控制软件智能合成技术评价方法[J]. 空间科学学报. doi: 10.11728/cjss2025.03.2024-0041
引用本文: 王鹤然, 赵明, 董晓刚, 顾斌, 李晓锋, 钟睿明. 面向航天控制软件智能合成技术评价方法[J]. 空间科学学报. doi: 10.11728/cjss2025.03.2024-0041
WANG Heran, ZHAO Ming, DONG Xiaogang, GU Bin, LI Xiaofeng, ZHONG Ruiming. Evaluation Methods for Intelligent Synthesis Technology of Aerospace Control Software (in Chinese). Chinese Journal of Space Science, 2025, 45(3): 1-19 doi: 10.11728/cjss2025.03.2024-0041
Citation: WANG Heran, ZHAO Ming, DONG Xiaogang, GU Bin, LI Xiaofeng, ZHONG Ruiming. Evaluation Methods for Intelligent Synthesis Technology of Aerospace Control Software (in Chinese). Chinese Journal of Space Science, 2025, 45(3): 1-19 doi: 10.11728/cjss2025.03.2024-0041

面向航天控制软件智能合成技术评价方法

doi: 10.11728/cjss2025.03.2024-0041 cstr: 32142.14.cjss.2024-0041
基金项目: 国家自然科学基金项目资助(62192730, 62192735, U21B2015)
详细信息
    作者简介:
    • 王鹤然 男, 1998年3月出生, 现为北京控制工程研究所助理工程师, 主要研究方向为程序合成、航天嵌入式软件智能合成技术评价方法. E-mail: wangheranjisuanji@163.com
    通讯作者:
    • 赵明 男, 1995年2月出生, 现为北京控制工程研究所工程师, 主要研究方向为软件智能合成、嵌入式软件架构设计方法. E-mail: zhaoming0205@outlook.com
  • 中图分类号: TP311.5

Evaluation Methods for Intelligent Synthesis Technology of Aerospace Control Software

  • 摘要: 程序合成是自动生成满足用户意图程序代码的软件开发活动, 随着人工智能在程序合成领域的成功应用, 智能程序合成技术逐渐成为软件开发的新范式. 虽然现有一些智能程序合成技术的评价方法, 但是仍面临许多问题需要进一步完善和改进. 本文通过调研智能程序合成技术使用的评价标准以及分析当前主流智能程序合成技术的评价方法, 分析并完善了智能程序合成技术的评价指标, 并结合航天嵌入式软件的特点, 构建了航天嵌入式软件智能合成的层级式评价指标体系, 设计了以动态和静态相结合为主的面向航天控制软件智能合成技术的综合评价方法. 通过实验验证了其中动静结合评价方法的有效性, 其能够获得与人类评分更高的皮尔逊相关系数.

     

  • 图  1  研究背景

    Figure  1.  Research background

    图  2  金字塔形理论框架

    Figure  2.  Pyramid-based theoretical framework

    图  3  智能程序合成评价方法的主要研究内容

    Figure  3.  Key research aspects in the evaluation methodologies of intelligent program synthesis

    图  4  总结出的智能程序合成技术的评价指标

    Figure  4.  Summarized evaluation indicators for intelligent program synthesis technology

    图  5  完善后的智能程序合成技术的评价指标

    Figure  5.  Refined evaluation indicators for intelligent program synthesis technology

    图  6  面向航天控制软件智能合成技术评价方法的设计思路

    Figure  6.  Design approach for the evaluation method of intelligent synthesis technology for aerospace control software

    图  7  航天嵌入式软件的特点及其相应的评价指标

    Figure  7.  Characteristics of aerospace embedded software and their corresponding evaluation indicators

    图  8  面向航天控制软件智能合成技术的评价指标体系

    Figure  8.  Evaluation indicator system for intelligent synthesis technology of aerospace control software

    图  9  动态和静态相结合的智能程序合成技术的评价方法

    Figure  9.  Evaluation method of intelligent program synthesis technology combining dynamic and static aspects

    图  10  面向航天控制软件智能合成技术的综合评价方法

    Figure  10.  A comprehensive evaluation method for intelligent synthesis technology of aerospace control software

    表  1  智能程序合成技术的评价指标及其使用的频率

    Table  1.   Evaluation indicators of intelligent program synthesis technology and their frequency of use

    评价层面 评价指标 相关文献 使用频率/(%)
    合成结果 程序正确性 [255] 46.49
    程序规模 [3,812,15,16,22,23,31,33,34,39,42,46] 9.73
    程序相似度 [10,11,19,47,48,56] 5.41
    合成过程 合成时间 [6,8,1013,15,1921,24,27,34,36,39,4145,47,49,57] 13.51
    候选程序数量 [7,1214,16,18,24,26,27,31,3941,57] 9.19
    训练合成器 训练数据量 [9,1417,33,35,36,38,44] 6.49
    其他 [11,15,22,25,4446,48,50,51,55,57,58] 9.18
    下载: 导出CSV

    表  2  实验结果

    Table  2.   Experimental results

    评价方法 ChatGLM-6B 模型
    ChatGLM2-6B
    ChatGLM3-6B
    CHRF++ 0.406261948 0.435981525 0.50124219
    AST_MATCH 0.312514349 0.355593869 0.44914179
    DFG_MATCH 0.517394044 0.548241746 0.60469280
    pass@k (k=1) 0.040579268 0.093689024 0.62344512
    pass@k (k=10) 0.135562582 0.192008573 0.79927887
    pass@k (k=100) 0.257184867 0.307372033 0.83987480
    CodeBLEU 0.259490443 0.289887388 0.35368544
    本文评价方法 0.231646673 0.276125765 0.58140539
    下载: 导出CSV

    表  3  ChatGPT3.5模拟人类评分结果

    Table  3.   ChatGPT3.5 simulates human rating results

    模型 ChatGLM-6B ChatGLM2-6B ChatGLM3-6B
    模拟人类评分 2.361817523 2.758730102 4.199085366
    下载: 导出CSV

    表  4  皮尔逊相关系数的计算结果

    Table  4.   Result of the calculation of the Pearson correlation coefficient

    评价方法 ChatGLM-6B 模型
    ChatGLM2-6B
    ChatGLM3-6B
    CHRF++ 0.340366625 0.460577583 0.382847599
    AST_MATCH 0.370650915 0.379543662 0.342105016
    DFG_MATCH 0.379022475 0.339336446 0.155618439
    pass@k (k=1) 0.602161597 0.576083246 0.380828102
    pass@k (k=10) 0.578192141 0.505663582 0.228842017
    pass@k (k=100) 0.457669783 0.442143948 0.151757714
    CodeBLEU 0.418648037 0.453922700 0.358580074
    本文评价方法 0.614578225 0.594448137 0.425688413
    下载: 导出CSV
  • [1] 杨孟飞, 顾斌, 段振华, 等. 嵌入式软件智能合成框架及关键科学问题[J]. 中国空间科学技术, 2022, 42(4): 1-7

    YANG Mengfei, GU Bin, DUAN Zhenhua, et al. Intelligent program synthesis framework and key scientific problems for embedded software[J]. Chinese Space Science and Technology, 2022, 42(4): 1-7
    [2] SHIN R, POLOSUKHIN I, SONG D. Improving neural program synthesis with inferred execution traces[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal: Curran Associates Inc. , 2018: 8931-8940
    [3] HUANG D, ZHANG R, HU X, et al. Neural program synthesis with query[C]//The 10th International Conference on Learning Representations. Virtual Event: OpenReview. net, 2022
    [4] RAMANI G, KARANDE S. Synthesis of mathematical programs from natural language specifications[OL]. arXiv preprint arXiv: 2304. 03287, 2023
    [5] JAIN N, VAIDYANATH S, IYER A, et al. Jigsaw: large language models meet program synthesis[C]//Proceedings of the 44th International Conference on Software Engineering. Pittsburgh: ACM, 2022: 1219-1231
    [6] CHRISTAKOPOULOU K, KALAI A T. Glass-box program synthesis: a machine learning approach[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans: AAAI Press, 2018: 646-653
    [7] ODENA A, SHI K, BIEBER D, et al. BUSTLE: bottom-up program synthesis through learning-guided exploration[C]//The 9th International Conference on Learning Representations. Austria: OpenReview. net, 2021
    [8] DUMANCIC S, GUNS T, CROPPER A. Knowledge refactoring for inductive program synthesis[C]//Proceedings of the 35th AAAI Conference on Artificial Intelligence. Virtual Event: AAAI Press, 2021: 7271-7278
    [9] ROSIN C D. Stepping stones to inductive synthesis of low-level looping programs[C]//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu: AAAI Press, 2019: 2362-2370
    [10] ZOHAR A, WOLF L. Automatic program synthesis of long programs with a learned garbage collector[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal: Curran Associates Inc. , 2018: 2098-2107
    [11] HONG J, DOHAN D, SINGH R, et al. Latent programmer: discrete latent codes for program synthesis[C]//Proceedings of the 38th International Conference on Machine Learning. Virtual Event: PMLR, 2021: 4308-4318
    [12] SHI K, DAI H J, ELLIS K, et al. CROSSBEAM: learning to search in bottom-up program synthesis[C]//The 10th International Conference on Learning Representations. Virtual Event: OpenReview. net, 2022
    [13] KALYAN A, MOHTA A, POLOZOV O, et al. Neural-guided deductive search for real-time program synthesis from examples[C]//The 6th International Conference on Learning Representations. Vancouver: OpenReview. net, 2018
    [14] VALKOV L, CHAUDHARI D, SRIVASTAVA A, et al. HOUDINI: lifelong learning as program synthesis[C]//The 32nd International Conference on Neural Information Processing Systems. Montréal: Curran Associates Inc. , 2018: 8701-8712
    [15] HANDA S, RINARD M C. Inductive program synthesis over noisy data[C]//The 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. USA: ACM, 2020: 87-98
    [16] NYE M I, HEWITT L B, TENENBAUM J B, et al. Learning to infer program sketches[C]//Proceedings of the 36th International Conference on Machine Learning. Long Beach: PMLR, 2019: 4861-4870
    [17] CHEN X Y, SONG D, TIAN Y D. Latent execution for neural program synthesis[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems. Virtual Event: Curran Associates Inc. , 2021: 22196-22208
    [18] FIJALKOW N, LAGARDE G, MATRICON T, et al. Scaling neural program synthesis with distribution-based search[C]//Proceedings of the 36th AAAI Conference on Artificial Intelligence. Virtual Event: AAAI Press, 2022: 6623-6630
    [19] THAKOOR S, SHAH S, RAMAKRISHNAN G, et al. Synthesis of programs from multimodal datasets[C]//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans: AAAI Press, 2018: 184-191
    [20] RAZA M, GULWANI S. Automated data extraction using predictive program synthesis[C]//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco: AAAI Press, 2017: 882-890
    [21] QUIRK C, MOONEY R, GALLEY M. Language to code: learning semantic parsers for if-this-then-that recipes[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Beijing: ACL, 2015: 878-888
    [22] ZHANG Y T. Scalability and precision improvement of neural program synthesis[C]//The 35th IEEE/ACM International Conference on Automated Software Engineering. Melbourne: IEEE, 2020: 1391-1393
    [23] CHASINS S, PHOTHILIMTHANA P M. Data-driven synthesis of full probabilistic programs[C]//The 29th International Conference on Computer Aided Verification. Heidelberg: Springer, 2017: 279-304
    [24] SI X J, LEE W, ZHANG R, et al. Syntax-guided synthesis of datalog programs[C]//The 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Lake Buena Vista: ACM, 2018: 515-527
    [25] LAICH L, BIELIK P, VECHEV M T. Guiding program synthesis by learning to generate examples[C]//The 8th International Conference on Learning Representations. Addis Ababa: OpenReview. net, 2020
    [26] ODENA A, SUTTON C. Learning to represent programs with property signatures[C]//The 8th International Conference on Learning Representations. Addis Ababa: OpenReview. net, 2020
    [27] SI X J, YANG Y, DAI H J, et al. Learning a meta-solver for syntax-guided program synthesis[C]//The 7th International Conference on Learning Representations. New Orleans: OpenReview. net, 2019
    [28] SHIN R, KANT N, GUPTA K, et al. Synthetic datasets for neural program synthesis[C]//The 7th International Conference on Learning Representations. New Orleans: OpenReview. net, 2019
    [29] CHEN X Y, LIU C, SONG D. Execution-guided neural program synthesis[C]//The 7th International Conference on Learning Representations. New Orleans: OpenReview. net, 2019
    [30] BUNEL R, HAUSKNECHT M J, DEVLIN J, et al. Leveraging grammar and reinforcement learning for neural program synthesis[C]//The 6th International Conference on Learning Representations. Vancouver: OpenReview. net, 2018
    [31] POLOSUKHIN I, SKIDANOV A. Neural program search: solving programming tasks from description and examples[C]//The 6th International Conference on Learning Representations. Vancouver: OpenReview. net, 2018
    [32] SHIN R, POLOSUKHIN I, SONG D. Towards specification-directed program repair[C]//The 6th International Conference on Learning Representations. Vancouver: OpenReview. net, 2018
    [33] PARISOTTO E, MOHAMED A R, SINGH R, et al. Neuro-symbolic program synthesis[C]//The 5th International Conference on Learning Representations. Toulon: OpenReview. net, 2017
    [34] BALOG M, GAUNT A L, BROCKSCHMIDT M, et al. DeepCoder: learning to write programs[C]//The 5th International Conference on Learning Representations. Toulon: OpenReview. net, 2017
    [35] ALET F, LOPEZ-CONTRERAS J, KOPPEL J, et al. A large-scale benchmark for few-shot program induction and synthesis[C]//Proceedings of the 38th International Conference on Machine Learning. Virtual Event: PMLR, 2021: 175-186
    [36] PU Y W, MIRANDA Z, SOLAR-LEZAMA A, et al. Selecting representative examples for program synthesis[C]//Proceedings of the 35th International Conference on Machine Learning. Stockholm: PMLR, 2018: 4158-4167
    [37] SUN S H, NOH H, SOMASUNDARAM S, et al. Neural program synthesis from diverse demonstration videos[C]//Proceedings of the 35th International Conference on Machine Learning. Stockholm: PMLR, 2018: 4797-4806
    [38] DEVLIN J, UESATO J, BHUPATIRAJU S, et al. RobustFill: neural program learning under noisy I/O[C]//Proceedings of the 34th International Conference on Machine Learning. Sydney: PMLR, 2017: 990-998
    [39] MENON A K, TAMUZ O, GULWANI S, et al. A machine learning framework for programming by example[C]//Proceedings of the 30th International Conference on Machine Learning. Atlanta: PMLR, 2013: 187-195
    [40] GU X D, ZHANG H Y, KIM S. Deep code search[C]//Proceedings of the 40th International Conference on Software Engineering. Gothenburg: ACM, 2018: 933-944
    [41] DESAI A, GULWANI S, HINGORANI V, et al. Program synthesis using natural language[C]//Proceedings of the 38th International Conference on Software Engineering. Austin: ACM, 2016: 345-356
    [42] SHRIVASTAVA D, LAROCHELLE H, TARLOW D. Learning to combine per-example solutions for neural program synthesis[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems. Virtual Event: Curran Associates Inc. , 2021: 6102-6114
    [43] CUI G F, ZHU H. Differentiable synthesis of program architectures[C]//Proceedings of the 35th International Conference on Neural Information Processing Systems. Virtual Event: Curran Associates Inc. , 2021: 11123-11135
    [44] YANG Y D, INALA J P, BASTANI O, et al. Program synthesis guided reinforcement learning for partially observed environments[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Virtual Event: Curran Associates Inc. , 2021: 29669-29683
    [45] SHAH A, ZHAN E, SUN J J, et al. Learning differentiable programs with admissible neural heuristics[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Virtual Event: Curran Associates Inc. , 2020: 4940-4952
    [46] GUPTA K, CHRISTENSEN P E, CHEN X Y, et al. Synthesize, execute and debug: learning to repair for neural program synthesis[C]//Proceedings of the 34th International Conference on Neural Information Processing Systems. Virtual Event: Curran Associates Inc. , 2020: 17685-17695
    [47] ELLIS K, NYE M, PU Y W, et al. Write, execute, assess: program synthesis with a REPL[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc. , 2019: 9165-9174
    [48] SHIN R, ALLAMANIS M, BROCKSCHMIDT M, et al. Program synthesis and semantic parsing with learned code idioms[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc. , 2019: 10824-10834
    [49] ELLIS K, MORALES L, SABLÉ-MEYER M, et al. Learning libraries of subroutines for neurally-guided Bayesian program induction[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal: Curran Associates Inc. , 2018: 7816-7826
    [50] ZHANG L, ROSENBLATT G, FETAYA E, et al. Neural guided constraint logic programming for program synthesis[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal: Curran Associates Inc. , 2018: 1744-1753
    [51] LIANG C, NOROUZI M, BERANT J, et al. Memory augmented policy optimization for program synthesis and semantic parsing[C]// Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal: Curran Associates Inc. , 2018: 10015-10027
    [52] CHEN X Y, LIU C, SHIN R, et al. Latent attention for if-then program synthesis[C]//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona: Curran Associates Inc. , 2016: 4581-4589
    [53] ELLIS K, SOLAR-LEZAMA A, TENENBAUM J B. Unsupervised learning by program synthesis[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems. Montréal: MIT Press, 2015: 973-981
    [54] ELLIS K, WONG C, NYE M, et al. DreamCoder: bootstrapping inductive program synthesis with wake-sleep library learning[C]//The 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation. Canada: ACM, 2021: 835-850
    [55] BEDNAREK J, PIASKOWSKI K, KRAWIEC K. Ain’t nobody got time for coding: structure-aware program synthesis from natural language[OL]. arXiv preprint arXiv: 1810. 09717, 2019
    [56] MURALI V, QI L, CHAUDHURI S, et al. Neural sketch learning for conditional program generation[C]//The 6th International Conference on Learning Representations. Vancouver: OpenReview. net, 2018
    [57] RAGHOTHAMAN M, WEI Y, HAMADI Y. SWIM: synthesizing what I mean-code search and idiomatic snippet synthesis[C]//2016 IEEE/ACM 38th International Conference on Software Engineering. Austin: ACM, 2016: 357-367
    [58] BHUPATIRAJU S, AGRAWAL K K, SINGH R. Towards mixed optimization for reinforcement learning with program synthesis[OL]. arXiv preprint arXiv: 1807. 00403, 2018
    [59] PAPINENI K, ROUKOS S, WARD T, et al. BLEU: a method for automatic evaluation of machine translation[C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Philadelphia: ACL, 2002: 311-318
    [60] KULAL S, PASUPAT P, CHANDRA K, et al. SPoC: search-based pseudocode to code[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver: Curran Associates Inc. , 2019: 11883-11894
    [61] CHEN M, TWOREK J, JUN H, et al. Evaluating large language models trained on code[OL]. arXiv preprint arXiv: 2107. 03374, 2021
    [62] BANERJEE S, LAVIE A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments[C]//Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Ann Arbor: ACL, 2005: 65-72
    [63] POPOVIĆ M. chrF: character n-gram F-score for automatic MT evaluation[C]//Proceedings of the 10th Workshop on Statistical Machine Translation. Lisbon: ACL, 2015: 392-395
    [64] POPOVIĆ M. chrF++: words helping character n-gram[C]//Proceedings of the 2nd Conference on Machine Translation. Copenhagen: ACL, 2017: 612-618
    [65] TRAN N, TRAN H, NGUYEN S, et al. Does BLEU score work for code migration?[C]//IEEE/ACM 27th International Conference on Program Comprehension. Montréal: IEEE, 2019: 165-176
    [66] REN S, GUO D Y, LU S, et al. CodeBLEU: a method for automatic evaluation of code synthesis[OL]. arXiv preprint arXiv: 2009. 10297, 2020
    [67] PAN Y, LYU C. Measuring efficient code generation with GEC[C]//The 14th Asia-Pacific Symposium on Internetware. Hangzhou: ACM, 2023: 249-258
    [68] IMPROTA C. Poisoning programs by un-repairing code: security concerns of AI-generated code[C]//IEEE 34th International Symposium on Software Reliability Engineering Workshops. Florence: IEEE, 2023: 128-131
    [69] SIDDIQ M L, SANTOS J C S. SecurityEval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques[C]//The 1st International Workshop on Mining Software Repositories Applications for Privacy and Security. Singapore: ACM, 2022: 29-33
    [70] SU H R, AI J, YU D, et al. An evaluation method for large language models’ code generation capability[C]//The 10th International Conference on Dependable Systems and Their Applications. Tokyo: IEEE, 2023: 831-838
    [71] KOVALCHUK S, FEDRUSHKOV D, LOMSHAKOV V, et al. Test-based and metric-based evaluation of code generation models for practical question answering[C]//The International Conference on Code Quality. St. Petersburg: IEEE, 2023: 73-86
  • 加载中
图(10) / 表(4)
计量
  • 文章访问数:  295
  • HTML全文浏览量:  53
  • PDF下载量:  19
  • 被引次数: 

    0(来源:Crossref)

    0(来源:其他)

出版历程
  • 收稿日期:  2024-03-14
  • 修回日期:  2024-04-25
  • 网络出版日期:  2024-06-20

目录

    /

    返回文章
    返回