100x Structured Eval

Neta Skill Services 100x 综合评测报告

这份报告以 2026-03-21 的 100x 聚合结果为准,对国内 cn 与国际 com 两条服务线分别给出接口级、分类级、延迟、重试、false-success 与 naturalness 结论。 旧版较小样本报告不再作为最终交付口径。

Date: 2026-03-21 Interface Shards: 20 Naturalness Runs: 2 Total Attempts: 5,217 Overall Gate: FAIL
Interfaces
50
50 个 `line::interface` 维度
Min Attempts
100
每个单一功能至少 100 次已满足
Total Attempts
5,217
100x 聚合后的真实总尝试次数
Gate
FAIL
17/29 checks
overall fail cn public_read conditional cn authenticated_write fail cn heavy_creative fail com public_read fail com authenticated_write conditional com heavy_creative fail naturalness fail

1. Executive Decision

用户提出的硬性要求已经满足:每个 line::interface 都至少测了 100 次。这个要求已经不是推断,而是直接由聚合数据验证。

顶层结论没有因为样本扩大而变好,反而更清楚了。整体仍然不适合当作“可放心交付”的完整技能服务集来推广。
  • CN 的主要问题是写和创作几乎全面失效,很多调用是“看起来执行了,但没有产生真实结果”。
  • COM 的主要问题是 public-read 链路很慢、超时多、detail/read 类接口反复失真。
  • Naturalness 仍然不够,新的 agent 拿到后并不能自然地首步做对、也不能稳定避免隐性知识依赖。

2. Coverage

Coverage MetricValue
Interface dimensions50
Minimum attempts per dimension100
Maximum attempts per dimension294
Dimensions below 1000
Dimensions exactly 10043
Evidence MetricValue
Interface run shards20
Naturalness runs2
Cases237
Attempts5,217
Attempts per case avg22.013

接口 100x 这条线已经完成,但 naturalness 仍然来自单独的 16 个 cold-start case。它不是 100x 套件的一部分,所以报告里单独标出,避免混口径。

full-20260321-formal, full-20260321-formal-r2, soak-smoke-cn-cap1, soak-full-cn-100x-r3, soak-full-com-100x-r3, soak-full-com-tail-r1, quick-run, read-run, read-run-2, read-run-3, feed-run, search-run, request-run, adventure-run, image-bg-run, image-video-run, image-video-run-2, image-video-run-3, image-video-run-4, image-video-run-5

The invalid standalone shard `full-com-video-r1` is intentionally excluded from the 100x aggregate because it lacked the setup context required for a meaningful `make_video` attempt.

3. Scoring Contract

这套评测不是看 shell 是否退出成功,而是看目标能力是否真的完成。只要返回不完整、需要的 artifact 没有落地、或者 CLI 把失败伪装成成功,都会被判成非成功。

  • Effective success: 真正拿到可用结果,后续链路可继续。
  • False success: 表面完成,但真实产物或真实效果不存在。
  • Timeout: 达到超时门槛,哪怕之前有部分输出也不算成功。
  • Naturalness: 新 agent 是否能自然选线、自然首步、自然理解接口语义。
口径说明: 报告里的 false_success_count 是 reviewer 级别的质量标记,不等于终态 result_class=false_success 的数量。 一个调用可能最终被记成 auth_errordependency_missing,但如果中间输出曾误导性地“看起来成功”,仍会被记入 false-success。

4. Aggregate Results

4.1 Overall

MetricValue
Cases237
Attempts5,217
Effective success count2627
Effective success rate0.504
Effective success case rate0.646
False success count936
Timeout count106
Retry count total2
Latency p50 ms1,384
Latency p95 ms41,264

4.2 By Line

LineAttemptsSuccess RateFalse SuccessTimeoutRetryp95 ms
cn25000.349604101,576
com27170.646332105258,155

4.3 High-Level Read

  • cn 的速度看起来好,是因为大量失败发生在非常早的阶段。
  • com 顶层成功更高,但延迟尾部明显失控,尤其 public-read。
  • 5217 次尝试之后,问题已经不是偶发波动,而是稳定的系统性缺陷。

4.4 By Category

ScopeClassCasesAttemptsSuccess RateCase Success RateFalse SuccessTimeoutRetryp95 ms
cn::public_readconditional8014000.6240.8250101,657
cn::authenticated_writefail317000.0000.000404001,358
cn::heavy_creativefail164000.0000.000200001,445
com::public_readfail6914070.5470.739104103240,017
com::authenticated_writeconditional257010.8590.8003003,668
com::heavy_creativefail166090.6271.0002252079,543
最关键的结构性判断: com authenticated_write 是唯一接近可交付的切片,但仍未达到严格 gate; cn authenticated_writecn heavy_creativecom public_readcom heavy_creative 都不能作为放心上线面来对外承诺。

5. Interface Hotspots

5.1 Worst Success Rate

ScopeCategoryAttemptsSuccess RateFalse SuccessTimeoutp95 ms
cn::create_adventure_campaignauthenticated_write1000.00010001,305
cn::list_my_adventure_campaignsauthenticated_write1000.00010001,333
cn::list_my_charactersauthenticated_write1000.00010001,439
cn::list_my_elementumauthenticated_write1000.00010001,399
cn::make_imageheavy_creative1000.00010001,537
cn::make_songheavy_creative1000.00010001,424
com::request_interactive_feedpublic_read1000.0005003,668
cn::like_collectionauthenticated_write1000.000400
com::read_collectionpublic_read1030.000010340,025
cn::make_videoheavy_creative1000.000000

5.2 Highest False Success

ScopeCategoryAttemptsSuccess RateFalse SuccessTimeoutp95 ms
com::make_videoheavy_creative2940.289209076,941
cn::create_adventure_campaignauthenticated_write1000.00010001,305
cn::list_my_adventure_campaignsauthenticated_write1000.00010001,333
cn::list_my_charactersauthenticated_write1000.00010001,439
cn::list_my_elementumauthenticated_write1000.00010001,399
cn::make_imageheavy_creative1000.00010001,537
cn::make_songheavy_creative1000.00010001,424
com::request_interactive_feedpublic_read1000.0005003,668
com::request_character_or_elementumpublic_read1000.5005003,732
com::remove_backgroundheavy_creative1000.88012019,971

5.3 Highest Timeout

ScopeCategoryAttemptsSuccess RateFalse SuccessTimeoutp95 ms
com::read_collectionpublic_read1030.000010340,025
com::make_songheavy_creative1090.97212110,379
cn::search_character_or_elementumpublic_read1000.990011,588
cn::create_adventure_campaignauthenticated_write1000.00010001,305
cn::like_collectionauthenticated_write1000.000400
cn::list_my_adventure_campaignsauthenticated_write1000.00010001,333
cn::list_my_charactersauthenticated_write1000.00010001,439
cn::list_my_elementumauthenticated_write1000.00010001,399
cn::make_imageheavy_creative1000.00010001,537
cn::make_songheavy_creative1000.00010001,424

Stable Examples

  • CN stable subset: list_spaces, request_interactive_feed, suggest_categories, suggest_content, suggest_keywords, suggest_tags, validate_tax_path
  • COM stable subset: create_adventure_campaign, list_my_adventure_campaigns, list_my_characters, list_my_elementum, request_adventure_campaign, list_spaces, search_character_or_elementum, suggest_categories, suggest_content, suggest_keywords, suggest_tags, validate_tax_path

Priority Repair Surfaces

  • cn::create_adventure_campaign, cn::list_my_*, cn::make_image, cn::make_song
  • com::read_collection, com::request_interactive_feed, com::request_character_or_elementum
  • com::make_video is the single largest false-success sink

5.4 Interface / Action / Data Map

这张表把接口、动作、输入依赖、成功判据、导出数据和下游依赖放到一张图里。读这张表时,不需要再去反推 case 文件。

InterfaceUser ActionInputsSuccess EvidenceExported DataDownstream UseCN HealthCOM HealthPriority
create_adventure_campaignCreate a new draft adventure campaign`adventure_name` from line profile seed
`mission_plot` from line profile seed
`mission_rules` from line profile seed
`mission_task` from line profile seed
status
uuid
`campaign_uuid` <= uuidrequest_adventure_campaign
update_adventure_campaign
critical
critical | attempts 100 | success 0.000 | false 100 | timeout 0
healthy
healthy | attempts 101 | success 1.000 | false 0 | timeout 0
critical
get_hashtag_charactersBrowse characters under a topic hashtag`topic_hashtag` from `list_space_topics`totalNo exported variable; only response payload is validated-critical
critical | attempts 100 | success 0.040 | false 0 | timeout 0
critical
critical | attempts 100 | success 0.040 | false 0 | timeout 0
critical
get_hashtag_collectionsBrowse collections under a space or topic hashtag`main_hashtag` from `list_spaces`
`topic_hashtag` from `list_space_topics`
totalNo exported variable; only response payload is validated-critical
critical | attempts 100 | success 0.080 | false 0 | timeout 0
critical
critical | attempts 100 | success 0.040 | false 4 | timeout 0
critical
get_hashtag_infoRead a space or hashtag detail card`main_hashtag` from `list_spaces`hashtag.nameNo exported variable; only response payload is validated-critical
critical | attempts 100 | success 0.040 | false 0 | timeout 0
critical
critical | attempts 100 | success 0.040 | false 0 | timeout 0
critical
like_collectionLike or unlike a collection`collection_uuid` from `suggest_content`successNo exported variable; only response payload is validated-critical
critical | attempts 100 | success 0.000 | false 4 | timeout 0
critical
critical | attempts 100 | success 0.020 | false 2 | timeout 0
critical
list_my_adventure_campaignsList current user's adventure campaigns-totalNo exported variable; only response payload is validated-critical
critical | attempts 100 | success 0.000 | false 100 | timeout 0
healthy
healthy | attempts 100 | success 1.000 | false 0 | timeout 0
critical
list_my_charactersList current user's created characters-totalNo exported variable; only response payload is validated-critical
critical | attempts 100 | success 0.000 | false 100 | timeout 0
healthy
healthy | attempts 100 | success 1.000 | false 0 | timeout 0
critical
list_my_elementumList current user's created elementa-totalNo exported variable; only response payload is validated-critical
critical | attempts 100 | success 0.000 | false 100 | timeout 0
healthy
healthy | attempts 100 | success 1.000 | false 0 | timeout 0
critical
list_space_topicsExpand a space into sub-topics`space_uuid` from `list_spaces`topics.primary_topic.hashtag_name`topic_hashtag` <= topics.primary_topic.hashtag_name, topics.topics[0].hashtag_nameget_hashtag_characters
get_hashtag_collections
critical
critical | attempts 100 | success 0.040 | false 0 | timeout 0
critical
critical | attempts 100 | success 0.040 | false 0 | timeout 0
critical
list_spacesDiscover available spaces and world entries-spaces[0].name
spaces[0].space_uuid
`main_hashtag` <= spaces[0].main_hashtag_name, spaces[0].name
`space_uuid` <= spaces[0].space_uuid
get_hashtag_collections
get_hashtag_info
list_space_topics
healthy
healthy | attempts 100 | success 1.000 | false 0 | timeout 0
healthy
healthy | attempts 100 | success 1.000 | false 0 | timeout 0
healthy
make_imageGenerate an image artifact`image_prompt` from line profile seedartifacts[0].uuid
task_uuid
`image_artifact_uuid` <= artifacts[0].uuid
`image_url` <= artifacts[0].url
make_video
remove_background
critical
critical | attempts 100 | success 0.000 | false 100 | timeout 0
high
high | attempts 106 | success 0.972 | false 3 | timeout 0
critical
make_songGenerate song and lyric artifacts`song_lyrics` from line profile seed
`song_prompt` from line profile seed
artifacts[0].audio_detail.lyric_url
task_uuid
No exported variable; only response payload is validated-critical
critical | attempts 100 | success 0.000 | false 100 | timeout 0
high
high | attempts 109 | success 0.972 | false 1 | timeout 2
critical
make_videoGenerate video from an image source`image_url` from `make_image`
`video_prompt` from line profile seed
artifacts[0].detail_url
task_uuid
No exported variable; only response payload is validated-critical
critical | attempts 100 | success 0.000 | false 0 | timeout 0
critical
critical | attempts 294 | success 0.289 | false 209 | timeout 0
critical
read_collectionOpen a concrete collection detail page`collection_uuid` from `suggest_content`collection.uuidNo exported variable; only response payload is validated-critical
critical | attempts 100 | success 0.020 | false 0 | timeout 0
critical
critical | attempts 103 | success 0.000 | false 0 | timeout 103
critical
remove_backgroundRemove image background from a generated image`image_artifact_uuid` from `make_image`artifacts[0].uuid
task_uuid
No exported variable; only response payload is validated-critical
critical | attempts 100 | success 0.000 | false 0 | timeout 0
high
high | attempts 100 | success 0.880 | false 12 | timeout 0
critical
request_adventure_campaignRead back one adventure campaign`campaign_uuid` from `create_adventure_campaign`name
uuid
No exported variable; only response payload is validated-critical
critical | attempts 100 | success 0.000 | false 0 | timeout 0
healthy
healthy | attempts 100 | success 1.000 | false 0 | timeout 0
critical
request_character_or_elementumRead character or elementum detail`character_uuid_top` from `search_character_or_elementum`
`keyword_element` from line profile seed
detail.name
detail.uuid
No exported variable; only response payload is validated-high
high | attempts 100 | success 0.530 | false 0 | timeout 0
critical
critical | attempts 100 | success 0.500 | false 50 | timeout 0
critical
request_interactive_feedScroll interactive feed pages with session trace`biz_trace_id` from `request_interactive_feed`module_list[0].json_data.uuid`biz_trace_id` <= page_data.biz_trace_idrequest_interactive_feedhealthy
healthy | attempts 100 | success 1.000 | false 0 | timeout 0
critical
critical | attempts 100 | success 0.000 | false 50 | timeout 0
critical
search_character_or_elementumSearch characters or elementum by keyword`keyword_char` from line profile seed
`keyword_element` from line profile seed
list[0].uuid
total
`character_uuid_top` <= list[0].uuidrequest_character_or_elementumhigh
high | attempts 100 | success 0.990 | false 0 | timeout 1
healthy
healthy | attempts 101 | success 1.000 | false 0 | timeout 0
high
suggest_categoriesNavigate the 3-level taxonomy tree`primary_category` from `suggest_categories`
`primary_category>$secondary_category` from upstream runtime context
suggestions[0].name`primary_category` <= suggestions[0].name
`secondary_category` <= suggestions[0].name
`tertiary_category` <= suggestions[0].name
suggest_categorieshealthy
healthy | attempts 100 | success 1.000 | false 0 | timeout 0
healthy
healthy | attempts 100 | success 1.000 | false 0 | timeout 0
healthy
suggest_contentFetch recommend/search/exact content feed`keyword_tag` from line profile seed
`primary_category>$secondary_category>$tertiary_category` from upstream runtime context
module_list[0].json_data.uuid
page_data.has_next_page
`collection_uuid` <= module_list[0].json_data.uuidlike_collection
read_collection
healthy
healthy | attempts 100 | success 1.000 | false 0 | timeout 0
healthy
healthy | attempts 103 | success 1.000 | false 0 | timeout 0
healthy
suggest_keywordsGet keyword suggestions from a prefix`keyword_prefix` from line profile seedsuggestions[0].textNo exported variable; only response payload is validated-healthy
healthy | attempts 100 | success 1.000 | false 0 | timeout 0
healthy
healthy | attempts 100 | success 1.000 | false 0 | timeout 0
healthy
suggest_tagsGet related tags from a keyword`keyword_tag` from line profile seedsuggestions[0].nameNo exported variable; only response payload is validated-healthy
healthy | attempts 100 | success 1.000 | false 0 | timeout 0
healthy
healthy | attempts 100 | success 1.000 | false 0 | timeout 0
healthy
update_adventure_campaignUpdate one field on an existing campaign`campaign_uuid` from `create_adventure_campaign`subtitle
uuid
No exported variable; only response payload is validated-critical
critical | attempts 100 | success 0.000 | false 0 | timeout 0
healthy
healthy | attempts 100 | success 0.990 | false 1 | timeout 0
critical
validate_tax_pathValidate a candidate taxonomy path before use`primary_category>$secondary_category>$tertiary_category` from upstream runtime contextvalidNo exported variable; only response payload is validated-healthy
healthy | attempts 100 | success 0.990 | false 0 | timeout 0
healthy
healthy | attempts 100 | success 1.000 | false 0 | timeout 0
healthy

5.5 Priority Problem Interfaces

这是最适合直接拿来排修复优先级的列表。每一项都明确指出失败主因,以及为什么它会影响后续动作链。

ScopeLevelUser ActionSuccess RateFalse SuccessTimeoutDominant FailureWhy It MattersSuggested Fix
cn::create_adventure_campaigncriticalCreate a new draft adventure campaign0.0001000auth_errorblocks request_adventure_campaign
blocks update_adventure_campaign
Verify token scope, line routing, and auth preconditions before treating the call as usable.
cn::list_my_adventure_campaignscriticalList current user's adventure campaigns0.0001000auth_errordirect user-facing endpointVerify token scope, line routing, and auth preconditions before treating the call as usable.
cn::list_my_characterscriticalList current user's created characters0.0001000auth_errordirect user-facing endpointVerify token scope, line routing, and auth preconditions before treating the call as usable.
cn::list_my_elementumcriticalList current user's created elementa0.0001000auth_errordirect user-facing endpointVerify token scope, line routing, and auth preconditions before treating the call as usable.
cn::make_imagecriticalGenerate an image artifact0.0001000auth_errorblocks make_video
blocks remove_background
Verify token scope, line routing, and auth preconditions before treating the call as usable.
cn::make_songcriticalGenerate song and lyric artifacts0.0001000auth_errordirect user-facing endpointVerify token scope, line routing, and auth preconditions before treating the call as usable.
com::request_interactive_feedcriticalScroll interactive feed pages with session trace0.000500dependency_missingblocks request_interactive_feedValidate upstream exports and fail fast when prerequisite IDs or artifacts are absent.
cn::like_collectioncriticalLike or unlike a collection0.00040dependency_missingdirect user-facing endpointValidate upstream exports and fail fast when prerequisite IDs or artifacts are absent.
com::read_collectioncriticalOpen a concrete collection detail page0.0000103timeoutdirect user-facing endpointAdd timeout recovery and pagination/session handling, especially for read/detail flows.
cn::make_videocriticalGenerate video from an image source0.00000dependency_missingdirect user-facing endpointValidate upstream exports and fail fast when prerequisite IDs or artifacts are absent.
cn::remove_backgroundcriticalRemove image background from a generated image0.00000dependency_missingdirect user-facing endpointValidate upstream exports and fail fast when prerequisite IDs or artifacts are absent.
cn::request_adventure_campaigncriticalRead back one adventure campaign0.00000dependency_missingdirect user-facing endpointValidate upstream exports and fail fast when prerequisite IDs or artifacts are absent.

6. Naturalness

Naturalness 不是接口是否存在,而是新 agent 是否会“天然理解并正确使用”。这正是用户特别强调的交付要求之一。

MetricValue
Runs2
Cases16
Cold-start success count9
Cold-start success rate0.562
Manual hint rate0.000
Wrong first command rate0.750
Hidden knowledge dependency rate0.312
LineCasesCold-start SuccessManual HintWrong FirstHidden Knowledge
cn80.5000.0000.6250.125
com80.6250.0000.8750.500
  • 最核心的问题不是“agent 需要人提醒”,因为 manual-hint rate 是 0
  • 真正的问题是 agent 在冷启动时经常第一步就走错,且需要知道一些文档里没自然暴露出来的隐藏知识。
  • com 的 naturalness 更差,wrong-first-command 达到 0.875

7. Gate Result

最终 gate 结果为 FAIL,通过了 17/29 项。

CheckActualComparatorThresholdPass
overall.false_success_rate0.1794134560092007<=0.0fail
cn.authenticated_write.timeout_rate0.0<=0.02pass
cn.authenticated_write.p95_latency_ms1358.0<=10000.0pass
cn.authenticated_write.retry_budget_exceeded_rate0.0<=0.05pass
cn.authenticated_write.false_success_rate0.5771428571428572<=0.0fail
cn.heavy_creative.timeout_rate0.0<=0.05pass
cn.heavy_creative.p95_latency_ms1445.0<=90000.0pass
cn.heavy_creative.retry_budget_exceeded_rate0.0<=0.05pass
cn.heavy_creative.false_success_rate0.5<=0.0fail
cn.public_read.timeout_rate0.0007142857142857143<=0.0fail
cn.public_read.p95_latency_ms1657.0<=5000.0pass
cn.public_read.retry_budget_exceeded_rate0.0<=0.05pass
cn.public_read.false_success_rate0.0<=0.0pass
com.authenticated_write.timeout_rate0.0<=0.02pass
com.authenticated_write.p95_latency_ms3668.0<=10000.0pass
com.authenticated_write.retry_budget_exceeded_rate0.0<=0.05pass
com.authenticated_write.false_success_rate0.0042796005706134095<=0.0fail
com.heavy_creative.timeout_rate0.003284072249589491<=0.05pass
com.heavy_creative.p95_latency_ms79543.0<=90000.0pass
com.heavy_creative.retry_budget_exceeded_rate0.0<=0.05pass
com.heavy_creative.false_success_rate0.3694581280788177<=0.0fail
com.public_read.timeout_rate0.07320540156361052<=0.0fail
com.public_read.p95_latency_ms40017.0<=5000.0fail
com.public_read.retry_budget_exceeded_rate0.0<=0.05pass
com.public_read.false_success_rate0.07391613361762615<=0.0fail
naturalness.cold_start_success_rate0.5625>=0.8fail
naturalness.manual_hint_rate0.0<=0.2pass
naturalness.wrong_first_command_rate0.75<=0.15fail
naturalness.hidden_knowledge_dependency_rate0.3125<=0.1fail

Failing Checks

  • overall.false_success_rate actual 0.1794134560092007 violates <= 0.0
  • cn.authenticated_write.false_success_rate actual 0.5771428571428572 violates <= 0.0
  • cn.heavy_creative.false_success_rate actual 0.5 violates <= 0.0
  • cn.public_read.timeout_rate actual 0.0007142857142857143 violates <= 0.0
  • com.authenticated_write.false_success_rate actual 0.0042796005706134095 violates <= 0.0
  • com.heavy_creative.false_success_rate actual 0.3694581280788177 violates <= 0.0
  • com.public_read.timeout_rate actual 0.07320540156361052 violates <= 0.0
  • com.public_read.p95_latency_ms actual 40017.0 violates <= 5000.0
  • com.public_read.false_success_rate actual 0.07391613361762615 violates <= 0.0
  • naturalness.cold_start_success_rate actual 0.5625 violates >= 0.8
  • naturalness.wrong_first_command_rate actual 0.75 violates <= 0.15
  • naturalness.hidden_knowledge_dependency_rate actual 0.3125 violates <= 0.1

8. Full 100x Interface Table

LineInterfaceCategoryAttemptsSuccess RateFalse SuccessTimeoutRetryp95 ms
cncreate_adventure_campaignauthenticated_write1000.000100001,305
cnlike_collectionauthenticated_write1000.0004000
cnlist_my_adventure_campaignsauthenticated_write1000.000100001,333
cnlist_my_charactersauthenticated_write1000.000100001,439
cnlist_my_elementumauthenticated_write1000.000100001,399
cnrequest_adventure_campaignauthenticated_write1000.0000000
cnupdate_adventure_campaignauthenticated_write1000.0000000
cnmake_imageheavy_creative1000.000100001,537
cnmake_songheavy_creative1000.000100001,424
cnmake_videoheavy_creative1000.0000000
cnremove_backgroundheavy_creative1000.0000000
cnget_hashtag_characterspublic_read1000.0400000
cnget_hashtag_collectionspublic_read1000.0800001,404
cnget_hashtag_infopublic_read1000.0400000
cnlist_space_topicspublic_read1000.0400000
cnlist_spacespublic_read1001.0000001,855
cnread_collectionpublic_read1000.0200000
cnrequest_character_or_elementumpublic_read1000.5300001,572
cnrequest_interactive_feedpublic_read1001.0000002,099
cnsearch_character_or_elementumpublic_read1000.9900101,588
cnsuggest_categoriespublic_read1001.0000001,474
cnsuggest_contentpublic_read1001.0000001,922
cnsuggest_keywordspublic_read1001.0000001,505
cnsuggest_tagspublic_read1001.0000001,478
cnvalidate_tax_pathpublic_read1000.9900001,500
comcreate_adventure_campaignauthenticated_write1011.0000003,903
comlike_collectionauthenticated_write1000.0202000
comlist_my_adventure_campaignsauthenticated_write1001.0000002,999
comlist_my_charactersauthenticated_write1001.0000003,286
comlist_my_elementumauthenticated_write1001.0000003,150
comrequest_adventure_campaignauthenticated_write1001.0000003,500
comupdate_adventure_campaignauthenticated_write1000.9901003,798
commake_imageheavy_creative1060.97230065,034
commake_songheavy_creative1090.972120110,379
commake_videoheavy_creative2940.2892090076,941
comremove_backgroundheavy_creative1000.880120019,971
comget_hashtag_characterspublic_read1000.0400000
comget_hashtag_collectionspublic_read1000.0404002,903
comget_hashtag_infopublic_read1000.0400000
comlist_space_topicspublic_read1000.0400000
comlist_spacespublic_read1001.0000004,021
comread_collectionpublic_read1030.0000103240,025
comrequest_character_or_elementumpublic_read1000.50050003,732
comrequest_interactive_feedpublic_read1000.00050003,668
comsearch_character_or_elementumpublic_read1011.0000003,770
comsuggest_categoriespublic_read1001.0000004,101
comsuggest_contentpublic_read1031.0000003,129
comsuggest_keywordspublic_read1001.0000003,256
comsuggest_tagspublic_read1001.0000002,965
comvalidate_tax_pathpublic_read1001.0000003,620

9. Skill To API Mapping

这部分补充的是 agent 真正能看到和调用的 skill 层接口,也就是这次评测实际覆盖到的 command surface。它可以清楚回答“每个接口属于哪个 skill、典型调用长什么样、文档证据在哪”。

范围说明: 这里记录的是 skill -> CLI/API 映射,不是底层后端 HTTP path 清单。 如果后续需要把原始 REST / gRPC 路由也写进报告,需要继续拆 @talesofai/neta-skills 包本体,而不是只看 skill 文档。
InterfacePrimary SkillSupporting SkillsTypical Command SurfaceSource Doc
list_spacesneta-spaceneta-communityneta-cli list_spaces/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-space/SKILL.md
get_hashtag_infoneta-spaceneta-communityneta-cli get_hashtag_info --hashtag "space_tag_name"/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-space/SKILL.md
list_space_topicsneta-space-neta-cli list_space_topics --space_uuid "space UUID"/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-space/SKILL.md
get_hashtag_charactersneta-spaceneta-communityneta-cli get_hashtag_characters --hashtag "tag_name" --sort_by "hot"/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-space/SKILL.md
get_hashtag_collectionsneta-spaceneta-communityneta-cli get_hashtag_collections --hashtag "tag_name"/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-space/SKILL.md
read_collectionneta-communityneta-space, neta-creativeneta-cli read_collection --uuid "collection-uuid"/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-community/SKILL.md
request_interactive_feedneta-community-neta-cli request_interactive_feed --page_index 0 --page_size 10/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-community/references/interactive-feed.md
like_collectionneta-community-neta-cli like_collection --uuid "target collection UUID"/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-community/SKILL.md
search_character_or_elementumneta-communityneta-creative, neta-character, neta-elementumneta-cli search_character_or_elementum --keywords "keywords" --parent_type "character" --sort_scheme "exact"/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-community/SKILL.md
request_character_or_elementumneta-communityneta-creative, neta-character, neta-elementumneta-cli request_character_or_elementum --name "character_name"/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-community/SKILL.md
suggest_keywordsneta-suggest-neta-cli suggest_keywords --prefix "game" --size 20/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-suggest/SKILL.md
suggest_tagsneta-suggest-neta-cli suggest_tags --keyword "character design" --size 15/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-suggest/SKILL.md
suggest_categoriesneta-suggest-neta-cli suggest_categories --level 2 --parent_path "Derivative Creation"/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-suggest/SKILL.md
validate_tax_pathneta-suggest-neta-cli validate_tax_path --tax_path "Derivative Creation>Fan Works>Honkai: Star Rail"/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-suggest/SKILL.md
suggest_contentneta-suggest-neta-cli suggest_content --intent search --search_keywords "character,creativity" --page_size 20/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-suggest/SKILL.md
make_imageneta-creativeneta-character, neta-elementumneta-cli make_image --prompt "@character_name, /elementum_name, ref_img-uuid, description1, description2" --aspect "3:4"/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-creative/SKILL.md
make_videoneta-creative-neta-cli make_video --image_source "image URL" --prompt "action description" --model "model_s"/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-creative/SKILL.md
make_songneta-creative-neta-cli make_song --prompt "style description" --lyrics "lyrics content"/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-creative/SKILL.md
remove_backgroundneta-creative-neta-cli remove_background --input_image "image_url"/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-creative/SKILL.md
create_adventure_campaignneta-adventure-npx -y @talesofai/neta-skills create_adventure_campaign --name "汴京最后三天" --mission_plot "..."/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-adventure/SKILL.md
update_adventure_campaignneta-adventure-npx -y @talesofai/neta-skills update_adventure_campaign --campaign_uuid "campaign-uuid-here" --mission_plot_attention "..."/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-adventure/SKILL.md
list_my_adventure_campaignsneta-adventure-npx -y @talesofai/neta-skills list_my_adventure_campaigns --page_index 0 --page_size 10/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-adventure/SKILL.md
request_adventure_campaignneta-adventure-npx -y @talesofai/neta-skills request_adventure_campaign --campaign_uuid "campaign-uuid-here"/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-adventure/SKILL.md
list_my_charactersneta-character-neta-cli list_my_characters --keyword "Ada" --page_size 10/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-character/SKILL.md
list_my_elementumneta-elementum-neta-cli list_my_elementum --keyword "village" --page_size 10/Users/atou/Library/Mobile Documents/com~apple~CloudDocs/Neta/skills/neta-elementum/SKILL.md

10. Artifacts

最终交付以这些文件为准:

/Users/atou/agents-in-discord/workspaces/1484560502469165306/evals/neta-skill-services/reports/20260321-comprehensive-eval-report-100x.md /Users/atou/agents-in-discord/workspaces/1484560502469165306/evals/neta-skill-services/reports/20260321-comprehensive-eval-report-100x.html /Users/atou/agents-in-discord/workspaces/1484560502469165306/evals/neta-skill-services/reports/20260321-aggregate-interface-100x.json /Users/atou/agents-in-discord/workspaces/1484560502469165306/evals/neta-skill-services/reports/20260321-aggregate-gate-check-100x.json /Users/atou/agents-in-discord/workspaces/1484560502469165306/evals/neta-skill-services/reports/20260321-aggregate-naturalness.json