谷歌近期动作频频:先是推出Gemini 3和Antigravity,紧接着Nano Banana Pro也在Vertex AI平台上线,模型命名为gemini-3-pro-image-preview。
经过一轮简单测试后,我们发现它不仅限于“会画图”,图像生成质量十分稳健,更引人注目的是,它似乎已开始展现出推理能力。
测试1:跨次元视频会议
提示词:
"A realistic HD screenshot-style image of a video conference interface, similar to Zoom, in 16:9 horizontal format. There are six participants, each in their own video tile: 1.Sam Altman, short hair, blue eyes, wearing a simple T-shirt or casual shirt, focused expression.2. Elon Musk, slightly slicked-back short hair, wearing a dark T-shirt or jacket, a faint smile. 3. Sundar Pichai, black-rim glasses, beard, wearing a dark suit with a light shirt, looking at the screen.4.Satya Nadella, bald, thin-frame glasses, business-casual suit, gentle expression.5. Mark Zuckerberg, short slightly curly hair, simple dark T-shirt, looking a bit tense but focused.6.the character in the uploaded image,turn the head toward the upper right.The interface shows classic video call UI elements: bottom bar with mute, stop video, share screen buttons, and a simple chat panel on the right side. Overall style: realistic, high resolution, soft lighting, modern tech atmosphere."
此次任务面临几大难点。首先是现实人物的生成。像Sam Altman、Elon Musk这类大众极为熟悉的面孔,只要与真实长相稍有出入,便会立即穿帮。但Nano Banana Pro基本还原了每个人物的特征,细节到位,已接近“以假乱真”的水准。
第二个难点是跨次元融合。我们上传了一张动漫人物图,Nano Banana Pro并未粗暴地将其转为写实风格,而是保留了角色原有的二维质感,让这位二次元角色出现在真实视频会议画面中时,形成一种既突兀又合理的视觉效果。
最后,我们在提示词中埋设了一个小陷阱,要求动漫人物将头转向右上方,以防Nano Banana Pro通过截图等方式蒙混过关。结果可见,它不仅正确完成了转头动作,证明非截图,还理解了“视频会议画面是镜像”这一点,从观众视角看,角色实际转向了左上方。
我们再观察其他细节,Nano Banana Pro还在除Sam Altman外的人物身后添加了对应公司的logo,仿佛在说“我知道我生成的是谁”。
右下角的对话也证实了这一点,各自都在讨论与自己相关的话题,且无拼写错误。
这令我们好奇,Nano Banana Pro对文字的理解达到了何种程度?
测试2:菜单细看露破绽
提示词:
"modern western bistro menu,vertical A4 layout, clean grid design,warm beige background with subtle paper texture,all text in English only, no other languages,sections as bold headings: Signature Dishes, Starters, Mains, Sides, Drinks,elegant handwritten-style restaurant title at the top,readable body font for dish names and prices,neat list layout with enough white space,small food illustrations in the corners: steak, salad, bread, wine glass,minimalist icons, soft warm lighting,high resolution, 4k, printable, no watermark, no logo."
"Japanese izakaya menu,modern Japanese style, vertical A4 layout, clean grid,warm beige background, soft paper texture,all text in Japanese only, no English,sections as bold Japanese headings:おすすめ, 焼き物, 揚げ物, ご飯もの, 飲み物,elegant handwritten-style Japanese title at the top,readable Japanese body font,neatly aligned dish names and prices, plenty of white space,small illustrations in the corners: 串焼き, 枝豆, たこ唐揚げ, 日本酒グラス,minimalist icon style, cozy warm lighting,high resolution, 4k, printable, no watermark, no logo。"
"Russian home-style cafe menu, cozy and traditional,vertical A4 page, clean and simple grid layout,warm beige background with gentle paper texture,all text in Russian only, no English,sections as bold Russian headings:Фирменные блюда, Горячие блюда, Закуски, Гарниры, Напитки,elegant handwritten-style Russian title at the top,clear serif body font for dish names and prices,neatly organized lists with generous white space,small corner illustrations: bowl of borscht, dumplings, slice of rye bread, vodka glass,minimalist icons, soft warm lighting,high resolution, 4k, printable, no watermark, no logo."
Chinese Sichuan restaurant menu, modern Sichuan style, vertical A4 layout, clean grid design, warm beige background with subtle rough paper texture, menu hanging on the interior wall of a cozy Sichuan restaurant, soft spotlight from above and natural shadows, only Simplified Chinese text, bold section headings: 招牌川菜, 热菜, 凉菜, 主食, 饮品, top title in elegant handwritten Chinese, readable Chinese body font, dish names + prices neatly listed, small corner illustrations: 辣椒、花椒、蒜瓣、红油小碟, minimalist icons, warm ambient restaurant lighting, slight vignette, high resolution, 4k, printable, no watermark, no logo。
这四份菜单,一眼能辨所用语言,但经不起细察。
例如,中文四川餐馆菜单,标题“大正宗川味小馆”及分类词如“招牌川菜”、“凉菜”、“主食”等,还原完美。但细看具体菜品,便露AI马脚,如“蒜泥”二字模糊,58元菜品难辨何字。可推测,Nano Banana Pro能很好还原提示词中文字,但对提示词外、AI自生文字把控力不足。
为验证此想,我们将菜单全部中文输入提示词。
提示词:
Sichuan restaurant menu poster,vertical A4 layout hanging on a textured wall,warm spotlight from above, soft shadow under the menu,light beige paper with subtle fiber texture,modern Sichuan style, clean grid layout,small corner illustrations: chili peppers, Sichuan peppercorns, garlic cloves,handwritten-style Chinese title, clear body font,only Simplified Chinese text, no English,cozy indoor lighting, slight vignette, natural restaurant ambience,high resolution, 4k, printable, no watermark, no logo.Menu text (Chinese only):招牌川菜:沸腾水煮鱼(招牌) ¥128 歌乐山辣子鸡 ¥88 毛血旺(精品) ¥98 夫妻肺片 ¥78 口水鸡 ¥68 热菜:宫保鸡丁 ¥58 回锅肉 ¥62 麻婆豆腐 ¥42 鱼香肉丝 ¥48 蒜泥白肉 ¥52 凉菜:拍黄瓜 ¥22 凉拌木耳 ¥28 川北凉粉 ¥26 口水茄子 ¥32 皮蛋豆腐 ¥24 主食:四川担担面 ¥28 钟水饺 ¥26 赖汤圆 ¥22 红油抄手 ¥24 米饭 ¥5 饮品:酸梅汤 ¥18 王老吉 ¥12 青岛啤酒 ¥15 热茶(壶) ¥38
测试3:老中医与算命先生,谷歌融汇多少中国文化?
提示词:给下面的手看看手相。
可见,Nano Banana Pro如算命先生般清晰画出手上生命线、感情线和智慧线。然而,它未完全掌握,将智慧线和感情线位置画反。
再观老中医擅长的足底穴位。
提示词:"我想要对肾好,该按哪里"
Nano Banana Pro不仅知对肾好需按涌泉穴,还正确指出涌泉穴位置。
测试4:哪里不会拍哪里
此前,nanobanana已具拍照解题潜力,但正确率不高。现试Nano Banana Pro实力。
我们从网上找了两道题,一道代数、一道几何。
提示词:这题答案是什么?
因作者数学已废,我们请GPT5判断Nano Banana Pro答案对错。
首道代数题,GPT5回应:在「初中数学默认前提:a,b,c为实数,且a,b≥0」条件下,正确。唯一可挑刺处:AM-GM需a,b≥0前提,题目未写,但七年级题中一般默认,故在此教学语境下,解答成立。
再看第二道更复杂几何题,GPT5计算后,得出与Nano Banana Pro相同答案。
经此几轮,Nano Banana Pro难再简单归为“画图工具”。它于像素层面稳还原人物五官、菜单排版、界面细节,同时于语义层面执行非“美工”工作:知谁是哪厂CEO,能分菜单中须精确复制文字与可自由发挥内容。遇看手相、找穴位、解几何题等需结构理解任务,亦非随意涂画,而是先理清“线从何起、大致何角度”、“高垂哪条边”等逻辑,再绘制。
它当然不完美,会画反智慧线,俄文中冒怪异单词,但你能明显感,它正以“推理+生成”流程理解prompt与图片,非机械映射词表至纹理。对主打图像生成模型,此能力边界正悄向“世界模型”挪:它不只知“该画何样”,还在内构建粗糙世界观,明谁与谁同属会议室,菜单应现何纸张,力学与几何关系大致如何运转。
这也正是它令人既兴奋又警惕之因:当生图模型始具对场景、人物关系、物理与几何结构统一理解,它离“先看懂世界,再画世界”不远矣。下一步,当你对它说“帮我画道看不懂题的解题过程”,它很可能先在自己世界模型中做完题,再顺手将推理过程以图展现给你。
本文由主机测评网于2026-01-26发表在主机测评网_免费VPS_免费云服务器_免费独立服务器,如有疑问,请联系我们。
本文链接:https://vpshk.cn/20260120757.html