写在前面

Arduino 是做什么用的
- 我们理解它是用来降低复杂度，帮我们更好对接不同设备
R 4 的板子和舵机是什么关系
- 舵机就是一个普通外设，流水的设备，铁打的板子
AI 编程的瓶颈大概在哪里
- 如果错误提示 AI 没有遇到过
- 如果那个编程语言它不熟悉
- 我们自己不知道要干啥

这次的意外体会，上个月在编程上，基本上个人想法参与很少，完全让 AI 牵着走，效率高，效果好。最近几天慢慢浮现了过往的思考和经历，想要参与进去的想法多了很多，但是结果也更加不理想了。
这也是在印证前段时间，关于如何和 AI 相处，放下傲慢，让渡权力的想法。在一定阶段会特别想要参与进去，实际上这是徒劳的，需要找到更好的方式。

项目背景和起源

上半年的 AI 硬件中，有别于 AI Pin 这种“小玩意“，出现了一个桌面的小宠物——LOOI，有了 AI 的加持，它能在桌面上自由滑动，动作捕捉，人脸识别，偶尔发点小脾气，偶尔烘托点情侣之间的气氛，情绪价值算是给到了，技术上着实火了一把。

远在大洋彼岸，自诩是文科生的机器人大拿 — Garman ，敏锐捕捉到了它的陪伴价值，在现代快节奏生活中的情感寄托，在无数次分享中，他都强调了这点，这在技术人中很少见到。

然后 1500+的价格，对于很多人来说，内心的独白大概是：我！没有情感诉求，没有！

Garman 自学 Python，自学 Arduino 编程，在经历过无数次尝试以后，无私贡献了自己的成果，把价格硬生生给打下来了，不足 200 就实现了 LOOI 自由。不过，门槛并不低，在一次分享以后，家里多一个吃灰的 R4 板子。

这次 AIPO 高校活动，鉴于 AI 编程共学的经验，做了 Arduino 的安装教程（因为当时以为这是主要的开发工具），有幸成为 Garman 老师的助手，在老师的眼皮底下，眼睁睁看着把线给插错了，不知道直播间的同学们进展如何。

回来以后，就想着：

Arduino 是要用汇编语言写吗？
Python 在这个事中扮演什么样的角色
为什么安卓手机不行，只有 iOS 可以
刷机是什么意思
作为生产力工具的老鸟，希望能解决把串口号抄下来这个难题。

带着这些疑问开始了这次的征程，目前的成果大概是这样的：




目前还有 2 个问题没有解决：

如何和手机联动，发信息过去
摄像头跟随

有兴趣的小伙伴留言，一起折腾完善。

源代码分析，各设备之间的交互逻辑

这次实践之前，完全没有 Arduino 操作经验，对硬件一窍不通。

分析现有代码逻辑—Claude

这部分基本是靠 claude 来实现的，下面是 Garman 老师的核心逻辑

老实说，上次看到这个图还是上一次，就在半年前，当时想着应该不难吧，才 3 个文件，硬啃也能啃下来呀，结果，楞是啃了半年，毫无进展，这次借助 claude，才稍微有了点眉目。

硬件部分完全小白，请移步 Garman 老师的共学视频。waytoagi 知识库搜索 Garman

整体逻辑

  
sequenceDiagram
participant User as 用户
participant Chat as chat.py (Macbook)
participant API as OpenAI API
participant Face as face.py (iPhone)
participant Head as head.py (Macbook)
participant Servo as 舵机电机

  User->>Chat: 输入对话内容
Chat->>API: 发送提示
API-->>Chat: 返回响应
Chat->>Chat: 解析JSON响应
Chat->>Chat: 生成语音
Chat->>Face: 发送颜文字和音频
Face->>Face: 显示颜文字
Face->>Face: 播放音频
Chat->>Head: 发送舵机位置
Head->>Head: 处理舵机位置
Head->>Servo: 控制舵机移动
Head->>Head: 更新图像显示

在原图基础上，重新梳理了一个时序图，从下面的箭头大概能看出来信息的流动情况。每一次会话的过程大概是这样的：

在电脑上同时启动 2 个程序，也就是 chat 和 head。在手机上启动face
在 chat 里面输入信息，它会和 AI 说，AI 就给了一段声音和一个表情符号
然后 chat 把它发给手机，手机上的 face 拿到这个信息之后，播放声音，显示一个表情符号

这样就结束了。然后我们再来看 head，它的作用是时刻捕捉摄像头，分析里面的人物动作，去调整舵机。

另外一个版本的流程图

  
graph LR
A[用户输入] --> B[chat.py PC电脑]
B --> C[OpenAI API]
C --> B
B --> D[face.py iPhone]
B --> E[head.py PC电脑]
E --> F[舵机电机]
D --> G[显示颜文字]
D --> H[播放音频]

style A fill:#f0f8ff,stroke:#b0e0e6,stroke-width:2px
style B fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px
style C fill:#e6fff2,stroke:#b3ffd9,stroke-width:2px
style D fill:#fff0f5,stroke:#ffd9e6,stroke-width:2px
style E fill:#f0fff0,stroke:#98fb98,stroke-width:2px
style F fill:#fffaf0,stroke:#ffdab9,stroke-width:2px
style G fill:#fff5ee,stroke:#ffa07a,stroke-width:2px
style H fill:#f0e6ff,stroke:#d9b3ff,stroke-width:2px

3个模块的内部逻辑

Head (电脑端运行)

  
graph TD
A[开始] --> B[初始化摄像头和Arduino]
B --> C[启动socket监听线程]
C --> D[进入主循环]
D --> E{最近5秒内收到数据?}
E -->|是| F[使用接收到的舵机位置]
E -->|否| G[检测人脸并计算舵机位置]
F --> H[更新舵机位置]
G --> H
H --> I[在图像上显示状态]
I --> J[显示图像]
J --> D

style A fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px
style B fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px
style C fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px
style D fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px
style E fill:#fff0e6,stroke:#ffd9b3,stroke-width:2px
style F fill:#e6fff2,stroke:#b3ffd9,stroke-width:2px
style G fill:#e6fff2,stroke:#b3ffd9,stroke-width:2px
style H fill:#e6fff2,stroke:#b3ffd9,stroke-width:2px
style I fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px
style J fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px

Face（手机上运行）

  
graph TD
A[开始] --> B[初始化socket]
B --> C[启动receive_data线程]
C --> D[初始化PyGame场景]
D --> E{收到新数据?}
E -->|是| F[更新text_to_display]
F --> G[保存并播放音频]
E -->|否| H[绘制当前文本]
G --> H
H --> I{触摸事件?}
I -->|是| J[清理音频文件]
J --> K[关闭场景]
I -->|否| E
K --> L[结束]

style A fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px
style B fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px
style C fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px
style D fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px
style E fill:#fff0e6,stroke:#ffd9b3,stroke-width:2px
style F fill:#e6fff2,stroke:#b3ffd9,stroke-width:2px
style G fill:#e6fff2,stroke:#b3ffd9,stroke-width:2px
style H fill:#e6fff2,stroke:#b3ffd9,stroke-width:2px
style I fill:#fff0e6,stroke:#ffd9b3,stroke-width:2px
style J fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px
style K fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px
style L fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px

Chat（电脑上运行）

  graph TD
A[开始] --> B[初始化OpenAI客户端]
B --> C[进入主循环]
C --> D{用户输入'quit'?}
D -->|是| E[结束]
D -->|否| F[发送提示到GPT-3.5-turbo]
F --> G[解析JSON响应]
G --> H[从响应生成语音]
H --> I[发送颜文字和音频到iPhone]
I --> J[发送舵机位置到head.py]
J --> C

style A fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px
style B fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px
style C fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px
style D fill:#fff0e6,stroke:#ffd9b3,stroke-width:2px
style E fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px
style F fill:#e6fff2,stroke:#b3ffd9,stroke-width:2px
style G fill:#e6fff2,stroke:#b3ffd9,stroke-width:2px
style H fill:#e6fff2,stroke:#b3ffd9,stroke-width:2px
style I fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px
style J fill:#e6f3ff,stroke:#b3d9ff,stroke-width:2px

至此，整个机器人软件部分的逻辑算是理顺了。AI 编程的原则是能不编程最好不编，无奈在直播的时候，确实是通了，但是回家以后，就咋也不行。想着能不能不用 Python 来写呢？用 nodejs 能不能做到呢？

希望做成什么样

首先希望有一个 UI 界面，尽量不用打开命令行
其次所有的配置信息最好能抽离出来，方便后续切换
最好能看到一些中间过程，方便定位问题

开始改造

经过了 50 多个会话，大概搞清楚了它的逻辑

舵机和板子还有 Arduino 的关系是啥

为了方便排查，板子是否和电脑连接成功，是否能正常控制舵机，就想了一个办法，让 AI 帮忙写一个小程序来测试。也是因为这次经历，了解到它背后的逻辑。

Arduino 代码

#include <Firmata.h>

#include <Servo.h>

  

Servo myservo; // 创建舵机对象

int servoPin = 9; // 舵机连接的引脚

int minAngle = 0; // 最小角度

int maxAngle = 180; // 最大角度

int delayTime = 1000; // 每次移动后的延迟时间(毫秒)

boolean isRunning = true; // 控制舵机是否运行

  

void setup() {

Firmata.setFirmwareVersion(FIRMATA_FIRMWARE_MAJOR_VERSION, FIRMATA_FIRMWARE_MINOR_VERSION);

Firmata.attach(ANALOG_MESSAGE, analogWriteCallback);

Firmata.begin(57600);

myservo.attach(servoPin); // 将舵机对象附加到指定引脚

Serial.begin(9600); // 初始化串口通信，用于控制

Serial.println("输入 's' 停止舵机，输入 'r' 重新启动舵机");

}

  

void loop() {

while(Firmata.available()) {

Firmata.processInput();

}

// 检查串口输入

if (Serial.available() > 0) {

char input = Serial.read();

if (input == 's') {

isRunning = false;

Serial.println("舵机已停止");

} else if (input == 'r') {

isRunning = true;

Serial.println("舵机已重新启动");

}

}

if (isRunning) {

// 移动舵机到最小角度

myservo.write(minAngle);

delay(delayTime);

// 移动舵机到最大角度

myservo.write(maxAngle);

delay(delayTime);

}

}

  

void analogWriteCallback(byte pin, int value) {

if (pin == servoPin) {

myservo.write(value);

}

}

它需要放在 arduino 的 ide 中执行

node 的代码 (arduino_servo_control.js)

const { SerialPort } = require('serialport');

const { ReadlineParser } = require('@serialport/parser-readline');

  

// 替换为你的串口名称（例如 COM3 或 /dev/ttyUSB0）

const port = new SerialPort({ path: '/dev/cu.usbmodemF0F5BD543E182', baudRate: 9600 });

const parser = port.pipe(new ReadlineParser({ delimiter: '\n' }));

  

port.on('open', () => {

console.log('串口已打开');

  

// 发送指令让舵机移动到指定位置 (例如 X: 45度, Y: 135度)

const position = '45,135\n';

port.write(position, (err) => {

if (err) {

return console.log('Error on write: ', err.message);

}

console.log('已发送指令: 移动到位置 45,135');

});

});

  

port.on('error', (err) => {

console.log('Error: ', err.message);

});

  

parser.on('data', (data) => {

console.log('收到来自Arduino的数据: ', data);

});