Skip to main content

目前研究

Ego4d与EgoExo4d数据集准备

Ego4d与EgoExo4d的数据集是申请制,申请得到的license有access id与access key。这里根据官网信息进行下载

pip install awscli
aws configure
egoexo -o /data0/hh_data/egoexo4d/ --parts downscaled_takes/448

训练一个小型多模态模型(pretrain + sft)

  • 视觉编码器:Siglip2
  • 文本LLM:qwen2.5-0.5b

流程:

  1. 使用LLaVA-CC3M-Pretrain-595K图片集与中文翻译的文本数据集chat-translated.json进行预训练。chat-translated.json例子如下:
    [{
    "id": "GCC_train_002582585",
    "image": "GCC_train_002582585.jpg",
    "conversations": [
    {
    "from": "human",
    "value": "提供给定图像的简要描述。\n<image>"
    },
    {
    "from": "gpt",
    "value": "橄榄油是自由使用的健康成分。"
    }
    ]
    },
    {
    "id": "GCC_train_002429825",
    "image": "GCC_train_002429825.jpg",
    "conversations": [
    {
    "from": "human",
    "value": "<image>\n写一篇简短但有益的图片摘要."
    },
    {
    "from": "gpt",
    "value": "随机旋转的薄金符号的3D向量豪华字母。"
    }
    ]
    },
    {
    "id": "GCC_train_000190697",
    "image": "GCC_train_000190697.jpg",
    "conversations": [
    {
    "from": "human",
    "value": "分享提供图像的简要解释。\n<image>"
    },
    {
    "from": "gpt",
    "value": "气候变化——岩石的物理和化学分解。"
    }
    ]
    }]
  2. 111
    [
    {
    "id": "000000033471",
    "image": "000000033471.jpg",
    "conversations": [
    {
    "from": "human",
    "value": "<image>\n图中巴士的颜色是 什么?"
    },
    {
    "from": "gpt",
    "value": "图中的公共汽车是白色和红色。"
    },
    {
    "from": "human",
    "value": "公共汽车后部可看到什么特征?"
    },
    {
    "from": "gpt",
    "value": "公共汽车的后部设有广告。"
    },
    {
    "from": "human",
    "value": "公共汽车在街上行走 吗?还是停靠在一边 呢?"
    },
    {
    "from": "gpt",
    "value": "公共汽车正在沿着拥挤着人和其他车辆的街道行驶。"
    }
    ]
    },
    ]

element 0 of tensors does not require grad and does not have a grad_fn

参考: 网址

add model.enable_input_require_grads() before get_peft_model or

if peft_config is not None:
model.enable_input_require_grads()
model = get_peft_model(model, peft_config)

AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

参考: 网址

错误复现:导入deepspeed后执行:

import deepspeed
deepspeed.ops.op_builder.CPUAdamBuilder().load()

该报错较为复杂,经过查阅和实践后,认为和以下四方面的因素有关:

  1. torch和deepspeed的版本冲突(参见:相关博客1);
  2. deepspeed版本问题,通过DS_BUILD_CPU_ADAM=1 pip install deepspeed重新安装后解决(参见:相关博客2);
  3. 环境变量没有定义,将cuda, library的环境变量导入(参见:相关链接3);
    export CUDA_HOME="/usr/local/cuda-12.5"
    export LIBRARY_PATH="/usr/local/cuda-12.5/lib64:$LIBRARY_PATH"
    export LD_LIBRARY_PATH="/usr/local/cuda-12.5/lib64:$LD_LIBRARY_PATH"
  4. gcc版本太低,需将conda环境中的gcc版本进行升级:在很多新的Linux设备上,gcc初始化版本为4.8.5,torch则要求最小为4.9,因此需要对gcc进行升级;(在conda环境中可单独升级gcc版本,而不影响系统环境,适用于不具备管理员sudo权限的设备)

AttributeError: 'Linear' object has no attribute 'ds_grads_remaining'

(vtg-r1) hh@amax:~/workspace/vtg-r1$ uv pip list | grep deepspeed
Using Python 3.11.13 environment at: /home/hh/.conda/envs/vtg-r1
deepspeed 0.18.0

deepspeed版本问题:

降到0.16.4

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

model.enable_input_require_grads() # 加上这个
# ...
model = get_peft_model()