RandBox

sidebar_position: 5

Random Boxes Are Open-world Object Detectors

RandBox 的核心在于使用随机提议框来消除已知目标对训练的混淆效应。具体来说，在每个训练迭代中，RandBox 从一个四维高斯噪声中采样 500 个随机框作为提议框，这些随机数的取值范围是 [0, 1]，分别对应于提议框中心点的横纵坐标以及提议框的宽高。对于这些随机框，从骨干网络 (backbone) 的特征图上裁剪 RoI (region of interest) 特征，并将其送入检测头 (detection head) 以获得分类损失 $L_{cls}$ 和回归损失 $L_{reg}$ 。

为了鼓励对未知目标的探索，RandBox 提出了一种新的匹配得分计算方法。传统的 Faster R-CNN 使用 RPN (region proposal network) 预测的目标性 (objectness) 作为匹配得分，而 DETR 则使用 ROI 特征的平均激活值。这两种方法都有其局限性：RPN 倾向于给出与已知目标匹配的提议框高分，而 ROI 特征的平均激活值则不够可靠。

RandBox 的匹配得分计算公式如下：

s(\hat{y}) = \sum_{i=1}^{|K|+1} \frac{1}{1 + exp(-\hat{y}_i)}

overiview

其中， $K$ 是已知类别的集合， $\hat{y}_i$ 是提议框属于类别 $i$ 的 logit 值。该得分本质上计算了 $\hat{y}$ 对应于前景 (foreground) 的可能性。对于 Unknown-FG (未匹配上的前景) 的选择，选取具有最大匹配得分的前 $N$ 个提议框 (不包括 Known-FG 中的提议框)，这些提议框会被伪标记为 "unknown"，用于训练 Unknown-FG 的 logit。

论文使用结构因果模型 (Structural Causal Model, SCM) 分析了 OWOD 中关键组件之间的因果关系，包括训练数据 $D$ 、区域提议 $R$ 、RoI 特征 $X$ 以及标签 $Y$ 。论文指出，现有方法中，由于区域提议网络 (RPN) 或 Transformer 解码器是在已知目标的训练数据上训练的，因此生成的提议 $R$ 不可避免地偏向于已知目标，从而导致了混淆效应。 RandBox 通过引入随机提议 $R$ 作为工具变量 (Instrumental Variable)，消除了 $D$ 对 $R$ 的影响，从而切断了后门路径 $R \leftarrow D \rightarrow Y$ ，模拟了随机对照实验 (Randomized Controlled Experiment)。在这种情况下，学习从 $R$ 预测 $Y$ 可以捕获到没有混淆效应的因果关系。

SCM

实验结果：

experiment

代码分析

在forward中生成若干个随机bbox

core/detector.py
    def forward(self, batched_inputs, do_postprocess=True):
        images, images_whwh = self.preprocess_image(batched_inputs)
        if isinstance(images, (list, torch.Tensor)):
            images = nested_tensor_from_tensor_list(images)

        # Feature Extraction.
        src = self.backbone(images.tensor)
        features = list()
        for f in self.in_features:
            feature = src[f]
            features.append(feature)

        # Prepare Proposals.
        if not self.training:
            results = self.ddim_sample(batched_inputs, features, images_whwh, images, do_postprocess=do_postprocess)
            return results


    ## self.ddim_sample中生成Proposals bbox
    def ddim_sample(self, batched_inputs, backbone_feats, images_whwh, images, clip_denoised=True, do_postprocess=True):
        batch = images_whwh.shape[0]
        shape = (batch, self.num_proposals, 4)
        total_timesteps, sampling_timesteps, eta, objective = self.num_timesteps, self.sampling_timesteps, self.ddim_sampling_eta, self.objective

        # [-1, 0, 1, 2, ..., T-1] when sampling_timesteps == total_timesteps
        times = torch.linspace(-1, total_timesteps - 1, steps=sampling_timesteps + 1)
        times = list(reversed(times.int().tolist()))
        time_pairs = list(zip(times[:-1], times[1:]))  # [(T-1, T-2), (T-2, T-3), ..., (1, 0), (0, -1)]

        img = torch.randn(shape, device=self.device) # 生成self.num_proposals个随机bbox

        x_start = None
        if self.sampling_method == 'Random':
            for time, time_next in time_pairs:
                time_cond = torch.full((batch,), time, device=self.device, dtype=torch.long)
                self_cond = x_start if self.self_condition else None
                # 
                preds, class_cat, objectness_cat, coord_cat = self.model_predictions(backbone_feats, images_whwh, img, time_cond,self_cond, clip_x_start=clip_denoised)
                pred_noise, x_start = preds.pred_noise, preds.pred_x_start

在forward中生成若干个随机bbox

core/detector.py
    # x为生成的proposals
    def model_predictions(self, backbone_feats, images_whwh, x, t, x_self_cond=None, clip_x_start=False, sample_i=0):
        if self.sampling_method == 'Random':
            x_boxes = torch.clamp(x, min=-1 * self.scale, max=self.scale)
            x_boxes = ((x_boxes / self.scale) + 1) / 2
        else:
            x_boxes = self.x_dic.to(x.device)[self.num_proposals * sample_i:self.num_proposals * (sample_i + 1), :]
            x_boxes = ((x_boxes / self.scale) + 1) / 2

        x_boxes = box_cxcywh_to_xyxy(x_boxes)
        x_boxes = x_boxes * images_whwh[:, None, :]
        # 输入到RCNNHead中
        outputs_class, outputs_objectness, outputs_coord = self.head(backbone_feats, x_boxes, t, None)

sidebar_position: 5​

Random Boxes Are Open-world Object Detectors

代码分析​

sidebar_position: 5

代码分析