YoloV8改进策略:独家原创,全网首发,复现Drone-Yolo,以及改进方法
摘要
Drone-Yolo在无人机数据集上取得了巨大的成功,mAP0.5指标上取得了显著改进,在VisDrone2019-test上增加了13.4%,在VisDrone2019-val上增加了17.40%。这篇文章我首先复现Drone-Yolo,然后,在Drone-Yolo的基础上加入我自己对小目标检测的改进。
YoloV8官方结果
YOLOv8l summary (fused): 268 layers, 43631280 parameters, 0 gradients, 165.0 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 29/29 [
all 230 1412 0.922 0.957 0.986 0.737
c17 230 131 0.973 0.992 0.995 0.825
c5 230 68 0.945 1 0.995 0.836
helicopter 230 43 0.96 0.907 0.951 0.607
c130 230 85 0.984 1 0.995 0.655
f16 230 57 0.955 0.965 0.985 0.669
b2 230 2 0.704 1 0.995 0.722
other 230 86 0.903 0.942 0.963 0.534
b52 230 70 0.96 0.971 0.978 0.831
kc10 230 62 0.999 0.984 0.99 0.847
command 230 40 0.97 1 0.995 0.811
f15 230 123 0.891 1 0.992 0.701
kc135 230 91 0.971 0.989 0.986 0.712
a10 230 27 1 0.555 0.899 0.456
b1 230 20 0.972 1 0.995 0.793
aew 230 25 0.945 1 0.99 0.784
f22 230 17 0.913 1 0.995 0.725
p3 230 105 0.99 1 0.995 0.801
p8 230 1 0.637 1 0.995 0.597
f35 230 32 0.939 0.938 0.978 0.574
f18 230 125 0.985 0.992 0.987 0.817
v22 230 41 0.983 1 0.995 0.69
su-27 230 31 0.925 1 0.995 0.859
il-38 230 27 0.972 1 0.995 0.811
tu-134 230 1 0.663 1 0.995 0.895
su-33 230 2 1 0.611 0.995 0.796
an-70 230 2 0.766 1 0.995 0.73
tu-22 230 98 0.984 1 0.995 0.831
Speed: 0.2ms preprocess, 3.8ms inference, 0.0ms loss, 0.8ms postprocess per image
BiC模块
BiC模块模块,有三个输入,一个输出组成,如下图:

我参照YoloV6中的源码,结合YoloV8,对BiC模块做了适当的修改,适应channel的输入和输出,代码如下:
class BiFusion(nn.Module):
'''BiFusion Block in PAN'''
def __init__(self, in_channels1,in_channels2,in_channels3, out_channels):
super().__init__()
self.CV1 = Conv(in_channels1, out_channels, 1, 1)
self.CV2 = Conv(in_channels2, out_channels, 1, 1)
self.CV3 = Conv(in_channels3, out_channels, 1, 1)
self.cv_out = Conv(out_channels * 3, out_channels, 1, 1)
self.upsample = ConvTranspose(
out_channels,
out_channels,
)
self.downsample = Conv(
out_channels,
out_channels,
3,
2
)
def forward(self, x):
x0 = self.upsample(self.CV1(x[0]))
x1 = self.CV2(x[1])
x2 = self.downsample(self.CV3(x[2]))
x3= self.cv_out(torch.cat((x0, x1, x2), dim=1))
return x3
改进一
测试结果
YOLOv8l summary (fused): 291 layers, 49370288 parameters, 0 gradients, 194.9 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 15/15 [00:13<00:00, 1.11it/s]
all 230 1412 0.932 0.966 0.986 0.735
c17 230 131 0.955 0.992 0.995 0.827
c5 230 68 0.954 1 0.993 0.829
helicopter 230 43 0.902 0.977 0.964 0.585
c130 230 85 0.978 0.941 0.991 0.669
f16 230 57 0.853 0.912 0.957 0.649
b2 230 2 0.768 1 0.995 0.697
other 230 86 0.954 0.93 0.972 0.557
b52 230 70 0.972 0.957 0.97 0.794
kc10 230 62 0.99 0.968 0.987 0.845
command 230 40 0.975 0.99 0.975 0.785
f15 230 123 0.948 0.976 0.993 0.706
kc135 230 91 0.965 0.989 0.979 0.672
a10 230 27 0.95 0.697 0.951 0.436
b1 230 20 0.974 0.95 0.988 0.677
aew 230 25 0.926 1 0.99 0.772
f22 230 17 0.925 1 0.995 0.721
p3 230 105 1 0.977 0.995 0.81
p8 230 1 0.755 1 0.995 0.697
f35 230 32 0.967 0.906 0.967 0.54
f18 230 125 0.967 0.992 0.992 0.809
v22 230 41 0.989 1 0.995 0.624
su-27 230 31 0.982 1 0.995 0.856
il-38 230 27 0.997 1 0.995 0.801
tu-134 230 1 0.732 1 0.995 0.995
su-33 230 2 1 0.923 0.995 0.796
an-70 230 2 0.809 1 0.995 0.848
tu-22 230 98 0.99 1 0.995 0.834
Speed: 0.1ms preprocess, 7.3ms inference, 0.0ms loss, 2.8ms postprocess per image
Results saved to runs\detect\train3
基本上没有变化,反而增加了计算量!
改进二
测试结果
原数据集测试结果
YOLOv8l summary (fused): 343 layers, 46326864 parameters, 0 gradients, 244.4 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 15/15 [00:05<00:00, 2.51it/s]
all 230 1412 0.906 0.975 0.986 0.73
c17 230 131 0.954 0.992 0.995 0.807
c5 230 68 0.919 0.997 0.993 0.825
helicopter 230 43 0.818 0.953 0.954 0.582
c130 230 85 0.971 0.965 0.993 0.655
f16 230 57 0.829 0.93 0.967 0.667
b2 230 2 0.768 1 0.995 0.722
other 230 86 0.881 0.953 0.975 0.539
b52 230 70 0.92 0.943 0.98 0.802
kc10 230 62 1 0.98 0.989 0.824
command 230 40 0.894 1 0.983 0.797
f15 230 123 0.926 1 0.992 0.694
kc135 230 91 0.967 0.989 0.981 0.689
a10 230 27 0.913 0.926 0.935 0.424
b1 230 20 0.927 1 0.995 0.739
aew 230 25 0.916 1 0.995 0.782
f22 230 17 0.931 1 0.995 0.763
p3 230 105 0.994 0.971 0.995 0.805
p8 230 1 0.886 1 0.995 0.796
f35 230 32 0.882 0.875 0.96 0.54
f18 230 125 0.949 0.984 0.989 0.81
v22 230 41 0.968 1 0.995 0.663
su-27 230 31 0.921 1 0.995 0.828
il-38 230 27 0.984 1 0.995 0.803
tu-134 230 1 0.605 1 0.995 0.895
su-33 230 2 1 0.86 0.995 0.697
an-70 230 2 0.759 1 0.995 0.749
tu-22 230 98 0.987 1 0.995 0.816
Speed: 0.1ms preprocess, 6.5ms inference, 0.0ms loss, 0.7ms postprocess per image
ViSDrone2019数据集测试结果
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 35/35 [00:06<00:00, 5.46it/s]
all 548 38759 0.584 0.469 0.491 0.302
pedestrian 548 8844 0.707 0.477 0.572 0.282
people 548 5125 0.619 0.425 0.464 0.199
bicycle 548 1287 0.315 0.282 0.224 0.104
car 548 14064 0.761 0.843 0.863 0.631
van 548 1975 0.604 0.498 0.527 0.382
truck 548 750 0.572 0.431 0.433 0.3
tricycle 548 1045 0.479 0.407 0.378 0.219
awning-tricycle 548 532 0.374 0.179 0.197 0.12
bus 548 251 0.718 0.614 0.639 0.48
motor 548 4886 0.621 0.561 0.58 0.2
比论文的结果低一些,这个和batchsize以及epoch有关系! 我选用的epoch为150,batchsize为8。如果按照论文中的300epoch可能会更高一些。
改进三
测试结果
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 35/35 [00:19<00:00, 1.82it/s]
all 548 38759 0.57 0.473 0.491 0.305
pedestrian 548 8844 0.678 0.504 0.571 0.287
people 548 5125 0.628 0.418 0.466 0.2
bicycle 548 1287 0.37 0.237 0.227 0.112
car 548 14064 0.733 0.852 0.86 0.627
van 548 1975 0.569 0.507 0.517 0.376
truck 548 750 0.532 0.437 0.459 0.319
tricycle 548 1045 0.487 0.395 0.37 0.218
awning-tricycle 548 532 0.37 0.212 0.202 0.13
bus 548 251 0.709 0.618 0.656 0.49
和原来的模型相比,总体结果差不多,但是你仔细对比,发现,更小的目标mAP50 更高,反而大的目标mAP50有所降低! 在训练的过程中,验证集已经达到了0.512,接近作者的指标,但是由于最后一步有融合的操作,融合后的结果有所下降!

链接: