每个都能看懂，有趣的AI新闻20230819，时常更新

2023-08-19 22:20 作者:oneds6 0人读过 | 我要投稿

来自QQ群926267297-CY简单汇总，渣渣机器翻译，没有仔细验证仅供参考。

NVIDIA发布了Neuralangelo的源碼！

该模型可以将来自任何设备的视频转换为详细的3D结构，完全复制建筑物，雕塑或其他真实的世界物体或空间。以下是它的工作原理：模型利用具有对象或场景多个角度的 2D 视频。我从不同的角度选择框架，以了解深度、大小和形状。人工智能创建一个初始的3D表示，类似于雕塑家塑造一个主题。渲染经过优化以增强细节，例如雕塑家优化纹理。结果是适合虚拟现实、数字孪生或机器人的 3D 对象或场景。

GitHub-https://github.com/NVlabs/neuralangelo

https://research.nvidia.com/labs/dir/neuralangelo/

SD WEBUI图像浏览器，sd-webui-infinite-image-browsing

https://github.com/zanllp/sd-webui-infinite-image-browsing

CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

类似GEN2的类似实现，好处是可以调换部分画面内容或者改变视频风格，https://github.com/qiuyu96/CoDeF

仿制上古FLASH动画风格的SD1.5大文件模型，2GB

https://huggingface.co/nerijs/coralchar-diffusion

DragNUWA是一种视频生成模型，它利用文本、图像和轨迹作为三个基本控制因素，从语义、空间和时间方面促进高度可控的视频生成。与现有研究不同，DragNUWA使用户能够直接操作图像中的背景或对象，并且模型将这些动作无缝转换为相机运动或物体运动，生成相应的视频。单击左上角的“播放”按钮以观察DragNUWA如何操作同一图像以创建具有所需相机运动和对象运动的视频。

Homepage for the paper: DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory(opens in new tab)

DragNUWA is a video generation model that utilizes text, images, and trajectory as three essential control factors to facilitate highly controllable video generation from semantic, spatial, and temporal aspects. Distinct from existing research, DragNUWA enables users to manipulate backgrounds or objects within images directly, and the model seamlessly translates these actions into camera movements or object motions, generating the corresponding video.

Click the top-left “play” button to observe how DragNUWA manipulates the same image to create videos with desired camera movements and object motions.