할 일

  • 7b evaluation 해보기(1 hours)
  • pretrained로 training dataset 만들기
  • [v] Chain of verification 읽기, medium으로 읽어보니 안읽어도 되겠음
    • https://arxiv.org/pdf/2309.11495.pdf
    • Meta에서 이런 것도 하는구나
  • Gemini 보기
  • 아래 것들

AK 정리

  • Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
    • auxiliary alpha channel to suggest attentive regions and fine-tuned with constructed millions of RGBA region-text pairs
    • preserves the visual recognition ability of CLIP
    • enables precise control over the emphasis of image contents
  • Pearl: A Production-ready Reinforcement Learning Agent
    • facebookresearch
    • RL agent software package
  • AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
    • all attributes (e.g., appearance, motion) are learned and generated jointly without precise control ability
    • unveil the pre-trained text-to-video diffusion model
    • appearance control, temporal control
  • Gen2Det: Generate to Detect
    • create synthetic training data for object detection for free by leveraging state-of-the-art grounded image generation methods
    • OD 학습을 위한 데이터 생성
  • PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
    • ID 보존을 위한 연구
    • an efficient personalized text-to-image generation method
    • mainly encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information
  • DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
    • generating personalized videos from a few static images of the desired subject and a few videos of target motion
    • 적은 이미지로 video generation
  • Efficient Monotonic Multihead Attention
    • Fair
  • Generating Illustrated Instructions
    • 뭐 설명할때 그림이랑 같이 설명
    • LLM과 diffusion 섞어서
  • Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
    • leverage code-writing to improve Chain of Thought reasoning not only for logic and arithmetic tasks, but also for linguistic ones
    • Code를 하는게 아니라 code-writing의 어떤 능력을 사용하는 거구나
    • Chain of Code (CoT), a simple yet surprisingly effective extension that improves LM code-driven reasoning
      • 코드처럼 생각하는거구나
        • 도시를 리스트 - 도시 for문 - 도시의 나라 get - 나라 추가 - len 반환
  • Beyond Surface: Probing LLaMA Across Scales and Layers
    • 라마를 분석해볼게요. 객관식 답변을 통해 고차원의 작업을 잘 하는지 보겠습니다.
    • multiple-choice tasks to probe its intrinsic understanding in high-order tasks such as reasoning and computation
  • LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
    • synthesize the action frame conditioning on the user prompt question and an input egocentric image
    • visual large language model (VLLM) 사용
  • Large Language Models for Mathematicians
  • Scaling Laws of Synthetic Images for Model Training … for Now
    • scaling laws of synthetic images generated by state of the art text-to-image models
    • text prompts, classifier-free guidance scale, and types of text-to-image models
  • Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
    • current methods intertwine spatial content and temporal dynamics together, leading to a notably increased complexity
    • HiGen, a diffusion model-based method that improves performance by decoupling the spatial and temporal factors of videos from two perspectives, i.e., structure level and content level