할 일

7b evaluation 해보기(1 hours)
pretrained로 training dataset 만들기
[v] Chain of verification 읽기, medium으로 읽어보니 안읽어도 되겠음
- https://arxiv.org/pdf/2309.11495.pdf
- Meta에서 이런 것도 하는구나
Gemini 보기
아래 것들

AK 정리

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
- auxiliary alpha channel to suggest attentive regions and fine-tuned with constructed millions of RGBA region-text pairs
- preserves the visual recognition ability of CLIP
- enables precise control over the emphasis of image contents
Pearl: A Production-ready Reinforcement Learning Agent
- facebookresearch
- RL agent software package
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators
- all attributes (e.g., appearance, motion) are learned and generated jointly without precise control ability
- unveil the pre-trained text-to-video diffusion model
- appearance control, temporal control
Gen2Det: Generate to Detect
- create synthetic training data for object detection for free by leveraging state-of-the-art grounded image generation methods
- OD 학습을 위한 데이터 생성
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
- ID 보존을 위한 연구
- an efficient personalized text-to-image generation method
- mainly encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
- generating personalized videos from a few static images of the desired subject and a few videos of target motion
- 적은 이미지로 video generation
Efficient Monotonic Multihead Attention
- Fair
Generating Illustrated Instructions
- 뭐 설명할때 그림이랑 같이 설명
- LLM과 diffusion 섞어서
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
- leverage code-writing to improve Chain of Thought reasoning not only for logic and arithmetic tasks, but also for linguistic ones
- Code를 하는게 아니라 code-writing의 어떤 능력을 사용하는 거구나
- Chain of Code (CoT), a simple yet surprisingly effective extension that improves LM code-driven reasoning
  - 코드처럼 생각하는거구나
    - 도시를 리스트 - 도시 for문 - 도시의 나라 get - 나라 추가 - len 반환
Beyond Surface: Probing LLaMA Across Scales and Layers
- 라마를 분석해볼게요. 객관식 답변을 통해 고차원의 작업을 잘 하는지 보겠습니다.
- multiple-choice tasks to probe its intrinsic understanding in high-order tasks such as reasoning and computation
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
- synthesize the action frame conditioning on the user prompt question and an input egocentric image
- visual large language model (VLLM) 사용
Large Language Models for Mathematicians
Scaling Laws of Synthetic Images for Model Training … for Now
- scaling laws of synthetic images generated by state of the art text-to-image models
- text prompts, classifier-free guidance scale, and types of text-to-image models
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
- current methods intertwine spatial content and temporal dynamics together, leading to a notably increased complexity
- HiGen, a diffusion model-based method that improves performance by decoupling the spatial and temporal factors of videos from two perspectives, i.e., structure level and content level