dec 1st

2023/11/30

할 일

생각없이 살지 말자.

InstructBLIP + BLIP-2

instructBLIP로 pretrained model 잘 되는지 확인하고 finetuning만 하자

논문 좀 읽고, InstructBLIP으로 finetuning dataset 모으기

내가 finetuning을 할 수 있을까?
Full finetuning말고, lora나 peft 기법으로 할 수 없을까?
일단 finetuning data를 모을 수나 있을까?
InstructBLIP을 benchmark들로 확인
BLIP-2를 무슨 데이터로 학습해야되지?

LLaMA-adapter

허깅페이스 llama1 테스트

reconstruction

AWRCP
RIDCP
StyleGAN, StyleGANv2나 읽어보기

이전 방법을 포기한 이유

llama adapter 안되는 이유

pretrained model이 llama 1버전으로 공개되어 있음. llama 1 다운 불가능(일단 llama 돌아다니는 weight로도 해볼게)
pretrained data를 얻을 수가 없음

llava랑 안 맞는 이유

q-former로 visual information을 control해야한다.

발표자료

제목

seq2seq를 포인트로 잡고 잘 했음

Motivation

pair가 없어서 robust feature 학습 어렵
생성한 이미지는 distribution gap 존재
이런 것은 codebook이 좋다.(codebook이 clean image로 학습되어 clean image의 prior를 잘 가지고 있다, 뭐가 있지?)

codebook 쓰는 논문 좀 읽어보자
codebook 사용 시 high-quality로 prior를 학습해야 하기 때문에 time consuming, dataset dependent
codebook에서는 context-aware understanding이 필요

Image restoration

training stability와 long-range generation을 위해 discrete 접근, 맞아 discrete하면 학습 잘되고, 더 간단해지니까 여러가지 쉬워지지
codebook으로 하면 robust prior를 가질 수 있나?

AWRCP, RIDCP

instruct blip

문제: lavis에 blip2 구현부분만 잘 나와있음, instructBlip 부분 거의 없음
search(instructBLIP과 관련된)
- https://github.com/salesforce/LAVIS/issues/575
  - instructblip을 paper에 있는 데로 평가하지 못함
- https://github.com/salesforce/LAVIS/issues/542
- https://github.com/salesforce/LAVIS/issues/534
  - paper와 성능이 다름
- https://github.com/salesforce/LAVIS/issues/465
  - 내가 만든 dataset으로 학습할 수 있나요?
- https://github.com/salesforce/LAVIS/issues/471
  - instructblip 학습 과정 구체적으로
- https://github.com/salesforce/LAVIS/issues/439
  - VisDial dataset v1.0 제로샷 결과가 이상해요
- https://github.com/salesforce/LAVIS/issues/435
  - OCR을 할때는 어떻게 했나요? 평가방법이 어떻게 되나요? 후처리를 했을거같은데..
- https://github.com/salesforce/LAVIS/issues/433
  - multi-turn 채팅이 가능한가요?

Hallucination

문제: 내가 hallucination을 잘 몰라..
논문을 읽어보자.