FigCapsHF Documentation#
- class FigCapsHF.FigCapsHF(benchmark_path)#
Main class for FigCapsHF
- generate_embeddings(image_paths, captions_list, embedding_model, MCSE_Path=None)#
Generate embeddings from a specified model for a list of given figure-caption pairs
- Parameters:
image_paths (List) – paths to the figure image
captions_list (List) – list of captions for the corresponding figure images
embedding_model (String) – Name of the Model to generate embeddings for the figure caption pair. Choose from [‘BLIP’, ‘BERT’, ‘SciBERT’, ‘MCSE’]
MCSE_path (String) – If MCSE is selected, supply path to the folder containing model weights
- Returns:
embeddings for each figure-caption pair
- Return type:
(N*D) numpy array
Note: for MCSE (flickr-mcse-roberta-base) , download the ‘flickr-mcse-robert-base’ model from ‘https://github.com/uds-lsv/MCSE’
- generate_embeddings_ds_split(embedding_model, MCSE_path=None, split_name='train', max_num_samples=None)#
Generate embeddings from a specified model for the figure-caption pairs in the larger dataset.
- Parameters:
embedding_model (String) – Name of the Model to generate embeddings for the figure caption pair. Choose from [‘BLIP’, ‘BERT’, ‘SciBERT’, ‘MCSE’]
split_name (String, default = 'train') – data split to generate embeddings from. Choose from [‘train’,’test’,’val’]
max_num_samples (natural number, default = all pairs in the specified split) – maximum number of samples from the specified split to generate embeddings for(picks the first N)
MCSE_path (String) – If MCSE is selected, supply path to the folder containing model weights
- Returns:
ds_split_embeddings: (N*D) numpy array containing embeddings for figure-caption pairs in the specified data split
image_names: list containing the paths of the figure-images
captions: list containing the captions corresponding to the figure-images
- Return type:
Numpy Array, List, List
Note: for MCSE (flickr-mcse-roberta-base) , download the ‘flickr-mcse-robert-base’ model from ‘https://github.com/uds-lsv/MCSE’
- generate_embeddings_hf_anno(hf_score_type, embedding_model, MCSE_path=None)#
Generate embeddings from a specified model for the ~400 figure-caption pairs in the human-annotated dataset
- Parameters:
hf_score_type (String) – the human-feedback score to be used from the human-annotated dataset. Select between [‘helpfulness’,’ocr’,’visual’,’takeaway’]
embedding_model (String) – Name of the Model to generate embeddings for the figure caption pair. Choose from [‘BLIP’, ‘BERT’, ‘SciBERT’, ‘MCSE’]
MCSE_path (String) – If MCSE is selected, supply path to the folder containing model weights
- Returns:
hf_ds_embeddings: (N*D) numpy array containing embeddings for figure-caption pairs in the human annotated dataset. Note: pairs with an empty specified human-feedback score are removed.
scores: (N,) numpy array containing the specified human-feedback scores (corresponding to the embeddings)
- Return type:
Numpy Array, Numpy Array
Note: for MCSE (flickr-mcse-roberta-base) , download the ‘flickr-mcse-robert-base’ model from ‘https://github.com/uds-lsv/MCSE’
- generate_jsonl()#
Used to generate metadata.jsonl’s for train/test/val and place it under their respective folders in No-Subfig-Img. Also verifies if dataset is consistent.
Visualize single figure-caption pair from the large dataset
- Parameters:
data_split (String) – the data split (train/val/test) where the image is located
image_name (String) – the name of the image to generate from (withou the .png suffix)
Visualize single figure-caption from the human annotated dataset (including HF factors)
- Parameters:
image_name (String) – the name of the human annotated image to generate from (without the .png suffix)
- infer_hf(embeddings, scoring_model, quantization_levels=2)#
Using the trained scoring model, predicts the (inferred) human feedback scores for unseen figure-caption pairs
- Parameters:
embeddings ((N*D) numpy array) – embeddings of the figure-caption pairs for which human feedback needs to be inferred
scoring_model (scikit-learn MLP Regressor) – model which has been trained to predict human feedback scores given a figure-caption pair embedding
quantization_levels (natural number, default is 2) – calculates different percentiles of the inferred scores and quantizes the scores into the selected number of levels/bins. Higher quantized score corresponds to a higher score.
- Returns:
inferred_scores: (N,) numpy array with the predicted scores of figure-caption pairs using the supplied model
quantized_scores: (N,) numpy array with the quantized scores (calculated using the inferred scores)
- Return type:
Numpy Array, Numpy Array
- infer_hf_training_set(hf_score_type, embedding_model, MCSE_path=None, max_num_samples=None, quantization_levels=2, mapped_hf_labels=['bad', 'good'])#
Predict the (inferred) human feedback scores for the training-split of the dataset, using a scoring model trained on the human-annotated dataset
- Parameters:
hf_score_type (String) – the human-feedback score to be used from the human-annotated dataset. Select between [‘helpfulness’,’ocr’,’visual’,’takeaway’]
embedding_model (String) – Name of the Model to generate embeddings for the figure caption pair. Choose from [‘BLIP’, ‘BERT’, ‘SciBERT’, ‘MCSE’]
MCSE_path (String) – If MCSE is selected, supply path to the folder containing model weights
max_num_samples (natural number, default = all pairs in the specified split) – maximum number of samples from the specified split to generate embeddings for(picks the first N)
quantization_levels (natural number, default is 2) – calculates different percentiles of the inferred scores and quantizes the scores into the selected number of levels/bins. Higher score corresponds to a higher score.
mapped_hf_labels (List of strings, default is ["bad","good"]) – a list containing string labels corresponding to each quantization level (quantization_levels number of labels needed)
Note: for MCSE (flickr-mcse-roberta-base) , download the ‘flickr-mcse-robert-base’ model from ‘https://github.com/uds-lsv/MCSE’
- Returns:
inferred_hf_df containing rows with
file_name: image_name(with .png)
text: corresponding caption
inferred_hf: predicted human-feedback scores using a trained scoring model
quantized_hf: quantized inferred human-feedback scores
mapped_hf: mapped quantized human-feedback scores
- Return type:
Pandas DataFrame
- train_scoring_model(hf_embeddings, scores)#
Train a MLP Regressor Model to, given a figure-caption embedding, predict the desired human feedback score
- Parameters:
hf_embeddings (Numpy Array) – Embeddings for the figure-caption pairs
scores (Numpy Array) – (N,) numpy array containing the target human feedback scores
- Returns:
trained model which can predict human feedback score, given a figure-caption pair embedding
- Return type:
scikit-learn MLP Regressor