FigCapsHF Documentation#

class FigCapsHF.FigCapsHF(benchmark_path)#

Main class for FigCapsHF

generate_embeddings(image_paths, captions_list, embedding_model, MCSE_Path=None)#

Generate embeddings from a specified model for a list of given figure-caption pairs

Parameters:
  • image_paths (List) – paths to the figure image

  • captions_list (List) – list of captions for the corresponding figure images

  • embedding_model (String) – Name of the Model to generate embeddings for the figure caption pair. Choose from [‘BLIP’, ‘BERT’, ‘SciBERT’, ‘MCSE’]

  • MCSE_path (String) – If MCSE is selected, supply path to the folder containing model weights

Returns:

embeddings for each figure-caption pair

Return type:

(N*D) numpy array

Note: for MCSE (flickr-mcse-roberta-base) , download the ‘flickr-mcse-robert-base’ model from ‘https://github.com/uds-lsv/MCSE

generate_embeddings_ds_split(embedding_model, MCSE_path=None, split_name='train', max_num_samples=None)#

Generate embeddings from a specified model for the figure-caption pairs in the larger dataset.

Parameters:
  • embedding_model (String) – Name of the Model to generate embeddings for the figure caption pair. Choose from [‘BLIP’, ‘BERT’, ‘SciBERT’, ‘MCSE’]

  • split_name (String, default = 'train') – data split to generate embeddings from. Choose from [‘train’,’test’,’val’]

  • max_num_samples (natural number, default = all pairs in the specified split) – maximum number of samples from the specified split to generate embeddings for(picks the first N)

  • MCSE_path (String) – If MCSE is selected, supply path to the folder containing model weights

Returns:

ds_split_embeddings: (N*D) numpy array containing embeddings for figure-caption pairs in the specified data split

image_names: list containing the paths of the figure-images

captions: list containing the captions corresponding to the figure-images

Return type:

Numpy Array, List, List

Note: for MCSE (flickr-mcse-roberta-base) , download the ‘flickr-mcse-robert-base’ model from ‘https://github.com/uds-lsv/MCSE

generate_embeddings_hf_anno(hf_score_type, embedding_model, MCSE_path=None)#

Generate embeddings from a specified model for the ~400 figure-caption pairs in the human-annotated dataset

Parameters:
  • hf_score_type (String) – the human-feedback score to be used from the human-annotated dataset. Select between [‘helpfulness’,’ocr’,’visual’,’takeaway’]

  • embedding_model (String) – Name of the Model to generate embeddings for the figure caption pair. Choose from [‘BLIP’, ‘BERT’, ‘SciBERT’, ‘MCSE’]

  • MCSE_path (String) – If MCSE is selected, supply path to the folder containing model weights

Returns:

hf_ds_embeddings: (N*D) numpy array containing embeddings for figure-caption pairs in the human annotated dataset. Note: pairs with an empty specified human-feedback score are removed.

scores: (N,) numpy array containing the specified human-feedback scores (corresponding to the embeddings)

Return type:

Numpy Array, Numpy Array

Note: for MCSE (flickr-mcse-roberta-base) , download the ‘flickr-mcse-robert-base’ model from ‘https://github.com/uds-lsv/MCSE

generate_jsonl()#

Used to generate metadata.jsonl’s for train/test/val and place it under their respective folders in No-Subfig-Img. Also verifies if dataset is consistent.

get_image_caption_pair(data_split, image_name)#

Visualize single figure-caption pair from the large dataset

Parameters:
  • data_split (String) – the data split (train/val/test) where the image is located

  • image_name (String) – the name of the image to generate from (withou the .png suffix)

get_image_caption_pair_hf(image_name)#

Visualize single figure-caption from the human annotated dataset (including HF factors)

Parameters:

image_name (String) – the name of the human annotated image to generate from (without the .png suffix)

infer_hf(embeddings, scoring_model, quantization_levels=2)#

Using the trained scoring model, predicts the (inferred) human feedback scores for unseen figure-caption pairs

Parameters:
  • embeddings ((N*D) numpy array) – embeddings of the figure-caption pairs for which human feedback needs to be inferred

  • scoring_model (scikit-learn MLP Regressor) – model which has been trained to predict human feedback scores given a figure-caption pair embedding

  • quantization_levels (natural number, default is 2) – calculates different percentiles of the inferred scores and quantizes the scores into the selected number of levels/bins. Higher quantized score corresponds to a higher score.

Returns:

inferred_scores: (N,) numpy array with the predicted scores of figure-caption pairs using the supplied model

quantized_scores: (N,) numpy array with the quantized scores (calculated using the inferred scores)

Return type:

Numpy Array, Numpy Array

infer_hf_training_set(hf_score_type, embedding_model, MCSE_path=None, max_num_samples=None, quantization_levels=2, mapped_hf_labels=['bad', 'good'])#

Predict the (inferred) human feedback scores for the training-split of the dataset, using a scoring model trained on the human-annotated dataset

Parameters:
  • hf_score_type (String) – the human-feedback score to be used from the human-annotated dataset. Select between [‘helpfulness’,’ocr’,’visual’,’takeaway’]

  • embedding_model (String) – Name of the Model to generate embeddings for the figure caption pair. Choose from [‘BLIP’, ‘BERT’, ‘SciBERT’, ‘MCSE’]

  • MCSE_path (String) – If MCSE is selected, supply path to the folder containing model weights

  • max_num_samples (natural number, default = all pairs in the specified split) – maximum number of samples from the specified split to generate embeddings for(picks the first N)

  • quantization_levels (natural number, default is 2) – calculates different percentiles of the inferred scores and quantizes the scores into the selected number of levels/bins. Higher score corresponds to a higher score.

  • mapped_hf_labels (List of strings, default is ["bad","good"]) – a list containing string labels corresponding to each quantization level (quantization_levels number of labels needed)

Note: for MCSE (flickr-mcse-roberta-base) , download the ‘flickr-mcse-robert-base’ model from ‘https://github.com/uds-lsv/MCSE

Returns:

inferred_hf_df containing rows with

  • file_name: image_name(with .png)

  • text: corresponding caption

  • inferred_hf: predicted human-feedback scores using a trained scoring model

  • quantized_hf: quantized inferred human-feedback scores

  • mapped_hf: mapped quantized human-feedback scores

Return type:

Pandas DataFrame

train_scoring_model(hf_embeddings, scores)#

Train a MLP Regressor Model to, given a figure-caption embedding, predict the desired human feedback score

Parameters:
  • hf_embeddings (Numpy Array) – Embeddings for the figure-caption pairs

  • scores (Numpy Array) – (N,) numpy array containing the target human feedback scores

Returns:

trained model which can predict human feedback score, given a figure-caption pair embedding

Return type:

scikit-learn MLP Regressor