sparse_caption.models package

Submodules

sparse_caption.models.att_model module

Created on 14 Oct 2020 14:19:19 https://github.com/ruotianluo/self-critical.pytorch/tree/3.2

This file contains UpDown model

UpDown is from Bottom-Up and Top-Down Attention for Image Captioning and VQA https://arxiv.org/abs/1707.07998 However, it may not be identical to the author’s architecture.

class sparse_caption.models.att_model.AttModel(config, tokenizer: Optional[sparse_caption.tokenizer.Tokenizer] = None)

Bases: sparse_caption.models.caption_model.CaptionModel

clip_att(att_feats, att_masks)
get_logprobs_state(it, fc_feats, att_feats, p_att_feats, att_masks, state, output_logsoftmax=1)
init_hidden(bsz)
make_model()
training: bool
class sparse_caption.models.att_model.Attention(config)

Bases: torch.nn.modules.module.Module

forward(h, att_feats, p_att_feats, att_masks=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class sparse_caption.models.att_model.UpDownCore(config, use_maxout=False)

Bases: torch.nn.modules.module.Module

forward(xt, fc_feats, att_feats, p_att_feats, state, att_masks=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class sparse_caption.models.att_model.UpDownModel(config, tokenizer: Optional[sparse_caption.tokenizer.Tokenizer] = None)

Bases: sparse_caption.models.att_model.AttModel

COLLATE_FN

alias of sparse_caption.data.collate.AttCollate

static add_argparse_args(parser: Union[argparse._ArgumentGroup, argparse.ArgumentParser])
training: bool

sparse_caption.models.att_model_prune module

Created on 14 Oct 2020 14:34:47 @author: jiahuei

class sparse_caption.models.att_model_prune.AttModel(config, tokenizer: Optional[sparse_caption.tokenizer.Tokenizer] = None)

Bases: sparse_caption.models.att_model.AttModel

make_model()
training: bool
class sparse_caption.models.att_model_prune.Attention(config)

Bases: sparse_caption.models.att_model.Attention

training: bool
class sparse_caption.models.att_model_prune.UpDownCore(config, use_maxout=False)

Bases: sparse_caption.models.att_model.UpDownCore

training: bool
class sparse_caption.models.att_model_prune.UpDownModel(config, tokenizer: Optional[sparse_caption.tokenizer.Tokenizer] = None)

Bases: sparse_caption.pruning.prune.PruningMixin, sparse_caption.models.att_model_prune.AttModel

COLLATE_FN

alias of sparse_caption.data.collate.AttCollate

static add_argparse_args(parser: Union[argparse._ArgumentGroup, argparse.ArgumentParser])

sparse_caption.models.caption_model module

https://github.com/ruotianluo/self-critical.pytorch/tree/3.2

This file contains ShowAttendTell and AllImg model

ShowAttendTell is from Show, Attend and Tell: Neural Image Caption Generation with Visual Attention https://arxiv.org/abs/1502.03044

AllImg is a model where img feature is concatenated with word embedding at every time step as the input of lstm

class sparse_caption.models.caption_model.CaptionModel

Bases: torch.nn.modules.module.Module

forward(*args, **kwargs)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

static sample_next_word(logprobs, sample_method, temperature)
training: bool

sparse_caption.models.relation_transformer module

https://github.com/yahoo/object_relation_transformer

# Please see LICENSE file in the project root for terms.

class sparse_caption.models.relation_transformer.BoxMultiHeadedAttention(h, d_model, trigonometric_embedding=True, dropout=0.1, share_att=None)

Bases: torch.nn.modules.module.Module

Self-attention layer with relative position weights. Following the paper “Relation Networks for Object Detection” in https://arxiv.org/pdf/1711.11575.pdf

static BoxRelationalEmbedding(f_g, dim_g=64, wave_len=1000, trigonometric_embedding=True)

Given a tensor with bbox coordinates for detected objects on each batch image, this function computes a matrix for each image

with entry (i,j) given by a vector representation of the displacement between the coordinates of bbox_i, and bbox_j

input: np.array of shape=(batch_size, max_nr_bounding_boxes, 4) output: np.array of shape=(batch_size, max_nr_bounding_boxes, max_nr_bounding_boxes, 64)

static box_attention(query, key, value, box_relation_embds_matrix, mask=None, dropout=None)

Compute ‘Scaled Dot Product Attention as in paper Relation Networks for Object Detection’. Follow the implementation in https://github.com/heefe92/Relation_Networks-pytorch/blob/master/model.py#L1026-L1055

forward(input_query, input_key, input_value, input_box, mask=None)

Implements Figure 2 of Relation Network for Object Detection

training: bool
class sparse_caption.models.relation_transformer.Encoder(layer, N, share_layer=None)

Bases: torch.nn.modules.module.Module

Core encoder is a stack of N layers

forward(x, box, mask)

Pass the input (and mask) through each layer in turn.

training: bool
class sparse_caption.models.relation_transformer.EncoderDecoder(encoder, decoder, src_embed, tgt_embed, generator)

Bases: torch.nn.modules.module.Module

A standard Encoder-Decoder architecture. Base for this and many other models.

decode(memory, src_mask, tgt, tgt_mask)
encode(src, boxes, src_mask)
forward(src, boxes, tgt, src_mask, tgt_mask)

Take in and process masked src and target sequences.

training: bool
class sparse_caption.models.relation_transformer.EncoderLayer(size, self_attn, feed_forward, dropout)

Bases: torch.nn.modules.module.Module

Encoder is made up of self-attn and feed forward (defined below)

forward(x, box, mask)

Follow Figure 1 (left) for connections.

training: bool
class sparse_caption.models.relation_transformer.RelationTransformerModel(config)

Bases: sparse_caption.models.transformer.CachedTransformerBase

COLLATE_FN

alias of sparse_caption.data.collate.ObjectRelationCollate

static add_argparse_args(parser: Union[argparse._ArgumentGroup, argparse.ArgumentParser])
static clip_att(att_feats, att_masks)
get_logprobs_state(it, memory, mask, state)

state = [ys.unsqueeze(0)]

make_model(h=8, dropout=0.1)

Helper: Construct a model from hyperparameters.

static subsequent_mask(size)

Mask out subsequent positions.

training: bool

sparse_caption.models.relation_transformer_prune module

Created on 09 Oct 2020 17:27:20 @author: jiahuei

class sparse_caption.models.relation_transformer_prune.BoxMultiHeadedAttention(mask_type, mask_init_value, h, d_model, trigonometric_embedding=True, dropout=0.03333333333333333, share_att=None)

Bases: sparse_caption.models.relation_transformer.BoxMultiHeadedAttention

Self-attention layer with relative position weights. Following the paper “Relation Networks for Object Detection” in https://arxiv.org/pdf/1711.11575.pdf

training: bool
class sparse_caption.models.relation_transformer_prune.CachedMultiHeadedAttention(mask_type, mask_init_value, h, d_model, dropout=0.03333333333333333, self_attention=False, share_att=None)

Bases: sparse_caption.models.transformer.CachedMultiHeadedAttention

training: bool
class sparse_caption.models.relation_transformer_prune.Embeddings(mask_type, mask_init_value, d_model, vocab)

Bases: sparse_caption.models.transformer.InputEmbedding

training: bool
class sparse_caption.models.relation_transformer_prune.EncoderDecoder(*, mask_type, mask_freeze_scope='', **kwargs)

Bases: sparse_caption.pruning.prune.PruningMixin, sparse_caption.models.relation_transformer.EncoderDecoder

A standard Encoder-Decoder architecture. Base for this and many other models.

class sparse_caption.models.relation_transformer_prune.Generator(mask_type, mask_init_value, d_model, vocab)

Bases: sparse_caption.models.transformer.OutputEmbedding

Define standard linear + softmax generation step.

training: bool
class sparse_caption.models.relation_transformer_prune.PositionwiseFeedForward(mask_type, mask_init_value, d_model, d_ff, dropout=0.03333333333333333)

Bases: sparse_caption.models.transformer.PositionwiseFeedForward

Implements FFN equation.

training: bool
class sparse_caption.models.relation_transformer_prune.RelationTransformerModel(config)

Bases: sparse_caption.pruning.prune.PruningMixin, sparse_caption.models.relation_transformer.RelationTransformerModel

static add_argparse_args(parser: Union[argparse._ArgumentGroup, argparse.ArgumentParser])
make_model(h=8, dropout=0.03333333333333333)

Helper: Construct a model from hyperparameters.

sparse_caption.models.transformer module

Created on 28 Dec 2020 18:00:01 @author: jiahuei

Based on The Annotated Transformer https://nlp.seas.harvard.edu/2018/04/03/attention.html

sparse_caption.models.transformer.CMHA

alias of sparse_caption.models.transformer.CachedMultiHeadedAttention

class sparse_caption.models.transformer.CachedMultiHeadedAttention(*args, **kwargs)

Bases: sparse_caption.models.transformer.MultiHeadedAttention

reset_cache()
training: bool
class sparse_caption.models.transformer.CachedTransformerBase(config)

Bases: sparse_caption.models.caption_model.CaptionModel

static add_argparse_args(parser: Union[argparse._ArgumentGroup, argparse.ArgumentParser])
static disable_incremental_decoding(module)
static enable_incremental_decoding(module)
training: bool
class sparse_caption.models.transformer.Decoder(layer, N, share_layer=None)

Bases: torch.nn.modules.module.Module

Generic N layer decoder with masking.

forward(x, memory, src_mask, tgt_mask)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class sparse_caption.models.transformer.DecoderLayer(size, self_attn, src_attn, feed_forward, dropout)

Bases: torch.nn.modules.module.Module

Decoder is made of self-attn, src-attn, and feed forward (defined below)

forward(x, memory, src_mask, tgt_mask)

Follow Figure 1 (right) for connections.

training: bool
class sparse_caption.models.transformer.Encoder(layer, N, share_layer=None)

Bases: torch.nn.modules.module.Module

Core encoder is a stack of N layers

forward(x, mask)

Pass the input (and mask) through each layer in turn.

training: bool
class sparse_caption.models.transformer.EncoderDecoder(encoder: Callable, decoder: Callable, src_embed: Callable, tgt_embed: Callable, generator: Callable, autoregressive: bool = True, pad_idx: int = 0)

Bases: torch.nn.modules.module.Module

A standard Encoder-Decoder architecture. Base for this and many other models.

decode(tgt: torch.Tensor, memory: torch.Tensor, memory_mask: torch.Tensor)
Parameters
  • tgt – (N, T)

  • memory – (N, S, E)

  • memory_mask – (N, S)

Returns:

encode(src: torch.Tensor, src_mask: torch.Tensor)
Parameters
  • src – (N, S, E)

  • src_mask – (N, S)

Returns:

forward(src: torch.Tensor, src_mask: torch.Tensor, tgt: torch.Tensor)
Parameters
  • src – (N, S, E)

  • src_mask – (N, S)

  • tgt – (N, T)

Returns:

generate(x)
training: bool
class sparse_caption.models.transformer.EncoderLayer(size, self_attn, feed_forward, dropout)

Bases: torch.nn.modules.module.Module

Encoder is made up of self-attn and feed forward

forward(x, mask)

Follow Figure 1 (left) for connections.

training: bool
class sparse_caption.models.transformer.InputEmbedding(d_model, vocab)

Bases: torch.nn.modules.module.Module

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class sparse_caption.models.transformer.LayerNorm(features, eps=1e-06)

Bases: torch.nn.modules.module.Module

Construct a layernorm module (See citation for details).

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
sparse_caption.models.transformer.MHA

alias of sparse_caption.models.transformer.MultiHeadedAttention

class sparse_caption.models.transformer.MultiHeadedAttention(h, d_model, dropout=0.1, self_attention=False, share_att=None)

Bases: torch.nn.modules.module.Module

static attention(query, key, value, mask=None, dropout=None)

Compute ‘Scaled Dot Product Attention’

forward(query, key, value, mask=None)

Implements Figure 2

training: bool
class sparse_caption.models.transformer.OutputEmbedding(d_model, vocab)

Bases: torch.nn.modules.module.Module

Define standard linear + softmax generation step.

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class sparse_caption.models.transformer.PositionalEncoding(d_model, dropout, max_len=5000)

Bases: torch.nn.modules.module.Module

Implement the PE function.

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset_cache()
training: bool
class sparse_caption.models.transformer.PositionwiseFeedForward(d_model, d_ff, dropout=0.1)

Bases: torch.nn.modules.module.Module

Implements FFN equation.

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class sparse_caption.models.transformer.SublayerConnection(size, dropout)

Bases: torch.nn.modules.module.Module

A residual connection followed by a layer norm. Note for code simplicity the norm is first as opposed to last.

forward(x, sublayer)

Apply residual connection to any sublayer with the same size.

training: bool
class sparse_caption.models.transformer.Transformer(config)

Bases: sparse_caption.models.transformer.CachedTransformerBase

COLLATE_FN

alias of sparse_caption.data.collate.UpDownCollate

static add_argparse_args(parser: Union[argparse._ArgumentGroup, argparse.ArgumentParser])
get_logprobs_state(it, memory, mask, state)

state = [ys.unsqueeze(0)]

make_model()
training: bool

Module contents

Created on 28 Aug 2020 12:43:22 @author: jiahuei

sparse_caption.models.get_model(name: str) Any
sparse_caption.models.register_model(name)

New models can be added with the register_model() function decorator.

For example:

@register_model('relation_transformer')
class RelationTransformerModel:
    (...)
Parameters

name (str) – the name of the model