sparse_caption.models package

Submodules

sparse_caption.models.att_model module

Created on 14 Oct 2020 14:19:19 https://github.com/ruotianluo/self-critical.pytorch/tree/3.2

This file contains UpDown model

UpDown is from Bottom-Up and Top-Down Attention for Image Captioning and VQA https://arxiv.org/abs/1707.07998 However, it may not be identical to the author’s architecture.

class sparse_caption.models.att_model.AttModel(config, tokenizer: Optional[sparse_caption.tokenizer.Tokenizer] = None)

Bases: sparse_caption.models.caption_model.CaptionModel

clip_att(att_feats, att_masks)

get_logprobs_state(it, fc_feats, att_feats, p_att_feats, att_masks, state, output_logsoftmax=1)

init_hidden(bsz)

make_model()

training: bool

class sparse_caption.models.att_model.Attention(config)

Bases: torch.nn.modules.module.Module

forward(h, att_feats, p_att_feats, att_masks=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

class sparse_caption.models.att_model.UpDownCore(config, use_maxout=False)

Bases: torch.nn.modules.module.Module

forward(xt, fc_feats, att_feats, p_att_feats, state, att_masks=None)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

class sparse_caption.models.att_model.UpDownModel(config, tokenizer: Optional[sparse_caption.tokenizer.Tokenizer] = None)

Bases: sparse_caption.models.att_model.AttModel

COLLATE_FN: alias of sparse_caption.data.collate.AttCollate

static add_argparse_args(parser: Union[argparse._ArgumentGroup, argparse.ArgumentParser])

training: bool

sparse_caption.models.att_model_prune module

Created on 14 Oct 2020 14:34:47 @author: jiahuei

class sparse_caption.models.att_model_prune.AttModel(config, tokenizer: Optional[sparse_caption.tokenizer.Tokenizer] = None)

Bases: sparse_caption.models.att_model.AttModel

make_model()

training: bool

class sparse_caption.models.att_model_prune.Attention(config)

Bases: sparse_caption.models.att_model.Attention

training: bool

class sparse_caption.models.att_model_prune.UpDownCore(config, use_maxout=False)

Bases: sparse_caption.models.att_model.UpDownCore

training: bool

class sparse_caption.models.att_model_prune.UpDownModel(config, tokenizer: Optional[sparse_caption.tokenizer.Tokenizer] = None)

Bases: sparse_caption.pruning.prune.PruningMixin, sparse_caption.models.att_model_prune.AttModel

COLLATE_FN: alias of sparse_caption.data.collate.AttCollate

static add_argparse_args(parser: Union[argparse._ArgumentGroup, argparse.ArgumentParser])

sparse_caption.models.caption_model module

https://github.com/ruotianluo/self-critical.pytorch/tree/3.2

This file contains ShowAttendTell and AllImg model

ShowAttendTell is from Show, Attend and Tell: Neural Image Caption Generation with Visual Attention https://arxiv.org/abs/1502.03044

AllImg is a model where img feature is concatenated with word embedding at every time step as the input of lstm

class sparse_caption.models.caption_model.CaptionModel

Bases: torch.nn.modules.module.Module

batch_beam_search(init_state, init_logprobs, *args, **kwargs)

forward(*args, **kwargs)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

static sample_next_word(logprobs, sample_method, temperature)

training: bool

sparse_caption.models.relation_transformer module

https://github.com/yahoo/object_relation_transformer

# Please see LICENSE file in the project root for terms.

class sparse_caption.models.relation_transformer.BoxMultiHeadedAttention(h, d_model, trigonometric_embedding=True, dropout=0.1, share_att=None)

Bases: torch.nn.modules.module.Module

Self-attention layer with relative position weights. Following the paper “Relation Networks for Object Detection” in https://arxiv.org/pdf/1711.11575.pdf

static BoxRelationalEmbedding(f_g, dim_g=64, wave_len=1000, trigonometric_embedding=True)

Given a tensor with bbox coordinates for detected objects on each batch image, this function computes a matrix for each image

with entry (i,j) given by a vector representation of the displacement between the coordinates of bbox_i, and bbox_j

input: np.array of shape=(batch_size, max_nr_bounding_boxes, 4) output: np.array of shape=(batch_size, max_nr_bounding_boxes, max_nr_bounding_boxes, 64)

static box_attention(query, key, value, box_relation_embds_matrix, mask=None, dropout=None): Compute ‘Scaled Dot Product Attention as in paper Relation Networks for Object Detection’. Follow the implementation in https://github.com/heefe92/Relation_Networks-pytorch/blob/master/model.py#L1026-L1055

forward(input_query, input_key, input_value, input_box, mask=None): Implements Figure 2 of Relation Network for Object Detection

training: bool

class sparse_caption.models.relation_transformer.Encoder(layer, N, share_layer=None)

Bases: torch.nn.modules.module.Module

Core encoder is a stack of N layers

forward(x, box, mask): Pass the input (and mask) through each layer in turn.

training: bool

class sparse_caption.models.relation_transformer.EncoderDecoder(encoder, decoder, src_embed, tgt_embed, generator)

Bases: torch.nn.modules.module.Module

A standard Encoder-Decoder architecture. Base for this and many other models.

decode(memory, src_mask, tgt, tgt_mask)

encode(src, boxes, src_mask)

forward(src, boxes, tgt, src_mask, tgt_mask): Take in and process masked src and target sequences.

training: bool

class sparse_caption.models.relation_transformer.EncoderLayer(size, self_attn, feed_forward, dropout)

Bases: torch.nn.modules.module.Module

Encoder is made up of self-attn and feed forward (defined below)

forward(x, box, mask): Follow Figure 1 (left) for connections.

training: bool

class sparse_caption.models.relation_transformer.RelationTransformerModel(config)

Bases: sparse_caption.models.transformer.CachedTransformerBase

COLLATE_FN: alias of sparse_caption.data.collate.ObjectRelationCollate

static add_argparse_args(parser: Union[argparse._ArgumentGroup, argparse.ArgumentParser])

static clip_att(att_feats, att_masks)

get_logprobs_state(it, memory, mask, state): state = [ys.unsqueeze(0)]

make_model(h=8, dropout=0.1): Helper: Construct a model from hyperparameters.

static subsequent_mask(size): Mask out subsequent positions.

training: bool

sparse_caption.models.relation_transformer_prune module

Created on 09 Oct 2020 17:27:20 @author: jiahuei

class sparse_caption.models.relation_transformer_prune.BoxMultiHeadedAttention(mask_type, mask_init_value, h, d_model, trigonometric_embedding=True, dropout=0.03333333333333333, share_att=None)

Bases: sparse_caption.models.relation_transformer.BoxMultiHeadedAttention

Self-attention layer with relative position weights. Following the paper “Relation Networks for Object Detection” in https://arxiv.org/pdf/1711.11575.pdf

training: bool

class sparse_caption.models.relation_transformer_prune.CachedMultiHeadedAttention(mask_type, mask_init_value, h, d_model, dropout=0.03333333333333333, self_attention=False, share_att=None)

Bases: sparse_caption.models.transformer.CachedMultiHeadedAttention

training: bool

class sparse_caption.models.relation_transformer_prune.Embeddings(mask_type, mask_init_value, d_model, vocab)

Bases: sparse_caption.models.transformer.InputEmbedding

training: bool

class sparse_caption.models.relation_transformer_prune.EncoderDecoder(*, mask_type, mask_freeze_scope='', **kwargs)

Bases: sparse_caption.pruning.prune.PruningMixin, sparse_caption.models.relation_transformer.EncoderDecoder

A standard Encoder-Decoder architecture. Base for this and many other models.

class sparse_caption.models.relation_transformer_prune.Generator(mask_type, mask_init_value, d_model, vocab)

Bases: sparse_caption.models.transformer.OutputEmbedding

Define standard linear + softmax generation step.

training: bool

class sparse_caption.models.relation_transformer_prune.PositionwiseFeedForward(mask_type, mask_init_value, d_model, d_ff, dropout=0.03333333333333333)

Bases: sparse_caption.models.transformer.PositionwiseFeedForward

Implements FFN equation.

training: bool

class sparse_caption.models.relation_transformer_prune.RelationTransformerModel(config)

Bases: sparse_caption.pruning.prune.PruningMixin, sparse_caption.models.relation_transformer.RelationTransformerModel

static add_argparse_args(parser: Union[argparse._ArgumentGroup, argparse.ArgumentParser])

make_model(h=8, dropout=0.03333333333333333): Helper: Construct a model from hyperparameters.

sparse_caption.models.transformer module

Created on 28 Dec 2020 18:00:01 @author: jiahuei

Based on The Annotated Transformer https://nlp.seas.harvard.edu/2018/04/03/attention.html

sparse_caption.models.transformer.CMHA: alias of sparse_caption.models.transformer.CachedMultiHeadedAttention

class sparse_caption.models.transformer.CachedMultiHeadedAttention(*args, **kwargs)

Bases: sparse_caption.models.transformer.MultiHeadedAttention

reset_cache()

training: bool

class sparse_caption.models.transformer.CachedTransformerBase(config)

Bases: sparse_caption.models.caption_model.CaptionModel

static add_argparse_args(parser: Union[argparse._ArgumentGroup, argparse.ArgumentParser])

static disable_incremental_decoding(module)

static enable_incremental_decoding(module)

training: bool

class sparse_caption.models.transformer.Decoder(layer, N, share_layer=None)

Bases: torch.nn.modules.module.Module

Generic N layer decoder with masking.

forward(x, memory, src_mask, tgt_mask)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

class sparse_caption.models.transformer.DecoderLayer(size, self_attn, src_attn, feed_forward, dropout)

Bases: torch.nn.modules.module.Module

Decoder is made of self-attn, src-attn, and feed forward (defined below)

forward(x, memory, src_mask, tgt_mask): Follow Figure 1 (right) for connections.

training: bool

class sparse_caption.models.transformer.Encoder(layer, N, share_layer=None)

Bases: torch.nn.modules.module.Module

Core encoder is a stack of N layers

forward(x, mask): Pass the input (and mask) through each layer in turn.

training: bool

class sparse_caption.models.transformer.EncoderDecoder(encoder: Callable, decoder: Callable, src_embed: Callable, tgt_embed: Callable, generator: Callable, autoregressive: bool = True, pad_idx: int = 0)

Bases: torch.nn.modules.module.Module

A standard Encoder-Decoder architecture. Base for this and many other models.

decode(tgt: torch.Tensor, memory: torch.Tensor, memory_mask: torch.Tensor)

Parameters

tgt – (N, T)
memory – (N, S, E)
memory_mask – (N, S)

Returns:

encode(src: torch.Tensor, src_mask: torch.Tensor)

Parameters

src – (N, S, E)
src_mask – (N, S)

Returns:

forward(src: torch.Tensor, src_mask: torch.Tensor, tgt: torch.Tensor)

Parameters

src – (N, S, E)
src_mask – (N, S)
tgt – (N, T)

Returns:

generate(x)

training: bool

class sparse_caption.models.transformer.EncoderLayer(size, self_attn, feed_forward, dropout)

Bases: torch.nn.modules.module.Module

Encoder is made up of self-attn and feed forward

forward(x, mask): Follow Figure 1 (left) for connections.

training: bool

class sparse_caption.models.transformer.InputEmbedding(d_model, vocab)

Bases: torch.nn.modules.module.Module

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

class sparse_caption.models.transformer.LayerNorm(features, eps=1e-06)

Bases: torch.nn.modules.module.Module

Construct a layernorm module (See citation for details).

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

sparse_caption.models.transformer.MHA: alias of sparse_caption.models.transformer.MultiHeadedAttention

class sparse_caption.models.transformer.MultiHeadedAttention(h, d_model, dropout=0.1, self_attention=False, share_att=None)

Bases: torch.nn.modules.module.Module

static attention(query, key, value, mask=None, dropout=None): Compute ‘Scaled Dot Product Attention’

forward(query, key, value, mask=None): Implements Figure 2

training: bool

class sparse_caption.models.transformer.OutputEmbedding(d_model, vocab)

Bases: torch.nn.modules.module.Module

Define standard linear + softmax generation step.

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

class sparse_caption.models.transformer.PositionalEncoding(d_model, dropout, max_len=5000)

Bases: torch.nn.modules.module.Module

Implement the PE function.

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

reset_cache()

training: bool

class sparse_caption.models.transformer.PositionwiseFeedForward(d_model, d_ff, dropout=0.1)

Bases: torch.nn.modules.module.Module

Implements FFN equation.

forward(x)

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

class sparse_caption.models.transformer.SublayerConnection(size, dropout)

Bases: torch.nn.modules.module.Module

A residual connection followed by a layer norm. Note for code simplicity the norm is first as opposed to last.

forward(x, sublayer): Apply residual connection to any sublayer with the same size.

training: bool

class sparse_caption.models.transformer.Transformer(config)

Bases: sparse_caption.models.transformer.CachedTransformerBase

COLLATE_FN: alias of sparse_caption.data.collate.UpDownCollate

static add_argparse_args(parser: Union[argparse._ArgumentGroup, argparse.ArgumentParser])

get_logprobs_state(it, memory, mask, state): state = [ys.unsqueeze(0)]

make_model()

training: bool

Module contents

Created on 28 Aug 2020 12:43:22 @author: jiahuei

sparse_caption.models.get_model(name: str) → Any

sparse_caption.models.register_model(name)

New models can be added with the register_model() function decorator.

For example:

@register_model('relation_transformer')
class RelationTransformerModel:
    (...)

Parameters: name (str) – the name of the model