πŸ“„ 1x1 conv πŸ“„ Activation Functions πŸ“„ Adagrad πŸ“„ Adaptive Gradient Clipping πŸ“„ Adaptive Input Representation πŸ“„ Additive Attention πŸ“„ Additive coupling layer πŸ“„ ADVENT πŸ“„ ALBERT πŸ“„ Alex Net πŸ“„ Alphacode πŸ“„ architecture πŸ“„ Atrous Convolution πŸ“„ Attention NMT πŸ“„ Attention πŸ“„ AudioLM πŸ“„ Auto Encoders πŸ“„ AutoDistill πŸ“„ autoregressive flows πŸ“„ Backprop πŸ“„ Bahdanau Attention πŸ“„ BART πŸ“„ Basic GAN πŸ“„ Basic RNN Architectures πŸ“„ Basic Transformer πŸ“„ Basics of Federated Learning πŸ“„ Beam search πŸ“„ BEiT πŸ“„ BERT πŸ“„ Best Maching Unit πŸ“„ Bi Directional RNN πŸ“„ Bias nodes πŸ“„ Big Bird πŸ“„ Big-Bench πŸ“„ BinaryBERT πŸ“„ BlockDrop πŸ“„ BlockNeRF πŸ“„ Bruckhaus - 2024 - RAG Does Not Work for Enterprises πŸ“„ CAM πŸ“„ Capsule Layer πŸ“„ Capsule Network πŸ“„ Causal 1D Conv πŸ“„ Causal Dilated Conv πŸ“„ Causal Language Model πŸ“„ Channels πŸ“„ Chat GPT is Not All You Need πŸ“„ ChatGPT πŸ“„ Chinchilla πŸ“„ Classifier Gradients πŸ“„ CLIP πŸ“„ Codex πŸ“„ cognitivemodel πŸ“„ Collaborative Topic Regression πŸ“„ Complete AI Pipeline πŸ“„ Computational Graph πŸ“„ Conditional GAN πŸ“„ Conformer πŸ“„ Content Based Attention πŸ“„ Contrastive Predictive Coding πŸ“„ Conv πŸ“„ ConvBERT πŸ“„ ConvNeXt πŸ“„ Convolutional RNN πŸ“„ coupling flows πŸ“„ cross-layer parameter sharing πŸ“„ Curriculum Learning πŸ“„ CvT πŸ“„ CycleGAN πŸ“„ DALL-E 3 πŸ“„ DALL-E πŸ“„ DALLΒ·E 2 πŸ“„ data2vec πŸ“„ DCGAN πŸ“„ DeepFM πŸ“„ DeepLearning πŸ“„ DeepNet πŸ“„ DeepPERF πŸ“„ DeiT πŸ“„ Denoising Autoencoder πŸ“„ Dense Net πŸ“„ Dense Skip Connections πŸ“„ Dense Vector Indexes πŸ“„ Dense πŸ“„ Density estimation using real NVP πŸ“„ Depth Efficiency of Neural Networks πŸ“„ Depthwise Separable πŸ“„ dGSLM πŸ“„ Diffusion LM πŸ“„ Dilated Sliding Window Attention πŸ“„ Dirac Delta πŸ“„ DistillBERT πŸ“„ DLRM πŸ“„ Dot Product Attention πŸ“„ Dreamfusion πŸ“„ Dynamic Eager Execution πŸ“„ Dynamic Sparsity πŸ“„ Effect Of Depth πŸ“„ EfficientNet πŸ“„ EigenCAM πŸ“„ ELECTRA πŸ“„ Elementwise Flows πŸ“„ ELMO πŸ“„ Elu πŸ“„ Encoder Decoder Attention πŸ“„ Ensemble Distillation πŸ“„ Exploding Gradient πŸ“„ FaceNet πŸ“„ Factorized Embedding Parameters πŸ“„ Faster RCNN πŸ“„ FastText πŸ“„ Feature Correlationa πŸ“„ Fixed Factorization Attention πŸ“„ Flamingo πŸ“„ FLASH πŸ“„ FLAVA πŸ“„ FlowNet πŸ“„ FTSwish πŸ“„ Galactica πŸ“„ GAN Z Space πŸ“„ Gato πŸ“„ GAU πŸ“„ GELU πŸ“„ Generalizing Adversarial Explanations with Grad-CAM πŸ“„ Generative Models πŸ“„ Generative RNN πŸ“„ Generative Spoken Language Modeling πŸ“„ Generative vs Discriminative Models πŸ“„ Git Commands πŸ“„ Global and Sliding Window Attention πŸ“„ Global Average Pooling πŸ“„ GloVE πŸ“„ GLOW πŸ“„ Google NMT πŸ“„ GPT πŸ“„ GPT3 πŸ“„ Grad-CAM πŸ“„ Gradient Accumulation πŸ“„ Gradient Clipping πŸ“„ GRU πŸ“„ Hallucination Text Generation πŸ“„ Heaviside πŸ“„ HiFI-GAN Denoising πŸ“„ HiFI-GAN Synthesis πŸ“„ Higher Layer Capsule πŸ“„ Highway Convolutions πŸ“„ HNSW πŸ“„ Hopfield networks πŸ“„ HyTAS πŸ“„ i-Code πŸ“„ ill conditioning πŸ“„ Imagen πŸ“„ Improved variational inference with inverse autoregressive flows πŸ“„ Inception πŸ“„ Influence of image classification accuracy on saliency map estimation πŸ“„ Instant NeRF πŸ“„ Interpreting Attention πŸ“„ inverse autoregressive flows πŸ“„ Isotropic Architectures πŸ“„ IVFADC πŸ“„ Joint Factor Analysis πŸ“„ Jukebox πŸ“„ Klue ML Engineer πŸ“„ LaMDA πŸ“„ Large Kernel in Attention πŸ“„ Large Kernel in Convolution πŸ“„ LASER πŸ“„ Layers πŸ“„ Le Net πŸ“„ Learning Rate Scheduling πŸ“„ Linear Classifier Probes πŸ“„ Linear Flows πŸ“„ Lisht πŸ“„ Listen Attend Spell πŸ“„ LLM Guide πŸ“„ Location Aware Attention πŸ“„ Location Base Attention πŸ“„ Long Short Term Memory (LSTM) πŸ“„ Longformer πŸ“„ Lost in the Middle How Language Models Use Long Contexts πŸ“„ Machine Learning Tool Landscape πŸ“„ MADE - Masked autoencoder for distribution estimation πŸ“„ Magic3D πŸ“„ Masked Autoencoders πŸ“„ Masked autoregressive flow for density estimation πŸ“„ MCnet πŸ“„ Minerva πŸ“„ Mixed chunk attention πŸ“„ ML Production Flow πŸ“„ MLIM πŸ“„ MLM πŸ“„ MLOps Learning πŸ“„ Mobile Net πŸ“„ MobileOne πŸ“„ Model Capacity πŸ“„ Multi Head Attention πŸ“„ Multi Scale Flows πŸ“„ Multiplicative Attention πŸ“„ Muse πŸ“„ Nasnet πŸ“„ Network Dissection Quantifying Interpretability of Deep Visual Representions πŸ“„ Neural Network Architecture Cheat Sheet πŸ“„ Neural Probabilistic Model πŸ“„ Neural Text Degeneration πŸ“„ NICE - non linear independant components estimation πŸ“„ Noisy Relu πŸ“„ Non Maxima Supression πŸ“„ On the overlap between Grad-CAM saliency maps and explainable visual features in skin cancer images πŸ“„ OpenML x SURF πŸ“„ OPT πŸ“„ Padded Conv πŸ“„ PaLM πŸ“„ PEER πŸ“„ Perceptron πŸ“„ pGLSM πŸ“„ Phase Transition Model Zoo πŸ“„ PhD vs Startup vs Big Company πŸ“„ Phenaki πŸ“„ Phrase Representation Learning πŸ“„ Pix2Seq πŸ“„ PixelShuffle πŸ“„ Point Cloud πŸ“„ PointNet++ πŸ“„ Pooling πŸ“„ Position Encoding πŸ“„ Position Wise Feed Forward πŸ“„ PQ πŸ“„ PRelu πŸ“„ Primary Capsule πŸ“„ Problems facing MLOps πŸ“„ Properties Required by Network Layers for Normalizing Flows πŸ“„ RAGAS - automated evaluation of RAG πŸ“„ RandAugment πŸ“„ Real Time Image Saliency for Black Box Classifiers πŸ“„ Receptive field πŸ“„ Recurrent πŸ“„ Region Proposal πŸ“„ RegNet πŸ“„ Relative Multi Head Self Attention πŸ“„ Relu πŸ“„ RepLKNet πŸ“„ Representational Capacity πŸ“„ RepVGG πŸ“„ Res Net D πŸ“„ Res Net πŸ“„ Research Engineer in Human Modeling for Automated Driving delft πŸ“„ residual flows πŸ“„ ResNeXt πŸ“„ Restricted Boltzmann Machine πŸ“„ RetinaNet πŸ“„ RETRO πŸ“„ Rmsprop πŸ“„ RoBERTa πŸ“„ Routing by Agreement πŸ“„ S2ST πŸ“„ Saddle Points πŸ“„ Salemi and Zamani - 2024 - Evaluating Retrieval Quality in Retrieval-Augmente πŸ“„ Salience Map πŸ“„ Scaled Dot Product Attention πŸ“„ Scaling matrix for coupling layers πŸ“„ Scene based text to image generation πŸ“„ ScoreCAM πŸ“„ SegNet πŸ“„ Self Attention GAN πŸ“„ Self Attention πŸ“„ Seq2Seq πŸ“„ Sequential auto encoding of neural embeddings - SANE πŸ“„ Shake-Drop πŸ“„ Shake-Shake πŸ“„ ShuffleNet πŸ“„ Sigmoid πŸ“„ SimCLR πŸ“„ Simulations Of language πŸ“„ Skip Connection πŸ“„ SLAK πŸ“„ Sliding Window Attention πŸ“„ Soft Attention πŸ“„ Softmax πŸ“„ Softplus πŸ“„ Soundify πŸ“„ Sparse Encoder Indexes πŸ“„ Sparse Evolutionary Training πŸ“„ Sparse Transformer πŸ“„ Spatial Transformer πŸ“„ Speaker Verification πŸ“„ Speech Emotion Recognition πŸ“„ Speech Recognition πŸ“„ Speech Resynthesis πŸ“„ Spiking Networks πŸ“„ SRN πŸ“„ Stable Difusion πŸ“„ Stack GAN πŸ“„ Stacking RNN πŸ“„ StarGAN v2 πŸ“„ StarGAN πŸ“„ Static Graph Execution πŸ“„ Strided Attention πŸ“„ Strided πŸ“„ Style GAN πŸ“„ Swin Transformer πŸ“„ Swish πŸ“„ Tacotron πŸ“„ Tanh πŸ“„ Teacher Forcing πŸ“„ Temporal Conv πŸ“„ TemporalLearning πŸ“„ Textless Speech Emotion Conversion πŸ“„ The elephant in the interpretability room πŸ“„ Thesis Flow πŸ“„ TinyBERT πŸ“„ Token Embedding πŸ“„ Transfer Learning or Self-supervised Learning? A Tale of Two Pretraining Paradigms πŸ“„ Transformer-XL πŸ“„ Transformer πŸ“„ Transposed Conv πŸ“„ Tug of war between RAG and LLM prior πŸ“„ Types of Normalizing flows πŸ“„ ULMFit πŸ“„ Un-LSTM πŸ“„ Unet πŸ“„ usermodel πŸ“„ VAE πŸ“„ Vanishing Gradient πŸ“„ Vapnik chervonenkis dimension πŸ“„ Vgg πŸ“„ VGGish πŸ“„ VICReg πŸ“„ ViLT πŸ“„ Vision Transformer πŸ“„ VisualGPT πŸ“„ VL-BEIT πŸ“„ Voronoi Cell πŸ“„ wave2vec πŸ“„ WaveGlow πŸ“„ WebGPT πŸ“„ Weight space learning πŸ“„ What is being Transferred in transfer learning πŸ“„ Whisper πŸ“„ Wide Deep Recommender πŸ“„ Window Based Regression πŸ“„ Word2Vec πŸ“„ X Vectors πŸ“„ Xception πŸ“„ XLM-R πŸ“„ XLNet πŸ“„ YOLO πŸ“„ Z-Space Entanglement πŸ“„ Zeiler Fergus