Bazel

Introduction

This post contains a basic introduction of the Bazel framework. It contains a summary of the advantages of using Bazel as a build framework, an overall technical overview of bazel including core concepts & terminologies, the files required for quick build setup using bazel and basic requirements/limitations of bazel projects. The contents of this post have been derived from various sources such as the official bazel documentation, presentations, tutorials and my own experiences using bazel.

General Overview

Benefits & Advantages

  • Build Avoidance
    • Only rebuilds what is necessary using local and distributed caching, optimized dependency analysis and parallel execution, resulting in fast and incremental builds
  • Fast & reliable
    • Caches all previous work & tracks changes to both file content & build commands
    • Only rebuilds the changes
    • Can be setup to build in highly parallel & incremental manner
  • Multi Language
    • Supports projects in multiple languages
  • Multi Platform**
    • Runs on Linux, macOS & Windows
    • Can build binaries & deployable packages for desktop, server & mobile from same project
  • Scalability
    • Supports large codebases across multiple repositories
    • Can handle 100K+ source files
  • Extensible
    • Many languages are already supported
    • Can be extended to support any language
  • Ease of Use
    • Uses high level build language
    • Describe build properties at a semantical level
    • Operates on concepts of libraries, binaries, scripts & data sets

Technical Overview

Concepts & Terminologies

Workspace

  • Root project directory which contains:
    • The source code files to be built
    • Symbolic links to directories with build outputs
    • A file named WORKSPACE, which identifies the directory and its contents as a Bazel workspace

Repositories

  • Used to organize code
    • Workspace is the root of the main repository, also called @
    • External repositories are defined in the WORKSPACE file using workspace rules
    • External repositories often are workspaces by themselves

Package

  • Basic unit to organize code that belongs together
    • Its a collection of related files and a specification of the dependencies among them
    • Has a directory within the workspace and contains a BUILD file
    • Includes all files in its directory, plus all subdirectories beneath it, except those which themselves contain a BUILD file
    • The name of a package is the name of the directory containing its BUILD file, relative to the top-level directory of the source tree

Targets

  • Elements of a package. Common types of targets are:
    1. Files: Further divided into 2 types
      • Source Files
      • Generated Files
    2. Rules:
      • Specifies the relationship between input and output files, including the steps to generate the outputs from the inputs
      • Output of a rule is always a generated file
      • Input to a rule could be source or generated file
      • Inputs to a rule could also be other rules
      • The files generated by a rule always belong to the same package as the rule itself i.e it is not possible to generate files into another package
      • A rule’s inputs can come from another package though
      • Every rule has a set of attributes. The applicable attributes for a given rule, and the significance and semantics of each attribute are a function of the rule’s class
      • Common Attributes:
        • srcs: List of labels, each being the name of a target input to the rule
        • outs: List of output labels, which cannot belong to another package i.e always expressed in relative form
        • deps: Separately-compiled modules providing header files, symbols, libraries, data, etc
        • data: Data files which dont have source code needed to run. Can also refer to directories. Use data = glob(["testdata/**"]) when specifying directories, as it enumerates the set of files contained within the directory, which helps detect changes to individual files instead of changes to directories
    3. Package Groups: sets of packages whose purpose is to limit accessibility of certain rules
      • Have two properties i.e the list of packages they contain and their name, and are defined by the package_group function
      • Do not generate or consume files

Labels

  • Name of the target
    • Eg. @repo//package:target
    • If the label refers to the same repository, @repo can be left out
    • Sometimes, the target name is by default assumed to be the last component of the package name, if the colon is omitted
      • Eg. //my/app is same as //my/app:app
    • The package_name part of a label can be omitted in a BUILD file
      • Eg. :app
    • Relative labels cannot be used to refer to targets in other packages
    • Labels starting with @// are references to the main repository, which will still work even from external repositories. Therefore @//a/b/c is different from //a/b/c when referenced from an external repository. The former refers back to the main repository, while the latter looks for //a/b/c in the external repository itself

Macro

  • Function called from the BUILD file that can instantiate rules
    • Similar to a function defined in .bzl file
    • Can be loaded into and called from the build file
    • Mainly used for encapsulation and code reuse of existing rules and other macros
    • Suitable for simple tasks such as preprocessing a source file or compressing a file
    • For complicated things such as adding support for new programming language, use Rules
    • Resolved by the end of loading phase
    • bazel query --output=build //my/path:all 
      • shows the BUILD file after evaluation, expanding all macros, globs and loops

Aspects

  • Allow augmenting build dependency graphs with additional information and actions
    • IDE’s can collect information about the project using aspects
    • Code generation tools can leverage aspects to execute on their inputs in “target-agnostic” manner

Limitations

  • All the inputs & dependencies in a Bazel build must be in the same workspace. Files residing in different workspaces are independent of each other unless linked
  • Read side effects (where data flows inwards into Bazel without a registered dependency) are not allowed, because they are an unregistered dependency and as such, can cause incorrect incremental builds

Setup

  • To get started with Bazel build setup, we need atleast 2 files:
    • WORKSPACE file: Create a file named “WORKSPACE” at the root of project directory. This file
      • Identifies the directory and its contents as a Bazel workspace and lives at the root of the project’s directory structure
      • Maybe empty, or may contain references to external dependencies required to build the outputs
      • Sub directories within a workspace with their own WORKSPACE file are ignored by bazel, since they form their own workspace
    • BUILD files: Tells Bazel how to build different parts of the project
      • Evaluated using an imperative language, Starlark
      • BUILD files cannot contain function definitions, for statements or if statements (but list comprehensions and if expressions are allowed)
      • BUILD files cannot perform arbitrary I/O, and are only dependent on set of known inputs
    • Extension files (Optional): Files ending in .bzl
      • Contains rules, functions or constants Symbols starting with _ are not exported and cannot be loaded from another file

Discussion & Comments