1936 words
10 minutes
Building Custom Indicators with Qlib Quant

Building Custom Indicators with Qlib Quant#

Welcome to this comprehensive guide on creating custom indicators with Qlib Quant. Qlib is a powerful open-source quantitative investment platform by Microsoft Research, offering rich datasets, factor modeling, backtesting, and more. In this blog post, we will explore how you can leverage Qlib’s flexible infrastructure to build your own indicators, from simple signals to more advanced, professional-level constructs.

This post aims to cover:

  1. An overview of Qlib and why custom indicators matter.
  2. Installing and setting up Qlib.
  3. Fundamentals of indicator design.
  4. Creation of a basic custom indicator in Qlib.
  5. Using Qlib’s expression engine to handle more complex signals.
  6. Handling performance considerations.
  7. Advanced concepts and expansions to bolster your professional capabilities.

By the end of this guide, you’ll be equipped with both foundational knowledge and next-level insights to create, refine, and scale your own custom indicators.


Table of Contents#

  1. Introduction to Qlib
  2. Why Build Custom Indicators?
  3. Setting Up Your Environment
  4. Qlib Basics: Data Infrastructure
  5. Designing Your First Custom Indicator
    1. Example: Simple Moving Average (SMA)
    2. Qlib Expression Syntax
  6. Advanced Indicator Creation Using Qlib’s Expression Engine
    1. Combining Expressions
    2. Chaining Indicators
    3. Debugging Strategies
  7. Performance Considerations and Optimization
    1. Efficient Data Retrieval Practices
    2. Caching and Reuse of Indicators
    3. Batch Computation vs. On-Demand Computation
  8. Professional-Level Indicator Expansions
    1. Feature Engineering with Custom Factors
    2. Integration with Machine Learning Pipelines
    3. Batch Backtesting for Multiple Indicators
    4. Hierarchical Factor Models
  9. Conclusion and Further Resources

Introduction to Qlib#

Qlib is an open-source research platform designed to meet the daily research needs of quantitative investors. Key features include:

  • A well-organized data infrastructure: Qlib efficiently handles large amounts of financial data, especially stock market data.
  • Flexible backtesting environment: Evaluate the performance of your strategies with customizable backtest modules.
  • Rich expression library: Build complex signals by composing smaller building blocks (factors) offered by Qlib.
  • Integration with existing Python data analysis tools: Qlib can be seamlessly paired with pandas, NumPy, scikit-learn, PyTorch, etc.

If you’re looking to create robust, scalable quant strategies, Qlib offers an ideal ecosystem to power your experiments and production systems.


Why Build Custom Indicators?#

While many out-of-the-box trading indicators exist (e.g., technical indicators like RSI, MACD, Bollinger Bands), quantitative investors often need to fine-tune or create entirely new signals. Common motivations for custom indicators include:

  • Capturing unique market behaviors not addressed by classic indicators.
  • Exploring innovative trading ideas, including advanced factor models and domain-specific signals.
  • Integrating alternative data sources (e.g., sentiment data, supply chain data) to create new alpha factors.
  • Combining standard signals in ways that produce fresh market insights.

Custom indicators are essential because they can incorporate your own research ideas and differentiate your strategy from off-the-shelf solutions.


Setting Up Your Environment#

To follow along with the examples:

  1. Python Installation
    Make sure you have Python 3.7 or later.

  2. Install Qlib
    Qlib can be installed via pip:

    Terminal window
    pip install pyqlib

    Alternatively, you can clone the Qlib repository and install from source:

    Terminal window
    git clone https://github.com/microsoft/qlib.git
    cd qlib
    python setup.py install
  3. Data Preparation (Optional)
    Qlib provides scripts to fetch data for various markets. Refer to the Qlib documentation for details on how to import your data.


Qlib Basics: Data Infrastructure#

Before diving into custom indicators, it’s important to understand how Qlib organizes data:

  • Instruments: Typically represent stocks, futures, or any other tradeable assets.
  • Data Fields: Represent daily bars (or other timescales) of open, high, low, close, volume, etc. Qlib’s default naming often uses “Close”, “Open”, “Volume”, “High”, and “Low”.
  • Provider: Handles data retrieval. Qlib supports both offline and online providers.
  • Calendar: The list of all trading days used within Qlib.

With Qlib, you interact with these concepts largely through expressions. If you’re new, you might want to check out Qlib’s expression tutorial—understanding how to request and manipulate data fields is the key to building custom indicators.


Designing Your First Custom Indicator#

Example: Simple Moving Average (SMA)#

Let’s begin with a straightforward example: computing a simple moving average (SMA). Even though Qlib’s built-in expression library already includes moving averages, implementing our own is a great introduction.

Step-by-Step#

  1. Understand the Command:
    A simple moving average for a given window size n is calculated by summing the last n closing prices and dividing the result by n.

  2. Use Qlib’s Expression:
    Qlib’s expression engine can handle window-based operations, typically with the syntax:

    Mean(window, <EXP>)

    This means we’ll take the Mean of some expression over a specified window.

  3. Example Code:

    import qlib
    from qlib.data import D
    from qlib.config import REG_CN
    import pandas as pd
    # Initialize Qlib
    qlib.init(provider_uri='~/.qlib/qlib_data/cn_data', region=REG_CN)
    # Define our instrument
    instrument = 'SH600519' # Example: Kweichow Moutai, a famous Chinese stock
    # We will define a period
    start_date = '2020-01-01'
    end_date = '2020-12-31'
    # Retrieve close prices
    close_expr = D.features([instrument], ['$close'], start_time=start_date, end_time=end_date)
    # Define a simple moving average of window size 20
    window_size = 20
    sma_expr = close_expr['$close'].rolling(window_size).mean() # Pandas-based approach
    # Evaluate (fetch) the expression
    sma_values = sma_expr.compute_data() # Qlib tracks expressions internally, then fetches
    # Print the tail of our resulting DataFrame
    print(sma_values.tail())

The above code uses a pandas-style rolling window for demonstration, but Qlib also has its own expression-based approach that automatically handles shifting, rolling, and other manipulations. Once you understand the basics, you can start using Qlib’s built-in expressions such as Ref, Mean, Std, and more.

Qlib Expression Syntax#

Qlib expressions are typically structured in a nested manner (e.g., Mean(Ref($close,1), 20)), where you combine smaller building blocks into more complex signals. This syntax can get quite elaborate. Some common expression function categories include:

Qlib FunctionDescriptionExample Usage
RefRefer to past/future valuesRef($close,1) → previous day’s close
MeanMoving averageMean($close,20) → 20-day SMA of close
StdStandard deviationStd($close,20) → 20-day standard deviation
CorrCorrelation coefficientCorr(close,close, volume, 10)
TSMaxRolling maximumTSMax($high, 14) → 14-day maximum high
TSMinRolling minimumTSMin($low, 14) → 14-day minimum low

These expressions empower you to create custom signals by composing them like building blocks.


Advanced Indicator Creation Using Qlib’s Expression Engine#

For more sophisticated indicators, we’ll step fully into Qlib’s expression ecosystem. By learning to combine, chain, and debug expressions, you can build highly customized signals.

Combining Expressions#

Let’s say we want to define a custom factor that checks how far the current close is above some rolling average of the volume. We might do something like:

from qlib.workflow.task.utils import Alpha360
from qlib.data.dataset import DatasetD, TSDatasetH
from qlib.data.dataset.handler import DataHandlerLP
# Pseudocode variant illustrating expression usage
base_close = "$close"
base_volume = "$volume"
window_size = 14
# Expression: (Close / Mean(Volume, 14))
expr_custom = f"({base_close} / Mean({base_volume},{window_size}))"
# Convert the raw string expression into Qlib's Expression representation
alpha_factor = Alpha360().parse_expr(expr_custom)
# Use alpha_factor in a DataHandler or pipeline
my_dataset = DatasetD(
handler=DataHandlerLP(instruments='SH600519', start_time='2020-01-01', end_time='2020-12-31'),
segments={"train": ("2020-01-01", "2020-12-31")},
# Additional dataset configs
)
# Evaluate the factor
df_factor = my_dataset.fetch(selector=alpha_factor)
print(df_factor.tail())

In this example, we define a ratio between the close price and the 14-day moving average of volume. Qlib’s expression parser allows this to be processed seamlessly. You can mix other factors into this chain—just keep referencing them in the string expression.

Chaining Indicators#

Chaining indicators is a matter of referencing them in subsequent expressions. For instance, if we have:

  • A = Mean($close, 20)
  • B = Std($close, 20)

We could define another indicator C = (A - $close) / B. This approach leads to a wide variety of factor compositions, enabling you to incorporate classical technical signals, fundamental data, or even external signals (e.g., news sentiment) if integrated into Qlib’s data pipeline.

Debugging Strategies#

If an expression produces unexpected results:

  1. Check Dimensions: Ensure you have data for the time range and instruments you expect.
  2. Print Partials: Test sub-expressions. For instance, if your expression is C = A / B, print A and B individually, verifying correctness.
  3. Use Qlib’s Computation Graph: The expression engine can be debugged by evaluating pieces of the graph.

Performance Considerations and Optimization#

While expression-based development is convenient, performance can become an issue when you scale to many stocks, multiple time periods, or large indicator sets.

Efficient Data Retrieval Practices#

  1. Leverage Qlib’s caching: Qlib caches data and expression results. Make sure it is configured properly to avoid recomputing.
  2. Use minimal time ranges: Limit queries to only the dates you actually need.

Caching and Reuse of Indicators#

If you plan on reusing the same factor expression in multiple experiments (e.g., an RSI, a custom factor, etc.), consider saving those computation results to disk or using Qlib’s built-in data caching. This can be as simple as computing the factor once and saving the resulting DataFrame for future reference.

Batch Computation vs. On-Demand Computation#

Qlib usually interprets expressions lazily—meaning it only pulls data when needed. For large-scale research tasks, you can schedule expression computations to run in batches, avoiding repetitive data I/O.


Professional-Level Indicator Expansions#

At this point, you have the foundation needed to create a variety of custom indicators. Below are some more advanced techniques to refine and expand your capabilities.

Feature Engineering with Custom Factors#

Rather than just building single numeric signals, consider how to transform your data for broader modeling:

  • Categorical transformations: Bucket volumes or returns into categories that can be used in classification tasks.
  • Lagging multiple signals: Create multiple versioned signals (e.g., lags of 1, 2, 5, 10 days) to capture short- and medium-term effects.
  • Rolling statistics: Expand from simple mean or std to advanced rolling transformations like skewness, kurtosis, or wavelet transforms.

Integration with Machine Learning Pipelines#

Qlib is designed to integrate well with scikit-learn, PyTorch, TensorFlow, etc. For example:

  1. Create your custom factor that uses Qlib expressions.
  2. Convert the factor’s output into a pandas DataFrame.
  3. Feed it into a machine learning model, training it to predict future returns or classify momentum patterns.

Below is a brief snippet illustrating how you might feed Qlib-based factors into an ML pipeline:

import qlib
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
qlib.init()
# Suppose factor_data is a DataFrame from Qlib containing your factor
# with columns: [datetime, instrument, factor_value], plus a target column
train_data = factor_data.loc[factor_data['datetime'] < '2020-07-01']
test_data = factor_data.loc[factor_data['datetime'] >= '2020-07-01']
X_train = train_data[['factor_value']]
y_train = train_data['target']
X_test = test_data[['factor_value']]
y_test = test_data['target']
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
print("Predictions:", y_pred[:10])

This approach supports more features and a broader architecture for you to integrate advanced ML solutions.

Batch Backtesting for Multiple Indicators#

Rather than testing a single indicator at a time, Qlib allows you to define multiple signals and run them through backtesting in one go:

  1. Define your set of factors.
  2. Bundle them into a single expression or data pipeline.
  3. Configure a backtest using Qlib’s built-in backtest modules (such as backtest for Alpha158, Alpha360, etc.).

By configuring a multi-factor pipeline, you can quickly compare the performance of various indicators or combinations thereof.

Hierarchical Factor Models#

Professional quant strategies often use multiple levels of factors:

  • Sector-level factors: Evaluate indicators specific to certain industries or segments.
  • Global factors: Evaluate macroeconomic or cross-instrument effects.
  • Instrument-level factors: Standard signals for each instrument separately.

By weighting these factors in a hierarchical manner (e.g., a sector factor might weigh instrument-level factors differently for cyclical vs. defensive sectors), you can create advanced, targeted signals that reflect both broad market conditions and individual stock nuances.

Below is a conceptual example:

  1. Compute a market-wide sentiment factor.
  2. Compute a sector factor capturing average returns in that sector.
  3. Compute a stock-specific factor capturing historical price momentum.
  4. Combine them in a final factor:
    FinalFactor = w1 * MarketSentiment + w2 * SectorFactor + w3 * StockMomentum

Where the weights w1, w2, w3 can be learned or set heuristically based on modeling assumptions. Qlib can handle each piece (market, sector, instrument) distinctly, merging them when retrieving data at the final computation step.


Conclusion and Further Resources#

Building custom indicators in Qlib unlocks a vast realm of possibility. You can combine pre-built expressions, chain them together to form advanced signals, integrate them with machine learning pipelines, and optimize them for efficient large-scale research.

Key takeaways from this blog post:

  1. Getting Started: Qlib’s expression engine is intuitive for building standard indicators; it supports rolling, referencing, correlation, etc.
  2. Customization: You can extend Qlib’s capabilities by chaining and mixing expressions to form unique signals.
  3. Performance: Understand caching, limiting data retrieval, and batch computations to handle large-scale tasks.
  4. Professional-Level: Incorporate multi-dimensional signals (sector, market, instrument), advanced feature engineering, and machine learning integration to push your strategies to new levels.

For further reading, explore:

  • Qlib’s Official Documentation for details on advanced expressions and modules.
  • Qlib GitHub Issues for community discussions, beginner tips, and best practices.
  • Relevant financial machine learning resources, such as “Advances in Financial Machine Learning” by Marcos López de Prado, to inspire more sophisticated factor engineering.

With this knowledge, you are well on your way to discovering and implementing new alpha factors. May your quant research journey be both enlightening and profitable!

Building Custom Indicators with Qlib Quant
https://closeaiblog.vercel.app/posts/qlib/14/
Author
CloseAI
Published at
2024-06-06
License
CC BY-NC-SA 4.0