Building Custom Indicators with Qlib Quant
Welcome to this comprehensive guide on creating custom indicators with Qlib Quant. Qlib is a powerful open-source quantitative investment platform by Microsoft Research, offering rich datasets, factor modeling, backtesting, and more. In this blog post, we will explore how you can leverage Qlib’s flexible infrastructure to build your own indicators, from simple signals to more advanced, professional-level constructs.
This post aims to cover:
- An overview of Qlib and why custom indicators matter.
- Installing and setting up Qlib.
- Fundamentals of indicator design.
- Creation of a basic custom indicator in Qlib.
- Using Qlib’s expression engine to handle more complex signals.
- Handling performance considerations.
- Advanced concepts and expansions to bolster your professional capabilities.
By the end of this guide, you’ll be equipped with both foundational knowledge and next-level insights to create, refine, and scale your own custom indicators.
Table of Contents
- Introduction to Qlib
- Why Build Custom Indicators?
- Setting Up Your Environment
- Qlib Basics: Data Infrastructure
- Designing Your First Custom Indicator
- Advanced Indicator Creation Using Qlib’s Expression Engine
- Performance Considerations and Optimization
- Professional-Level Indicator Expansions
- Conclusion and Further Resources
Introduction to Qlib
Qlib is an open-source research platform designed to meet the daily research needs of quantitative investors. Key features include:
- A well-organized data infrastructure: Qlib efficiently handles large amounts of financial data, especially stock market data.
- Flexible backtesting environment: Evaluate the performance of your strategies with customizable backtest modules.
- Rich expression library: Build complex signals by composing smaller building blocks (factors) offered by Qlib.
- Integration with existing Python data analysis tools: Qlib can be seamlessly paired with pandas, NumPy, scikit-learn, PyTorch, etc.
If you’re looking to create robust, scalable quant strategies, Qlib offers an ideal ecosystem to power your experiments and production systems.
Why Build Custom Indicators?
While many out-of-the-box trading indicators exist (e.g., technical indicators like RSI, MACD, Bollinger Bands), quantitative investors often need to fine-tune or create entirely new signals. Common motivations for custom indicators include:
- Capturing unique market behaviors not addressed by classic indicators.
- Exploring innovative trading ideas, including advanced factor models and domain-specific signals.
- Integrating alternative data sources (e.g., sentiment data, supply chain data) to create new alpha factors.
- Combining standard signals in ways that produce fresh market insights.
Custom indicators are essential because they can incorporate your own research ideas and differentiate your strategy from off-the-shelf solutions.
Setting Up Your Environment
To follow along with the examples:
-
Python Installation
Make sure you have Python 3.7 or later. -
Install Qlib
Qlib can be installed via pip:Terminal window pip install pyqlibAlternatively, you can clone the Qlib repository and install from source:
Terminal window git clone https://github.com/microsoft/qlib.gitcd qlibpython setup.py install -
Data Preparation (Optional)
Qlib provides scripts to fetch data for various markets. Refer to the Qlib documentation for details on how to import your data.
Qlib Basics: Data Infrastructure
Before diving into custom indicators, it’s important to understand how Qlib organizes data:
- Instruments: Typically represent stocks, futures, or any other tradeable assets.
- Data Fields: Represent daily bars (or other timescales) of open, high, low, close, volume, etc. Qlib’s default naming often uses “Close”, “Open”, “Volume”, “High”, and “Low”.
- Provider: Handles data retrieval. Qlib supports both offline and online providers.
- Calendar: The list of all trading days used within Qlib.
With Qlib, you interact with these concepts largely through expressions. If you’re new, you might want to check out Qlib’s expression tutorial—understanding how to request and manipulate data fields is the key to building custom indicators.
Designing Your First Custom Indicator
Example: Simple Moving Average (SMA)
Let’s begin with a straightforward example: computing a simple moving average (SMA). Even though Qlib’s built-in expression library already includes moving averages, implementing our own is a great introduction.
Step-by-Step
-
Understand the Command:
A simple moving average for a given window sizen
is calculated by summing the lastn
closing prices and dividing the result byn
. -
Use Qlib’s Expression:
Qlib’s expression engine can handle window-based operations, typically with the syntax:Mean(window, <EXP>)This means we’ll take the Mean of some expression
over a specified window
. -
Example Code:
import qlibfrom qlib.data import Dfrom qlib.config import REG_CNimport pandas as pd# Initialize Qlibqlib.init(provider_uri='~/.qlib/qlib_data/cn_data', region=REG_CN)# Define our instrumentinstrument = 'SH600519' # Example: Kweichow Moutai, a famous Chinese stock# We will define a periodstart_date = '2020-01-01'end_date = '2020-12-31'# Retrieve close pricesclose_expr = D.features([instrument], ['$close'], start_time=start_date, end_time=end_date)# Define a simple moving average of window size 20window_size = 20sma_expr = close_expr['$close'].rolling(window_size).mean() # Pandas-based approach# Evaluate (fetch) the expressionsma_values = sma_expr.compute_data() # Qlib tracks expressions internally, then fetches# Print the tail of our resulting DataFrameprint(sma_values.tail())
The above code uses a pandas-style rolling window for demonstration, but Qlib also has its own expression-based approach that automatically handles shifting, rolling, and other manipulations. Once you understand the basics, you can start using Qlib’s built-in expressions such as Ref
, Mean
, Std
, and more.
Qlib Expression Syntax
Qlib expressions are typically structured in a nested manner (e.g., Mean(Ref($close,1), 20)
), where you combine smaller building blocks into more complex signals. This syntax can get quite elaborate. Some common expression function categories include:
Qlib Function | Description | Example Usage |
---|---|---|
Ref | Refer to past/future values | Ref($close,1) → previous day’s close |
Mean | Moving average | Mean($close,20) → 20-day SMA of close |
Std | Standard deviation | Std($close,20) → 20-day standard deviation |
Corr | Correlation coefficient | Corr(volume, 10) |
TSMax | Rolling maximum | TSMax($high, 14) → 14-day maximum high |
TSMin | Rolling minimum | TSMin($low, 14) → 14-day minimum low |
These expressions empower you to create custom signals by composing them like building blocks.
Advanced Indicator Creation Using Qlib’s Expression Engine
For more sophisticated indicators, we’ll step fully into Qlib’s expression ecosystem. By learning to combine, chain, and debug expressions, you can build highly customized signals.
Combining Expressions
Let’s say we want to define a custom factor that checks how far the current close is above some rolling average of the volume. We might do something like:
from qlib.workflow.task.utils import Alpha360from qlib.data.dataset import DatasetD, TSDatasetHfrom qlib.data.dataset.handler import DataHandlerLP
# Pseudocode variant illustrating expression usagebase_close = "$close"base_volume = "$volume"window_size = 14
# Expression: (Close / Mean(Volume, 14))expr_custom = f"({base_close} / Mean({base_volume},{window_size}))"
# Convert the raw string expression into Qlib's Expression representationalpha_factor = Alpha360().parse_expr(expr_custom)
# Use alpha_factor in a DataHandler or pipelinemy_dataset = DatasetD( handler=DataHandlerLP(instruments='SH600519', start_time='2020-01-01', end_time='2020-12-31'), segments={"train": ("2020-01-01", "2020-12-31")}, # Additional dataset configs)
# Evaluate the factordf_factor = my_dataset.fetch(selector=alpha_factor)print(df_factor.tail())
In this example, we define a ratio between the close price and the 14-day moving average of volume. Qlib’s expression parser allows this to be processed seamlessly. You can mix other factors into this chain—just keep referencing them in the string expression.
Chaining Indicators
Chaining indicators is a matter of referencing them in subsequent expressions. For instance, if we have:
A = Mean($close, 20)
B = Std($close, 20)
We could define another indicator C = (A - $close) / B
. This approach leads to a wide variety of factor compositions, enabling you to incorporate classical technical signals, fundamental data, or even external signals (e.g., news sentiment) if integrated into Qlib’s data pipeline.
Debugging Strategies
If an expression produces unexpected results:
- Check Dimensions: Ensure you have data for the time range and instruments you expect.
- Print Partials: Test sub-expressions. For instance, if your expression is
C = A / B
, printA
andB
individually, verifying correctness. - Use Qlib’s Computation Graph: The expression engine can be debugged by evaluating pieces of the graph.
Performance Considerations and Optimization
While expression-based development is convenient, performance can become an issue when you scale to many stocks, multiple time periods, or large indicator sets.
Efficient Data Retrieval Practices
- Leverage Qlib’s caching: Qlib caches data and expression results. Make sure it is configured properly to avoid recomputing.
- Use minimal time ranges: Limit queries to only the dates you actually need.
Caching and Reuse of Indicators
If you plan on reusing the same factor expression in multiple experiments (e.g., an RSI, a custom factor, etc.), consider saving those computation results to disk or using Qlib’s built-in data caching. This can be as simple as computing the factor once and saving the resulting DataFrame for future reference.
Batch Computation vs. On-Demand Computation
Qlib usually interprets expressions lazily—meaning it only pulls data when needed. For large-scale research tasks, you can schedule expression computations to run in batches, avoiding repetitive data I/O.
Professional-Level Indicator Expansions
At this point, you have the foundation needed to create a variety of custom indicators. Below are some more advanced techniques to refine and expand your capabilities.
Feature Engineering with Custom Factors
Rather than just building single numeric signals, consider how to transform your data for broader modeling:
- Categorical transformations: Bucket volumes or returns into categories that can be used in classification tasks.
- Lagging multiple signals: Create multiple versioned signals (e.g., lags of 1, 2, 5, 10 days) to capture short- and medium-term effects.
- Rolling statistics: Expand from simple mean or std to advanced rolling transformations like skewness, kurtosis, or wavelet transforms.
Integration with Machine Learning Pipelines
Qlib is designed to integrate well with scikit-learn, PyTorch, TensorFlow, etc. For example:
- Create your custom factor that uses Qlib expressions.
- Convert the factor’s output into a pandas DataFrame.
- Feed it into a machine learning model, training it to predict future returns or classify momentum patterns.
Below is a brief snippet illustrating how you might feed Qlib-based factors into an ML pipeline:
import qlibimport pandas as pdfrom sklearn.ensemble import RandomForestRegressor
qlib.init()
# Suppose factor_data is a DataFrame from Qlib containing your factor# with columns: [datetime, instrument, factor_value], plus a target column
train_data = factor_data.loc[factor_data['datetime'] < '2020-07-01']test_data = factor_data.loc[factor_data['datetime'] >= '2020-07-01']
X_train = train_data[['factor_value']]y_train = train_data['target']X_test = test_data[['factor_value']]y_test = test_data['target']
model = RandomForestRegressor(n_estimators=100, random_state=42)model.fit(X_train, y_train)
# Predicty_pred = model.predict(X_test)print("Predictions:", y_pred[:10])
This approach supports more features and a broader architecture for you to integrate advanced ML solutions.
Batch Backtesting for Multiple Indicators
Rather than testing a single indicator at a time, Qlib allows you to define multiple signals and run them through backtesting in one go:
- Define your set of factors.
- Bundle them into a single expression or data pipeline.
- Configure a backtest using Qlib’s built-in backtest modules (such as backtest for Alpha158, Alpha360, etc.).
By configuring a multi-factor pipeline, you can quickly compare the performance of various indicators or combinations thereof.
Hierarchical Factor Models
Professional quant strategies often use multiple levels of factors:
- Sector-level factors: Evaluate indicators specific to certain industries or segments.
- Global factors: Evaluate macroeconomic or cross-instrument effects.
- Instrument-level factors: Standard signals for each instrument separately.
By weighting these factors in a hierarchical manner (e.g., a sector factor might weigh instrument-level factors differently for cyclical vs. defensive sectors), you can create advanced, targeted signals that reflect both broad market conditions and individual stock nuances.
Below is a conceptual example:
- Compute a market-wide sentiment factor.
- Compute a sector factor capturing average returns in that sector.
- Compute a stock-specific factor capturing historical price momentum.
- Combine them in a final factor:
FinalFactor = w1 * MarketSentiment + w2 * SectorFactor + w3 * StockMomentum
Where the weights w1, w2, w3
can be learned or set heuristically based on modeling assumptions. Qlib can handle each piece (market, sector, instrument) distinctly, merging them when retrieving data at the final computation step.
Conclusion and Further Resources
Building custom indicators in Qlib unlocks a vast realm of possibility. You can combine pre-built expressions, chain them together to form advanced signals, integrate them with machine learning pipelines, and optimize them for efficient large-scale research.
Key takeaways from this blog post:
- Getting Started: Qlib’s expression engine is intuitive for building standard indicators; it supports rolling, referencing, correlation, etc.
- Customization: You can extend Qlib’s capabilities by chaining and mixing expressions to form unique signals.
- Performance: Understand caching, limiting data retrieval, and batch computations to handle large-scale tasks.
- Professional-Level: Incorporate multi-dimensional signals (sector, market, instrument), advanced feature engineering, and machine learning integration to push your strategies to new levels.
For further reading, explore:
- Qlib’s Official Documentation for details on advanced expressions and modules.
- Qlib GitHub Issues for community discussions, beginner tips, and best practices.
- Relevant financial machine learning resources, such as “Advances in Financial Machine Learning” by Marcos López de Prado, to inspire more sophisticated factor engineering.
With this knowledge, you are well on your way to discovering and implementing new alpha factors. May your quant research journey be both enlightening and profitable!